Reading an HTML document & extracting content

Discussion in 'Javascript' started by Cognizance, May 23, 2005.

  1. Cognizance

    Cognizance Guest

    Hi gang,

    I'm an ASP developer by trade, but I've had to create client side
    scripts with JavaScript many times in the past. Simple things, like
    validating form elements and such.

    Now I've been assigned the task of extracting content from a given HTML
    page. If anyone's familiar with the Yahoo! Store order confirmation
    screen, I need to be able to grab the total amount from the table to
    the right-hand side. (Sample File:
    http://www.2beyourself.com/t/sample.html)

    If you view the source, this is in a table and enclosed with ugly html.
    the value I want to retrieve is wrapped with b tags. Originally I was
    thinking of using innerHTML or innerText for extracting the value. But
    I find that we cannot gain control of this piece of the Yahoo! Store to
    make it work!

    So after talking with peers, we thought of reading in the entire HTML
    page and using regular expressions to try and extract the value.
    Something along the lines of: '\<b\>[0-9]+\.[0-9]{2}\<\/b\/>'

    I'm not sure how to accomplish this. Could someone please point me in
    the right direction? If this solution is even a good one. If you have
    something better, I'm all ears! (eyes) If using the regular expression
    would be a good solution, I need to find out how to read in the entire
    HTML doc, and then parse out that piece.

    Any tips and suggestions will be appreciate greatly!!

    And I hope your week is starting off right. ^^
     
    Cognizance, May 23, 2005
    #1
    1. Advertising

  2. Cognizance

    McKirahan Guest

    "Cognizance" <> wrote in message
    news:...
    > Hi gang,
    >
    > I'm an ASP developer by trade, but I've had to create client side
    > scripts with JavaScript many times in the past. Simple things, like
    > validating form elements and such.
    >
    > Now I've been assigned the task of extracting content from a given HTML
    > page. If anyone's familiar with the Yahoo! Store order confirmation
    > screen, I need to be able to grab the total amount from the table to
    > the right-hand side. (Sample File:
    > http://www.2beyourself.com/t/sample.html)
    >
    > If you view the source, this is in a table and enclosed with ugly html.
    > the value I want to retrieve is wrapped with b tags. Originally I was
    > thinking of using innerHTML or innerText for extracting the value. But
    > I find that we cannot gain control of this piece of the Yahoo! Store to
    > make it work!
    >
    > So after talking with peers, we thought of reading in the entire HTML
    > page and using regular expressions to try and extract the value.
    > Something along the lines of: '\<b\>[0-9]+\.[0-9]{2}\<\/b\/>'
    >
    > I'm not sure how to accomplish this. Could someone please point me in
    > the right direction? If this solution is even a good one. If you have
    > something better, I'm all ears! (eyes) If using the regular expression
    > would be a good solution, I need to find out how to read in the entire
    > HTML doc, and then parse out that piece.
    >
    > Any tips and suggestions will be appreciate greatly!!
    >
    > And I hope your week is starting off right. ^^
    >


    RegEx would be better but this works:

    <html>
    <head>
    <title>Total.htm</title>
    <script type="text/javascript">
    function total() {
    var sURL = "http://www.2beyourself.com/t/sample.html";
    var oXML = new ActiveXObject("Microsoft.XMLHTTP");
    oXML.Open("GET",sURL,false);
    oXML.send();
    try {
    var sXML = oXML.ResponseText;
    // Find Total's label
    var iTAG = sXML.indexOf("<b>Total:</b>");
    var sVAL = sXML.substr(iTAG);
    // Find Total's decimal
    var iDOT = sVAL.indexOf(".");
    sVAL = sVAL.substr(0,iDOT+3);
    // Find Total's start
    iTAG = sVAL.lastIndexOf(">")
    sVAL = sVAL.substr(iTAG+1)
    // Show Total's value
    alert(sVAL);
    } catch(e) {
    alert(sURL + " not found!");
    }
    }
    </script>
    </head>
    <body onload="total()">
    </body>
    </html>
     
    McKirahan, May 23, 2005
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. mike
    Replies:
    2
    Views:
    519
    Chris Smith
    Jan 14, 2005
  2. mike
    Replies:
    3
    Views:
    503
    Thomas Weidenfeller
    Jan 24, 2005
  3. hazz
    Replies:
    6
    Views:
    49,782
    SkyUCHC
    Jun 9, 2010
  4. Dave L
    Replies:
    3
    Views:
    3,286
    Göran Andersson
    Mar 4, 2010
  5. masterGaurav

    Extracting HTML Content

    masterGaurav, May 1, 2006, in forum: Perl Misc
    Replies:
    19
    Views:
    203
    robic0
    May 6, 2006
Loading...

Share This Page