XPath searching

Discussion in 'XML' started by CxT, Apr 13, 2009.

  1. CxT

    CxT Guest

    Hello,

    I am very new to XPath. I *have* read through several online
    tutorials though.

    I have, what I think to be, a very basic question:

    How do I find something specific in an HTML document using XPath?
    What I mean is... I am looking for a specific <div class="foo"...
    which might be nested 100 levels deep - I am trying to pull a stock
    quote from http://moneycentral.msn.com/detail/stock_quote?Symbol=IBM.

    I'd like to use something like "*div[@class='foo']" but that doesn't
    seem to be valid.

    Any guidance would be much appreciated.

    Thanks,
    CxT
    CxT, Apr 13, 2009
    #1
    1. Advertising

  2. CxT wrote:

    > How do I find something specific in an HTML document using XPath?


    XPath is first of all defined on XML documents, not an HTML documents.
    Depending on the implementation there are however ways to parse HTML
    documents into a suitable data structure for XPath. Which XPath
    implementation do you use?

    > What I mean is... I am looking for a specific <div class="foo"...
    > which might be nested 100 levels deep - I am trying to pull a stock
    > quote from http://moneycentral.msn.com/detail/stock_quote?Symbol=IBM.
    >
    > I'd like to use something like "*div[@class='foo']" but that doesn't
    > seem to be valid.


    //div

    would select 'div' elements at all levels and then you can add your
    predicate

    //div[@class = 'foo']

    and should filter out only those 'div' elements where the class
    attribute has the value 'foo'.

    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, Apr 13, 2009
    #2
    1. Advertising

  3. CxT

    CxT Guest

    On Apr 13, 7:55 am, Martin Honnen <> wrote:
    > http://moneycentral.msn.com/detail/stock_quote?Symbol=IBM.
    >
    > > I'd like to use something like "*div[@class='foo']" but that doesn't
    > > seem to be valid.

    >
    > //div
    >
    > would select 'div' elements at all levels and then you can add your
    > predicate
    >
    > //div[@class = 'foo']
    >
    > and should filter out only those 'div' elements where the class
    > attribute has the value 'foo'.


    That definitely seems to work Martin - thank you!

    Here is the block that receive:

    <div class="bd">
    <table>
    <tr>
    <td id="detail">
    <table>
    <tr class="rs0">
    <th colspan="4"><span class="s1">119.57</span>
    &nbsp;unch <a href="http://moneycentral.msn.com/investor/invsub/
    advisor/advisor.asp?symbol=AAPL" class="fyistyle">fyi</a>&nbsp;&nbsp;</
    th>
    </tr>

    I want to access that value of the span (class=s1) - 119.57. Do I
    have to work my way down from each level (from the div)? For example
    something like: "//div[@class = 'bd']/table/tr/td[@class = 'detail'" -
    which again doesn't seem to be valid.

    Thanks you for any guidance... once I understand how to iterate over
    paths I think I should be good to do.

    CxT
    CxT, Apr 13, 2009
    #3
  4. CxT wrote:

    > Here is the block that receive:
    >
    > <div class="bd">
    > <table>
    > <tr>
    > <td id="detail">
    > <table>
    > <tr class="rs0">
    > <th colspan="4"><span class="s1">119.57</span>
    > &nbsp;unch <a href="http://moneycentral.msn.com/investor/invsub/
    > advisor/advisor.asp?symbol=AAPL" class="fyistyle">fyi</a>&nbsp;&nbsp;</
    > th>
    > </tr>
    >
    > I want to access that value of the span (class=s1) - 119.57. Do I
    > have to work my way down from each level (from the div)? For example
    > something like: "//div[@class = 'bd']/table/tr/td[@class = 'detail'" -
    > which again doesn't seem to be valid.


    A closing square bracket is missing:
    //div[@class = 'bd']/table/tr/td[@class = 'detail']
    is certainly a syntactically correct XPath expression.

    On the other hand SGML/HTML parsing rules might insert an implied tbody so
    //div[@class = 'bd']/table/tbody/tr/td[@class = 'detail']
    could also be possible, depending on the parser used for parsing the HTML.

    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, Apr 13, 2009
    #4
  5. Martin Honnen schrieb:
    > CxT wrote:
    >
    >> Here is the block that receive:
    >>
    >> <div class="bd">
    >> <table>
    >> <tr>
    >> <td id="detail">
    >> <table>
    >> <tr class="rs0">
    >> <th colspan="4"><span class="s1">119.57</span>
    >> &nbsp;unch <a href="http://moneycentral.msn.com/investor/invsub/
    >> advisor/advisor.asp?symbol=AAPL" class="fyistyle">fyi</a>&nbsp;&nbsp;</
    >> th>
    >> </tr>
    >>
    >> I want to access that value of the span (class=s1) - 119.57. Do I
    >> have to work my way down from each level (from the div)? For example
    >> something like: "//div[@class = 'bd']/table/tr/td[@class = 'detail'" -
    >> which again doesn't seem to be valid.

    >
    > A closing square bracket is missing:
    > //div[@class = 'bd']/table/tr/td[@class = 'detail']
    > is certainly a syntactically correct XPath expression.
    >
    > On the other hand SGML/HTML parsing rules might insert an implied tbody so
    > //div[@class = 'bd']/table/tbody/tr/td[@class = 'detail']
    > could also be possible, depending on the parser used for parsing the HTML.


    Additionally, in the code fragment the td element has an _id_ attribute
    with value "detail", not a _class_ attribute with that value.

    --
    Johannes Koch
    In te domine speravi; non confundar in aeternum.
    (Te Deum, 4th cent.)
    Johannes Koch, Apr 14, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Marvin_123456

    "Memory leak" in javax.xml.xpath.XPath

    Marvin_123456, Jul 29, 2005, in forum: Java
    Replies:
    4
    Views:
    1,940
    jan V
    Jul 29, 2005
  2. Alastair Cameron
    Replies:
    1
    Views:
    7,357
    SQL Server Development Team [MSFT]
    Jul 8, 2003
  3. Anna
    Replies:
    0
    Views:
    500
  4. goog
    Replies:
    0
    Views:
    488
  5. stumblng.tumblr
    Replies:
    1
    Views:
    186
    stumblng.tumblr
    Feb 4, 2008
Loading...

Share This Page