Bes XPath query?

Discussion in 'XML' started by CxT, May 8, 2009.

  1. CxT

    CxT Guest

    Hi,

    I need to be able to find the value (5.56) in the td that is a
    sibling
    of the "Earnings/Share" td. I'm not sure how I go about using XPath
    to search for that specific string.
    Any guidance would be much appreciated.
    CxT
    <table>
    <tr>
    <td>Beta</td>
    <td class="cl1">1.66</td>
    </tr>
    <tr>
    <td>Dividend &amp; Yield</td>
    <td class="cl1">NA</td>
    </tr>
    <tr>
    <td>Earnings/Share</td>
    <td class="cl1">5.56</td>
    </tr>
    Note: the above comes from a very long html file (this is just a
    snippet).
     
    CxT, May 8, 2009
    #1
    1. Advertising

  2. CxT wrote:
    > Hi,
    >
    > I need to be able to find the value (5.56) in the td that is a
    > sibling
    > of the "Earnings/Share" td. I'm not sure how I go about using XPath
    > to search for that specific string.
    > Any guidance would be much appreciated.
    > CxT
    > <table>
    > <tr>
    > <td>Beta</td>
    > <td class="cl1">1.66</td>
    > </tr>
    > <tr>
    > <td>Dividend &amp; Yield</td>
    > <td class="cl1">NA</td>
    > </tr>
    > <tr>
    > <td>Earnings/Share</td>
    > <td class="cl1">5.56</td>
    > </tr>


    //table/tr[td[. = 'Earnings/Share']]/td[@class = 'cl1']
    --

    Martin Honnen
    http://msmvps.com/blogs/martin_honnen/
     
    Martin Honnen, May 8, 2009
    #2
    1. Advertising

  3. CxT

    CxT Guest

    On May 8, 5:45 am, Martin Honnen <> wrote:
    > CxT wrote:
    > > Hi,

    >
    > > I need to be able to find the value (5.56) in the td that is a
    > > sibling
    > > of the "Earnings/Share" td.  I'm not sure how I go about using XPath
    > > to search for that specific string.
    > > Any guidance would be much appreciated.
    > > CxT
    > >                     <table>
    > >                       <tr>
    > >                         <td>Beta</td>
    > >                         <td class="cl1">1.66</td>
    > >                       </tr>
    > >                       <tr>
    > >                         <td>Dividend &amp; Yield</td>
    > >                         <td class="cl1">NA</td>
    > >                       </tr>
    > >                       <tr>
    > >                         <td>Earnings/Share</td>
    > >                         <td class="cl1">5.56</td>
    > >                       </tr>

    >
    >    //table/tr[td[. = 'Earnings/Share']]/td[@class = 'cl1']
    > --
    >
    >         Martin Honnen
    >        http://msmvps.com/blogs/martin_honnen/


    Hmmm... that specific query is returning 0 elements. What would I do
    if I wanted to search for an element that contains just the text
    ("Earnings/Share")? I thought I could do "//td[. = 'Earnings/Share']"
    but that isn't returning any hits either. Very confused.

    Thanks for any additional guidance.
    CxT
     
    CxT, May 8, 2009
    #3
  4. CxT wrote:

    >>> <table>
    >>> <tr>
    >>> <td>Beta</td>
    >>> <td class="cl1">1.66</td>
    >>> </tr>
    >>> <tr>
    >>> <td>Dividend &amp; Yield</td>
    >>> <td class="cl1">NA</td>
    >>> </tr>
    >>> <tr>
    >>> <td>Earnings/Share</td>
    >>> <td class="cl1">5.56</td>
    >>> </tr>

    >> //table/tr[td[. = 'Earnings/Share']]/td[@class = 'cl1']



    > Hmmm... that specific query is returning 0 elements. What would I do
    > if I wanted to search for an element that contains just the text
    > ("Earnings/Share")? I thought I could do "//td[. = 'Earnings/Share']"
    > but that isn't returning any hits either. Very confused.


    Are you trying to use XPath against an XHTML document? In XHTML elements
    are in the namespace http://www.w3.org/1999/xhtml and '//td' (in XPath
    1.) always selects elements in no namespace so that could be one reason
    why the expressions do not find any element.

    Other than that you will need to provide some context as for how exactly
    you use XPath.

    --

    Martin Honnen
    http://msmvps.com/blogs/martin_honnen/
     
    Martin Honnen, May 8, 2009
    #4
  5. CxT

    CxT Guest

    On May 8, 8:09 am, Martin Honnen <> wrote:
    > CxT wrote:
    > >>>                     <table>
    > >>>                       <tr>
    > >>>                         <td>Beta</td>
    > >>>                         <td class="cl1">1.66</td>
    > >>>                       </tr>
    > >>>                       <tr>
    > >>>                         <td>Dividend &amp; Yield</td>
    > >>>                         <td class="cl1">NA</td>
    > >>>                       </tr>
    > >>>                       <tr>
    > >>>                         <td>Earnings/Share</td>
    > >>>                         <td class="cl1">5.56</td>
    > >>>                       </tr>
    > >>    //table/tr[td[. = 'Earnings/Share']]/td[@class = 'cl1']

    > > Hmmm... that specific query is returning 0 elements.  What would I do
    > > if I wanted to search for an element that contains just the text
    > > ("Earnings/Share")?  I thought I could do "//td[. = 'Earnings/Share']"
    > > but that isn't returning any hits either.  Very confused.

    >
    > Are you trying to use XPath against an XHTML document? In XHTML elements
    > are in the namespacehttp://www.w3.org/1999/xhtmland '//td' (in XPath
    > 1.) always selects elements in no namespace so that could be one reason
    > why the expressions do not find any element.
    >
    > Other than that you will need to provide some context as for how exactly
    > you use XPath.
    >
    > --
    >
    >         Martin Honnen
    >        http://msmvps.com/blogs/martin_honnen/


    I'm using XPath to search through the following URL:

    http://moneycentral.msn.com/detail/stock_quote?Symbol=aapl&getquote=Get Quote

    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-
    US" is present at the top of the file.

    If I can't do a more structured search can I still use XPath to
    perform a simple text search and then obtain the node for where I find
    the text?

    Thank you so much for your help,
    CxT

    PS: Note that other XPath searches work in this document, for example:
    "//table/tr[@class = 'rs0']/th/span[@class = 's1']"
     
    CxT, May 8, 2009
    #5
  6. CxT wrote:

    > I'm using XPath to search through the following URL:
    >
    > http://moneycentral.msn.com/detail/stock_quote?Symbol=aapl&getquote=Get Quote
    >
    > <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-
    > US" is present at the top of the file.


    So that is XHTML and that means, if the document is parsed by an XML
    parser, that you need to bind a prefix to the namespace URI and use that
    prefix in your XPath expressions.


    > PS: Note that other XPath searches work in this document, for example:
    > "//table/tr[@class = 'rs0']/th/span[@class = 's1']"


    That is rather odd, with the namespace declaration being present on the
    root element. How do you parse the document, which XPath API do you use?
    Is that XPath over HTML, as some browsers like Mozilla or Opera provide?

    --

    Martin Honnen
    http://msmvps.com/blogs/martin_honnen/
     
    Martin Honnen, May 8, 2009
    #6
  7. CxT

    CxT Guest

    On May 8, 8:52 am, Martin Honnen <> wrote:

    > So that is XHTML and that means, if the document is parsed by an XML
    > parser, that you need to bind a prefix to the namespace URI and use that
    > prefix in your XPath expressions.


    Could you please provide an example of what such an expression would
    look like?

    > > PS: Note that other XPath searches work in this document, for example:
    > > "//table/tr[@class = 'rs0']/th/span[@class = 's1']"

    >
    > That is rather odd, with the namespace declaration being present on the
    > root element. How do you parse the document, which XPath API do you use?
    > Is that XPath over HTML, as some browsers like Mozilla or Opera provide?


    I am using NSXML under Cocoa/Objective-C (Mac OS X).

    Once again, thank you for your help. I didn't even know XPath existed
    until a few days ago. :(
     
    CxT, May 8, 2009
    #7
  8. CxT wrote:
    > On May 8, 8:52 am, Martin Honnen <> wrote:
    >
    >> So that is XHTML and that means, if the document is parsed by an XML
    >> parser, that you need to bind a prefix to the namespace URI and use that
    >> prefix in your XPath expressions.

    >
    > Could you please provide an example of what such an expression would
    > look like?


    The XPath API needs to provide a way to bind a prefix to a namespace
    URI. Assuming we have bound the prefix 'xhtml' to
    'http://www.w3.org/1999/xhtml' any XPath expression would then use the
    prefix to qualify element names e.g.
    /xhtml:html/xhtml:body//xhtml:table


    > I am using NSXML under Cocoa/Objective-C (Mac OS X).


    I don't know that one. The documentation
    http://developer.apple.com/documentation/Cocoa/Conceptual/NSXML_Concepts/NSXML.html
    says it supports both XQuery and XPath.
    If it really supports XQuery 1.0 then you might be able to avoid the
    prefix and do

    declare default element namespace "http://www.w3.org/1999/xhtml";
    /html/body//table

    But that all does not explain why some XPath expressions worked without
    any prefix and other did not work. I am afraid you need to find some
    forum/newsgroup/mailing list dealing with NSXML, unless someone here
    comes along that knows NSXML.

    I tried that URL you provided with Saxon 9's XQuery implementation but
    it reports an XML parse error so it is not even able to build a data
    model from that document.

    --

    Martin Honnen
    http://msmvps.com/blogs/martin_honnen/
     
    Martin Honnen, May 8, 2009
    #8
  9. CxT wrote:
    > Could you please provide an example of what such an expression would
    > look like?


    The expression needs to use namespace prefixes, and you need to provide
    a namespace context to the API. Details of the latter depend on what API
    you're using.

    It is possible to do this all within the XPath, but EXTREMELY ugly --
    you need to wildcard the namespace and then use a predicate to specify it.
    /*:foo[namespace()="http://whatever"]
    Since this is uncommon, processors may be slower interpreting this
    version than the prefix-and-bindings version.
     
    Joe Kesselman, May 8, 2009
    #9
  10. Quick reminder: The default namespace (xmlns=) is *not* applied to
    attributes. If you actually want an attribute name to be namespaced, you
    must use a prefix on it.
     
    Joe Kesselman, May 8, 2009
    #10
  11. CxT

    CxT Guest

    On May 8, 10:07 am, Joe Kesselman <>
    wrote:
    > Quick reminder: The default namespace (xmlns=) is *not* applied to
    > attributes. If you actually want an attribute name to be namespaced, you
    > must use a prefix on it.


    This is the query that ended up working... I don't know why:

    "//td[. = 'Earnings/Share ']"

    Thanks for all of the help!!
    CxT
     
    CxT, May 8, 2009
    #11
  12. CxT

    Peter Flynn Guest

    CxT wrote:
    > On May 8, 10:07 am, Joe Kesselman <>
    > wrote:
    >> Quick reminder: The default namespace (xmlns=) is *not* applied to
    >> attributes. If you actually want an attribute name to be namespaced, you
    >> must use a prefix on it.

    >
    > This is the query that ended up working... I don't know why:
    >
    > "//td[. = 'Earnings/Share ']"


    I was just about to post that the data might have intrusive spaces: it's
    a common misapprehension by data-providers that leading and trailing
    spaces get trimmed by applications, because that's what browsers do with
    plain ol' HTML. Handling of white-space in XML is defined differently,
    so it's best to assume spaces are significant.

    When I'm scraping data from [X]HTML and need to reference the character
    data content of an element, I tend to normalise it, eg

    //td[normalize-space(.)='Earnings/Share']

    ///Peter
    --
    XML FAQ: http://xml.silmaril.ie/
     
    Peter Flynn, May 13, 2009
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Marvin_123456

    "Memory leak" in javax.xml.xpath.XPath

    Marvin_123456, Jul 29, 2005, in forum: Java
    Replies:
    4
    Views:
    2,027
    jan V
    Jul 29, 2005
  2. Alastair Cameron
    Replies:
    1
    Views:
    7,519
    SQL Server Development Team [MSFT]
    Jul 8, 2003
  3. Anna
    Replies:
    0
    Views:
    570
  4. David Gordon

    xpath query query

    David Gordon, May 18, 2005, in forum: XML
    Replies:
    2
    Views:
    828
    David Gordon
    May 18, 2005
  5. Renato Veneroso

    What is the bes Ruby's book for beginners?

    Renato Veneroso, May 15, 2008, in forum: Ruby
    Replies:
    17
    Views:
    205
    Markus Arike
    May 26, 2008
Loading...

Share This Page