Bes XPath query?

C

CxT

Hi,

I need to be able to find the value (5.56) in the td that is a
sibling
of the "Earnings/Share" td. I'm not sure how I go about using XPath
to search for that specific string.
Any guidance would be much appreciated.
CxT
<table>
<tr>
<td>Beta</td>
<td class="cl1">1.66</td>
</tr>
<tr>
<td>Dividend &amp; Yield</td>
<td class="cl1">NA</td>
</tr>
<tr>
<td>Earnings/Share</td>
<td class="cl1">5.56</td>
</tr>
Note: the above comes from a very long html file (this is just a
snippet).
 
M

Martin Honnen

CxT said:
Hi,

I need to be able to find the value (5.56) in the td that is a
sibling
of the "Earnings/Share" td. I'm not sure how I go about using XPath
to search for that specific string.
Any guidance would be much appreciated.
CxT
<table>
<tr>
<td>Beta</td>
<td class="cl1">1.66</td>
</tr>
<tr>
<td>Dividend &amp; Yield</td>
<td class="cl1">NA</td>
</tr>
<tr>
<td>Earnings/Share</td>
<td class="cl1">5.56</td>
</tr>

//table/tr[td[. = 'Earnings/Share']]/td[@class = 'cl1']
 
C

CxT

CxT said:
I need to be able to find the value (5.56) in the td that is a
sibling
of the "Earnings/Share" td.  I'm not sure how I go about using XPath
to search for that specific string.
Any guidance would be much appreciated.
CxT
                    <table>
                      <tr>
                        <td>Beta</td>
                        <td class="cl1">1.66</td>
                      </tr>
                      <tr>
                        <td>Dividend &amp; Yield</td>
                        <td class="cl1">NA</td>
                      </tr>
                      <tr>
                        <td>Earnings/Share</td>
                        <td class="cl1">5.56</td>
                      </tr>

   //table/tr[td[. = 'Earnings/Share']]/td[@class = 'cl1']

Hmmm... that specific query is returning 0 elements. What would I do
if I wanted to search for an element that contains just the text
("Earnings/Share")? I thought I could do "//td[. = 'Earnings/Share']"
but that isn't returning any hits either. Very confused.

Thanks for any additional guidance.
CxT
 
M

Martin Honnen

CxT said:
<table>
<tr>
<td>Beta</td>
<td class="cl1">1.66</td>
</tr>
<tr>
<td>Dividend &amp; Yield</td>
<td class="cl1">NA</td>
</tr>
<tr>
<td>Earnings/Share</td>
<td class="cl1">5.56</td>
</tr>
//table/tr[td[. = 'Earnings/Share']]/td[@class = 'cl1']

Hmmm... that specific query is returning 0 elements. What would I do
if I wanted to search for an element that contains just the text
("Earnings/Share")? I thought I could do "//td[. = 'Earnings/Share']"
but that isn't returning any hits either. Very confused.

Are you trying to use XPath against an XHTML document? In XHTML elements
are in the namespace http://www.w3.org/1999/xhtml and '//td' (in XPath
1.) always selects elements in no namespace so that could be one reason
why the expressions do not find any element.

Other than that you will need to provide some context as for how exactly
you use XPath.
 
C

CxT

CxT said:
                    <table>
                      <tr>
                        <td>Beta</td>
                        <td class="cl1">1.66</td>
                      </tr>
                      <tr>
                        <td>Dividend &amp; Yield</td>
                        <td class="cl1">NA</td>
                      </tr>
                      <tr>
                        <td>Earnings/Share</td>
                        <td class="cl1">5.56</td>
                      </tr>
   //table/tr[td[. = 'Earnings/Share']]/td[@class = 'cl1']
Hmmm... that specific query is returning 0 elements.  What would I do
if I wanted to search for an element that contains just the text
("Earnings/Share")?  I thought I could do "//td[. = 'Earnings/Share']"
but that isn't returning any hits either.  Very confused.

Are you trying to use XPath against an XHTML document? In XHTML elements
are in the namespacehttp://www.w3.org/1999/xhtmland '//td' (in XPath
1.) always selects elements in no namespace so that could be one reason
why the expressions do not find any element.

Other than that you will need to provide some context as for how exactly
you use XPath.

I'm using XPath to search through the following URL:

http://moneycentral.msn.com/detail/stock_quote?Symbol=aapl&getquote=Get+Quote

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-
US" is present at the top of the file.

If I can't do a more structured search can I still use XPath to
perform a simple text search and then obtain the node for where I find
the text?

Thank you so much for your help,
CxT

PS: Note that other XPath searches work in this document, for example:
"//table/tr[@class = 'rs0']/th/span[@class = 's1']"
 
M

Martin Honnen

CxT said:
I'm using XPath to search through the following URL:

http://moneycentral.msn.com/detail/stock_quote?Symbol=aapl&getquote=Get+Quote

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-
US" is present at the top of the file.

So that is XHTML and that means, if the document is parsed by an XML
parser, that you need to bind a prefix to the namespace URI and use that
prefix in your XPath expressions.

PS: Note that other XPath searches work in this document, for example:
"//table/tr[@class = 'rs0']/th/span[@class = 's1']"

That is rather odd, with the namespace declaration being present on the
root element. How do you parse the document, which XPath API do you use?
Is that XPath over HTML, as some browsers like Mozilla or Opera provide?
 
C

CxT

So that is XHTML and that means, if the document is parsed by an XML
parser, that you need to bind a prefix to the namespace URI and use that
prefix in your XPath expressions.

Could you please provide an example of what such an expression would
look like?
PS: Note that other XPath searches work in this document, for example:
"//table/tr[@class = 'rs0']/th/span[@class = 's1']"

That is rather odd, with the namespace declaration being present on the
root element. How do you parse the document, which XPath API do you use?
Is that XPath over HTML, as some browsers like Mozilla or Opera provide?

I am using NSXML under Cocoa/Objective-C (Mac OS X).

Once again, thank you for your help. I didn't even know XPath existed
until a few days ago. :(
 
M

Martin Honnen

CxT said:
Could you please provide an example of what such an expression would
look like?

The XPath API needs to provide a way to bind a prefix to a namespace
URI. Assuming we have bound the prefix 'xhtml' to
'http://www.w3.org/1999/xhtml' any XPath expression would then use the
prefix to qualify element names e.g.
/xhtml:html/xhtml:body//xhtml:table

I am using NSXML under Cocoa/Objective-C (Mac OS X).

I don't know that one. The documentation
http://developer.apple.com/documentation/Cocoa/Conceptual/NSXML_Concepts/NSXML.html
says it supports both XQuery and XPath.
If it really supports XQuery 1.0 then you might be able to avoid the
prefix and do

declare default element namespace "http://www.w3.org/1999/xhtml";
/html/body//table

But that all does not explain why some XPath expressions worked without
any prefix and other did not work. I am afraid you need to find some
forum/newsgroup/mailing list dealing with NSXML, unless someone here
comes along that knows NSXML.

I tried that URL you provided with Saxon 9's XQuery implementation but
it reports an XML parse error so it is not even able to build a data
model from that document.
 
J

Joe Kesselman

CxT said:
Could you please provide an example of what such an expression would
look like?

The expression needs to use namespace prefixes, and you need to provide
a namespace context to the API. Details of the latter depend on what API
you're using.

It is possible to do this all within the XPath, but EXTREMELY ugly --
you need to wildcard the namespace and then use a predicate to specify it.
/*:foo[namespace()="http://whatever"]
Since this is uncommon, processors may be slower interpreting this
version than the prefix-and-bindings version.
 
J

Joe Kesselman

Quick reminder: The default namespace (xmlns=) is *not* applied to
attributes. If you actually want an attribute name to be namespaced, you
must use a prefix on it.
 
C

CxT

Quick reminder: The default namespace (xmlns=) is *not* applied to
attributes. If you actually want an attribute name to be namespaced, you
must use a prefix on it.

This is the query that ended up working... I don't know why:

"//td[. = 'Earnings/Share ']"

Thanks for all of the help!!
CxT
 
P

Peter Flynn

CxT said:
Quick reminder: The default namespace (xmlns=) is *not* applied to
attributes. If you actually want an attribute name to be namespaced, you
must use a prefix on it.

This is the query that ended up working... I don't know why:

"//td[. = 'Earnings/Share ']"

I was just about to post that the data might have intrusive spaces: it's
a common misapprehension by data-providers that leading and trailing
spaces get trimmed by applications, because that's what browsers do with
plain ol' HTML. Handling of white-space in XML is defined differently,
so it's best to assume spaces are significant.

When I'm scraping data from [X]HTML and need to reference the character
data content of an element, I tend to normalise it, eg

//td[normalize-space(.)='Earnings/Share']

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top