Bes XPath query?

CxT · May 8, 2009

Hi,

I need to be able to find the value (5.56) in the td that is a
sibling
of the "Earnings/Share" td. I'm not sure how I go about using XPath
to search for that specific string.
Any guidance would be much appreciated.
CxT
<table>
<tr>
<td>Beta</td>
<td class="cl1">1.66</td>
</tr>
<tr>
<td>Dividend & Yield</td>
<td class="cl1">NA</td>
</tr>
<tr>
<td>Earnings/Share</td>
<td class="cl1">5.56</td>
</tr>
Note: the above comes from a very long html file (this is just a
snippet).

Martin Honnen · May 8, 2009

CxT said:
Hi,

I need to be able to find the value (5.56) in the td that is a
sibling
of the "Earnings/Share" td. I'm not sure how I go about using XPath
to search for that specific string.
Any guidance would be much appreciated.
CxT
<table>
<tr>
<td>Beta</td>
<td class="cl1">1.66</td>
</tr>
<tr>
<td>Dividend & Yield</td>
<td class="cl1">NA</td>
</tr>
<tr>
<td>Earnings/Share</td>
<td class="cl1">5.56</td>
</tr>

//table/tr[td[. = 'Earnings/Share']]/td[@class = 'cl1']

CxT · May 8, 2009

CxT said:
CxT said:

Hi,

Click to expand...

I need to be able to find the value (5.56) in the td that is a
sibling
of the "Earnings/Share" td. I'm not sure how I go about using XPath
to search for that specific string.
Any guidance would be much appreciated.
CxT
<table>
<tr>
<td>Beta</td>
<td class="cl1">1.66</td>
</tr>
<tr>
<td>Dividend & Yield</td>
<td class="cl1">NA</td>
</tr>
<tr>
<td>Earnings/Share</td>
<td class="cl1">5.56</td>
</tr>

Click to expand...

//table/tr[td[. = 'Earnings/Share']]/td[@class = 'cl1']

Hmmm... that specific query is returning 0 elements. What would I do
if I wanted to search for an element that contains just the text
("Earnings/Share")? I thought I could do "//td[. = 'Earnings/Share']"
but that isn't returning any hits either. Very confused.

Thanks for any additional guidance.
CxT

Martin Honnen · May 8, 2009

CxT said:
<table>
<tr>
<td>Beta</td>
<td class="cl1">1.66</td>
</tr>
<tr>
<td>Dividend & Yield</td>
<td class="cl1">NA</td>
</tr>
<tr>
<td>Earnings/Share</td>
<td class="cl1">5.56</td>
</tr>

Click to expand...

//table/tr[td[. = 'Earnings/Share']]/td[@class = 'cl1']

Click to expand...

Hmmm... that specific query is returning 0 elements. What would I do
if I wanted to search for an element that contains just the text
("Earnings/Share")? I thought I could do "//td[. = 'Earnings/Share']"
but that isn't returning any hits either. Very confused.

Are you trying to use XPath against an XHTML document? In XHTML elements
are in the namespace http://www.w3.org/1999/xhtml and '//td' (in XPath
1.) always selects elements in no namespace so that could be one reason
why the expressions do not find any element.

Other than that you will need to provide some context as for how exactly
you use XPath.

CxT · May 8, 2009

CxT said:
CxT said:

<table>
<tr>
<td>Beta</td>
<td class="cl1">1.66</td>
</tr>
<tr>
<td>Dividend & Yield</td>
<td class="cl1">NA</td>
</tr>
<tr>
<td>Earnings/Share</td>
<td class="cl1">5.56</td>
</tr>
//table/tr[td[. = 'Earnings/Share']]/td[@class = 'cl1']

Click to expand...

Hmmm... that specific query is returning 0 elements. What would I do
if I wanted to search for an element that contains just the text
("Earnings/Share")? I thought I could do "//td[. = 'Earnings/Share']"
but that isn't returning any hits either. Very confused.

Click to expand...

Are you trying to use XPath against an XHTML document? In XHTML elements
are in the namespacehttp://www.w3.org/1999/xhtmland '//td' (in XPath
1.) always selects elements in no namespace so that could be one reason
why the expressions do not find any element.

Other than that you will need to provide some context as for how exactly
you use XPath.

I'm using XPath to search through the following URL:

http://moneycentral.msn.com/detail/stock_quote?Symbol=aapl&getquote=Get+Quote

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-
US" is present at the top of the file.

If I can't do a more structured search can I still use XPath to
perform a simple text search and then obtain the node for where I find
the text?

Thank you so much for your help,
CxT

PS: Note that other XPath searches work in this document, for example:
"//table/tr[@class = 'rs0']/th/span[@class = 's1']"

Martin Honnen · May 8, 2009

CxT said:
I'm using XPath to search through the following URL:

http://moneycentral.msn.com/detail/stock_quote?Symbol=aapl&getquote=Get+Quote

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-
US" is present at the top of the file.

So that is XHTML and that means, if the document is parsed by an XML
parser, that you need to bind a prefix to the namespace URI and use that
prefix in your XPath expressions.

PS: Note that other XPath searches work in this document, for example:
"//table/tr[@class = 'rs0']/th/span[@class = 's1']"

That is rather odd, with the namespace declaration being present on the
root element. How do you parse the document, which XPath API do you use?
Is that XPath over HTML, as some browsers like Mozilla or Opera provide?

CxT · May 8, 2009

So that is XHTML and that means, if the document is parsed by an XML
parser, that you need to bind a prefix to the namespace URI and use that
prefix in your XPath expressions.

Could you please provide an example of what such an expression would
look like?

PS: Note that other XPath searches work in this document, for example:
"//table/tr[@class = 'rs0']/th/span[@class = 's1']"

Click to expand...

That is rather odd, with the namespace declaration being present on the
root element. How do you parse the document, which XPath API do you use?
Is that XPath over HTML, as some browsers like Mozilla or Opera provide?

I am using NSXML under Cocoa/Objective-C (Mac OS X).

Once again, thank you for your help. I didn't even know XPath existed
until a few days ago.

Martin Honnen · May 8, 2009

CxT said:
Could you please provide an example of what such an expression would
look like?

The XPath API needs to provide a way to bind a prefix to a namespace
URI. Assuming we have bound the prefix 'xhtml' to
'http://www.w3.org/1999/xhtml' any XPath expression would then use the
prefix to qualify element names e.g.
/xhtml:html/xhtml:body//xhtml:table

I am using NSXML under Cocoa/Objective-C (Mac OS X).

I don't know that one. The documentation
http://developer.apple.com/documentation/Cocoa/Conceptual/NSXML_Concepts/NSXML.html
says it supports both XQuery and XPath.
If it really supports XQuery 1.0 then you might be able to avoid the
prefix and do

declare default element namespace "http://www.w3.org/1999/xhtml";
/html/body//table

But that all does not explain why some XPath expressions worked without
any prefix and other did not work. I am afraid you need to find some
forum/newsgroup/mailing list dealing with NSXML, unless someone here
comes along that knows NSXML.

I tried that URL you provided with Saxon 9's XQuery implementation but
it reports an XML parse error so it is not even able to build a data
model from that document.

Joe Kesselman · May 8, 2009

CxT said:
Could you please provide an example of what such an expression would
look like?

The expression needs to use namespace prefixes, and you need to provide
a namespace context to the API. Details of the latter depend on what API
you're using.

It is possible to do this all within the XPath, but EXTREMELY ugly --
you need to wildcard the namespace and then use a predicate to specify it.
/*:foo[namespace()="http://whatever"]
Since this is uncommon, processors may be slower interpreting this
version than the prefix-and-bindings version.

Joe Kesselman · May 8, 2009

Quick reminder: The default namespace (xmlns=) is *not* applied to
attributes. If you actually want an attribute name to be namespaced, you
must use a prefix on it.

CxT · May 8, 2009

Quick reminder: The default namespace (xmlns=) is *not* applied to
attributes. If you actually want an attribute name to be namespaced, you
must use a prefix on it.

This is the query that ended up working... I don't know why:

"//td[. = 'Earnings/Share ']"

Thanks for all of the help!!
CxT

Peter Flynn · May 13, 2009

CxT said:
Quick reminder: The default namespace (xmlns=) is *not* applied to
attributes. If you actually want an attribute name to be namespaced, you
must use a prefix on it.

Click to expand...

This is the query that ended up working... I don't know why:

"//td[. = 'Earnings/Share ']"

I was just about to post that the data might have intrusive spaces: it's
a common misapprehension by data-providers that leading and trailing
spaces get trimmed by applications, because that's what browsers do with
plain ol' HTML. Handling of white-space in XML is defined differently,
so it's best to assume spaces are significant.

When I'm scraping data from [X]HTML and need to reference the character
data content of an element, I tend to normalise it, eg

//td[normalize-space(.)='Earnings/Share']

///Peter

Xpath Relational Query (like a join)	3	Aug 24, 2006
XPath searching	4	Apr 13, 2009
ASP.NET button control OnClick event doesn't fire	2	Jun 18, 2009
python/xpath question...	3	Jul 6, 2006
XSL/XPath Problems	0	Jun 24, 2003
Pop up window problem with dynamic textboxes and query string	0	Jan 30, 2006
Unexpected end of file looking for </HeaderTemplate>	1	Aug 17, 2004
Noob -- BC30002: Type 'Table' is not defined.	7	Aug 12, 2004

Bes XPath query?

CxT

Martin Honnen

CxT

Martin Honnen

CxT

Martin Honnen

CxT

Martin Honnen

Joe Kesselman

Joe Kesselman

CxT

Peter Flynn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads