XPath searching

C

CxT

Hello,

I am very new to XPath. I *have* read through several online
tutorials though.

I have, what I think to be, a very basic question:

How do I find something specific in an HTML document using XPath?
What I mean is... I am looking for a specific <div class="foo"...
which might be nested 100 levels deep - I am trying to pull a stock
quote from http://moneycentral.msn.com/detail/stock_quote?Symbol=IBM.

I'd like to use something like "*div[@class='foo']" but that doesn't
seem to be valid.

Any guidance would be much appreciated.

Thanks,
CxT
 
M

Martin Honnen

CxT said:
How do I find something specific in an HTML document using XPath?

XPath is first of all defined on XML documents, not an HTML documents.
Depending on the implementation there are however ways to parse HTML
documents into a suitable data structure for XPath. Which XPath
implementation do you use?
What I mean is... I am looking for a specific <div class="foo"...
which might be nested 100 levels deep - I am trying to pull a stock
quote from http://moneycentral.msn.com/detail/stock_quote?Symbol=IBM.

I'd like to use something like "*div[@class='foo']" but that doesn't
seem to be valid.

//div

would select 'div' elements at all levels and then you can add your
predicate

//div[@class = 'foo']

and should filter out only those 'div' elements where the class
attribute has the value 'foo'.
 
C

CxT

http://moneycentral.msn.com/detail/stock_quote?Symbol=IBM.
I'd like to use something like "*div[@class='foo']" but that doesn't
seem to be valid.

//div

would select 'div' elements at all levels and then you can add your
predicate

//div[@class = 'foo']

and should filter out only those 'div' elements where the class
attribute has the value 'foo'.

That definitely seems to work Martin - thank you!

Here is the block that receive:

<div class="bd">
<table>
<tr>
<td id="detail">
<table>
<tr class="rs0">
<th colspan="4"><span class="s1">119.57</span>
&nbsp;unch <a href="http://moneycentral.msn.com/investor/invsub/
advisor/advisor.asp?symbol=AAPL" class="fyistyle">fyi</a>&nbsp;&nbsp;</
th>
</tr>

I want to access that value of the span (class=s1) - 119.57. Do I
have to work my way down from each level (from the div)? For example
something like: "//div[@class = 'bd']/table/tr/td[@class = 'detail'" -
which again doesn't seem to be valid.

Thanks you for any guidance... once I understand how to iterate over
paths I think I should be good to do.

CxT
 
M

Martin Honnen

CxT said:
Here is the block that receive:

<div class="bd">
<table>
<tr>
<td id="detail">
<table>
<tr class="rs0">
<th colspan="4"><span class="s1">119.57</span>
&nbsp;unch <a href="http://moneycentral.msn.com/investor/invsub/
advisor/advisor.asp?symbol=AAPL" class="fyistyle">fyi</a>&nbsp;&nbsp;</
th>
</tr>

I want to access that value of the span (class=s1) - 119.57. Do I
have to work my way down from each level (from the div)? For example
something like: "//div[@class = 'bd']/table/tr/td[@class = 'detail'" -
which again doesn't seem to be valid.

A closing square bracket is missing:
//div[@class = 'bd']/table/tr/td[@class = 'detail']
is certainly a syntactically correct XPath expression.

On the other hand SGML/HTML parsing rules might insert an implied tbody so
//div[@class = 'bd']/table/tbody/tr/td[@class = 'detail']
could also be possible, depending on the parser used for parsing the HTML.
 
J

Johannes Koch

Martin said:
CxT said:
Here is the block that receive:

<div class="bd">
<table>
<tr>
<td id="detail">
<table>
<tr class="rs0">
<th colspan="4"><span class="s1">119.57</span>
&nbsp;unch <a href="http://moneycentral.msn.com/investor/invsub/
advisor/advisor.asp?symbol=AAPL" class="fyistyle">fyi</a>&nbsp;&nbsp;</
th>
</tr>

I want to access that value of the span (class=s1) - 119.57. Do I
have to work my way down from each level (from the div)? For example
something like: "//div[@class = 'bd']/table/tr/td[@class = 'detail'" -
which again doesn't seem to be valid.

A closing square bracket is missing:
//div[@class = 'bd']/table/tr/td[@class = 'detail']
is certainly a syntactically correct XPath expression.

On the other hand SGML/HTML parsing rules might insert an implied tbody so
//div[@class = 'bd']/table/tbody/tr/td[@class = 'detail']
could also be possible, depending on the parser used for parsing the HTML.

Additionally, in the code fragment the td element has an _id_ attribute
with value "detail", not a _class_ attribute with that value.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top