how to write this xpath?

P

Pen Ttt

there is an html file
<table>
<tr>
<td>ok
<strong>Sep 10</strong>
| <a href="ttt">Oct 10</a>
| <a href="kkk">Dec 10</a>
<table>
<tr>
<td>
123
</td>
<td>
567
</td>
</tr>
</table>
</td>
</tr>
</table>
when i open it with firefox,the output is :
ok Sep 10 | Oct 10 | Dec 10
123 567
what i want to get is
ok Sep 10 | Oct 10 | Dec 10
here is my codes
require 'rubygems'
require 'nokogiri'
web='/home/test'
doc = Nokogiri::HTML.parse(open(web))
data=doc.xpath('/html/body/table/tr/td')
puts data
i get
<td>ok
<strong>Sep 10</strong>
| <a href="ttt">Oct 10</a>
| <a href="kkk">Dec 10</a>
<table><tr>
<td>
123
</td>
<td>
567
</td>
</tr></table>
</td>
how can i get :
ok
<strong>Sep 10</strong>
| <a href="ttt">Oct 10</a>
| <a href="kkk">Dec 10</a>
 
R

Robert Klemme

there is an html file
<table>
<tr>
<td>ok
<strong>Sep 10</strong>
|<a href="ttt">Oct 10</a>
|<a href="kkk">Dec 10</a>
<table>
<tr>
<td>
123
</td>
<td>
567
</td>
</tr>
</table>
</td>
</tr>
</table>
when i open it with firefox,the output is :
ok Sep 10 | Oct 10 | Dec 10
123 567
what i want to get is
ok Sep 10 | Oct 10 | Dec 10
here is my codes
require 'rubygems'
require 'nokogiri'
web='/home/test'
doc = Nokogiri::HTML.parse(open(web))
data=doc.xpath('/html/body/table/tr/td')
puts data
i get
<td>ok
<strong>Sep 10</strong>
|<a href="ttt">Oct 10</a>
|<a href="kkk">Dec 10</a>
<table><tr>
<td>
123
</td>
<td>
567
</td>
</tr></table>
</td>
how can i get :
ok
<strong>Sep 10</strong>
|<a href="ttt">Oct 10</a>
|<a href="kkk">Dec 10</a>

You want the first row? Try

'/html/body/table/tr[1]/td'

See also
http://www.zvon.org/xxl/XPathTutorial/General/examples.html
http://www.w3schools.com/xpath/

Cheers

robert
 
R

Robert Klemme

think for your help,but your method can't work ,i have made a try.

Ah, I see - it's more complicated. I think it should be one of these
depending on whether you want the text nodes:

/html/body/table[1]/tr[1]/td[1]/(.|strong|a)
/html/body/table[1]/tr[1]/td[1]/(.|strong|a)/text()

but I cannot test it right now since I don't have Nokogiri on this machine.

Kind regards

robert
 
R

Robert Klemme

think for your help,but your method can't work ,i have made a try.

Ah, I see - it's more complicated. =A0I think it should be one of these
depending on whether you want the text nodes:

/html/body/table[1]/tr[1]/td[1]/(.|strong|a)
/html/body/table[1]/tr[1]/td[1]/(.|strong|a)/text()

but I cannot test it right now since I don't have Nokogiri on this machin=
e.

Now this looks awful but apparently it works:

$doc.xpath '/html/body/table[1]/tr[1]/td[1]/text()|/html/body/table[1]/tr[1=
]/td[1]/strong/text()|/html/body/table[1]/tr[1]/td[1]/a/text()'

If you want the elements you need to remove all "/text()" from the above.

The difficult thing here is that you want to select only a portion of
the child nodes of /table/tr/td.

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top