select tr>3 with nokogiri

P

Pen Ttt

i want to get row which it contains more than 3 columns
how to write xpath with nokogiri


require 'rubygems'
require 'nokogiri'
item='sometext'
doc = Nokogiri::HTML.parse(open(item))
data=doc.xpath('/html/body/table/tr[@td.size>3]')
puts data
it can not run , help and advices appreciated.
 
P

Pen Ttt

for example,
table1:
<table >
<tr>
<td>kk</td>
</tr>
<tr>
<td > 1 </td>
<td > 2 </td>
</tr>
<tr>
<td > 3 </td>
<td > 4 </td>
</tr>
<tr>
<td>qq</td>
</tr>
</table>

table2:
<table >
<tr>
<td>kk</td>
</tr>
<tr>
<td > 1 </td>
<td > 2 </td>
</tr>
<tr>
<td > 3 </td>
<td > 4 </td>
</tr>
</table>

i want to get table2 from table1,to get row which contains more then
one column,how to do it with nokogiri??
 
A

Ammar Ali

Use count(), like:

document.xpath("//*[count(td)=3D2]")

You can also select children at certain offsets with td:nth-child(N)
or position(N)

HTH,
Ammar
 
P

Pen Ttt

p1
data=doc.xpath('/table/tr/*[count(td)>1]')
puts data
p2
data=doc.xpath('/table/tr/td[count(td)>1]')
puts data
none of them is right,why can i get nothing?
 
P

Pen Ttt

document.xpath("//*[count(td)=2]") is right,but i want to know
p1
data=doc.xpath('/table/tr/*[count(td)>1]')
puts data
p2
data=doc.xpath('/table/tr/td[count(td)>1]')
puts data
how to fix p1\p2?
 
A

Ammar Ali

If the table is not the root or directly inside the root, you need 2
"/" in the beginning. The count function applies to the tr, not the
td, so you don't need the "*" in p1, or the td in p2. Try this:

doc.xpath('//table/tr[count(td)>1]')

Good Luck,
Ammar
 
K

Ken Bloom

i want to get row which it contains more than 3 columns how to write
xpath with nokogiri


require 'rubygems'
require 'nokogiri'
item='sometext'
doc = Nokogiri::HTML.parse(open(item))
data=doc.xpath('/html/body/table/tr[@td.size>3]') puts data
it can not run , help and advices appreciated.

doc.xpath('/html/body/table/tr[count(td)>3]')
 
P

Pen Ttt

think Ammar ,one problem vanish,another occur.
here is the content of /home/pt/mytest:

<table>
<tr bgcolor="F3F3F3">
<td align="right" width="240">reportdate</td>
<td align="right" width="65" class="tickerSm">10/31/09</td>
<td align="right" width="65" class="tickerSm">10/31/08</td>
<td align="right" width="65" class="tickerSm">10/31/07</td>
<td align="right" width="65" class="tickerSm">10/31/06</td>
<td align="right" width="65" class="tickerSm">10/31/05</td>
</tr>
<tr bgcolor="ffffff">
<td class="tickerSm">Cash &amp; Equivalents</td>
<td align="right" class="ticker">2,493</td>
<td align="right" class="ticker">1,429</td>
<td align="right" class="ticker">1,826</td>
<td align="right" class="ticker">2,262</td>
<td align="right" class="ticker">2,251</td>
</tr>
<tr bgcolor="ffffff">
<td class="ticker">Receivables</td>
<td align="right" class="ticker">595</td>
<td align="right" class="ticker">770</td>
<td align="right" class="ticker">735</td>
<td align="right" class="ticker">692</td>
<td align="right" class="ticker">753</td>
</tr>
<tr bgcolor="ffffff">
<td class="ticker">Notes Receivable</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
</tr>
<tr bgcolor="ffffff">
<td class="ticker">Inventories</td>
<td align="right" class="ticker">552</td>
<td align="right" class="ticker">646</td>
<td align="right" class="ticker">643</td>
<td align="right" class="ticker">627</td>
<td align="right" class="ticker">722</td>
</tr>
<table>

what i want to get is :
<tr bgcolor="ffffff">
<td class="ticker">Receivables</td>
<td align="right" class="ticker">595</td>
<td align="right" class="ticker">770</td>
<td align="right" class="ticker">735</td>
<td align="right" class="ticker">692</td>
<td align="right" class="ticker">753</td>
</tr>
<tr bgcolor="ffffff">
<td class="ticker">Notes Receivable</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
</tr>
<tr bgcolor="ffffff">
<td class="ticker">Inventories</td>
<td align="right" class="ticker">552</td>
<td align="right" class="ticker">646</td>
<td align="right" class="ticker">643</td>
<td align="right" class="ticker">627</td>
<td align="right" class="ticker">722</td>
</tr>

p1:
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::HTML.parse(open('/home/pt/mytest'))
result=doc.xpath('//table/tr[td[@class="ticker"]]')
puts result

i can get what i want with p1

p2:
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::HTML.parse(open('/home/pt/mytest'))
result=doc.xpath('//table/tr[td[not(@class="tickerSm")]]')
puts result

why can't i get what i want with p2??
how to fix p2?
think for your help.
 
P

Pen Ttt

i found some secret,if my file /home/pt/mytest was changed into:
<table>
<tr bgcolor="F3F3F3">
<td align="right" width="240" class="tickerSm">reportdate</td>
<td align="right" width="65" class="tickerSm">10/31/09</td>
<td align="right" width="65" class="tickerSm">10/31/08</td>
<td align="right" width="65" class="tickerSm">10/31/07</td>
<td align="right" width="65" class="tickerSm">10/31/06</td>
<td align="right" width="65" class="tickerSm">10/31/05</td>
</tr>
<tr bgcolor="ffffff">
<td class="tickerSm">Cash &amp; Equivalents</td>
<td align="right" class="ticker">2,493</td>
<td align="right" class="ticker">1,429</td>
<td align="right" class="ticker">1,826</td>
<td align="right" class="ticker">2,262</td>
<td align="right" class="ticker">2,251</td>
</tr>
<tr bgcolor="ffffff">
<td class="ticker">Receivables</td>
<td align="right" class="ticker">595</td>
<td align="right" class="ticker">770</td>
<td align="right" class="ticker">735</td>
<td align="right" class="ticker">692</td>
<td align="right" class="ticker">753</td>
</tr>
<tr bgcolor="ffffff">
<td class="ticker">Notes Receivable</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
</tr>
<tr bgcolor="ffffff">
<td class="ticker">Inventories</td>
<td align="right" class="ticker">552</td>
<td align="right" class="ticker">646</td>
<td align="right" class="ticker">643</td>
<td align="right" class="ticker">627</td>
<td align="right" class="ticker">722</td>
</tr>
<table>

with the code ,
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::HTML.parse(open('/home/pt/mytest'))
result=doc.xpath('//table/tr[*[not(@class="tickerSm")]]')
puts result

what i can get is:
<tr bgcolor="ffffff">
<td class="tickerSm">Cash &amp; Equivalents</td>
<td align="right" class="ticker">2,493</td>
<td align="right" class="ticker">1,429</td>
<td align="right" class="ticker">1,826</td>
<td align="right" class="ticker">2,262</td>
<td align="right" class="ticker">2,251</td>
</tr>
<tr bgcolor="ffffff">
<td class="ticker">Receivables</td>
<td align="right" class="ticker">595</td>
<td align="right" class="ticker">770</td>
<td align="right" class="ticker">735</td>
<td align="right" class="ticker">692</td>
<td align="right" class="ticker">753</td>
</tr>
<tr bgcolor="ffffff">
<td class="ticker">Notes Receivable</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
<td align="right" class="ticker">0</td>
</tr>
<tr bgcolor="ffffff">
<td class="ticker">Inventories</td>
<td align="right" class="ticker">552</td>
<td align="right" class="ticker">646</td>
<td align="right" class="ticker">643</td>
<td align="right" class="ticker">627</td>
<td align="right" class="ticker">722</td>
</tr>

the row can not be selected by my code,
<tr bgcolor="F3F3F3">
<td align="right" width="240" class="tickerSm">reportdate</td>
<td align="right" width="65" class="tickerSm">10/31/09</td>
<td align="right" width="65" class="tickerSm">10/31/08</td>
<td align="right" width="65" class="tickerSm">10/31/07</td>
<td align="right" width="65" class="tickerSm">10/31/06</td>
<td align="right" width="65" class="tickerSm">10/31/05</td>
</tr>
<tr bgcolor="ffffff">

but how to delete row with xpath?

<tr bgcolor="ffffff">
<td class="tickerSm">Cash &amp; Equivalents</td>
<td align="right" class="ticker">2,493</td>
<td align="right" class="ticker">1,429</td>
<td align="right" class="ticker">1,826</td>
<td align="right" class="ticker">2,262</td>
<td align="right" class="ticker">2,251</td>
</tr>
it can't work :
xpath('//table/tr[*[not(@class="tickerSm")]]')
maybe the reason is : some class of td is "ticker",another is
"tickerSm",
if i don't want to select it with xpath,how to express it with xpath??
 
A

Ammar Ali

xpath('//table/tr[*[not(@class=3D"tickerSm")]]')
maybe the reason is : some class of td is "ticker",another is
"tickerSm",
if i don't want to =C2=A0select it with xpath,how to express it with xpat=
h??

Hi Pen,

I don't know if "not" is valid like that, I have to double check. But
you can use "!=3D" with attributes.

doc.xpath('//table/tr/*[@class!=3D"tickerSm"]')

I hope it helps,
Ammar
 
P

Pen Ttt

i found they are equal between not and != in nokogiri xpath
expression.
there is still one problem remain,if my html is the following:

<table>
<tr bgcolor="F3F3F3">
<td align="right" width="240" class="tickerSm">reportdate</td>
<td align="right" width="65" class="tickerSm">10/31/09</td>
<td align="right" width="65" class="tickerSm">10/31/08</td>
<td align="right" width="65" class="tickerSm">10/31/07</td>
<td align="right" width="65" class="tickerSm">10/31/06</td>
<td align="right" width="65" class="tickerSm">10/31/05</td>
</tr>
<tr bgcolor="ffffff">
<td class="tickerSm">Cash &amp; Equivalents</td>
<td align="right" class="ticker">2,493</td>
<td align="right" class="ticker">1,429</td>
<td align="right" class="ticker">1,826</td>
<td align="right" class="ticker">2,262</td>
<td align="right" class="ticker">2,251</td>
</tr>
<tr bgcolor="ffffff">
<td class="ticker">Receivables</td>
<td align="right" class="ticker">595</td>
<td align="right" class="ticker">770</td>
<td align="right" class="ticker">735</td>
<td align="right" class="ticker">692</td>
<td align="right" class="ticker">753</td>
</tr>
</table>

xpath('//table/tr[td[@class="tickerSm"]') get :

<tr bgcolor="F3F3F3">
<td align="right" width="240" class="tickerSm">reportdate</td>
<td align="right" width="65" class="tickerSm">10/31/09</td>
<td align="right" width="65" class="tickerSm">10/31/08</td>
<td align="right" width="65" class="tickerSm">10/31/07</td>
<td align="right" width="65" class="tickerSm">10/31/06</td>
<td align="right" width="65" class="tickerSm">10/31/05</td>
</tr>
<tr bgcolor="ffffff">


xpath('//table/tr[td[@class="ticker"]') get :

<tr bgcolor="ffffff">
<td class="ticker">Receivables</td>
<td align="right" class="ticker">595</td>
<td align="right" class="ticker">770</td>
<td align="right" class="ticker">735</td>
<td align="right" class="ticker">692</td>
<td align="right" class="ticker">753</td>
</tr>

but how can i get the following with xpath expression?
<tr bgcolor="ffffff">
<td class="tickerSm">Cash &amp; Equivalents</td>
<td align="right" class="ticker">2,493</td>
<td align="right" class="ticker">1,429</td>
<td align="right" class="ticker">1,826</td>
<td align="right" class="ticker">2,262</td>
<td align="right" class="ticker">2,251</td>
</tr>
 
P

Pen Ttt

a friend tell me,
//table/tr[td[1][@class="tickerSm"] and td[2][@class="ticker"]]
it is ok
 
A

Ammar Ali

a friend tell me,
//table/tr[td[1][@class="tickerSm"] and td[2][@class="ticker"]]
it is ok

That's good. Another possible approach is using following-sibling, if
you don't want the first td[@class="tickerSm"]

//table/tr/td[1][@class="tickerSm"]/following-sibling::td[@class!="tickerSm"]

Ammar
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top