select tr>3 with nokogiri

Discussion in 'Ruby' started by Pen Ttt, Aug 27, 2010.

  1. Pen Ttt

    Pen Ttt Guest

    i want to get row which it contains more than 3 columns
    how to write xpath with nokogiri


    require 'rubygems'
    require 'nokogiri'
    item='sometext'
    doc = Nokogiri::HTML.parse(open(item))
    data=doc.xpath('/html/body/table/tr[@td.size>3]')
    puts data
    it can not run , help and advices appreciated.
    --
    Posted via http://www.ruby-forum.com/.
     
    Pen Ttt, Aug 27, 2010
    #1
    1. Advertising

  2. Pen Ttt

    Pen Ttt Guest

    for example,
    table1:
    <table >
    <tr>
    <td>kk</td>
    </tr>
    <tr>
    <td > 1 </td>
    <td > 2 </td>
    </tr>
    <tr>
    <td > 3 </td>
    <td > 4 </td>
    </tr>
    <tr>
    <td>qq</td>
    </tr>
    </table>

    table2:
    <table >
    <tr>
    <td>kk</td>
    </tr>
    <tr>
    <td > 1 </td>
    <td > 2 </td>
    </tr>
    <tr>
    <td > 3 </td>
    <td > 4 </td>
    </tr>
    </table>

    i want to get table2 from table1,to get row which contains more then
    one column,how to do it with nokogiri??

    --
    Posted via http://www.ruby-forum.com/.
     
    Pen Ttt, Aug 28, 2010
    #2
    1. Advertising

  3. Pen Ttt

    Ammar Ali Guest

    Re: select tr>3 with nokogiri

    Use count(), like:

    document.xpath("//*[count(td)=3D2]")

    You can also select children at certain offsets with td:nth-child(N)
    or position(N)

    HTH,
    Ammar


    On Sat, Aug 28, 2010 at 10:17 AM, Pen Ttt <> wrote:
    > for =C2=A0example,
    > table1:
    > <table >
    > <tr>
    > <td>kk</td>
    > </tr>
    > <tr>
    > <td > 1 </td>
    > <td > 2 </td>
    > </tr>
    > <tr>
    > <td > 3 </td>
    > <td > 4 </td>
    > </tr>
    > <tr>
    > <td>qq</td>
    > </tr>
    > </table>
    >
    > table2:
    > <table >
    > <tr>
    > <td>kk</td>
    > </tr>
    > <tr>
    > <td > 1 </td>
    > <td > 2 </td>
    > </tr>
    > <tr>
    > <td > 3 </td>
    > <td > 4 </td>
    > </tr>
    > </table>
    >
    > i want =C2=A0to get =C2=A0table2 from table1,to get row =C2=A0which conta=

    ins more then
    > one column,how to do it =C2=A0with =C2=A0nokogiri??
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >
     
    Ammar Ali, Aug 28, 2010
    #3
  4. Pen Ttt

    Pen Ttt Guest

    Re: select tr>3 with nokogiri

    p1
    data=doc.xpath('/table/tr/*[count(td)>1]')
    puts data
    p2
    data=doc.xpath('/table/tr/td[count(td)>1]')
    puts data
    none of them is right,why can i get nothing?
    --
    Posted via http://www.ruby-forum.com/.
     
    Pen Ttt, Aug 28, 2010
    #4
  5. Pen Ttt

    Pen Ttt Guest

    Re: select tr>3 with nokogiri

    document.xpath("//*[count(td)=2]") is right,but i want to know
    p1
    data=doc.xpath('/table/tr/*[count(td)>1]')
    puts data
    p2
    data=doc.xpath('/table/tr/td[count(td)>1]')
    puts data
    how to fix p1\p2?
    --
    Posted via http://www.ruby-forum.com/.
     
    Pen Ttt, Aug 28, 2010
    #5
  6. Pen Ttt

    Ammar Ali Guest

    Re: select tr>3 with nokogiri

    If the table is not the root or directly inside the root, you need 2
    "/" in the beginning. The count function applies to the tr, not the
    td, so you don't need the "*" in p1, or the td in p2. Try this:

    doc.xpath('//table/tr[count(td)>1]')

    Good Luck,
    Ammar


    On Sat, Aug 28, 2010 at 3:33 PM, Pen Ttt <> wrote:
    > document.xpath("//*[count(td)=3D2]") =C2=A0is =C2=A0right,but =C2=A0i wan=

    t to know
    > p1
    > =C2=A0data=3Ddoc.xpath('/table/tr/*[count(td)>1]')
    > =C2=A0puts data
    > p2
    > =C2=A0data=3Ddoc.xpath('/table/tr/td[count(td)>1]')
    > =C2=A0puts data
    > how to =C2=A0fix p1\p2?
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >
     
    Ammar Ali, Aug 28, 2010
    #6
  7. Pen Ttt

    Ken Bloom Guest

    On Fri, 27 Aug 2010 23:26:53 +0900, Pen Ttt wrote:

    > i want to get row which it contains more than 3 columns how to write
    > xpath with nokogiri
    >
    >
    > require 'rubygems'
    > require 'nokogiri'
    > item='sometext'
    > doc = Nokogiri::HTML.parse(open(item))
    > data=doc.xpath('/html/body/table/tr[@td.size>3]') puts data
    > it can not run , help and advices appreciated.


    doc.xpath('/html/body/table/tr[count(td)>3]')



    --
    Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory.
    Department of Computer Science. Illinois Institute of Technology.
    http://www.iit.edu/~kbloom1/
     
    Ken Bloom, Aug 29, 2010
    #7
  8. Pen Ttt

    Pen Ttt Guest

    Re: select tr>3 with nokogiri

    think Ammar ,one problem vanish,another occur.
    here is the content of /home/pt/mytest:

    <table>
    <tr bgcolor="F3F3F3">
    <td align="right" width="240">reportdate</td>
    <td align="right" width="65" class="tickerSm">10/31/09</td>
    <td align="right" width="65" class="tickerSm">10/31/08</td>
    <td align="right" width="65" class="tickerSm">10/31/07</td>
    <td align="right" width="65" class="tickerSm">10/31/06</td>
    <td align="right" width="65" class="tickerSm">10/31/05</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="tickerSm">Cash &amp; Equivalents</td>
    <td align="right" class="ticker">2,493</td>
    <td align="right" class="ticker">1,429</td>
    <td align="right" class="ticker">1,826</td>
    <td align="right" class="ticker">2,262</td>
    <td align="right" class="ticker">2,251</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="ticker">Receivables</td>
    <td align="right" class="ticker">595</td>
    <td align="right" class="ticker">770</td>
    <td align="right" class="ticker">735</td>
    <td align="right" class="ticker">692</td>
    <td align="right" class="ticker">753</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="ticker">Notes Receivable</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="ticker">Inventories</td>
    <td align="right" class="ticker">552</td>
    <td align="right" class="ticker">646</td>
    <td align="right" class="ticker">643</td>
    <td align="right" class="ticker">627</td>
    <td align="right" class="ticker">722</td>
    </tr>
    <table>

    what i want to get is :
    <tr bgcolor="ffffff">
    <td class="ticker">Receivables</td>
    <td align="right" class="ticker">595</td>
    <td align="right" class="ticker">770</td>
    <td align="right" class="ticker">735</td>
    <td align="right" class="ticker">692</td>
    <td align="right" class="ticker">753</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="ticker">Notes Receivable</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="ticker">Inventories</td>
    <td align="right" class="ticker">552</td>
    <td align="right" class="ticker">646</td>
    <td align="right" class="ticker">643</td>
    <td align="right" class="ticker">627</td>
    <td align="right" class="ticker">722</td>
    </tr>

    p1:
    require 'rubygems'
    require 'nokogiri'
    doc = Nokogiri::HTML.parse(open('/home/pt/mytest'))
    result=doc.xpath('//table/tr[td[@class="ticker"]]')
    puts result

    i can get what i want with p1

    p2:
    require 'rubygems'
    require 'nokogiri'
    doc = Nokogiri::HTML.parse(open('/home/pt/mytest'))
    result=doc.xpath('//table/tr[td[not(@class="tickerSm")]]')
    puts result

    why can't i get what i want with p2??
    how to fix p2?
    think for your help.

    --
    Posted via http://www.ruby-forum.com/.
     
    Pen Ttt, Aug 29, 2010
    #8
  9. Pen Ttt

    Pen Ttt Guest

    i found some secret,if my file /home/pt/mytest was changed into:
    <table>
    <tr bgcolor="F3F3F3">
    <td align="right" width="240" class="tickerSm">reportdate</td>
    <td align="right" width="65" class="tickerSm">10/31/09</td>
    <td align="right" width="65" class="tickerSm">10/31/08</td>
    <td align="right" width="65" class="tickerSm">10/31/07</td>
    <td align="right" width="65" class="tickerSm">10/31/06</td>
    <td align="right" width="65" class="tickerSm">10/31/05</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="tickerSm">Cash &amp; Equivalents</td>
    <td align="right" class="ticker">2,493</td>
    <td align="right" class="ticker">1,429</td>
    <td align="right" class="ticker">1,826</td>
    <td align="right" class="ticker">2,262</td>
    <td align="right" class="ticker">2,251</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="ticker">Receivables</td>
    <td align="right" class="ticker">595</td>
    <td align="right" class="ticker">770</td>
    <td align="right" class="ticker">735</td>
    <td align="right" class="ticker">692</td>
    <td align="right" class="ticker">753</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="ticker">Notes Receivable</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="ticker">Inventories</td>
    <td align="right" class="ticker">552</td>
    <td align="right" class="ticker">646</td>
    <td align="right" class="ticker">643</td>
    <td align="right" class="ticker">627</td>
    <td align="right" class="ticker">722</td>
    </tr>
    <table>

    with the code ,
    require 'rubygems'
    require 'nokogiri'
    doc = Nokogiri::HTML.parse(open('/home/pt/mytest'))
    result=doc.xpath('//table/tr[*[not(@class="tickerSm")]]')
    puts result

    what i can get is:
    <tr bgcolor="ffffff">
    <td class="tickerSm">Cash &amp; Equivalents</td>
    <td align="right" class="ticker">2,493</td>
    <td align="right" class="ticker">1,429</td>
    <td align="right" class="ticker">1,826</td>
    <td align="right" class="ticker">2,262</td>
    <td align="right" class="ticker">2,251</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="ticker">Receivables</td>
    <td align="right" class="ticker">595</td>
    <td align="right" class="ticker">770</td>
    <td align="right" class="ticker">735</td>
    <td align="right" class="ticker">692</td>
    <td align="right" class="ticker">753</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="ticker">Notes Receivable</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    <td align="right" class="ticker">0</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="ticker">Inventories</td>
    <td align="right" class="ticker">552</td>
    <td align="right" class="ticker">646</td>
    <td align="right" class="ticker">643</td>
    <td align="right" class="ticker">627</td>
    <td align="right" class="ticker">722</td>
    </tr>

    the row can not be selected by my code,
    <tr bgcolor="F3F3F3">
    <td align="right" width="240" class="tickerSm">reportdate</td>
    <td align="right" width="65" class="tickerSm">10/31/09</td>
    <td align="right" width="65" class="tickerSm">10/31/08</td>
    <td align="right" width="65" class="tickerSm">10/31/07</td>
    <td align="right" width="65" class="tickerSm">10/31/06</td>
    <td align="right" width="65" class="tickerSm">10/31/05</td>
    </tr>
    <tr bgcolor="ffffff">

    but how to delete row with xpath?

    <tr bgcolor="ffffff">
    <td class="tickerSm">Cash &amp; Equivalents</td>
    <td align="right" class="ticker">2,493</td>
    <td align="right" class="ticker">1,429</td>
    <td align="right" class="ticker">1,826</td>
    <td align="right" class="ticker">2,262</td>
    <td align="right" class="ticker">2,251</td>
    </tr>
    it can't work :
    xpath('//table/tr[*[not(@class="tickerSm")]]')
    maybe the reason is : some class of td is "ticker",another is
    "tickerSm",
    if i don't want to select it with xpath,how to express it with xpath??
    --
    Posted via http://www.ruby-forum.com/.
     
    Pen Ttt, Aug 29, 2010
    #9
  10. Pen Ttt

    Ammar Ali Guest

    Re: select tr>3 with nokogiri

    > xpath('//table/tr[*[not(@class=3D"tickerSm")]]')
    > maybe the reason is : some class of td is "ticker",another is
    > "tickerSm",
    > if i don't want to =C2=A0select it with xpath,how to express it with xpat=

    h??

    Hi Pen,

    I don't know if "not" is valid like that, I have to double check. But
    you can use "!=3D" with attributes.

    doc.xpath('//table/tr/*[@class!=3D"tickerSm"]')

    I hope it helps,
    Ammar
     
    Ammar Ali, Aug 29, 2010
    #10
  11. Pen Ttt

    Pen Ttt Guest

    Re: select tr>3 with nokogiri

    i found they are equal between not and != in nokogiri xpath
    expression.
    there is still one problem remain,if my html is the following:

    <table>
    <tr bgcolor="F3F3F3">
    <td align="right" width="240" class="tickerSm">reportdate</td>
    <td align="right" width="65" class="tickerSm">10/31/09</td>
    <td align="right" width="65" class="tickerSm">10/31/08</td>
    <td align="right" width="65" class="tickerSm">10/31/07</td>
    <td align="right" width="65" class="tickerSm">10/31/06</td>
    <td align="right" width="65" class="tickerSm">10/31/05</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="tickerSm">Cash &amp; Equivalents</td>
    <td align="right" class="ticker">2,493</td>
    <td align="right" class="ticker">1,429</td>
    <td align="right" class="ticker">1,826</td>
    <td align="right" class="ticker">2,262</td>
    <td align="right" class="ticker">2,251</td>
    </tr>
    <tr bgcolor="ffffff">
    <td class="ticker">Receivables</td>
    <td align="right" class="ticker">595</td>
    <td align="right" class="ticker">770</td>
    <td align="right" class="ticker">735</td>
    <td align="right" class="ticker">692</td>
    <td align="right" class="ticker">753</td>
    </tr>
    </table>

    xpath('//table/tr[td[@class="tickerSm"]') get :

    <tr bgcolor="F3F3F3">
    <td align="right" width="240" class="tickerSm">reportdate</td>
    <td align="right" width="65" class="tickerSm">10/31/09</td>
    <td align="right" width="65" class="tickerSm">10/31/08</td>
    <td align="right" width="65" class="tickerSm">10/31/07</td>
    <td align="right" width="65" class="tickerSm">10/31/06</td>
    <td align="right" width="65" class="tickerSm">10/31/05</td>
    </tr>
    <tr bgcolor="ffffff">


    xpath('//table/tr[td[@class="ticker"]') get :

    <tr bgcolor="ffffff">
    <td class="ticker">Receivables</td>
    <td align="right" class="ticker">595</td>
    <td align="right" class="ticker">770</td>
    <td align="right" class="ticker">735</td>
    <td align="right" class="ticker">692</td>
    <td align="right" class="ticker">753</td>
    </tr>

    but how can i get the following with xpath expression?
    <tr bgcolor="ffffff">
    <td class="tickerSm">Cash &amp; Equivalents</td>
    <td align="right" class="ticker">2,493</td>
    <td align="right" class="ticker">1,429</td>
    <td align="right" class="ticker">1,826</td>
    <td align="right" class="ticker">2,262</td>
    <td align="right" class="ticker">2,251</td>
    </tr>
    --
    Posted via http://www.ruby-forum.com/.
     
    Pen Ttt, Aug 29, 2010
    #11
  12. Pen Ttt

    Pen Ttt Guest

    Re: select tr>3 with nokogiri

    a friend tell me,
    //table/tr[td[1][@class="tickerSm"] and td[2][@class="ticker"]]
    it is ok
    --
    Posted via http://www.ruby-forum.com/.
     
    Pen Ttt, Aug 29, 2010
    #12
  13. Pen Ttt

    Ammar Ali Guest

    Re: select tr>3 with nokogiri

    On Sun, Aug 29, 2010 at 9:40 AM, Pen Ttt <> wrote:
    > a friend tell me,
    > //table/tr[td[1][@class="tickerSm"] and td[2][@class="ticker"]]
    > it is ok


    That's good. Another possible approach is using following-sibling, if
    you don't want the first td[@class="tickerSm"]

    //table/tr/td[1][@class="tickerSm"]/following-sibling::td[@class!="tickerSm"]

    Ammar
     
    Ammar Ali, Aug 29, 2010
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Aaron Patterson

    [ANN] nokogiri 1.0.0 Released

    Aaron Patterson, Oct 31, 2008, in forum: Ruby
    Replies:
    0
    Views:
    113
    Aaron Patterson
    Oct 31, 2008
  2. Aaron Patterson

    [ANN] nokogiri 1.0.3 Released

    Aaron Patterson, Nov 4, 2008, in forum: Ruby
    Replies:
    0
    Views:
    133
    Aaron Patterson
    Nov 4, 2008
  3. Aaron Patterson

    [ANN] nokogiri 1.0.5 Released

    Aaron Patterson, Nov 13, 2008, in forum: Ruby
    Replies:
    4
    Views:
    180
    Marcin Raczkowski
    Nov 13, 2008
  4. Aaron Patterson

    [ANN] nokogiri 1.0.6 Released

    Aaron Patterson, Nov 17, 2008, in forum: Ruby
    Replies:
    2
    Views:
    133
    Aaron Patterson
    Nov 18, 2008
  5. palmiere
    Replies:
    1
    Views:
    466
    Erwin Moller
    Feb 9, 2004
Loading...

Share This Page