using HPricot to parse a fiddly table

Discussion in 'Ruby' started by Adam Dullenty, Jan 6, 2008.

  1. Hi there,

    I'm fairly new to Ruby, previously I was an average programmer in Java,
    so it's all a bit foreign to me - especially XPath and cSS. I would be
    grateful if someone could give me a hand with a problem I'm having. I
    have a table which I'm trying to get the fields from in a certain way.
    The table is in the form:

    <table>
    <tr>
    <td>...stuff I don't want...</td>
    </tr>
    <tr>
    <td>
    <table>
    ------------rows i want
    <tr>
    <td>
    <table>
    <tr>
    <td>Field 1</td>
    <td>Field 2</td>
    </tr>
    </table>
    </td>
    <td>Field 3</td>
    <td>Field 4, Field 5</td>
    </tr>
    ------------end of rows i want
    </table>
    </td>
    </tr>
    </table>

    I have managed to get HPricot to parse the page and return that HTML for
    the table, however I'm struggling to get it into an array in the form
    ["Field 1", "Field 2", "Field 3", "Field 4", "Field 5"] for each row. I
    would have hoped there would be some kind of built in method for
    extracting data from a table, but I can't find one.

    Thanks again, look forward to a reply :)
    Adam
    --
    Posted via http://www.ruby-forum.com/.
    Adam Dullenty, Jan 6, 2008
    #1
    1. Advertising

  2. Adam Dullenty

    s.ross Guest

    For the innermost table, try:

    eles = doc.search('table table table td')

    for the enclosing table,

    eles = doc.search('table table td')

    I don't suppose the semantics can be improved any -- like class names
    or ids?


    On Jan 6, 2008, at 11:13 AM, Adam Dullenty wrote:

    > Hi there,
    >
    > I'm fairly new to Ruby, previously I was an average programmer in
    > Java,
    > so it's all a bit foreign to me - especially XPath and cSS. I would be
    > grateful if someone could give me a hand with a problem I'm having. I
    > have a table which I'm trying to get the fields from in a certain way.
    > The table is in the form:
    >
    > <table>
    > <tr>
    > <td>...stuff I don't want...</td>
    > </tr>
    > <tr>
    > <td>
    > <table>
    > ------------rows i want
    > <tr>
    > <td>
    > <table>
    > <tr>
    > <td>Field 1</td>
    > <td>Field 2</td>
    > </tr>
    > </table>
    > </td>
    > <td>Field 3</td>
    > <td>Field 4, Field 5</td>
    > </tr>
    > ------------end of rows i want
    > </table>
    > </td>
    > </tr>
    > </table>
    >
    > I have managed to get HPricot to parse the page and return that HTML
    > for
    > the table, however I'm struggling to get it into an array in the form
    > ["Field 1", "Field 2", "Field 3", "Field 4", "Field 5"] for each
    > row. I
    > would have hoped there would be some kind of built in method for
    > extracting data from a table, but I can't find one.
    >
    > Thanks again, look forward to a reply :)
    > Adam
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    s.ross, Jan 6, 2008
    #2
    1. Advertising

  3. Steve Ross wrote:

    > I don't suppose the semantics can be improved any -- like class names
    > or ids?


    Thanks for your reply. Afraid not, no handy names or ids. The code you
    posted I think I was doing anyway in a slightly different form as
    "elements2 = (elements/"table//table//td")". Since I posted last though
    I've managed to sort it out just by lots of array manipulation.

    Thanks for the help though :)
    Adam


    --
    Posted via http://www.ruby-forum.com/.
    Adam Dullenty, Jan 7, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jerome ---
    Replies:
    2
    Views:
    176
    Ken Bloom
    Nov 21, 2006
  2. Ehud Rosenberg
    Replies:
    2
    Views:
    141
    Ehud Rosenberg
    Nov 14, 2007
  3. K. R.

    hpricot - parse html

    K. R., Jan 2, 2008, in forum: Ruby
    Replies:
    3
    Views:
    110
    Daniel Brumbaugh Keeney
    Jan 3, 2008
  4. Christiaan Venter
    Replies:
    1
    Views:
    144
    7stud --
    May 22, 2009
  5. Daniel Orner

    IE6 memory leak - very fiddly

    Daniel Orner, Nov 19, 2008, in forum: Javascript
    Replies:
    20
    Views:
    234
    Thomas 'PointedEars' Lahn
    Nov 21, 2008
Loading...

Share This Page