How to get data from html table

Discussion in 'Ruby' started by Vikash Kumar, Nov 27, 2006.

  1. Vikash Kumar

    Vikash Kumar Guest

    I want to store the values of a table in different variables, I have the
    following table structure:

    <table width="579">
    <tr class="even">
    <td class width="65">&nbsp;Case5-04</td>
    <td class width="130">10/11/2006 23:24:33</td>
    <td class width="61">Case5-04</td>
    <td class width="32">1005</td>
    <td class width="59">Sell</td>
    <td class width="36">1,000</td>
    <td class width="34">ARP</td>
    <td class width="52">$36.90</td>
    </tr>
    <tr class="odd">
    <td class width="65">&nbsp;Case5-03</td>
    <td class width="130">10/11/2006 23:20:07</td>
    <td class width="61">Case5-03</a></td>
    <td class width="32">1005</td>
    <td class width="59">Buy</td>
    <td class width="36">1,500</td>
    <td class width="34">ARP</td>
    <td class width="52">$36.70</td>
    </tr>
    <tr class="even">
    <td class width="65">&nbsp;Case4-04</td>
    <td class width="130">10/11/2006 05:28:54</td>
    <td class width="61">Case4-04</a></td>
    <td class width="32">1004</td>
    <td class width="59">Sell</td>
    <td class width="36">300</td>
    <td class width="34">RIL</td>
    <td class width="52">$490.00</td>
    </tr>
    <tr class="odd">
    <td class width="65">&nbsp;Case4-03</td>
    <td class width="130">10/11/2006 05:21:32</td>
    <td class width="61">Case4-03</a></td>
    <td class width="32">1004</td>
    <td class width="59">Buy</td>
    <td class width="36">200</td>
    <td class width="34">RIL</td>
    <td class width="52">$489.90</td>
    </tr>
    </table>

    I want to store the values in variables so that I can compare records.
    Please help me out how to do this in ruby.

    --
    Posted via http://www.ruby-forum.com/.
    Vikash Kumar, Nov 27, 2006
    #1
    1. Advertising

  2. Vikash Kumar

    Peter Szinek Guest

    > I want to store the values in variables so that I can compare records.
    > Please help me out how to do this in ruby.


    One possible way:

    Record = Struct.new("Record", :name, :date, :name_again, :some_num,
    :buy_link, :some_num2, :letters, :price)
    records = []

    doc = Hpricot(doc)
    stuff = doc/"/table/tr/td"

    elements = stuff.map { |elem| elem.inner_html }.each_slice(8) do |slice|
    records << Record.new(*slice)
    end

    p records.sort_by {|record| record.price.slice(1..record.size) }

    Note that since I did not know the semantics of the table cells,
    sometimes the Struct Record has some weird fields in it, but you get the
    idea.


    Also I am not 100% sure if the sort_by should not be done on to_f-d
    prices (probably not due to rounding problems, but I wonder if there can
    be some weird string issues, too).

    HTH,
    Peter

    __
    http://www.rubyrailways.com
    Peter Szinek, Nov 27, 2006
    #2
    1. Advertising

  3. Vikash Kumar

    Park Heesob Guest

    Hi,

    >From: Vikash Kumar <>
    >Reply-To:
    >To: (ruby-talk ML)
    >Subject: How to get data from html table
    >Date: Mon, 27 Nov 2006 20:20:54 +0900
    >
    >I want to store the values of a table in different variables, I have the
    >following table structure:
    >
    ><table width="579">
    > <tr class="even">
    > <td class width="65">&nbsp;Case5-04</td>
    > <td class width="130">10/11/2006 23:24:33</td>
    > <td class width="61">Case5-04</td>
    > <td class width="32">1005</td>
    > <td class width="59">Sell</td>
    > <td class width="36">1,000</td>
    > <td class width="34">ARP</td>
    > <td class width="52">$36.90</td>
    > </tr>
    > <tr class="odd">
    > <td class width="65">&nbsp;Case5-03</td>
    > <td class width="130">10/11/2006 23:20:07</td>
    > <td class width="61">Case5-03</a></td>
    > <td class width="32">1005</td>
    > <td class width="59">Buy</td>
    > <td class width="36">1,500</td>
    > <td class width="34">ARP</td>
    > <td class width="52">$36.70</td>
    > </tr>
    > <tr class="even">
    > <td class width="65">&nbsp;Case4-04</td>
    > <td class width="130">10/11/2006 05:28:54</td>
    > <td class width="61">Case4-04</a></td>
    > <td class width="32">1004</td>
    > <td class width="59">Sell</td>
    > <td class width="36">300</td>
    > <td class width="34">RIL</td>
    > <td class width="52">$490.00</td>
    > </tr>
    > <tr class="odd">
    > <td class width="65">&nbsp;Case4-03</td>
    > <td class width="130">10/11/2006 05:21:32</td>
    > <td class width="61">Case4-03</a></td>
    > <td class width="32">1004</td>
    > <td class width="59">Buy</td>
    > <td class width="36">200</td>
    > <td class width="34">RIL</td>
    > <td class width="52">$489.90</td>
    > </tr>
    ></table>
    >
    >I want to store the values in variables so that I can compare records.
    >Please help me out how to do this in ruby.
    >

    Here is another way:

    After saving the html table text to file 'w.xml',
    You can deal the value like this:

    require 'rexml/document'
    include REXML
    doc = Document.new File.new("w.xml")
    doc.elements.each("*/tr/td") {|e|
    puts e.texts
    }


    Regards,

    Park Heesob

    _________________________________________________________________
    FREE pop-up blocking with the new MSN Toolbar - get it now!
    http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/
    Park Heesob, Nov 27, 2006
    #3
  4. Vikash Kumar

    Peter Szinek Guest

    Hello,

    > Digression: when solving a problem like this, it is often much easier to
    > write a few lines of HTML than to try to use a high-powered library to
    > accomplish it.


    I don't see why is it an advantage here. The first solution in this thread:

    -------------------------------------------------------------------
    Record = Struct.new("Record", :name, :date, :name_again, :some_num,
    :buy_link, :some_num2, :letters, :price)
    records = []

    cells = Hpricot(doc)/"/table/tr/td"

    cells.map { |elem| elem.inner_html }.each_slice(8) do |slice|
    records << Record.new(*slice)
    end

    p records.sort_by {|record| record.price.slice(1..record.size) }
    ------------------------------------------------------------------

    is shorter, does not care about malformed HTML and even does the sorting
    which I believe was the main intention of the OP. So why not use a
    high-powered library?

    Discalimer: that solution was actually mine but I am not referring to it
    because of this, but rather because I think that parsing all the cells
    with a one liner using a robust HTML parser is actually much better in
    practice than to use a basic set of regexps and then patch the results
    they yield with ad-hoc rules (missing close tags etc) looked up from 3
    examples. I believe the above HPricot-powered solution will work with
    100 records, too (if the other 97 does not get *really* messed up - but
    in that case the regexps will fail miserably too) whereas the
    we-do-not-need-any-high-powered-library approach may need another 25
    patches due to the other errors in the 100-record HTML...

    I do not argue that parsing the page with regexps and seeing what's
    going on under the hood can provide a lot of experience, but I am really
    sure that feeding a real life page to a HTML parser is safer than to use
    the regexp approach.

    Of course if this question is just a theoretical one, and there won't be
    100 (or more than 3) records, just these 3, then forget about this mail.

    Cheers,
    Peter

    __
    http://www.rubyrailways.com
    Peter Szinek, Nov 27, 2006
    #4
  5. Vikash Kumar

    Vikash Kumar Guest

    > #!/usr/bin/ruby -w
    >
    > data = File.read(sourcefilename)
    >
    > output = []
    >
    > html_rows = data.scan(%r{<tr.*?>(.*?)</tr>}im).flatten
    >
    > html_rows.each do |row|
    > # filter these undesired elements
    > row.gsub!("&nbsp;","")
    > row.gsub("</a>","")
    > cells = row.scan(%r{<td.*?>(.*?)</td>}im).flatten
    > output << cells
    > end
    >
    > # done collecting, now display
    >
    > output.each do |row|
    > line = row.join(",")
    > puts line
    > end
    >


    What will be right solution if some one wants to get the data from yahoo
    site http://finance.yahoo.com/q?s=IBM and then displaying only some
    values such as Prev Close, Last Trade. Lets suppose we go to the URL
    through :

    require 'watir'
    include Watir
    require 'hpricot'
    include Hpricot
    ie=Watir::IE.new
    ie.goto("http://finance.yahoo.com/q?s=IBM")

    Now, whats next. Also let suppose we want to get all the values of
    table, we don't know the table structure then what what should be the
    correct solution ?

    --
    Posted via http://www.ruby-forum.com/.
    Vikash Kumar, Nov 28, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Williams
    Replies:
    2
    Views:
    1,113
    Jacob Yang [MSFT]
    Aug 12, 2003
  2. Rio
    Replies:
    4
    Views:
    1,182
  3. Replies:
    9
    Views:
    496
    Bruno Desthuilliers
    Dec 28, 2007
  4. Renie83
    Replies:
    1
    Views:
    308
    Ray at
    Jul 9, 2003
  5. sil
    Replies:
    4
    Views:
    283
    Thomas 'PointedEars' Lahn
    Feb 7, 2010
Loading...

Share This Page