how to use an array to load html data with ruby

P

Pen Ttt

there is a html file:
<table border=1>
<tr>
<td>report_data</td>
<td><strong>liu_asset</strong></td>
<td><strong> cash</strong></td>
<td><strong>finance_asset</strong></td>
<td><strong> note</strong></td>

<tr>
<td>2009-12-31</td>
<td>0 </td>
<td>1,693,048,000,000</td>
<td>20,147,000,000</td>
<td>500</td>
</tr>

<tr>
<td>2009-09-30</td>
<td>0 </td>
<td>1,777,512,000,000</td>
<td>24,977,000,000</td>
<td>700</td>
</tr>
</table>

how can i use an array to load data with ruby,here is what i want :
array[0,0]=report_data
array[0,1]=liu_asset
array[0,2]=cash
array[0,3]=finance_asset
array[0,4]=note
array[1,0]=2009-12-31
array[1,1]=0
array[1,2]=1,693,048,000,000
array[1,3]=20,147,000,000
array[1,4]=500
array[2,0]=2009-09-30
array[2,1]=0
array[2,2]=1,777,512,000,000
array[2,3]=24,977,000,000
array[2,4]=700
 
J

Josh Cheek

[Note: parts of this message were removed to make it a legal post.]

there is a html file:
<table border=1>
<tr>
<td>report_data</td>
<td><strong>liu_asset</strong></td>
<td><strong> cash</strong></td>
<td><strong>finance_asset</strong></td>
<td><strong> note</strong></td>

<tr>
<td>2009-12-31</td>
<td>0 </td>
<td>1,693,048,000,000</td>
<td>20,147,000,000</td>
<td>500</td>
</tr>

<tr>
<td>2009-09-30</td>
<td>0 </td>
<td>1,777,512,000,000</td>
<td>24,977,000,000</td>
<td>700</td>
</tr>
</table>

how can i use an array to load data with ruby,here is what i want :
array[0,0]=report_data
array[0,1]=liu_asset
array[0,2]=cash
array[0,3]=finance_asset
array[0,4]=note
array[1,0]=2009-12-31
array[1,1]=0
array[1,2]=1,693,048,000,000
array[1,3]=20,147,000,000
array[1,4]=500
array[2,0]=2009-09-30
array[2,1]=0
array[2,2]=1,777,512,000,000
array[2,3]=24,977,000,000
array[2,4]=700
You're missing a </tr> after the first row.

I added it in for you, and gave an example of how you can use Hpricot to
load the data. Here is basically everything I know about Hpricot: you can
get the text inside an Hpricot element by passing it #innerHTML, if it is an
element, you can get a list of elements with a specific tag by using
division, for example how I got all the td's from the row. You can get the
first instance of a specific tag by using %, for example how I pull out the
strong.
That is all I know about Hpricot, but it's all a problem like this requires.


require 'rubygems'
require 'hpricot'

rows = Array.new
for row in Hpricot(DATA) % 'table' / 'tr'
rows.push Array.new
for data in row / 'td'
rows.last.push ( data % 'strong' || data ).innerHTML
end
end

require 'pp'
pp rows

__END__
<table border=1>
<tr>
<td>report_data</td>
<td><strong>liu_asset</strong></td>
<td><strong> cash</strong></td>
<td><strong>finance_asset</strong></td>
<td><strong> note</strong></td>
</tr>
<tr>
<td>2009-12-31</td>
<td>0 </td>
<td>1,693,048,000,000</td>
<td>20,147,000,000</td>
<td>500</td>
</tr>

<tr>
<td>2009-09-30</td>
<td>0 </td>
<td>1,777,512,000,000</td>
<td>24,977,000,000</td>
<td>700</td>
</tr>
</table>
 
G

gf

Here's a similar way to do what Josh does, only using Nokogiri:


#!/usr/bin/ruby

require 'rubygems'
require 'nokogiri'

HTML =<<EOT
<table border=1>
<tr>
<td>report_data</td>
<td><strong>liu_asset</strong></td>
<td><strong> cash</strong></td>
<td><strong>finance_asset</strong></td>
<td><strong> note</strong></td>
</tr>
<tr>
<td>2009-12-31</td>
<td>0 </td>
<td>1,693,048,000,000</td>
<td>20,147,000,000</td>
<td>500</td>
</tr>
<tr>
<td>2009-09-30</td>
<td>0 </td>
<td>1,777,512,000,000</td>
<td>24,977,000,000</td>
<td>700</td>
</tr>
</table>
EOT

doc = Nokogiri::HTML.parse(HTML)

array = []
doc.css('tr').each_with_index do |tr, tr_i|
array[tr_i] = tr.css('td').map{ |td| td.text }
end

array[0][0] # => "report_data"
array[0][1] # => "liu_asset"
array[0][2] # => " cash"
array[0][3] # => "finance_asset"
array[0][4] # => " note"
array[1][0] # => "2009-12-31"
array[1][1] # => "0 "
array[1][2] # => "1,693,048,000,000"
array[1][3] # => "20,147,000,000"
array[1][4] # => "500"
array[2][0] # => "2009-09-30"
array[2][1] # => "0 "
array[2][2] # => "1,777,512,000,000"
array[2][3] # => "24,977,000,000"
array[2][4] # => "700"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,062
Latest member
OrderKetozenseACV

Latest Threads

Top