Reading lines from a file into an array

J

Jim Burgess

Using ruby, I am trying to read in lines from a two column html table
and store each line in a two element array. This two element array is in
turn stored in one large array.

The table rows look like this:

<tr class="odd">
<td>Row 1 - Column 1</td>
<td>Row 1 - Column 2</td>
</tr>
...

And, when I'm done, I hoping for this:
[["row1 - col1", "row1, col2"], ["row2 - col1", "row2, col2"], ...]

Can anyone give me any pointers on the correct way to do this.
The code I have come up with so far is this:

f = File.open("file_containing_table.txt", "r")
lines = f.readlines
array_to_hold_rows= []
index = 0
loop do
if lines[index] == nil
break
elsif lines[index].match "<td"
array_to_hold_rows << ["#{lines[index]}", "#{lines[index+1]}"]
index +=2
else
index +=1
end
end

This works and does what I want, but I would like to know if this is the
best / most effective way to go about what I am trying to achieve.

Would be grateful for any help.
 
J

Jesús Gabriel y Galán

Using ruby, I am trying to read in lines from a two column html table
and store each line in a two element array. This two element array is in
turn stored in one large array.

The table rows look like this:

<tr class=3D"odd">
=A0<td>Row 1 - Column 1</td>
=A0<td>Row 1 - Column 2</td>
</tr>
...

And, when I'm done, I hoping for this:
[["row1 - col1", "row1, col2"], ["row2 - col1", "row2, col2"], ...]

Can anyone give me any pointers on the correct way to do this.
The code I have come up with so far is this:

f =3D File.open("file_containing_table.txt", "r")
lines =3D f.readlines
array_to_hold_rows=3D []
index =3D 0
loop do
=A0if lines[index] =3D=3D nil
=A0 =A0break
=A0elsif lines[index].match "<td"
=A0 =A0array_to_hold_rows << ["#{lines[index]}", "#{lines[index+1]}"]
=A0 =A0index +=3D2
=A0else
=A0 =A0index +=3D1
=A0end
end

Usually parsing html with regular expressions is risky. I'd use a
parser instead, if possible, for example nokogiri:

require 'nokogiri'
html =3D<<END
<html>
<body>
<div>
<table>
<tr class=3D"odd">
<td>Row 1 - Column 1</td>
<td>Row 1 - Column 2</td>
</tr>
<tr class=3D"odd">
<td> Row2 - col1</td>
<td> row2 - col2</td>
</tr>
</table>
</div>
</body>
</html>
END
doc =3D Nokogiri::HTML(html)
result =3D []
doc.xpath('//tr[@class=3D"odd"]/td').each_slice(2) do |first, second|
result << [first.inner_html, second.inner_html]
end

result #=3D> [["Row 1 - Column 1", "Row 1 - Column 2"], [" Row2 -
col1", " row2 - col2"]]
This works and does what I want, but I would like to know if this is the
best / most effective way to go about what I am trying to achieve.

I think in your solution you are not stripping the markup, so your
array still contains the <td> and </td> tags.
Now that I think about it you might want to tweak the Xpath I wrote,
because maybe not all the trs have class=3D"odd".

Hope this helps,

Jesus.
 
J

Jim Burgess

Hi Jesus,

Thanks for your reply. That really helped.
Usually parsing html with regular expressions is risky. I'd use a
parser instead, if possible, for example nokogiri:

You are of course correct.
I have followed your suggestion, installed 'nokogiri' and am currently
looking at some examples. I will then tweak what you wrote and use that
in my final version as it is very neat and concise.

However, this:
doc.xpath('//tr[@class="odd"]/td').each_slice(2) do |first, second|
result << [first.inner_html, second.inner_html]
end

gave me the idea of writing this:

lines.each_slice(4){|one,two,three,four| result <<
[two.chomp,three.chomp]}

which also does exactly what I want, so thanks again!

Essentially, I have just finished reading the first part of the Pickaxe
book and am trying to implement some of their suggestions and think
about the code I write as opposed to doing the "quick and dirty" method.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top