Hpricot getting a table

lrlebron · Apr 18, 2007

I am currently trying to scrape some data from the following web page

I am using some hpricot code that looks like this
@doc = Hpricot(open(strLink))

@doc.search("/html/body/table[5]/tr/td[2]/div[3]/table/tr/td/div[1]/
table") do |data|
puts data
end

At this point data contains html that looks like this

<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>

More stuff continues ......

I want capture each of these four tables individually for further
processing. I have tried a variety of methods but nothing seems to
work.

Thanks,
Luis

lrlebron · Apr 18, 2007

Would something as simple as this work? I'm not sure how complex
your tables get.

#!/usr/bin/env ruby

require "hpricot"

doc = Hpricot("<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>")

(doc/"table").map {|t| puts t.to_html}

This outputs:

"<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>"
"<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>"
"<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>"
"<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>"

Note that there's an Hpricot mailing list athttp://code.whytheluckystiff.net/hpricot/that might be a more
appropriate forum for these questions.

-Drew

Thanks,

This gets me a lot closer to what I need.
I'm having some problems with syntax. If I'm reading the docs
correctly map returns an array. So I should be able to do something
like

arrTables = (doc/"tables").map

And then access each table individually. For example

arrTables[0]

Luis

Peter Szinek · Apr 18, 2007

I am currently trying to scrape some data from the following web page

I am using some hpricot code that looks like this
@doc = Hpricot(open(strLink))

@doc.search("/html/body/table[5]/tr/td[2]/div[3]/table/tr/td/div[1]/
table") do |data|
puts data
end

At this point data contains html that looks like this

<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>

More stuff continues ......

I want capture each of these four tables individually for further
processing. I have tried a variety of methods but nothing seems to
work.

What are you trying to do exactly? What should be the result?
Could you please provide some real data, because these 'stuff' do not
make too much sense

Thanks,
Peter
__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby

Only one table shows up with the information	2	Mar 29, 2023
Ruby(and programming) beginners question regarding 'NoMethodError'while using Hpricot	5	Feb 15, 2011
using HPricot to parse a fiddly table	2	Jan 6, 2008
Html parsing with Hpricot	2	Jun 9, 2010
HTML parser using Hpricot	0	Jan 8, 2010
Using hpricot to get tables	0	Jul 1, 2008
extract value of the hpricot elem	1	Aug 12, 2008
Help with Hpricot and collect	0	Dec 18, 2008

Hpricot getting a table

lrlebron

lrlebron

Peter Szinek

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads