Hpricot getting a table

L

lrlebron

I am currently trying to scrape some data from the following web page

I am using some hpricot code that looks like this
@doc = Hpricot(open(strLink))

@doc.search("/html/body/table[5]/tr/td[2]/div[3]/table/tr/td/div[1]/
table") do |data|
puts data
end

At this point data contains html that looks like this

<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>

More stuff continues ......

I want capture each of these four tables individually for further
processing. I have tried a variety of methods but nothing seems to
work.

Thanks,
Luis
 
L

lrlebron

Would something as simple as this work? I'm not sure how complex
your tables get.

#!/usr/bin/env ruby

require "hpricot"

doc = Hpricot("<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>")

(doc/"table").map {|t| puts t.to_html}

This outputs:

"<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>"
"<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>"
"<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>"
"<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>"

Note that there's an Hpricot mailing list athttp://code.whytheluckystiff.net/hpricot/that might be a more
appropriate forum for these questions.

-Drew

Thanks,

This gets me a lot closer to what I need.
I'm having some problems with syntax. If I'm reading the docs
correctly map returns an array. So I should be able to do something
like

arrTables = (doc/"tables").map

And then access each table individually. For example

arrTables[0]

Luis
 
P

Peter Szinek

I am currently trying to scrape some data from the following web page

I am using some hpricot code that looks like this
@doc = Hpricot(open(strLink))

@doc.search("/html/body/table[5]/tr/td[2]/div[3]/table/tr/td/div[1]/
table") do |data|
puts data
end

At this point data contains html that looks like this

<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>
<table><tr><td>Stuff</td></tr><tr><td>Stuff</td></tr></table>

More stuff continues ......

I want capture each of these four tables individually for further
processing. I have tried a variety of methods but nothing seems to
work.

What are you trying to do exactly? What should be the result?
Could you please provide some real data, because these 'stuff' do not
make too much sense :)


Thanks,
Peter
__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,771
Messages
2,569,587
Members
45,099
Latest member
AmbrosePri
Top