Getting all google results with hpricot and connecting two gsubstatements to just one?

K

kazaam

Hi,
I'm trying to fetch all google results with hpricot. For the first page of results I wrote this here:

#!/usr/bin/env ruby
$Verbose=true

require 'hpricot'
require 'open-uri'

google = Hpricot(open("http://www.google.com/search?name=f&hl=en&q=#{$*}"))
(google/"h2.r/a").each {|line| puts line.to_s.gsub(/^.+href="/,'').gsub(/" .+$/,'')}

So my first question is can I connect the both gsub statments above in just one gsub which should increase the speed? Or is there even a better way than using gsub for cleaning the results?

And the next question is: how can I get all results not just from the first page?

greets
 
G

Gregory Seidman

I'm trying to fetch all google results with hpricot. For the first page
of results I wrote this here:

#!/usr/bin/env ruby
$Verbose=true

require 'hpricot'
require 'open-uri'

google = Hpricot(open("http://www.google.com/search?name=f&hl=en&q=#{$*}"))
(google/"h2.r/a").each {|line| puts line.to_s.gsub(/^.+href="/,'').gsub(/" .+$/,'')}

So my first question is can I connect the both gsub statments above in
just one gsub which should increase the speed? Or is there even a better
way than using gsub for cleaning the results?

And the next question is: how can I get all results not just from the
first page?

Look into mechanize or scrubyt for this. They sit on top of hpricot, but
are much better suited to screen scraping applications than hpricot alone.
--Greg
 
D

Douglas F Shearer

Hi Kazaam,

So my first question is can I connect the both gsub statments above
in just one gsub which should increase the speed? Or is there even
a better way than using gsub for cleaning the results?

why covered this a while ago in one of his blog posts: http://
redhanded.hobix.com/inspect/nostrils.html

The non-graphic pastie version can be seen here: http://
pastie.caboo.se/54741

And just for completeness, I wrapped it up in a rails
controller:http://douglasfshearer.com/blog/site-search-using-google-
in-ruby-on-rails

Cheers.

Douglas F Shearer
(e-mail address removed)
http://douglasfshearer.com
 
D

Daniel DeLorme

Douglas said:
why covered this a while ago in one of his blog posts:
http://redhanded.hobix.com/inspect/nostrils.html

The non-graphic pastie version can be seen here:
http://pastie.caboo.se/54741

And just for completeness, I wrapped it up in a rails
controller:http://douglasfshearer.com/blog/site-search-using-google-in-ruby-on-rails

I would not suggest doing that as google doesn't allow bot requests and
your IP will simply get blacklisted. For site search I recommend google
coop: http://www.google.com/coop/

Daniel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top