Youtube...urgent, please help

A

Arun Kumar

Hi,

I'm new to ruby and my co. has given me an assignment in ruby. It is
regarding html extraction. It works fine except for some sites like
http://www.youtube.com, http://www.gmail.com where i'll get errors like
'400 Bad Request' and 'getaddrinfo: Name or service not known
(SocketError)' respectively for each of the 2 sites. I came to know that
may be it is because the url is being redirected. But i'm not sure about
it. My code for html extraction is :

require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'dbi'

puts "Enter domain name :"
domain = gets
#concatinating 'http://www.' with the url to open the page
url = "http://www."+domain
document = open(url)
#getting the original url of the site
url2 = document.base_uri.to_s

Can anybody please help. It is urgent. I'll be really greatful for those
who reply

Regards,
Arun Kumar

Attachments:
http://www.ruby-forum.com/attachment/3450/htmlParse.rb
 
D

David Masover

Arun said:
Hi,

I'm new to ruby and my co. has given me an assignment in ruby. It is
regarding html extraction.

You probably want Mechanize.
domain = gets
#concatinating 'http://www.' with the url to open the page
url = "http://www."+domain

Take a look at that URL -- I'd say you don't need 'www' in that.

But I'm guessing what's hurting is the newline at the end of it.

Quick fix:

domain = gets.chomp
url = "http://#{domain}"
 
A

Arun Kumar

David said:
You probably want Mechanize.


Take a look at that URL -- I'd say you don't need 'www' in that.

But I'm guessing what's hurting is the newline at the end of it.

Quick fix:

domain = gets.chomp
url = "http://#{domain}"
Sorry to say David, I tried that but the same error is producing. Is it
because i've not set the user agent. Can u please tell me how to set the
user_agent for mozilla.
Thanks for ur immediate reply
 
M

Martin DeMello

Sorry to say David, I tried that but the same error is producing. Is it
because i've not set the user agent. Can u please tell me how to set the
user_agent for mozilla.

http://mechanize.rubyforge.org/mechanize/EXAMPLES_rdoc.html has some
examples setting the user agent. Google around and see what the
mozilla user agent should be -
http://www.user-agents.org/index.shtml?moz has an extensive list, for
instance.
Thanks for ur immediate reply

Don't do that, it's annoying.

martin
 
A

Arun Kumar

Martin said:
http://mechanize.rubyforge.org/mechanize/EXAMPLES_rdoc.html has some
examples setting the user agent. Google around and see what the
mozilla user agent should be -
http://www.user-agents.org/index.shtml?moz has an extensive list, for
instance.


Don't do that, it's annoying.

martin

Can i use user-agents in hpricot? or if it can be used only for
mechanize. I've found a user-agent for mozilla :
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322; .NET CLR 2.0.50727)
But still it is showing the same error.
 
S

Serabe

2009/3/17 Arun Kumar said:
Can i use user-agents in hpricot? or if it can be used only for
mechanize. I've found a user-agent for mozilla :
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322; .NET CLR 2.0.50727)
But still it is showing the same error.

I found this:

http://schf.uc.org/articles/2007/02/14/scraping-gmail-with-mechanize-and-hpricot

It scraps gmail. If my memory doesn't fail, it is one that gives you
some problems.

Cheers,

Serabe
 
M

Martin DeMello

Can i use user-agents in hpricot? or if it can be used only for
mechanize.

Hpricot is an html parser, I don't think it concerns itself with
actually fetching the page. Use mechanize for that.

martin
 
D

David Masover

Martin said:
Hpricot is an html parser, I don't think it concerns itself with
actually fetching the page. Use mechanize for that.

What's more, mechanize doesn't even use hpricot anymore -- it uses nokogiri.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,066
Latest member
VytoKetoReviews

Latest Threads

Top