open - uri question

A

akanksha

I am using open-uri for the first time. I need to visit a bunch of urls
and gather some data. Here is a samll code snippet

require 'open-uri' # allows the use of a file like API for URLs
open( "http://no-way-outspaik375.spaces.msn.com/") { |file|
lines = file.read
puts lines

}

and here is the error I get
ruby test.rb
/usr/local/lib/ruby/1.8/open-uri.rb:290:in `open_http': 500 Internal
Server Error (OpenURI::HTTPError)
from /usr/local/lib/ruby/1.8/open-uri.rb:629:in `buffer_open'
from /usr/local/lib/ruby/1.8/open-uri.rb:167:in `open_loop'
from /usr/local/lib/ruby/1.8/open-uri.rb:165:in `open_loop'
from /usr/local/lib/ruby/1.8/open-uri.rb:135:in `open_uri'
from /usr/local/lib/ruby/1.8/open-uri.rb:531:in `open'
from /usr/local/lib/ruby/1.8/open-uri.rb:86:in `open'
from test.rb:2

However
require 'open-uri' # allows the use of a file like API for URLs
open( "http://www.google.com/") { |file|
lines = file.read
puts lines

}

works just fine. What am I doing wrong??
 
C

ChrisH

akanksha said:
I am using open-uri for the first time. I need to visit a bunch of urls
and gather some data. Here is a samll code snippet

require 'open-uri' # allows the use of a file like API for URLs
open( "http://no-way-outspaik375.spaces.msn.com/") { |file|
lines = file.read
puts lines

}

and here is the error I get
ruby test.rb
/usr/local/lib/ruby/1.8/open-uri.rb:290:in `open_http': 500 Internal
Server Error (OpenURI::HTTPError)
....

You can see some info on HTTP 500 errors here:
http://www.checkupdown.com/status/E500.html

Maybe the service was down?
Or they may have it restricted to prevent scraping?
You may need to provide some info to fool the site into
thinking your a regular browser...

Cheers
 
A

akanksha

Maybe the service was down?

The service was not down. Both urls open in a browser.


Or they may have it restricted to prevent scraping?
You may need to provide some info to fool the site into
thinking your a regular browser...

How would I go about doing that ...could you plz point me to some
info?
Thank you.
 
A

ara.t.howard

The service was not down. Both urls open in a browser.




How would I go about doing that ...could you plz point me to some
info?
Thank you.


you need to set user-agent to a 'real' browser. something like 'Mozilla/4.0'

-a
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,573
Members
45,046
Latest member
Gavizuho

Latest Threads

Top