Stuck in a Redirect Loop While Crawling

M

Matt White

Hello,

I am writing a crawler in Ruby to crawl websites. One of the sites I
crawl is very picky about headers so I am mimicking my FireFox browser
as closely as possible. One of the GETs I make to this site results in
a redirect response. I take the 'location' field from the redirect
header and go there. When FireFox sends its GET to this location, it
gets a 200 OK response. However, I keep getting redirected every time.

Here is what FireFox is sending:

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:
1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
Keep-Alive: 300
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Accept-Language: en-us,en;q=0.5
Cookie: sessionid=6d7dd6277ec64983bf642760d7d77d6a
Connection: keep-alive
Accept: text/xml,application/xml,application/xhtml+xml,text/
html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Host: <hostname here>

And here is how the server responds to FireFox:

HTTP/1.x 200 OK
Date: Tue, 12 Jun 2007 17:30:20 GMT
Server: Microsoft-IIS/6.0
MicrosoftOfficeWebServer: 5.0_Pub
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Cache-Control: private
Expires: Tue, 12 Jun 2007 17:29:18 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 81118

I am sending this exact same header using Ruby's Net::HTTP.get method:

server = Net::HTTP.new(uri.host, uri.port)
response,data = server.get(uri.request_uri, headers)

where headers is a hash with the exact same keys and values as the
FireFox headers above (the cookie value differs, of course, as that is
retrieved and stored dynamically). But I always get redirected to the
exact same URL that I just GETed. This is the response I get:

RESPONSE: #<Net::HTTPFound:0x300c604>
Printing Response:

cache-control: private
expires: Tue, 12 Jun 2007 18:17:26 GMT
x-aspnet-version: 1.1.4322
content-type: text/html; charset=utf-8
x-powered-by: ASP.NET
date: Tue, 12 Jun 2007 18:18:26 GMT
microsoftofficewebserver: 5.0_Pub
server: Microsoft-IIS/6.0
content-length: 200
location: <exact same URL I just GETed>

Can anyone enlighten me as to what I am doing differently that the
site redirects me to the same place? I can't tell if it's something
I'm doing wrong or something Ruby is doing that is not the same as
what FireFox is doing. Thanks.
 
M

Matt White

Presumably this is from LiveHTTPHeaders? I note that the Referer header
is not included herein, but Firefox does send those data by default.
Perhaps that's the substantive difference between the Firefox request
and the Net::HTTP request? Just a thought.

- donald

Donald,

Good thought. The GET right before this one that FireFox sent did have
the referer field but then it wasn't there for this one, so I removed
it. Any other ideas?

Matt
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top