"HTTP error -1" from urllib2

J

John Nagle

I'm getting a wierd error from urllib2 when opening certain
URLs. The code works for most sites, but not all of them.
Here's the traceback:

[Thread-2] InfoSitePage EXCEPTION while processing page
"http://www.fourmilab.ch": Problem with page "http://www.fourmilab.ch": HTTP
error -1 - ..
Traceback (most recent call last):
File "D:\projects\sitetruth\InfoSitePage.py", line 318, in httpfetch
fd = url_opener.open(self.requestedurl) # open file by url
File "D:\projects\sitetruth\miscutils.py", line 149, in open
result = urllib.FancyURLopener.open(self, url, *args)
File "D:\python24\lib\urllib.py", line 190, in open
return getattr(self, name)(url)
File "D:\python24\lib\urllib.py", line 322, in open_http
return self.http_error(url, fp, errcode, errmsg, headers)
File "D:\python24\lib\urllib.py", line 339, in http_error
return self.http_error_default(url, fp, errcode, errmsg, headers)
File "D:\projects\sitetruth\miscutils.py", line 144, in http_error_default
raise InfoException.InfoException(self.url, 'HTTP error %s - %s.' %
(errcode, errmsg))
InfoException: Problem with page "http://www.fourmilab.ch": HTTP error -1 - ..

This fails identically using Python 2.4 on a Windows desktop and on Python 2.5
on a Linux server.

The site being accessed reads fine in a browser. It's not a redirect, and it
doesn't insist on cookies.

See "http://mail.python.org/pipermail/python-list/2005-March/314301.html"
for another problem involving "HTTP error -1".

John Nagle
 
J

John J. Lee

John Nagle said:
I'm getting a wierd error from urllib2 when opening certain
URLs. The code works for most sites, but not all of them.
Here's the traceback: [...]
InfoException: Problem with page "http://www.fourmilab.ch": HTTP error -1 - ..

This fails identically using Python 2.4 on a Windows desktop and on Python 2.5
on a Linux server.

The site being accessed reads fine in a browser. It's not a redirect,
and it doesn't insist on cookies.

See "http://mail.python.org/pipermail/python-list/2005-March/314301.html"
for another problem involving "HTTP error -1".

Can you create an example (preferably small) that fails? Feel free to
email it to me if it includes something you don't want to post.

Simply fetching the URL you mention with urllib2.urlopen() works for
me, so I guess something extra is needed to reproduce the bug:

import urllib2
r = urllib2.urlopen("http://www.fourmilab.ch")
print r.read()


John
 
J

John Nagle

The crash is a known bug, and is fixed in the Subversion repository,
but not in any released version. The problem is that if the server
returns an blank line, instead of "HTTP 1", httplib goes off into
some old HTTP 0.9 code that's broken.

John Nagle
 
J

John Nagle

John said:
John Nagle <[email protected]> writes:
Can you create an example (preferably small) that fails? Feel free to
email it to me if it includes something you don't want to post.

It's not a Python problem, as it turns out. It's a problem in,
surprisingly, Coyote Point load balancers.

This fails:
====
telnet www.coyotepoint.com 80
GET / HTTP/1.0
Host: www.fourmilab.ch
User-agent: am

====

This works:
====
telnet www.coyotepoint.com 80
GET / HTTP/1.0
Host: www.fourmilab.ch
User-agent: an

=====

Note the difference in the "User-agent" field; "m" vs. "n".

There's some problem in Coyote Point Equalizer load balancers
in USER-AGENT parsing. If it sees a USER-AGENT string ending in
"m" but with no earlier "m" in the string, and the USER-AGENT field
is the last field in the HTTP header, it drops the packet. One can make
this happen talking to the HTTP server with a Telnet client.
If you paste the sections between "===" lines above into a Windows
command line window, you can demonstrate this too. (Remember to
copy the blank line that ends the header.)

We found this because we were using a user agent string of
"SiteTruth.com rating system", which ends in "m" but doesn't
contain any other "m" characters. A site run by people we know
wouldn't respond, and we've been working to figure out why. They
own a Coyote Point Equalizer, and after much digging through
log files, it became clear that the load balancer was dropping
these packets, even though it wasn't configured to do so.
So we tried Coyote Point's own site, and it has exactly the
same problem. It's thus probably a generic problem with
Coyote Point load balancers. It's not a configuration
problem; we've checked the load balancer's configuration file.

That load balancer uses regular expressions to parse HTTP
headers. My guess is that we're going to find a "\m" somewhere
that a "\n" was intended.

I'll be on the phone to Coyote Point on Monday.

John Nagle
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,432
Messages
2,571,681
Members
48,796
Latest member
Greg L.

Latest Threads

Top