urllib leaves sockets open?

Chris Tavares · Aug 21, 2005

Hi all. I'm currently tracking down a problem in a little script[1] I have,
and I was hoping that those more experienced than myself could weigh in.

The script's job is to grab the status page off a DLink home router. This is
a really simple job: I just use urllib.urlopen() to grab the status page.
The router uses HTTP Basic authentication, so I've subclassed FancyURLOpener
to supply the credentials.

This all worked fine with an older router, but with the newer model there's
a long delay between sending the authentication information and actually
getting the response back. When just going in via a brower, there is no such
delay.

I did a little work with a tracing proxy, and I noticed something
interesting. urllib first makes an HTTP request without authentication
information. This gets back an HTTP 401 error code, as expected. urllib then
opens a second socket, and sends the Authentication header, again just as
expected.

Here's what I noticed: The socket for the first request that failed is still
connected. It looks like what's happening is that the router's only allowing
a single HTTP connection at a time. As a result, the second, authenticated
request, doesn't get it's response until there's some kind of timeout and
the first socket disconnects.

Is this normal behavior for urllib? Is there a way to force that initial
socket closed earlier? Is there something else I need to do?

Thanks for any insight,

-Chris

[1] The script in question is:

router_address = "xxx"
router_port = 80
router_user = "user"
router_password = "password"

class DI604Opener( urllib.FancyURLopener ):
def prompt_user_passwd( self, host, realm ):
return ( router_user, router_password )

urllib._urlopener = DI604Opener()

#
# Kick off the process when run from the command line
#
if __name__ == "__main__":
status_page = urllib.urlopen( "http://%s:%s/status.htm" % ( router_address,
router_port ) )
print status_page.read()

Paul Rubin · Aug 21, 2005

Chris Tavares said:
Is this normal behavior for urllib? Is there a way to force that initial
socket closed earlier? Is there something else I need to do?

I'd say open a sourceforge bug. There may be a way around it with the
fancy opener methods of urllib2, but it's a bug if regular urllib
opens a second socket without closing the first one. For http 1.1
it should be able to use just one socket anyway.

Chris Tavares · Aug 21, 2005

Paul Rubin said:
I'd say open a sourceforge bug. There may be a way around it with the
fancy opener methods of urllib2, but it's a bug if regular urllib
opens a second socket without closing the first one. For http 1.1
it should be able to use just one socket anyway.

Thanks, I'll do some poking around in urllib first and see if I can narrow
it down.

Is there a way to do HTTP 1.1 with urllib? The docs say 0.9 and 1.0 only.

Thanks,

-Chris

Paul Rubin · Aug 21, 2005

Chris Tavares said:
Is there a way to do HTTP 1.1 with urllib? The docs say 0.9 and 1.0 only.

I'm not sure. Try urllib2, but I'm still not sure.

urllib equivalent for HTTP requests	5	Oct 8, 2008
urllib (54, 'Connection reset by peer') error	5	Jun 13, 2008
More urllib timeout issues.	5	Apr 27, 2007
urllib post and redirect = fail	0	Dec 11, 2009
urllib (in thread) never returns	1	Jul 17, 2006
trouble getting google through urllib	9	Dec 19, 2006
Bugs: Content-Length not updated by reused urllib.request.Request/ has_header() case-sensitive	4	Nov 12, 2012
urllib problem	0	Jan 15, 2004

urllib leaves sockets open?

Chris Tavares

Paul Rubin

Chris Tavares

Paul Rubin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads