urllib2 hangs "forever" where there is no network interface

D

dumbkiwi

I have written a script that uses the urllib2 module to download web
pages for parsing.

If there is no network interface, urllib2 hangs for a very long time
before it raises an exception. I have set the socket timeout with
socket.setdefaulttimeout(), however, where there is no network
interface, this seems to be ignored - presumably, because without a
network interface, there is nothing for the socket module to interact
with.

So, can someone point me in the right direction, so that I can catch
an exception where there is no network interface?
 
J

John J. Lee

dumbkiwi said:
I have written a script that uses the urllib2 module to download web
pages for parsing.

If there is no network interface, urllib2 hangs for a very long time
before it raises an exception. I have set the socket timeout with
socket.setdefaulttimeout(), however, where there is no network
interface, this seems to be ignored - presumably, because without a
network interface, there is nothing for the socket module to interact
with.

So, can someone point me in the right direction, so that I can catch
an exception where there is no network interface?

Are you on Windows or something Unixy?

Presumably Windows? (Unix systems almost always have at least a
loopback interface)


John
 
D

dumbkiwi

Are you on Windows or something Unixy?
Linux

Presumably Windows? (Unix systems almost always have at least a
loopback interface)

John

Sorry, I should have been more specific. The network interfaces are
up - ie lo and eth1, it's where the wireless connection has dropped
out. Is the best solution to test for a wireless connection through /
proc before trying to download data?
 
J

John J. Lee

(I'm having news trouble, sorry if anybody sees a similar reply three
times...)

dumbkiwi said:
If there is no network interface, urllib2 hangs for a very long time
before it raises an exception. I have set the socket timeout with
socket.setdefaulttimeout(), however, where there is no network
interface, this seems to be ignored - presumably, because without a
network interface, there is nothing for the socket module to interact
with.
[...]
Presumably Windows? (Unix systems almost always have at least a
loopback interface)

John

Sorry, I should have been more specific. The network interfaces are
up - ie lo and eth1, it's where the wireless connection has dropped
out.

The underlying problem is that Python's socket timeout is implemented
using select() or poll(). Those system calls only allow timing out
activity on file descriptors (e.g. sockets). The problem you're
seeing is caused by getaddrinfo() blocking for a long time, and that
function doesn't involve file descriptors. The problem should really
be fixed at the C level (in Modules/socketmodule.c), using something
like alarm() or a thread to apply a timeout to getaddrinfo() calls.

Is the best solution to test for a wireless connection through /
proc before trying to download data?

That may be a good practical solution.

Another workaround that might be useful is to do your DNS lookups only
once, then use only IP addresses.


John
 
J

John J. Lee

(I'm having news trouble, sorry if anybody sees a similar reply three
times...)

dumbkiwi said:
If there is no network interface, urllib2 hangs for a very long time
before it raises an exception. I have set the socket timeout with
socket.setdefaulttimeout(), however, where there is no network
interface, this seems to be ignored - presumably, because without a
network interface, there is nothing for the socket module to interact
with. [...]
Presumably Windows? (Unix systems almost always have at least a
loopback interface)

John

Sorry, I should have been more specific. The network interfaces are
up - ie lo and eth1, it's where the wireless connection has dropped
out.

The underlying problem is that Python's socket timeout is implemented
using select() or poll(). Those system calls only allow timing out
activity on file descriptors (e.g. sockets). The problem you're
seeing is caused by getaddrinfo() blocking for a long time, and that
function doesn't involve file descriptors. The problem should really
be fixed at the C level (in Modules/socketmodule.c), using something
like alarm() or a thread to apply a timeout to getaddrinfo() calls.

Seems doing this portably with threads is a bit of a nightmare,
actually. You'd have to extend every one of CPython's thread
implementations (pthreads, Solaris threads, etc. etc. etc.) -- and I
don't even know if it's possible on all systems.

And since the GIL is released around the getaddrinfo() call in
socketmodule.c (and that can't be changed), one can't guarantee that a
Python thread won't set a different signal handler, so alarm() is not
good.

And of course Windows is a separate case.

That may be a good practical solution.

Another workaround that might be useful is to do your DNS lookups only
once, then use only IP addresses.

The portable way to actually solve what I assume is your underlying
problem (latency in a GUI) is to have a Python thread or separate
process do your urlopen()s (this can be done at the Python level).


John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,586
Members
45,096
Latest member
ThurmanCre

Latest Threads

Top