Question about using urllib2 to load a url

K

ken

Hi,

i have the following code to load a url.
My question is what if I try to load an invalide url ("http://
www.heise.de/"), will I get an IOException? or it will wait forever?

Thanks for any help.

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)

txheaders = {'User-agent': 'Mozilla/5.0 (X11; U; Linux i686; en-
US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3'}

try:
req = Request(url, txdata, txheaders)
handle = urlopen(req)
except IOError, e:
print e
print 'Failed to open %s' % url
return 0;
 
K

Kushal Kumaran

Hi,

i have the following code to load a url.
My question is what if I try to load an invalide url
("http://www.heise.de/"), will I get an IOException? or it will wait
forever?

Depends on why the URL is invalid. If the URL refers to a non-
existent domain, a DNS request will result in error and you will get
an "urllib2.URLError: <urlopen error (-2, 'Name or service not
known')>". If the name resolves but the host is not reachable, the
connect code will timeout (eventually) and result in an
"urllib2.URLError: <urlopen error (113, 'No route to host')>". If the
host exists but does not have a web server running, you will get an
"urllib2.URLError: <urlopen error (111, 'Connection refused')>". If a
webserver is running but the requested page does not exist, you will
get an "urllib2.HTTPError: HTTP Error 404: Not Found".

The URL you gave above does not meet any of these conditions, so
results in a valid handle to read from.

If, at any time, an error response fails to reach your machine, the
code will have to wait for a timeout. It should not have to wait
forever.
 
J

John J. Lee

Kushal Kumaran said:
If, at any time, an error response fails to reach your machine, the
code will have to wait for a timeout. It should not have to wait
forever.
[...]

....but it might have to wait a long time. Even if you use
socket.setdefaulttimeout(), DNS lookups can block for a long time.
The way around that is to use Python threads (no need to try to "kill"
the thread that's doing the urlopen() -- just ignore that thread if it
takes too long to finish).

Looks like 2.6 will have socket timeouts exposed at the urllib2 level
(so no need to call socket.setdefaulttimeout() any more), but the need
to use threads with urllib2 to get timeouts will remain in many cases,
due to the DNS thing (the same applies to urllib, or any other module
that ends up doing DNS lookups).


John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,120
Latest member
ShelaWalli
Top