Question about using urllib2 to load a url

ken · Apr 1, 2007

Hi,

i have the following code to load a url.
My question is what if I try to load an invalide url ("http://
www.heise.de/"), will I get an IOException? or it will wait forever?

Thanks for any help.

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)

txheaders = {'User-agent': 'Mozilla/5.0 (X11; U; Linux i686; en-
US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3'}

try:
req = Request(url, txdata, txheaders)
handle = urlopen(req)
except IOError, e:
print e
print 'Failed to open %s' % url
return 0;

Kushal Kumaran · Apr 2, 2007

Hi,

i have the following code to load a url.
My question is what if I try to load an invalide url
("http://www.heise.de/"), will I get an IOException? or it will wait
forever?

Depends on why the URL is invalid. If the URL refers to a non-
existent domain, a DNS request will result in error and you will get
an "urllib2.URLError: <urlopen error (-2, 'Name or service not
known')>". If the name resolves but the host is not reachable, the
connect code will timeout (eventually) and result in an
"urllib2.URLError: <urlopen error (113, 'No route to host')>". If the
host exists but does not have a web server running, you will get an
"urllib2.URLError: <urlopen error (111, 'Connection refused')>". If a
webserver is running but the requested page does not exist, you will
get an "urllib2.HTTPError: HTTP Error 404: Not Found".

The URL you gave above does not meet any of these conditions, so
results in a valid handle to read from.

If, at any time, an error response fails to reach your machine, the
code will have to wait for a timeout. It should not have to wait
forever.

John J. Lee · Apr 4, 2007

Kushal Kumaran said:
If, at any time, an error response fails to reach your machine, the
code will have to wait for a timeout. It should not have to wait
forever.

[...]

....but it might have to wait a long time. Even if you use
socket.setdefaulttimeout(), DNS lookups can block for a long time.
The way around that is to use Python threads (no need to try to "kill"
the thread that's doing the urlopen() -- just ignore that thread if it
takes too long to finish).

Looks like 2.6 will have socket timeouts exposed at the urllib2 level
(so no need to call socket.setdefaulttimeout() any more), but the need
to use threads with urllib2 to get timeouts will remain in many cases,
due to the DNS thing (the same applies to urllib, or any other module
that ends up doing DNS lookups).

John

loading a url using urllib2	1	Mar 31, 2007
Python: 404 Error when trying to login a webpage by using 'urllib'and 'HTTPCookieProcessor'	4	Jan 12, 2014
urllib2 opendirector versus request object	0	Jun 9, 2011
HTTP Authentication using urllib2	0	Apr 24, 2009
urllib2 - not returning page expected after post	1	Mar 23, 2011
python: HTTP connections through a proxy server requiring authentication	3	Jan 26, 2013
help on HTTP 400 Bad Request syntax error on urllib2.urlopen	0	Jan 10, 2012
urllib2 auth error	1	Feb 20, 2006

Question about using urllib2 to load a url

ken

Kushal Kumaran

John J. Lee

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads