urllib timeout issues

supercooper · Mar 27, 2007

I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:

Traceback (most recent call last):
File "ftp_20070326_Downloads_cooperc_FetchLibreMapProjectDRGs.py",
line 108, i
n ?
urllib.urlretrieve(fullurl, localfile)
File "C:\Python24\lib\urllib.py", line 89, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Python24\lib\urllib.py", line 222, in retrieve
fp = self.open(url, data)
File "C:\Python24\lib\urllib.py", line 190, in open
return getattr(self, name)(url)
File "C:\Python24\lib\urllib.py", line 322, in open_http
return self.http_error(url, fp, errcode, errmsg, headers)
File "C:\Python24\lib\urllib.py", line 335, in http_error
result = method(url, fp, errcode, errmsg, headers)
File "C:\Python24\lib\urllib.py", line 593, in http_error_302
data)
File "C:\Python24\lib\urllib.py", line 608, in redirect_internal
return self.open(newurl)
File "C:\Python24\lib\urllib.py", line 190, in open
return getattr(self, name)(url)
File "C:\Python24\lib\urllib.py", line 313, in open_http
h.endheaders()
File "C:\Python24\lib\httplib.py", line 798, in endheaders
self._send_output()
File "C:\Python24\lib\httplib.py", line 679, in _send_output
self.send(msg)
File "C:\Python24\lib\httplib.py", line 646, in send
self.connect()
File "C:\Python24\lib\httplib.py", line 630, in connect
raise socket.error, msg
IOError: [Errno socket error] (10060, 'Operation timed out')

I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.

Thanks.

<--- CODE --->

images = [['34095e3','Clayton'],
['35096d2','Clearview'],
['34095d1','Clebit'],
['34095c3','Cloudy'],
['34096e2','Coalgate'],
['34096e1','Coalgate SE'],
['35095g7','Concharty Mountain'],
['34096d6','Connerville'],
['34096d5','Connerville NE'],
['34096c5','Connerville SE'],
['35094f8','Cookson'],
['35095e6','Council Hill'],
['34095f5','Counts'],
['35095h6','Coweta'],
['35097h2','Coyle'],
['35096c4','Cromwell'],
['35095a6','Crowder'],
['35096h7','Cushing']]

exts = ['tif', 'tfw']
envir = 'DEV'
# URL of our image(s) to grab
url = 'http://www.archive.org/download/'
logRoot = '//fayfiler/seecoapps/Geology/GEOREFRENCED IMAGES/TOPO/
Oklahoma UTMz14meters NAD27/'
logFile = os.path.join(logRoot, 'FetchLibreDRGs_' + strftime('%m_%d_%Y_
%H_%M_%S', localtime()) + '_' + envir + '.log')

# Local dir to store files in
fetchdir = logRoot
# Entire process start time
start = time.clock()

msg = envir + ' - ' + "Script: " + os.path.join(sys.path[0],
sys.argv[0]) + ' - Start time: ' + strftime('%m/%d/%Y %I:%M:%S %p',
localtime()) + \

'\n--------------------------------------------------------------------------------------------------------------
\n\n'
AddPrintMessage(msg)
StartFinishMessage('Start')

# Loop thru image list, grab each tif and tfw
for image in images:
# Try and set socket timeout default to none
# Create a new socket connection for every time through list loop
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('archive.org', 80))
s.settimeout(None)

s2 = time.clock()
msg = '\nProcessing ' + image[0] + ' --> ' + image[1]
AddPrintMessage(msg)
print msg
for ext in exts:
fullurl = url + 'usgs_drg_ok_' + image[0][:5] + '_' + image[0]
[5:] + '/o' + image[0] + '.' + ext
localfile = fetchdir + image[0] + '_' +
string.replace(image[1], ' ', '_') + '.' + ext
urllib.urlretrieve(fullurl, localfile)
e2 = time.clock()
msg = '\nDone processing ' + image[0] + ' --> ' + image[1] +
'\nProcess took ' + Timer(s2, e2)
AddPrintMessage(msg)
print msg
# Close socket connection, only to reopen with next run thru loop
s.close()

end = time.clock()
StartFinishMessage('Finish')
msg = '\n\nDone! Process completed in ' + Timer(start, end)
AddPrintMessage(msg)

Gabriel Genellina · Mar 27, 2007

I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:

urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')

I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.

You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)

supercooper · Mar 27, 2007

I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:

Click to expand...

urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')

Click to expand...

I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.

Click to expand...

You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)

Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first
place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.

chad

Gabriel Genellina · Mar 27, 2007

I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:

Click to expand...

urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')

Click to expand...

I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.

Click to expand...

You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)

Click to expand...

Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first

Exactly. The error is out of your control: maybe the server is down,
irresponsive, overloaded, a proxy has any problems, any network problem,
etc.

place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.

Exactly!
Another option: Python is cool, but there is no need to reinvent the
wheel. Use wget instead

Nick Vatamaniuc · Mar 28, 2007

En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <[email protected]>
escribió:

I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:
urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.

Click to expand...

Click to expand...

You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)

Click to expand...

Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first
place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.

chad

Chad,

Just run the retrieval in a Thread. If the thread is not done after x
seconds, then handle it as a timeout and then retry, ignore, quit or
anything else you want.

Even better, what I did for my program is first gather all the URLs (I
assume you can do that), then group by servers, i.e. n # of images
from foo.com, m # from bar.org .... Then start a thread for each
server (with some possible maximum number of threads), each one of
those threads will be responsible for retrieving images from only one
server (this is to prevent a DoS pattern). Let each of the server
threads start a 'small' retriever thread for each image (this is to
handle the timeout you mention).

So you have two different threads -- one per server to parallelize
downloading, which in turn will spawn and one per download to handle
timeout. This way you will (ideally) saturate your bandwidth but you
only get one image per server at a time so you still 'play nice' with
each of the servers. If you want to have a max # of server threads
running (in case you have way to many servers to deal with) then run
batches of server threads.

Hope this helps,
Nick Vatamaniuc

supercooper · Mar 29, 2007

En Tue, 27 Mar 2007 17:41:44 -0300, supercooper <[email protected]>
escribió:

En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <[email protected]>
escribió:
I am downloading images using the script below. Sometimes it will go
for 10 mins, sometimes 2 hours before timing out with the following
error:
urllib.urlretrieve(fullurl, localfile)
IOError: [Errno socket error] (10060, 'Operation timed out')
I have searched this forum extensively and tried to avoid timing out,
but to no avail. Anyone have any ideas as to why I keep getting a
timeout? I thought setting the socket timeout did it, but it didnt.
You should do the opposite: timing out *early* -not waiting 2 hours- and
handling the error (maybe using a queue to hold pending requests)

Click to expand...

Click to expand...

Gabriel, thanks for the input. So are you saying there is no way to
realistically *prevent* the timeout from occurring in the first

Click to expand...

Exactly. The error is out of your control: maybe the server is down,
irresponsive, overloaded, a proxy has any problems, any network problem,
etc.

place? And by timing out early, do you mean to set the timeout for x
seconds and if and when the timeout occurs, handle the error and start
the process again somehow on the pending requests? Thanks.

Click to expand...

Exactly!
Another option: Python is cool, but there is no need to reinvent the
wheel. Use wget instead

Gabriel...thanks for the tip on wget...its awesome! I even built it on
my mac. It is working like a champ for hours on end...

Thanks!

chad

import os, shutil, string

images = [['34095d2','Nashoba'],
['34096c8','Nebo'],
['36095a4','Neodesha'],
['33095h7','New Oberlin'],
['35096f3','Newby'],
['35094e5','Nicut'],
['34096g2','Non'],
['35096h6','North Village'],
['35095g3','Northeast Muskogee'],
['35095g4','Northwest Muskogee'],
['35096f2','Nuyaka'],
['34094e6','Octavia'],
['36096a5','Oilton'],
['35096d3','Okemah'],
['35096c3','Okemah SE'],
['35096e2','Okfuskee'],
['35096e1','Okmulgee Lake'],
['35095f7','Okmulgee NE'],
['35095f8','Okmulgee North'],
['35095e8','Okmulgee South'],
['35095e4','Oktaha'],
['34094b7','Old Glory Mountain'],
['36096a4','Olive'],
['34096d3','Olney'],
['36095a6','Oneta'],
['34097a2','Overbrook']]

wgetDir = 'C:/Program Files/wget/o'
exts = ['tif', 'tfw']
url = 'http://www.archive.org/download/'
home = '//fayfiler/seecoapps/Geology/GEOREFRENCED IMAGES/TOPO/Oklahoma
UTMz14meters NAD27/'

for image in images:
for ext in exts:
fullurl = url + 'usgs_drg_ok_' + image[0][:5] + '_' + image[0]
[5:] + '/o' + image[0] + '.' + ext
os.system('wget %s -t 10 -a log.log' % fullurl)
shutil.move(wgetDir + image[0] + '.' + ext, home + 'o' +
image[0] + '_' + string.replace(image[1], ' ', '_') + '.' + ext)

urllib (54, 'Connection reset by peer') error	5	Jun 13, 2008
urllib on windows machines	4	Dec 3, 2005
urllib.urlopen	7	Dec 17, 2005
Image issues	0	Apr 2, 2023
socket timeout error?	0	Apr 17, 2006
Waiting for receiving data	4	Nov 23, 2009
urllib.urlopen: Errno socket error	6	Oct 16, 2006
urllib and urllib2, with proxies	0	Aug 8, 2006

urllib timeout issues

supercooper

Gabriel Genellina

supercooper

Gabriel Genellina

Nick Vatamaniuc

supercooper

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads