C
chrispoliquin
Hi,
I have a small Python script to fetch some pages from the internet.
There are a lot of pages and I am looping through them and then
downloading the page using urlretrieve() in the urllib module.
The problem is that after 110 pages or so the script sort of hangs and
then I get the following traceback:
Traceback (most recent call last):
File "volume_archiver.py", line 21, in <module>
urllib.urlretrieve(remotefile,localfile)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/urllib.py", line 89, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/urllib.py", line 222, in retrieve
fp = self.open(url, data)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/urllib.py", line 190, in open
return getattr(self, name)(url)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/urllib.py", line 328, in open_http
errcode, errmsg, headers = h.getreply()
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/httplib.py", line 1195, in getreply
response = self._conn.getresponse()
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/httplib.py", line 924, in getresponse
response.begin()
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/httplib.py", line 385, in begin
version, status, reason = self._read_status()
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/httplib.py", line 343, in _read_status
line = self.fp.readline()
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/socket.py", line 331, in readline
data = recv(1)
IOError: [Errno socket error] (54, 'Connection reset by peer')
My script code is as follows:
-----------------------------------------
import os
import urllib
volume_number = 149 # The volumes number 150 to 544
while volume_number < 544:
volume_number = volume_number + 1
localfile = '/Users/Chris/Desktop/Decisions/' + str(volume_number) +
'.html'
remotefile = 'http://caselaw.lp.findlaw.com/scripts/getcase.pl?
court=us&navby=vol&vol=' + str(volume_number)
print 'Getting volume number:', volume_number
urllib.urlretrieve(remotefile,localfile)
print 'Download complete.'
-----------------------------------------
Once I get the error once running the script again doesn't do much
good. It usually gets two or three pages and then hangs again.
What is causing this?
I have a small Python script to fetch some pages from the internet.
There are a lot of pages and I am looping through them and then
downloading the page using urlretrieve() in the urllib module.
The problem is that after 110 pages or so the script sort of hangs and
then I get the following traceback:
Traceback (most recent call last):
File "volume_archiver.py", line 21, in <module>
urllib.urlretrieve(remotefile,localfile)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/urllib.py", line 89, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/urllib.py", line 222, in retrieve
fp = self.open(url, data)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/urllib.py", line 190, in open
return getattr(self, name)(url)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/urllib.py", line 328, in open_http
errcode, errmsg, headers = h.getreply()
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/httplib.py", line 1195, in getreply
response = self._conn.getresponse()
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/httplib.py", line 924, in getresponse
response.begin()
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/httplib.py", line 385, in begin
version, status, reason = self._read_status()
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/httplib.py", line 343, in _read_status
line = self.fp.readline()
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/socket.py", line 331, in readline
data = recv(1)
IOError: [Errno socket error] (54, 'Connection reset by peer')
My script code is as follows:
-----------------------------------------
import os
import urllib
volume_number = 149 # The volumes number 150 to 544
while volume_number < 544:
volume_number = volume_number + 1
localfile = '/Users/Chris/Desktop/Decisions/' + str(volume_number) +
'.html'
remotefile = 'http://caselaw.lp.findlaw.com/scripts/getcase.pl?
court=us&navby=vol&vol=' + str(volume_number)
print 'Getting volume number:', volume_number
urllib.urlretrieve(remotefile,localfile)
print 'Download complete.'
-----------------------------------------
Once I get the error once running the script again doesn't do much
good. It usually gets two or three pages and then hangs again.
What is causing this?