urllib2 pinger : insight as to use, cause of hang-up?

EP · Jun 6, 2005

Hello patient and tolerant Pythonistas,

Iterating through a long list of arbitrary (and possibly syntactically flawed) urls with a urllib2 pinging function I get a hang up. No exception is raised, however (according to Windows Task Manager) python.exe stops using any CPU time, neither increasing nor decreasing the memory it uses, and the script does not progress (permanently stalled, it seems). As an example, the below function has been stuck on url number 364 for ~40 minutes.

Does this simply indicate the need for a time-out function, or could there be something else going on (error in my usage) I've overlooked?

If it requires a time-out control, is there a way to implement that without using separate threads? Any best practice recommendations?

Here's my function:

--------------------------------------------------
def testLinks2(urlList=[]):
import urllib2
goodLinks=[]
badLinks=[]
user_agent = 'mySpider Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
print len(urlList), " links to test"
count=0
for url in urlList:
count+=1
print count,
try:
request = urllib2.Request(url)
request.add_header('User-Agent', user_agent)
handle = urllib2.urlopen(request)
goodLinks.append(url)
except urllib2.HTTPError, e:
badLinks.append({url:e.code})
print e.code,": ",url
except:
print "unknown error: ",url
badLinks.append({url:"unknown error"})
print len(goodLinks)," working links found"
return goodLinks, badLinks

good, bad=testLinks2(linkList)
--------------------------------------------------

Thannks in advance for your thoughts.

Eric Pederson

Mahesh · Jun 6, 2005

Timing it out will probably solve it.

EP · Jun 6, 2005

"Mahesh" advised:

Timing it out will probably solve it.

Thanks.

Follow-on question regarding implementing a timeout for use by urllib2. I am guessing the simplest way to do this is via socket.setdefaulttimeout(), but I am not sure if this sets a global parameter, and if so, whether it might be reset via instantiations of urllib, urllib2, httplib, etc. I assume socket and the timeout parameter is in the global namespace and that I can just reset it at will for application to all the socket module 'users'. Is that right?

(TIA)

[experimenting]

Traceback (most recent call last):
File "<pyshell#52>", line 1, in -toplevel-
urllib2plus.urlopen('http://zomething.com')
File "C:\Python24\lib\urllib2plus.py", line 130, in urlopen
return _opener.open(url, data)
File "C:\Python24\lib\urllib2plus.py", line 361, in open
response = self._open(req, data)
File "C:\Python24\lib\urllib2plus.py", line 379, in _open
'_open', req)
File "C:\Python24\lib\urllib2plus.py", line 340, in _call_chain
result = func(*args)
File "C:\Python24\lib\urllib2plus.py", line 1024, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\Python24\lib\urllib2plus.py", line 999, in do_open
raise URLError(err)
Traceback (most recent call last):
File "<pyshell#60>", line 1, in -toplevel-
urllib2plus.urlopen('http://zomething.com')
File "C:\Python24\lib\urllib2plus.py", line 130, in urlopen
return _opener.open(url, data)
File "C:\Python24\lib\urllib2plus.py", line 361, in open
response = self._open(req, data)
File "C:\Python24\lib\urllib2plus.py", line 379, in _open
'_open', req)
File "C:\Python24\lib\urllib2plus.py", line 340, in _call_chain
result = func(*args)
File "C:\Python24\lib\urllib2plus.py", line 1024, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\Python24\lib\urllib2plus.py", line 999, in do_open
raise URLError(err)
<addinfourl at 12449992 whose fp = <socket._fileobject object at 0x00BE1420>>

Mahesh · Jun 6, 2005

socket.setdefaulttimeout() is what I have used in the past and it has
worked well. I think it is set in the global namespace though I could
be wrong. I think it retains its value within the module it is called
in. If you use it in a different module if will probably get reset
though it is easy enough to test that out.

Python: 404 Error when trying to login a webpage by using 'urllib'and 'HTTPCookieProcessor'	4	Jan 12, 2014
urllib2 and threading	6	May 1, 2009
IOError 35 when trying to read the result of call to urllib2.urlopen	2	Sep 10, 2011
urllib2 - safe way to download something	3	Nov 14, 2008
[urllib2 + Tor] How to handle 404?	2	Nov 7, 2008
SOAPpy.Types.faultType: Cannot use object of type stdClass as array	0	Mar 20, 2013
Timeout in urllib2	0	Nov 23, 2005
code debugging	3	Jul 26, 2009

urllib2 pinger : insight as to use, cause of hang-up?

EP

Mahesh

EP

Mahesh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads