urllib hangs

J

Jay Donnell

This is a basic version of my code

for url in urls:
fp = urllib.urlopen(url)
lines = fp.readlines()

#print lines
for line in lines:
#print line
if(reUrl.search(line)):
print 'found'
return 1
else:
print 'not found'
return 0

this hangs occasionally for some certain url's. If I do a ctrl-c
(linux) it will move on to the next url. How can I get this to timeout
and move on to the next url.
 
B

Benjamin Niemann

Jay said:
This is a basic version of my code

for url in urls:
fp = urllib.urlopen(url)
lines = fp.readlines()

#print lines
for line in lines:
#print line
if(reUrl.search(line)):
print 'found'
return 1
else:
print 'not found'
return 0

this hangs occasionally for some certain url's. If I do a ctrl-c
(linux) it will move on to the next url. How can I get this to timeout
and move on to the next url.
since 2.3 there's socket.setdefaulttimeout(), this should to the job:

import socket
socket.setdefaulttimeout(10)
# throw socket.timeout exception after 10s,
# default is to wait a infinitly (or at least a very, very long time...)

For older python versions, ask google for timeoutsocket.py
 
J

Jay Donnell

Thanks, but I'm on python 2.2 and don't have the ability to upgrade it
on the server. How would I do this in 2.2?
 
J

Jay Donnell

Don't you have something better to do with your time?
I missed that last sentence because I was at work and trying to do a
few things at once.
 
P

Peter Hansen

Jay said:
I found this link
http://www.timo-tasi.org/python/timeoutsocket.py
which I'm assuming is the most recent version of timeoutsocket.py

It's very light on the explanatory side. Do I simply need drop this
file into the same directory as my script, import it, and set the
timeout???

At the risk of getting beaten up by you for the same reason
you beat up Baalbek (which was, by the way, a wholly unjustified
beating), have you actually looked at the comments in that
module yet? It does say exactly what you need to do...

(Feel free to beat me up, but note that it really does look
like you aren't taken even a few minutes to help yourself out,
so please consider not abusing those who are helping you,
even if they are a little brusque in their replies...)

-Peter
 
J

Jay Donnell

have you actually looked at the comments in that
module yet? It does say exactly what you need to do...

I don't understand the inner workings of the socket or timeoutsocket
modules. On it's face it doesn't make sense that importing
timeoutsocket would magically override the behaviour of socket without
me doing anything else to the socket module. This appears to be what
happens, but it certainly isn't clear to someone that doesn't know how
it works. Here is what timeoutsocket says, "After this module
has been imported, all socket creation goes through this shim. ".
After reading this I was unsure if I needed to install
timeoutsocket.py into the base python distro because, again, it seems
odd that simply dropping timeoutsocket.py into my cwd and importing it
will override the behaviour of the socket module. This didn't say
"exactly what I need to do". It assumed a few things that seemed odd
to me. How does timeoutsocket.py " insert a shim into the socket
module."? What does that mean??? It wasn't clear! In the time it took
you and the other guy to criticize me you could have simply said,
"yeah, just drop it into your cwd and import it".

P.S. - I really do appreciate the help that Benjamin gave.

Here is what timeoutsocket says.
--------------------------------------------------------------------------
"This module enables a timeout mechanism on all TCP connections.
After this module
has been imported, all socket creation goes through this shim. As a
result, every TCP connection will support a timeout.

The beauty of this method is that it immediately and transparently
enables the entire python library to support timeouts on TCP sockets.
As an example, if you wanted to SMTP connections to have a 20 second
timeout:

import timeoutsocket
import smtplib
timeoutsocket.setDefaultSocketTimeout(20)


The timeout applies to the socket functions that normally block on
execution: read, write, connect, and accept. If any of these
operations exceeds the specified timeout, the exception Timeout
will be raised.
---------------------------------------------------------------------
 
B

Benjamin Niemann

I don't know how it exactly works myself. But for dynamic languages like
python it is part of the design that things can be changed on-the-fly:

###### foo.py
# assume this module is part of the standard lib
def A():
print "A called"

def B():
A() # this could also be in another module
# the effect would be the same

###### bar.py
# this is an addon that modifies foo
import foo

origA = foo.A
def newA():
print "before A"
origA()
print "after A"
foo.A = newA

###### test.py
import foo
B()
import bar
B()

------------

It doesn't matter where the files are placed - once modules are found in
a directory listed in os.path they are all 'equal'.
This is a powerful feature (if you know how to use it correctly) and one
thing that make dynamic languages dynamic. Once a module is imported,
all imports result in a reference to the same module - if that one is
modified, the modification will be visible everywhere where the module
is imported (but this is not true for 'from foo import A', because this
is a reference to the original function that cannot be changed by anyone
once you get a hold of it).
(My first try was to overwrite time.time(), hoping that time.ctime()
would show a modified time. This didn't work, probably because both
time() and ctime() make a call to a lower level function. A good example
for not tampering with other peoples code unless you know how it works...)

Ok, enough to explain, why you don't have to modify any file of the
standard library in order to change its the behaviour.
The author of timeoutsocket was in fact not clear about the location
where the actual file has to be placed, probably because he assumed that
everyone knows that it doesn't matter.
 
B

Benjamin Niemann

Bernd said:
I also would use a few Threads, they will speed up your script.
He didn't mention any performance issues. Don't try to optimize for
speed before you have a valid complaint about the program being to slow
- especially if optimization involves threads!
 
J

Jay Donnell

Performance isn't an issue with this script. It runs in the background
from a cron job. It just needs to finish before the day is over which
it easily does :)
It currently runs through ~20,000 urls which it does in about 3 hours.
It should be faster now that it times out.

If I had used threads at first I probably wouldn't have noticed that
it wasn't timing out on a few urls.
 
J

Jay Donnell

Your example needed the module name in front of the method calls in
test.py
Here it is in case anyone is interested. I had no idea that this sort
of think could be done.

## foo.py
def A():
print "original A called"

def B():
A() # this could also be in another module
# the effect would be the same


###### bar.py
# this is an addon that modifies foo
import foo

origA = foo.A
def newA():
print "new A called"
foo.A = newA


#!/usr/bin/python
###### test.py
import foo
foo.B()
import bar
foo.B()
 
P

Peter Hansen

Jay said:
I don't understand the inner workings of the socket or timeoutsocket
modules. On it's face it doesn't make sense that importing
timeoutsocket would magically override the behaviour of socket without
me doing anything else to the socket module. This appears to be what
happens, but it certainly isn't clear to someone that doesn't know how
it works. Here is what timeoutsocket says, "After this module
has been imported, all socket creation goes through this shim. ".
After reading this I was unsure if I needed to install
timeoutsocket.py into the base python distro because, again, it seems
odd that simply dropping timeoutsocket.py into my cwd and importing it
will override the behaviour of the socket module. This didn't say
"exactly what I need to do".

Yes it does. It may seem odd, but it does in fact say something
that is exactly what you should do.
It assumed a few things that seemed odd
to me. How does timeoutsocket.py " insert a shim into the socket
module."? What does that mean??? It wasn't clear! In the time it took
you and the other guy to criticize me you could have simply said,
"yeah, just drop it into your cwd and import it".

Just because you disbelieved the words doesn't mean they aren't
clear. Maybe it should say "do this... it works, really!". Would
that have helped?

Anyway, in the time it took us all to have this discussion, you
could really easily have just tried it and seen for yourself
that it did work. Python has a nice interactive interpreter
prompt just for such things, and it's good to get in the habit
of using it.
P.S. - I really do appreciate the help that Benjamin gave.

And yet he says he didn't know how it worked either, but you
believed him and not the comments in the code itself. I can
see that despite anything any of us say about the inadequacy
of the approach you took you are right and we're wrong. Carry
on...

-Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,150
Latest member
MakersCBDReviews
Top