How to prevent the script from stopping before it should

P

python

I have a script that downloads some webpages.The problem is that,
sometimes, after I download few pages the script hangs( stops).
(But sometimes it finishes in an excellent way ( to the end) and
download all the pages I want to)
I think the script stops if the internet connection to the server (from
where I download the pages) is rather poor.
Is there a solution how to prevent the script from hanging before all
pages are downloaded?

Thanks for help
Lad.
 
W

wittempj

#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
will print an error if a page fails opening, rest opens fine
 
S

Steve Holden

#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
will print an error if a page fails opening, rest opens fine
More generally you may wish to use the timeout features of TCP sockets.
These were introduced in Python 2.3, though Tim O'Malley's module
"timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
available for earlier versions.

You will need to import the socket module and then call
socket.setdefaulttimeout() to ensure that communication with
non-responsive servers results in a socket exception that you can trap.

regards
Steve
 
P

python

Steve said:
#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
will print an error if a page fails opening, rest opens fine
More generally you may wish to use the timeout features of TCP sockets.
These were introduced in Python 2.3, though Tim O'Malley's module
"timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
available for earlier versions.

You will need to import the socket module and then call
socket.setdefaulttimeout() to ensure that communication with
non-responsive servers results in a socket exception that you can trap.

regards
Steve

Thank you (e-mail address removed) and Steve for some ideas.Finding the
fact that the script hanged is not a big problem .
I,however, would need a solution that I will not need to start again
the script but the script re-start by itself. I am thinking about two
threads, the main(master) that will supervise a slave thread.This slave
thread will download the pages and whenever there is a timeout the
master thread restart a slave thread.
Is it a good solution? Or is there a better one?
Thanks for help
Lad
 
F

Fredrik Lundh

Steve said:
You will need to import the socket module and then call socket.setdefaulttimeout() to ensure that
communication with non-responsive servers results in a socket exception that you can trap.

or you can use asynchronous sockets, so your program can keep processing
the sites that do respond at once while it's waiting for the ones that don't. for
one way to do that, see "Using HTTP to Download Files" here:

http://effbot.org/zone/effnews-1.htm

(make sure you read the second and third article as well)

</F>
 
P

python

Fredrik said:
exception that you can trap.

or you can use asynchronous sockets, so your program can keep processing
the sites that do respond at once while it's waiting for the ones that don't. for
one way to do that, see "Using HTTP to Download Files" here:

http://effbot.org/zone/effnews-1.htm

(make sure you read the second and third article as well)
Dear Fredrik Lundh,
Thank you for the link. I checked it. But I have not found an answer to
my question.
My problem is that I can not finish( sometimes) to download all pages.
Sometimes my script freezes and I can not do nothing but restart the
script from the last successfully downloaded web page. There is no
error saying that was an error. I do not know why; maybe the server is
programed to reduce the numbers of connection or there maybe different
reasons.So, my idea was two threads. One master ,suprevising the slave
thread that would do downloading and if the slave thread stopped,
master thread would start another slave. Is it a good solution? Or is
there a better solution?
Thanks for help
Lad
 
F

Fuzzyman

Steve said:
#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
will print an error if a page fails opening, rest opens fine
More generally you may wish to use the timeout features of TCP sockets.
These were introduced in Python 2.3, though Tim O'Malley's module
"timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
available for earlier versions.

You will need to import the socket module and then call
socket.setdefaulttimeout() to ensure that communication with
non-responsive servers results in a socket exception that you can trap.

So adding :

import socket
socket.setdefaulttimeout()


Is *necessary* in order to avoid hangs when using urllib2 to fetch web
resources ?

Regards,

Fuzzy
http://www.voidspace.org.uk/python/index.shtml
 
E

export

Fuzzyman said:
Steve said:
#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
will print an error if a page fails opening, rest opens fine
More generally you may wish to use the timeout features of TCP sockets.
These were introduced in Python 2.3, though Tim O'Malley's module
"timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
available for earlier versions.

You will need to import the socket module and then call
socket.setdefaulttimeout() to ensure that communication with
non-responsive servers results in a socket exception that you can trap.

So adding :

import socket
socket.setdefaulttimeout()


Is *necessary* in order to avoid hangs when using urllib2 to fetch web
resources ?

Regards,

Fuzzy
http://www.voidspace.org.uk/python/index.shtml

Fuzzy,
I use HTTPLIB with timeoutsocket but there is no timeout but the script
freezes sometimes.I suspect the server, from which I download pages,
does that to prevent high traffic. I must re start my script.
Do you think Urllib2 would be better?
Or is there a better solution?
Regards,
Lad
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top