How to prevent the script from stopping before it should

Discussion in 'Python' started by python@hope.cz, Jan 17, 2005.

  1. Guest

    I have a script that downloads some webpages.The problem is that,
    sometimes, after I download few pages the script hangs( stops).
    (But sometimes it finishes in an excellent way ( to the end) and
    download all the pages I want to)
    I think the script stops if the internet connection to the server (from
    where I download the pages) is rather poor.
    Is there a solution how to prevent the script from hanging before all
    pages are downloaded?

    Thanks for help
    Lad.
    , Jan 17, 2005
    #1
    1. Advertising

  2. Guest

    #import urllib, sys
    #pages = ['http://www.python.org', 'http://xxx']
    #for i in pages:
    # try:
    # u = urllib.urlopen(i)
    # print u.geturl()
    # except Exception, e:
    # print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
    will print an error if a page fails opening, rest opens fine
    , Jan 17, 2005
    #2
    1. Advertising

  3. Steve Holden Guest

    wrote:

    > #import urllib, sys
    > #pages = ['http://www.python.org', 'http://xxx']
    > #for i in pages:
    > # try:
    > # u = urllib.urlopen(i)
    > # print u.geturl()
    > # except Exception, e:
    > # print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
    > will print an error if a page fails opening, rest opens fine
    >

    More generally you may wish to use the timeout features of TCP sockets.
    These were introduced in Python 2.3, though Tim O'Malley's module
    "timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
    available for earlier versions.

    You will need to import the socket module and then call
    socket.setdefaulttimeout() to ensure that communication with
    non-responsive servers results in a socket exception that you can trap.

    regards
    Steve
    --
    Steve Holden http://www.holdenweb.com/
    Python Web Programming http://pydish.holdenweb.com/
    Holden Web LLC +1 703 861 4237 +1 800 494 3119
    Steve Holden, Jan 17, 2005
    #3
  4. Guest

    Steve Holden wrote:
    > wrote:
    >
    > > #import urllib, sys
    > > #pages = ['http://www.python.org', 'http://xxx']
    > > #for i in pages:
    > > # try:
    > > # u = urllib.urlopen(i)
    > > # print u.geturl()
    > > # except Exception, e:
    > > # print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
    > > will print an error if a page fails opening, rest opens fine
    > >

    > More generally you may wish to use the timeout features of TCP

    sockets.
    > These were introduced in Python 2.3, though Tim O'Malley's module
    > "timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
    > available for earlier versions.
    >
    > You will need to import the socket module and then call
    > socket.setdefaulttimeout() to ensure that communication with
    > non-responsive servers results in a socket exception that you can

    trap.
    >
    > regards
    > Steve
    > --
    > Steve Holden http://www.holdenweb.com/
    > Python Web Programming http://pydish.holdenweb.com/
    > Holden Web LLC +1 703 861 4237 +1 800 494 3119


    Thank you and Steve for some ideas.Finding the
    fact that the script hanged is not a big problem .
    I,however, would need a solution that I will not need to start again
    the script but the script re-start by itself. I am thinking about two
    threads, the main(master) that will supervise a slave thread.This slave
    thread will download the pages and whenever there is a timeout the
    master thread restart a slave thread.
    Is it a good solution? Or is there a better one?
    Thanks for help
    Lad
    , Jan 17, 2005
    #4
  5. Steve Holden wrote:

    > You will need to import the socket module and then call socket.setdefaulttimeout() to ensure that
    > communication with non-responsive servers results in a socket exception that you can trap.


    or you can use asynchronous sockets, so your program can keep processing
    the sites that do respond at once while it's waiting for the ones that don't. for
    one way to do that, see "Using HTTP to Download Files" here:

    http://effbot.org/zone/effnews-1.htm

    (make sure you read the second and third article as well)

    </F>
    Fredrik Lundh, Jan 17, 2005
    #5
  6. Guest

    Fredrik Lundh wrote:
    > Steve Holden wrote:
    >
    > > You will need to import the socket module and then call

    socket.setdefaulttimeout() to ensure that
    > > communication with non-responsive servers results in a socket

    exception that you can trap.
    >
    > or you can use asynchronous sockets, so your program can keep

    processing
    > the sites that do respond at once while it's waiting for the ones

    that don't. for
    > one way to do that, see "Using HTTP to Download Files" here:
    >
    > http://effbot.org/zone/effnews-1.htm
    >
    > (make sure you read the second and third article as well)
    >

    Dear Fredrik Lundh,
    Thank you for the link. I checked it. But I have not found an answer to
    my question.
    My problem is that I can not finish( sometimes) to download all pages.
    Sometimes my script freezes and I can not do nothing but restart the
    script from the last successfully downloaded web page. There is no
    error saying that was an error. I do not know why; maybe the server is
    programed to reduce the numbers of connection or there maybe different
    reasons.So, my idea was two threads. One master ,suprevising the slave
    thread that would do downloading and if the slave thread stopped,
    master thread would start another slave. Is it a good solution? Or is
    there a better solution?
    Thanks for help
    Lad
    , Jan 18, 2005
    #6
  7. Fuzzyman Guest

    Steve Holden wrote:
    > wrote:
    >
    > > #import urllib, sys
    > > #pages = ['http://www.python.org', 'http://xxx']
    > > #for i in pages:
    > > # try:
    > > # u = urllib.urlopen(i)
    > > # print u.geturl()
    > > # except Exception, e:
    > > # print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
    > > will print an error if a page fails opening, rest opens fine
    > >

    > More generally you may wish to use the timeout features of TCP

    sockets.
    > These were introduced in Python 2.3, though Tim O'Malley's module
    > "timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
    > available for earlier versions.
    >
    > You will need to import the socket module and then call
    > socket.setdefaulttimeout() to ensure that communication with
    > non-responsive servers results in a socket exception that you can

    trap.
    >


    So adding :

    import socket
    socket.setdefaulttimeout()


    Is *necessary* in order to avoid hangs when using urllib2 to fetch web
    resources ?

    Regards,

    Fuzzy
    http://www.voidspace.org.uk/python/index.shtml

    > regards
    > Steve
    > --
    > Steve Holden http://www.holdenweb.com/
    > Python Web Programming http://pydish.holdenweb.com/
    > Holden Web LLC +1 703 861 4237 +1 800 494 3119
    Fuzzyman, Jan 19, 2005
    #7
  8. Guest

    Fuzzyman wrote:
    > Steve Holden wrote:
    > > wrote:
    > >
    > > > #import urllib, sys
    > > > #pages = ['http://www.python.org', 'http://xxx']
    > > > #for i in pages:
    > > > # try:
    > > > # u = urllib.urlopen(i)
    > > > # print u.geturl()
    > > > # except Exception, e:
    > > > # print >> sys.stderr, '%s: %s' % (e.__class__.__name__,

    e)
    > > > will print an error if a page fails opening, rest opens fine
    > > >

    > > More generally you may wish to use the timeout features of TCP

    > sockets.
    > > These were introduced in Python 2.3, though Tim O'Malley's module
    > > "timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
    > > available for earlier versions.
    > >
    > > You will need to import the socket module and then call
    > > socket.setdefaulttimeout() to ensure that communication with
    > > non-responsive servers results in a socket exception that you can

    > trap.
    > >

    >
    > So adding :
    >
    > import socket
    > socket.setdefaulttimeout()
    >
    >
    > Is *necessary* in order to avoid hangs when using urllib2 to fetch

    web
    > resources ?
    >
    > Regards,
    >
    > Fuzzy
    > http://www.voidspace.org.uk/python/index.shtml


    Fuzzy,
    I use HTTPLIB with timeoutsocket but there is no timeout but the script
    freezes sometimes.I suspect the server, from which I download pages,
    does that to prevent high traffic. I must re start my script.
    Do you think Urllib2 would be better?
    Or is there a better solution?
    Regards,
    Lad
    , Jan 19, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ThunderMusic

    Prevent service from stopping

    ThunderMusic, Jan 11, 2006, in forum: ASP .Net
    Replies:
    6
    Views:
    4,634
  2. Bart Nessux

    stopping a threaded script

    Bart Nessux, Feb 4, 2004, in forum: Python
    Replies:
    1
    Views:
    255
    Diez B. Roggisch
    Feb 4, 2004
  3. dfaber
    Replies:
    1
    Views:
    351
    Amit Khemka
    Jul 4, 2006
  4. Dantium
    Replies:
    2
    Views:
    260
    Dantium
    Oct 11, 2010
  5. Nene
    Replies:
    5
    Views:
    627
    Abhishek Jain
    Apr 9, 2012
Loading...

Share This Page