urllib2 - safe way to download something

Discussion in 'Python' started by konstantin, Nov 14, 2008.

  1. konstantin

    konstantin Guest

    Hi,

    I wonder if there is a safe way to download page with urllib2. I've
    constructed following method to catch all possible exceptions.

    def retrieve(url):
    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
    headers = {'User-Agent':user_agent}
    request = urllib2.Request(url, headers=headers)
    try:
    handler = urllib2.urlopen(request)
    data = handler.read()
    handler.close()
    except urllib2.HTTPError, e:
    log.warning("Server couldn't fulfill the request: %s, %s" % \
    (url, e.code))
    return None
    except urllib2.URLError, e:
    log.warning("Failed to reach a server: %s, %s" % (url,
    e.reason))
    return None
    except HTTPException, e:
    log.warning("HTTP exception: %s, %s" % (url,
    e.__class__.__name__))
    return None
    except socket.timeout:
    log.warning("Timeout expired: %s" % (url))
    return None
    return data


    But suddenly I've got the following:

    Traceback (most recent call last):
    File "/usr/lib/python2.5/threading.py", line 486, in
    __bootstrap_inner
    self.run()
    File "/home/light/prj/ym-crawl/shops/dispatcher.py", line 122, in
    run
    self.task(self.queue, item)
    File "scrawler.py", line 24, in spider
    data = retrieve(url)
    File "scrawler.py", line 44, in retrieve
    data = handler.read()
    File "/usr/lib/python2.5/socket.py", line 291, in read
    data = self._sock.recv(recv_size)
    File "/usr/lib/python2.5/httplib.py", line 509, in read
    return self._read_chunked(amt)
    File "/usr/lib/python2.5/httplib.py", line 563, in _read_chunked
    value += self._safe_read(chunk_left)
    File "/usr/lib/python2.5/httplib.py", line 602, in _safe_read
    chunk = self.fp.read(min(amt, MAXAMOUNT))
    File "/usr/lib/python2.5/socket.py", line 309, in read
    data = self._sock.recv(recv_size)
    error: (104, 'Connection reset by peer')

    What did I miss? I don't really want to catch all errors. Thanks!
    konstantin, Nov 14, 2008
    #1
    1. Advertising

  2. konstantin

    konstantin Guest

    I mean I don't want to catch all unexpected errors with empty
    "except:" :).
    konstantin, Nov 14, 2008
    #2
    1. Advertising

  3. On Fri, 14 Nov 2008 06:35:27 -0800, konstantin wrote:

    > Hi,
    >
    > I wonder if there is a safe way to download page with urllib2. I've
    > constructed following method to catch all possible exceptions.


    See here:

    http://niallohiggins.com/2008/04/05/python-and-poor-documentation-
    urllib2urlopen-exception-layering-problems/

    There are probably others as well... I seem to recall getting
    socket.error at some point myself.


    --
    Steven
    Steven D'Aprano, Nov 14, 2008
    #3
  4. konstantin

    konstantin Guest

    On 14 ÎÏÑÂ, 18:12, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > On Fri, 14 Nov 2008 06:35:27 -0800, konstantin wrote:
    > > Hi,

    >
    > > I wonder if there is a safe way to download page with urllib2. I've
    > > constructed following method to catch all possible exceptions.

    >
    > See here:
    >
    > http://niallohiggins.com/2008/04/05/python-and-poor-documentation-
    > urllib2urlopen-exception-layering-problems/
    >
    > There are probably others as well... I seem to recall getting
    > socket.error at some point myself.
    >
    > --
    > Steven


    Thanks. It's a nice post. But it seems there is no clear solution.
    I remember I've caught IOError and ValueError as well.
    I think urllib2 needs some unification on exception handling. It
    breaks simplicity and this is no good.

    But anyway, thanks.

    ps Maybe I could contribute to this module but I do not really know
    how and where to start.
    konstantin, Nov 14, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Alex Hunsley

    cookielib and urllib2: thread-safe?

    Alex Hunsley, Jan 26, 2005, in forum: Python
    Replies:
    1
    Views:
    470
    John J. Lee
    Jan 29, 2005
  2. Josef Cihal
    Replies:
    0
    Views:
    727
    Josef Cihal
    Sep 5, 2005
  3. Replies:
    7
    Views:
    461
    Stefan Behnel
    Aug 22, 2007
  4. Gabriel Rossetti
    Replies:
    0
    Views:
    1,298
    Gabriel Rossetti
    Aug 29, 2008
  5. Replies:
    1
    Views:
    326
    Brian Candler
    Aug 12, 2003
Loading...

Share This Page