Downloading Large Files -- Feedback?

Discussion in 'Python' started by mwt, Feb 12, 2006.

  1. mwt

    mwt Guest

    This code works fine to download files from the web and write them to
    the local drive:

    import urllib
    f = urllib.urlopen("http://www.python.org/blah/blah.zip")
    g = f.read()
    file = open("blah.zip", "wb")
    file.write(g)
    file.close()

    The process is pretty opaque, however. This downloads and writes the
    file with no feedback whatsoever. You don't see how many bytes you've
    downloaded already, etc. Especially the "g = f.read()" step just sits
    there while downloading a large file, presenting a pregnant, blinking
    cursor.

    So my question is, what is a good way to go about coding this kind of
    basic feedback? Also, since my testing has only *worked* with this
    code, I'm curious if it will throw a visibile error if something goes
    wrong with the download.

    Thanks for any pointers. I'm busily Googling away.
    mwt, Feb 12, 2006
    #1
    1. Advertising

  2. mwt

    Paul Rubin Guest

    "mwt" <> writes:
    > f = urllib.urlopen("http://www.python.org/blah/blah.zip")
    > g = f.read() # ...


    > So my question is, what is a good way to go about coding this kind of
    > basic feedback? Also, since my testing has only *worked* with this
    > code, I'm curious if it will throw a visibile error if something goes
    > wrong with the download.


    One obvious type of failure is running out of memory if the file is
    too large. Python can be fairly hosed (VM thrashing etc.) by the time
    that happens. Normally you shouldn't read a potentially big file of
    unknown size all in one gulp like that. You'd instead say something
    like

    while True:
    block = f.read(4096) # read a 4k block from the file
    if len(block) == 0:
    break # end of file
    # do something with the block

    Your "do something with..." could involve updating a status display
    or something, saying how much has been read so far.
    Paul Rubin, Feb 12, 2006
    #2
    1. Advertising

  3. mwt

    mwt Guest

    Pardon my ignorance here, but could you give me an example of what
    would constitute file that is unreasonably or dangerously large? I'm
    running python on a ubuntu box with about a gig of ram.

    Also, do you know of any online examples of the kind of robust,
    real-world code you're describing?

    Thanks.
    mwt, Feb 12, 2006
    #3
  4. mwt <> wrote:
    ...
    > The process is pretty opaque, however. This downloads and writes the
    > file with no feedback whatsoever. You don't see how many bytes you've
    > downloaded already, etc. Especially the "g = f.read()" step just sits
    > there while downloading a large file, presenting a pregnant, blinking
    > cursor.
    >
    > So my question is, what is a good way to go about coding this kind of
    > basic feedback? Also, since my testing has only *worked* with this


    You may use urlretrieve instead of urlopen: urlretrieve accepts an
    optional argument named reporthook, and calls it once in a while ("zero
    or more times"...;-) with three arguments block_count (number of blocks
    downloaded so far), block_size (size of each block in bytes), file_size
    (total size of the file in bytes if known, otherwise -1). The
    reporthook function (or other callable) may display a progress bar or
    whatever you like best.

    urlretrieve saves what's downloading to a disk file (you may specify a
    filename, or let it pick an appropriate temporary filename) and returns
    two things, the filename where it's downloaded the data and a
    mimetools.Message instance whose headers have metadata (such as content
    type information).

    If that doesn't fit your needs well, you may study the sources of
    urllib.py in your Python's library source directory, to see exactly what
    it's doing and code your own modified version.


    Alex

    Alex
    Alex Martelli, Feb 12, 2006
    #4
  5. mwt wrote:

    > Pardon my ignorance here, but could you give me an example of what
    > would constitute file that is unreasonably or dangerously large? I'm
    > running python on a ubuntu box with about a gig of ram.


    1GB of RAM plus (say) 2GB of virtual memory = 3GB in total.

    Your OS and other running processes might be using
    (say) 1GB. So 2GB might be the absolute limit.

    Of course your mileage will vary, and in practice your
    machine will probably start slowing down long before
    that limit.


    > Also, do you know of any online examples of the kind of robust,
    > real-world code you're describing?


    It isn't written in C, but get your hands on wget. It
    is probably already on your Linux distro, but if not,
    check it out here:

    http://www.gnu.org/software/wget/wget.html



    --
    Steven.
    Steven D'Aprano, Feb 13, 2006
    #5
  6. mwt

    mwt Guest

    Thanks for the explanation. That is exactly what I'm looking for. In a
    way, it's kind of neat that urlopen just *does* it, no questions asked,
    but I'd like to just know the basics, which is what it sounds like
    urlretrieve covers. Excellent. Now, let's see what I can whip up with
    that.

    -- just bought "cookbook" and "nutshell" moments ago btw....
    mwt, Feb 13, 2006
    #6
  7. mwt

    mwt Guest

    mwt, Feb 13, 2006
    #7
  8. mwt <> wrote:

    > Thanks for the explanation. That is exactly what I'm looking for. In a
    > way, it's kind of neat that urlopen just *does* it, no questions asked,
    > but I'd like to just know the basics, which is what it sounds like
    > urlretrieve covers. Excellent. Now, let's see what I can whip up with
    > that.


    Yes, I entirely understand your mindset, because mine is so similar: I
    prefer using higher-level "just works" abstractions, BUT also want to
    understand what's going on "below"... "just in case"!-)

    > -- just bought "cookbook" and "nutshell" moments ago btw....


    Nice coincidence, and thanks!-)


    Alex
    Alex Martelli, Feb 13, 2006
    #8
  9. mwt

    mwt Guest

    So, I just put this little chunk to the test, which does give you
    feedback about what's going on with a file download. Interesting that
    with urlretrieve, you don't do all the file opening and closing stuff.

    Works fine:

    ------------------
    import urllib

    def download_file(filename, URL):
    f = urllib.urlretrieve(URL, filename, reporthook=my_report_hook)

    def my_report_hook(block_count, block_size, total_size):
    total_kb = total_size/1024
    print "%d kb of %d kb downloaded" %(block_count *
    (block_size/1024),total_kb )

    if __name__ == "__main__":
    download_file("test_zip.zip","http://blah.com/blah.zip")
    mwt, Feb 13, 2006
    #9
  10. mwt <> wrote:
    ...
    > import urllib
    >
    > def download_file(filename, URL):
    > f = urllib.urlretrieve(URL, filename, reporthook=my_report_hook)


    If you wanted to DO anything with the results, you'd probably want to
    assign to
    f, m = ...
    not just f. This way, f is the filename, m a message object useful for
    metadata (e.g., content type).

    Otherwise looks fine.


    Alex
    Alex Martelli, Feb 13, 2006
    #10
  11. mwt

    Fuzzyman Guest

    mwt wrote:
    > This code works fine to download files from the web and write them to
    > the local drive:
    >
    > import urllib
    > f = urllib.urlopen("http://www.python.org/blah/blah.zip")
    > g = f.read()
    > file = open("blah.zip", "wb")
    > file.write(g)
    > file.close()
    >
    > The process is pretty opaque, however. This downloads and writes the
    > file with no feedback whatsoever. You don't see how many bytes you've
    > downloaded already, etc. Especially the "g = f.read()" step just sits
    > there while downloading a large file, presenting a pregnant, blinking
    > cursor.
    >
    > So my question is, what is a good way to go about coding this kind of
    > basic feedback? Also, since my testing has only *worked* with this
    > code, I'm curious if it will throw a visibile error if something goes
    > wrong with the download.
    >


    By the way, you can achieve what you want with urllib2, you may also
    want to check out the pycurl library - which is a Python interface to a
    very good C library called curl.

    With urllib2 you don't *have* to read the whole thing in one go -

    import urllib2
    f = urllib2.urlopen("http://www.python.org/blah/blah.zip")
    g = ''
    while True:
    a = f.read(1024*10)
    if not a:
    break
    print 'Read another 10k'
    g += a

    file = open("blah.zip", "wb")
    file.write(g)
    file.close()

    All the best,

    Fuzzyman
    http://www.voidspace.org.uk/python/index.shtml
    > Thanks for any pointers. I'm busily Googling away.
    Fuzzyman, Feb 13, 2006
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    327
  2. Simon Clark
    Replies:
    0
    Views:
    124
    Simon Clark
    Feb 2, 2005
  3. Michael C. Gates

    Downloading Large Files

    Michael C. Gates, Jan 27, 2004, in forum: ASP General
    Replies:
    0
    Views:
    161
    Michael C. Gates
    Jan 27, 2004
  4. thehercman
    Replies:
    0
    Views:
    111
    thehercman
    Jan 20, 2006
  5. Chad Burt
    Replies:
    8
    Views:
    149
    Comfort Eagle
    Nov 24, 2006
Loading...

Share This Page