Memory errors with large zip files

Discussion in 'Python' started by Lorn, May 20, 2005.

  1. Lorn

    Lorn Guest

    Is there a limitation with python's zipfile utility that limits the
    size of a file that can be extracted? I'm currently trying to extract
    125MB zip files with files that are uncompressed to > 1GB and am
    receiving memory errors. Indeed my ram gets maxed during extraction and
    then the script quits. Is there a way to spool to disk on the fly, or
    is necessary that python opens the entire file before writing? The code
    below iterates through a directory of zip files and extracts them
    (thanks John!), however for testing I've just been using one file:

    zipnames = [x for x in glob.glob('*.zip') if isfile(x)]
    for zipname in zipnames:
    zf =zipfile.ZipFile (zipname, 'r')
    for zfilename in zf.namelist():
    newFile = open ( zfilename, "wb")
    newFile.write (zf.read (zfilename))
    newFile.close()
    zf.close()


    Any suggestions or comments on how I might be able to work with zip
    files of this size would be very helpful.

    Best regards,
    Lorn
     
    Lorn, May 20, 2005
    #1
    1. Advertising

  2. Lorn

    Lorn Guest

    Ok, I'm not sure if this helps any, but in debugging it a bit I see the
    script stalls on:

    newFile.write (zf.read (zfilename))

    The memory error generated references line 357 of the zipfile.py
    program at the point of decompression:

    elif zinfo.compress_type == ZIP_DEFLATED:
    if not zlib:
    raise RuntimeError, \
    "De-compression requires the (missing) zlib module"
    # zlib compress/decompress code by Jeremy Hylton of CNRI
    dc = zlib.decompressobj(-15)
    bytes = dc.decompress(bytes) ### <------ right here

    Is there anyway to modify how my code is approaching this or perhaps
    how the zipfile code is handling it or do I need to just invest in more
    RAM? I currently have 512 MB and thought that would be plenty....
    perhaps I was wrong :-(. If anyone has any ideas it would truly be very
    helpful.

    Lorn
     
    Lorn, May 21, 2005
    #2
    1. Advertising

  3. Hi


    I had make this test (try) :

    - create 12 txt's files of 100 MB (exactly 102 400 000 bytes)
    - create the file "tst.zip" who contains this 12 files (but the file result
    is only 1 095 965 bytes size...)
    - delete the 12 txt's files
    - try your code

    And... it's OK for me.

    But : the compressed file is only 1 MB of size ; I had 1 GB of RAM ; I use
    windows-XP

    Sorry, because :
    1) my english is bad
    2) I had no found your problem


    Michel Claveau
     
    Do Re Mi chel La Si Do, May 21, 2005
    #3
  4. Lorn

    John Machin Guest

    On 20 May 2005 18:04:22 -0700, "Lorn" <> wrote:

    >Ok, I'm not sure if this helps any, but in debugging it a bit I see the
    >script stalls on:
    >
    >newFile.write (zf.read (zfilename))
    >
    >The memory error generated references line 357 of the zipfile.py
    >program at the point of decompression:
    >
    >elif zinfo.compress_type == ZIP_DEFLATED:
    > if not zlib:
    > raise RuntimeError, \
    > "De-compression requires the (missing) zlib module"
    > # zlib compress/decompress code by Jeremy Hylton of CNRI
    > dc = zlib.decompressobj(-15)
    > bytes = dc.decompress(bytes) ### <------ right here
    >



    The basic problem is that the zipfile module is asking the "dc" object
    to decompress the whole file at once -- so you would need (at least)
    enough memory to hold both the compressed file (C) and the
    uncompressed file (U). There is also a possibility that this could
    rise to 2U instead of U+C -- read a few lines further on:

    bytes = bytes + ex

    >Is there anyway to modify how my code is approaching this


    You're doing the best you can, as far as I can tell.

    > or perhaps
    >how the zipfile code is handling it


    Read this:
    http://docs.python.org/lib/module-zlib.html

    If you think you can work out how to modify zipfile.py to feed
    dc.decompressobj a chunk of data at a time, properly manipulating
    dc.unconsumed_tail, and keeping memory usage to a minimum, then go for
    it :)

    Reading the source of the Python zlib module, plus this page from the
    zlib website could be helpful, perhaps even necessary:
    http://www.gzip.org/zlib/zlib_how.html

    See also the following post to this newsgroup:
    From: John Goerzen <>
    Newsgroups: comp.lang.python
    Subject: Fixes to zipfile.py [PATCH]
    Date: Fri, 07 Mar 2003 16:39:25 -0600

    .... his patch obviously wasn't accepted :-(


    > or do I need to just invest in more
    >RAM? I currently have 512 MB and thought that would be plenty....
    >perhaps I was wrong :-(.


    Before you do anything rash (hacking zipfile.py or buying more
    memory), take a step back for a moment:

    Is this a one-off exercise or a regular exercise? Does it *really*
    need to be done programatically? There will be at least one
    command-line unzipper program for your platform . One-off req't: do it
    manually.
    Regular: Try using the unzipper manually; if all the available
    unzippers on your platform die with a memory allocation problem then
    you really have a problem. If it works, then instead of using the
    zipfile module, use the unzipper program from your Python code via a
    subprocess.

    HTH,
    John
     
    John Machin, May 21, 2005
    #4
  5. Thank for the detailed reply John! I guess it turned out to be a bit
    tougher than I originally thought :)....

    Reading over your links, I think I better not attempt rewriting the
    zipfile.py program... a little over my head :). The best solution,
    from everything I read seems to be calling an unzipper program from a
    subprocess. I assume you mean using execfile()? I can't think of
    another way.

    Anyway, thank you very much for your help, it's been very educational.

    Best regards,
    Lorn
     
    Marcus Lowland, May 23, 2005
    #5
  6. Lorn

    John Machin Guest

    On 23 May 2005 09:28:15 -0700, "Marcus Lowland" <>
    wrote:

    >Thank for the detailed reply John! I guess it turned out to be a bit
    >tougher than I originally thought :)....
    >
    >Reading over your links, I think I better not attempt rewriting the
    >zipfile.py program... a little over my head :). The best solution,
    >from everything I read seems to be calling an unzipper program from a
    >subprocess. I assume you mean using execfile()? I can't think of
    >another way.


    Errrmmmm ... no, execfile runs a Python source file.

    Check out the subprocess module:

    """
    6.8 subprocess -- Subprocess management

    New in version 2.4.

    The subprocess module allows you to spawn new processes, connect to
    their input/output/error pipes, and obtain their return codes. This
    module intends to replace several other, older modules and functions,
    such as:

    os.system
    os.spawn*
    os.popen*
    popen2.*
    commands.*
    """
     
    John Machin, May 23, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Goldin

    Errors, errors, errors

    Mark Goldin, Jan 17, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    1,028
    Mark Goldin
    Jan 17, 2004
  2. Alex Hunsley
    Replies:
    1
    Views:
    626
    Andrew Thompson
    Sep 16, 2004
  3. Kevin Ar18

    Unable to read large files from zip

    Kevin Ar18, Aug 29, 2007, in forum: Python
    Replies:
    2
    Views:
    523
    David Bolen
    Aug 29, 2007
  4. MoshiachNow
    Replies:
    2
    Views:
    299
    Ilya Zakharevich
    Oct 4, 2006
  5. Bo Yang
    Replies:
    9
    Views:
    328
    -berlin.de
    Nov 20, 2006
Loading...

Share This Page