Memory errors with large zip files

Lorn · May 20, 2005

Is there a limitation with python's zipfile utility that limits the
size of a file that can be extracted? I'm currently trying to extract
125MB zip files with files that are uncompressed to > 1GB and am
receiving memory errors. Indeed my ram gets maxed during extraction and
then the script quits. Is there a way to spool to disk on the fly, or
is necessary that python opens the entire file before writing? The code
below iterates through a directory of zip files and extracts them
(thanks John!), however for testing I've just been using one file:

zipnames = [x for x in glob.glob('*.zip') if isfile(x)]
for zipname in zipnames:
zf =zipfile.ZipFile (zipname, 'r')
for zfilename in zf.namelist():
newFile = open ( zfilename, "wb")
newFile.write (zf.read (zfilename))
newFile.close()
zf.close()

Any suggestions or comments on how I might be able to work with zip
files of this size would be very helpful.

Best regards,
Lorn

Lorn · May 21, 2005

Ok, I'm not sure if this helps any, but in debugging it a bit I see the
script stalls on:

newFile.write (zf.read (zfilename))

The memory error generated references line 357 of the zipfile.py
program at the point of decompression:

elif zinfo.compress_type == ZIP_DEFLATED:
if not zlib:
raise RuntimeError, \
"De-compression requires the (missing) zlib module"
# zlib compress/decompress code by Jeremy Hylton of CNRI
dc = zlib.decompressobj(-15)
bytes = dc.decompress(bytes) ### <------ right here

Is there anyway to modify how my code is approaching this or perhaps
how the zipfile code is handling it or do I need to just invest in more
RAM? I currently have 512 MB and thought that would be plenty....
perhaps I was wrong :-(. If anyone has any ideas it would truly be very
helpful.

Lorn

Do Re Mi chel La Si Do · May 21, 2005

Hi

I had make this test (try) :

- create 12 txt's files of 100 MB (exactly 102 400 000 bytes)
- create the file "tst.zip" who contains this 12 files (but the file result
is only 1 095 965 bytes size...)
- delete the 12 txt's files
- try your code

And... it's OK for me.

But : the compressed file is only 1 MB of size ; I had 1 GB of RAM ; I use
windows-XP

Sorry, because :
1) my english is bad
2) I had no found your problem

Michel Claveau

John Machin · May 21, 2005

Ok, I'm not sure if this helps any, but in debugging it a bit I see the
script stalls on:

newFile.write (zf.read (zfilename))

The memory error generated references line 357 of the zipfile.py
program at the point of decompression:

elif zinfo.compress_type == ZIP_DEFLATED:
if not zlib:
raise RuntimeError, \
"De-compression requires the (missing) zlib module"
# zlib compress/decompress code by Jeremy Hylton of CNRI
dc = zlib.decompressobj(-15)
bytes = dc.decompress(bytes) ### <------ right here

The basic problem is that the zipfile module is asking the "dc" object
to decompress the whole file at once -- so you would need (at least)
enough memory to hold both the compressed file (C) and the
uncompressed file (U). There is also a possibility that this could
rise to 2U instead of U+C -- read a few lines further on:

bytes = bytes + ex

Is there anyway to modify how my code is approaching this

You're doing the best you can, as far as I can tell.

or perhaps
how the zipfile code is handling it

Read this:
http://docs.python.org/lib/module-zlib.html

If you think you can work out how to modify zipfile.py to feed
dc.decompressobj a chunk of data at a time, properly manipulating
dc.unconsumed_tail, and keeping memory usage to a minimum, then go for
it

Reading the source of the Python zlib module, plus this page from the
zlib website could be helpful, perhaps even necessary:
http://www.gzip.org/zlib/zlib_how.html

See also the following post to this newsgroup:
From: John Goerzen <[email protected]>
Newsgroups: comp.lang.python
Subject: Fixes to zipfile.py [PATCH]
Date: Fri, 07 Mar 2003 16:39:25 -0600

.... his patch obviously wasn't accepted :-(

or do I need to just invest in more
RAM? I currently have 512 MB and thought that would be plenty....
perhaps I was wrong :-(.

Before you do anything rash (hacking zipfile.py or buying more
memory), take a step back for a moment:

Is this a one-off exercise or a regular exercise? Does it *really*
need to be done programatically? There will be at least one
command-line unzipper program for your platform . One-off req't: do it
manually.
Regular: Try using the unzipper manually; if all the available
unzippers on your platform die with a memory allocation problem then
you really have a problem. If it works, then instead of using the
zipfile module, use the unzipper program from your Python code via a
subprocess.

HTH,
John

Marcus Lowland · May 23, 2005

Thank for the detailed reply John! I guess it turned out to be a bit
tougher than I originally thought

....

Reading over your links, I think I better not attempt rewriting the
zipfile.py program... a little over my head

. The best solution,
from everything I read seems to be calling an unzipper program from a
subprocess. I assume you mean using execfile()? I can't think of
another way.

Anyway, thank you very much for your help, it's been very educational.

Best regards,
Lorn

John Machin · May 23, 2005

Thank for the detailed reply John! I guess it turned out to be a bit
tougher than I originally thought ....

Reading over your links, I think I better not attempt rewriting the
zipfile.py program... a little over my head . The best solution,
from everything I read seems to be calling an unzipper program from a
subprocess. I assume you mean using execfile()? I can't think of
another way.

Errrmmmm ... no, execfile runs a Python source file.

Check out the subprocess module:

"""
6.8 subprocess -- Subprocess management

New in version 2.4.

The subprocess module allows you to spawn new processes, connect to
their input/output/error pipes, and obtain their return codes. This
module intends to replace several other, older modules and functions,
such as:

os.system
os.spawn*
os.popen*
popen2.*
commands.*
"""

Extracting multiple zip files in a directory	4	May 19, 2005
Extracted files form zip corrupted	3	Mar 26, 2009
ZipFile and file rigths	8	Jul 10, 2005
Unable to read large files from zip	2	Aug 29, 2007

Memory errors with large zip files

Lorn

Lorn

Do Re Mi chel La Si Do

John Machin

Marcus Lowland

John Machin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads