How to "gunzip-iterate" over a file?

K

kj

I need to iterate over the lines of *very* large (>1 GB) gzipped
files. I would like to do this without having to read the full
compressed contents into memory so that I can apply zlib.decompress
to these contents. I also would like to avoid having to gunzip
the file (i.e. creating an uncompressed version of the file in the
filesystem) prior to iterating over it.

Basically I'm looking for something that will give me the same
functionality as Perl's gzip IO layer, which looks like this (from
the documentation):

use PerlIO::gzip;
open FOO, "<:gzip", "file.gz" or die $!;
print while <FOO>; # And it will be uncompressed...

What's the best way to achieve the same functionality in Python?

TIA!

kynn
 
R

Robert Kern

I need to iterate over the lines of *very* large (>1 GB) gzipped
files. I would like to do this without having to read the full
compressed contents into memory so that I can apply zlib.decompress
to these contents. I also would like to avoid having to gunzip
the file (i.e. creating an uncompressed version of the file in the
filesystem) prior to iterating over it.

Basically I'm looking for something that will give me the same
functionality as Perl's gzip IO layer, which looks like this (from
the documentation):

use PerlIO::gzip;
open FOO, "<:gzip", "file.gz" or die $!;
print while<FOO>; # And it will be uncompressed...

What's the best way to achieve the same functionality in Python?

http://docs.python.org/library/gzip

import gzip

f = gzip.open('filename.gz')
for line in f:
print line
f.close()

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
P

Paul Rubin

Robert Kern said:
f = gzip.open('filename.gz')
for line in f:
print line

or use f.read(nbytes) to read n uncompressed bytes from f. Note that
the standard iterator (which iterates over lines) can potentially
consume an unbounded amount of memory if the file contains no newlines.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top