Read a gzip file from inside a tar file

Discussion in 'Python' started by rohisingh@gmail.com, Dec 13, 2004.

  1. Guest

    I have a tar file. The content of the file are as following.

    rohits@sandman 12-08-04 $ tar tvf 20041208.tar
    drwxr-xr-x root/root 0 2004-12-08 21:39:19 20041208/
    -rw-r--r-- root/root 1576 2004-12-08 21:39:19 20041208/README
    drwxr-xr-x root/root 0 2004-12-08 21:27:31
    20041208/snapshot_01/
    -rw-r--r-- was/was 103010606 2004-12-08 16:37:38
    20041208/snapshot_01/tpv-2004 1208-1350.xml.gz


    What is the best method to read the content of the
    tpv-20041208-1350.xml.gz?

    I want to do the following with minimum code :)
    1) read above tar file
    2) find the gzip file
    3) read the content of this file
    4) perform operations on content
    5) continue

    I tried various combination of following code but it does not work as
    intended

    fileName = sys.argv[1]
    print "File Name is ", fileName
    tar = tarfile.open(fileName, "r:")
    for tarinfo in tar:
    if tarinfo.isreg():
    print tarinfo.name
    if tarinfo.name.find("tpv") != -1:
    #read the gzip file
    print "\thttp plugin file"
    fileLike = tar.extractfile(tarinfo)
    fileText = fileLike.read()
    stringio = StringIO.StringIO(fileText)
    fileRead = gzip.GzipFile(stringio)
    for aLine in fileRead:
    print aLine
    , Dec 13, 2004
    #1
    1. Advertising

  2. Rohit Guest

    if I change fileText = fileLike.read() to fileText =
    fileLike.readLines().

    It works for a while before it gets killed of out of memory.

    These are huge files. My goal is to analyze the content of the gzip
    file in the tar file without having to un gzip. If that is possible.
    Rohit, Dec 13, 2004
    #2
    1. Advertising

  3. Craig Ringer Guest

    On Tue, 2004-12-14 at 02:39, Rohit wrote:
    > if I change fileText = fileLike.read() to fileText =
    > fileLike.readLines().
    >
    > It works for a while before it gets killed of out of memory.
    >
    > These are huge files. My goal is to analyze the content of the gzip
    > file in the tar file without having to un gzip. If that is possible.


    As far as I know, gzip is a stream compression algorithm that can't be
    decompressed in small blocks. That is, I don't think you can seek 500k
    into a 1MB file and decompress the next 100k.

    I'd say you'll have to progressively read the file from the beginning,
    processing and discarding as you go. It looks like a no-brainer to me -
    see zlib.decompressobj.

    Note that you _do_ have to ungzip it, you just don't have to store the
    whole decompressed thing in memory / on disk at once. If you need to do
    anything to it that does require the entire thing to be loaded (or
    anything that means you have to seek around the file), I'd say you're
    SOL.

    --
    Craig Ringer
    Craig Ringer, Dec 13, 2004
    #3
  4. Craig Ringer wrote:

    >> These are huge files. My goal is to analyze the content of the gzip
    >> file in the tar file without having to un gzip. If that is possible.

    >
    > As far as I know, gzip is a stream compression algorithm that can't be
    > decompressed in small blocks. That is, I don't think you can seek 500k
    > into a 1MB file and decompress the next 100k.


    correct.

    > I'd say you'll have to progressively read the file from the beginning,
    > processing and discarding as you go. It looks like a no-brainer to me -
    > see zlib.decompressobj.


    it can be a bit tricky to set things up properly, though. here's a piece
    of code that uses Python's good old consumer interface to decode things
    incrementally:

    http://effbot.org/zone/consumer-gzip.htm

    you can either use this as is; just create a "target consumer", wrap it in the
    gzip consumer, and feed data to the gzip consumer in suitable pieces.

    alternatively, hack it until it does what you want.

    </F>
    Fredrik Lundh, Dec 13, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Claudio Grondi
    Replies:
    4
    Views:
    552
    Claudio Grondi
    Aug 20, 2005
  2. Replies:
    2
    Views:
    423
    Michael Hoffman
    Apr 24, 2007
  3. Ray Van Dolson
    Replies:
    0
    Views:
    320
    Ray Van Dolson
    Sep 23, 2009
  4. m_ahlenius
    Replies:
    2
    Views:
    286
    m_ahlenius
    Feb 8, 2010
  5. benoit Guyon
    Replies:
    2
    Views:
    218
    benoit Guyon
    Jul 26, 2005
Loading...

Share This Page