tarfile...bug?

A

Anurag

Hi,

I am trying to use tarfile module to list contents of a 'gz' file but
it seems to hang for large files and CPU usage goes 100%.
though 'tar -tvf' on same file list contents in couple of secs.

Here is a test script which can show the problem; I am using python
Python 2.4.3

------------
import tarfile

bigFilePath = "/tmp/bigFile"
bigFileTGZ = "/tmp/big.tar.gz"

# create a big file
print "Creating big file...",bigFilePath
f = open(bigFilePath,"w")
for i in xrange(100):
f.write("anurag"*1024*1024)
f.close()

#create a tarfile from big file
print "pack to...",bigFileTGZ
tar = tarfile.open(bigFileTGZ, "w:gz")
tar.add(bigFilePath,"bigFile")
tar.close()

print "unpack...",bigFileTGZ
# now try to list contents of tar
tar = tarfile.open(bigFileTGZ, "r")
tar.list() #hangs
 
A

Anurag

Hi,

Have any one faced such problem, I assume it must be common if it can
be replicated so easily , or something wrong with my system

Also if I use tar.members instead of tar.getmembers() it works
so what is the diff. between tar.members and tar.getmembers()

rgds
Anurag
 
A

alan.haffner

Hi,

Have any one faced such problem, I assume it must be common if it can
be replicated so easily , or something wrong with my system

Also if I use tar.members instead of tar.getmembers() it works
so what is the diff. between tar.members and tar.getmembers()

rgds
Anurag

if you are not fully dependant on tarfiles, have a look at the zipfile
library in Python. Everytime I start to use the tarfile .lib, the
zip .lib turns out to be a better solution.

Cheers,

--Alan
 
S

Scott David Daniels

if you are not fully dependant on tarfiles, have a look at the zipfile
library in Python. Everytime I start to use the tarfile .lib, the
zip .lib turns out to be a better solution.

And here's why:
The tar-gzip format (sometimes .tar.gz, sometimes .tgz) is defined by
taking a fully expanded archive (tar archives), and compressing them
_as_a_whole_ with the gzip compression. It is not possible to see the
last bytes of the .tgz file without uncompressing _all_ of the file.

The zip format compresses the contained files individually, and keeps
a separate directory. So it can expand only the file you want whether
it is at the beginning or the end of the zip file. This is also (one
of) the reason(s) the .zip format gets less compression than the .tgz
format. Each file in the zip is separately compressed, so redundancy
between files is not compressed out.

-Scott David Daniels
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,123
Latest member
Layne6498
Top