S
sebastian.noack
Hi,
is there a way to or at least a reason why I can not use tarfile to
create a gzip or bunzip2 compressed archive in the memory?
You might might wanna answer "use StringIO" but this isn't such easy
as it seems to be. I am using Python 2.5.2, by the way. I think
this is a bug in at least in this version of python, but maybe
StringIO isn't just file-like enough for this "korky" tarfile module.
But this would conflict with its documentation.
"For special purposes, there is a second format for mode: 'filemode|
[compression]'. open() will return a TarFile object that processes its
data as a stream of blocks. No random seeking will be done on the
file. If given, fileobj may be any object that has a read() or write()
method (depending on the mode)."
Sounds good, but doesn't work. ;P StringIO provides a read() and
write() method amongst others. But tarfile has especially in this mode
problems with the StringIO object.
I extracted the code out of my project into a standalone python script
to proof this issue on the lowest level. You can run the script below
as following: ./StringIO-tarfile.py file1 [file2] [...]
#
# File: StringIO-tarfile.py
#
#!/usr/bin/env python
from StringIO import StringIO
import tarfile
import sys
def create_tar_file(filenames, fileobj, mode, result_cb=lambda f:
None):
tar_file = tarfile.open(mode=mode, fileobj=fileobj)
for f in filenames:
tar_file.add(f)
result = result_cb(fileobj)
tar_file.close()
return result
if __name__ == '__main__':
files = sys.argv[1:]
modes = ['w%s%s' % (x, y)for x in (':', '|') for y in ('', 'gz',
'bz2')]
string_io_cb = lambda f: f.getvalue()
for mode in modes:
ext = mode.replace('w|', '-pipe.tar.').replace('w:',
'.tar.').rstrip('.')
# StringIO test.
content = create_tar_file(files, StringIO(), mode, string_io_cb)
fd = open('StringIO%s' % ext, 'w')
fd.write(content)
fd.close()
# file object test.
fd = open('file%s' % ext, 'w')
create_tar_file(files, fd, mode)
As test input, I have used a directory with a single text file. As you
can see below, any tests using plain file objects were successful. But
when using StringIO, I can only create uncompressed tar files. Even
though I don't get any errors when creating them most of the files are
just empty or truncated.
$ for f in `ls *.tar{,.gz,.bz2}`; do echo -n $f; du -h $f | awk
'{print " ("$1"B)"}'; tar -tf $f; echo; done
file-pipe.tar (84KB)
foo/
foo/ksp-fosdem2008.txt
file-pipe.tar.bz2 (20KB)
foo/
foo/ksp-fosdem2008.txt
file-pipe.tar.gz (20KB)
foo/
foo/ksp-fosdem2008.txt
file.tar (84KB)
foo/
foo/ksp-fosdem2008.txt
file.tar.bz2 (20KB)
foo/
foo/ksp-fosdem2008.txt
file.tar.gz (20KB)
foo/
foo/ksp-fosdem2008.txt
StringIO-pipe.tar (76KB)
foo/
foo/ksp-fosdem2008.txt
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
StringIO-pipe.tar.bz2 (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors
StringIO-pipe.tar.gz (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors
StringIO.tar (76KB)
foo/
foo/ksp-fosdem2008.txt
StringIO.tar.bz2 (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors
StringIO.tar.gz (4.0KB)
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error exit delayed from previous errors
Can somebody reproduce this problem? Did I misunderstood the API? What
would be the best work around, if I am right? I am thinking about
using the gzip and bz2 module directly.
Regards
Sebastian Noack
is there a way to or at least a reason why I can not use tarfile to
create a gzip or bunzip2 compressed archive in the memory?
You might might wanna answer "use StringIO" but this isn't such easy
as it seems to be. I am using Python 2.5.2, by the way. I think
this is a bug in at least in this version of python, but maybe
StringIO isn't just file-like enough for this "korky" tarfile module.
But this would conflict with its documentation.
"For special purposes, there is a second format for mode: 'filemode|
[compression]'. open() will return a TarFile object that processes its
data as a stream of blocks. No random seeking will be done on the
file. If given, fileobj may be any object that has a read() or write()
method (depending on the mode)."
Sounds good, but doesn't work. ;P StringIO provides a read() and
write() method amongst others. But tarfile has especially in this mode
problems with the StringIO object.
I extracted the code out of my project into a standalone python script
to proof this issue on the lowest level. You can run the script below
as following: ./StringIO-tarfile.py file1 [file2] [...]
#
# File: StringIO-tarfile.py
#
#!/usr/bin/env python
from StringIO import StringIO
import tarfile
import sys
def create_tar_file(filenames, fileobj, mode, result_cb=lambda f:
None):
tar_file = tarfile.open(mode=mode, fileobj=fileobj)
for f in filenames:
tar_file.add(f)
result = result_cb(fileobj)
tar_file.close()
return result
if __name__ == '__main__':
files = sys.argv[1:]
modes = ['w%s%s' % (x, y)for x in (':', '|') for y in ('', 'gz',
'bz2')]
string_io_cb = lambda f: f.getvalue()
for mode in modes:
ext = mode.replace('w|', '-pipe.tar.').replace('w:',
'.tar.').rstrip('.')
# StringIO test.
content = create_tar_file(files, StringIO(), mode, string_io_cb)
fd = open('StringIO%s' % ext, 'w')
fd.write(content)
fd.close()
# file object test.
fd = open('file%s' % ext, 'w')
create_tar_file(files, fd, mode)
As test input, I have used a directory with a single text file. As you
can see below, any tests using plain file objects were successful. But
when using StringIO, I can only create uncompressed tar files. Even
though I don't get any errors when creating them most of the files are
just empty or truncated.
$ for f in `ls *.tar{,.gz,.bz2}`; do echo -n $f; du -h $f | awk
'{print " ("$1"B)"}'; tar -tf $f; echo; done
file-pipe.tar (84KB)
foo/
foo/ksp-fosdem2008.txt
file-pipe.tar.bz2 (20KB)
foo/
foo/ksp-fosdem2008.txt
file-pipe.tar.gz (20KB)
foo/
foo/ksp-fosdem2008.txt
file.tar (84KB)
foo/
foo/ksp-fosdem2008.txt
file.tar.bz2 (20KB)
foo/
foo/ksp-fosdem2008.txt
file.tar.gz (20KB)
foo/
foo/ksp-fosdem2008.txt
StringIO-pipe.tar (76KB)
foo/
foo/ksp-fosdem2008.txt
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
StringIO-pipe.tar.bz2 (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors
StringIO-pipe.tar.gz (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors
StringIO.tar (76KB)
foo/
foo/ksp-fosdem2008.txt
StringIO.tar.bz2 (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors
StringIO.tar.gz (4.0KB)
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error exit delayed from previous errors
Can somebody reproduce this problem? Did I misunderstood the API? What
would be the best work around, if I am right? I am thinking about
using the gzip and bz2 module directly.
Regards
Sebastian Noack