tarfile.open(mode='w:gz'|'w|gz'|..., fileobj=StringIO()) fails.

Discussion in 'Python' started by sebastian.noack@googlemail.com, May 26, 2008.

  1. Guest

    Hi,

    is there a way to or at least a reason why I can not use tarfile to
    create a gzip or bunzip2 compressed archive in the memory?

    You might might wanna answer "use StringIO" but this isn't such easy
    as it seems to be. ;) I am using Python 2.5.2, by the way. I think
    this is a bug in at least in this version of python, but maybe
    StringIO isn't just file-like enough for this "korky" tarfile module.
    But this would conflict with its documentation.

    "For special purposes, there is a second format for mode: 'filemode|
    [compression]'. open() will return a TarFile object that processes its
    data as a stream of blocks. No random seeking will be done on the
    file. If given, fileobj may be any object that has a read() or write()
    method (depending on the mode)."

    Sounds good, but doesn't work. ;P StringIO provides a read() and
    write() method amongst others. But tarfile has especially in this mode
    problems with the StringIO object.

    I extracted the code out of my project into a standalone python script
    to proof this issue on the lowest level. You can run the script below
    as following: ./StringIO-tarfile.py file1 [file2] [...]


    #
    # File: StringIO-tarfile.py
    #
    #!/usr/bin/env python

    from StringIO import StringIO
    import tarfile
    import sys

    def create_tar_file(filenames, fileobj, mode, result_cb=lambda f:
    None):
    tar_file = tarfile.open(mode=mode, fileobj=fileobj)
    for f in filenames:
    tar_file.add(f)
    result = result_cb(fileobj)
    tar_file.close()
    return result

    if __name__ == '__main__':
    files = sys.argv[1:]
    modes = ['w%s%s' % (x, y)for x in (':', '|') for y in ('', 'gz',
    'bz2')]

    string_io_cb = lambda f: f.getvalue()

    for mode in modes:
    ext = mode.replace('w|', '-pipe.tar.').replace('w:',
    '.tar.').rstrip('.')
    # StringIO test.
    content = create_tar_file(files, StringIO(), mode, string_io_cb)
    fd = open('StringIO%s' % ext, 'w')
    fd.write(content)
    fd.close()

    # file object test.
    fd = open('file%s' % ext, 'w')
    create_tar_file(files, fd, mode)


    As test input, I have used a directory with a single text file. As you
    can see below, any tests using plain file objects were successful. But
    when using StringIO, I can only create uncompressed tar files. Even
    though I don't get any errors when creating them most of the files are
    just empty or truncated.


    $ for f in `ls *.tar{,.gz,.bz2}`; do echo -n $f; du -h $f | awk
    '{print " ("$1"B)"}'; tar -tf $f; echo; done

    file-pipe.tar (84KB)
    foo/
    foo/ksp-fosdem2008.txt

    file-pipe.tar.bz2 (20KB)
    foo/
    foo/ksp-fosdem2008.txt

    file-pipe.tar.gz (20KB)
    foo/
    foo/ksp-fosdem2008.txt

    file.tar (84KB)
    foo/
    foo/ksp-fosdem2008.txt

    file.tar.bz2 (20KB)
    foo/
    foo/ksp-fosdem2008.txt

    file.tar.gz (20KB)
    foo/
    foo/ksp-fosdem2008.txt

    StringIO-pipe.tar (76KB)
    foo/
    foo/ksp-fosdem2008.txt
    tar: Unexpected EOF in archive
    tar: Error is not recoverable: exiting now

    StringIO-pipe.tar.bz2 (0B)
    tar: This does not look like a tar archive
    tar: Error exit delayed from previous errors

    StringIO-pipe.tar.gz (0B)
    tar: This does not look like a tar archive
    tar: Error exit delayed from previous errors

    StringIO.tar (76KB)
    foo/
    foo/ksp-fosdem2008.txt

    StringIO.tar.bz2 (0B)
    tar: This does not look like a tar archive
    tar: Error exit delayed from previous errors

    StringIO.tar.gz (4.0KB)

    gzip: stdin: unexpected end of file
    tar: Child returned status 1
    tar: Error exit delayed from previous errors


    Can somebody reproduce this problem? Did I misunderstood the API? What
    would be the best work around, if I am right? I am thinking about
    using the gzip and bz2 module directly.

    Regards
    Sebastian Noack
     
    , May 26, 2008
    #1
    1. Advertising

  2. En Mon, 26 May 2008 17:44:28 -0300,
    <> escribió:

    > is there a way to or at least a reason why I can not use tarfile to
    > create a gzip or bunzip2 compressed archive in the memory?
    >
    > You might might wanna answer "use StringIO" but this isn't such easy
    > as it seems to be. ;) I am using Python 2.5.2, by the way. I think
    > this is a bug in at least in this version of python, but maybe
    > StringIO isn't just file-like enough for this "korky" tarfile module.
    > But this would conflict with its documentation.


    > def create_tar_file(filenames, fileobj, mode, result_cb=lambda f:
    > None):
    > tar_file = tarfile.open(mode=mode, fileobj=fileobj)
    > for f in filenames:
    > tar_file.add(f)
    > result = result_cb(fileobj)
    > tar_file.close()
    > return result


    It's not a bug, you must extract the StringIO contents *after* closing
    tar_file, else you won't get the last blocks pending to be written.

    --
    Gabriel Genellina
     
    Gabriel Genellina, May 27, 2008
    #2
    1. Advertising

  3. Guest

    On May 27, 2:17 am, "Gabriel Genellina" <>
    wrote:
    > It's not a bug, you must extract the StringIO contents *after* closing
    > tar_file, else you won't get the last blocks pending to be written.


    I looked at tarfile's source code last night after I wrote this
    message and figured it out. But the problem is that TarFile's close
    method closes the underlying file object after the last block is
    written and when you close StringIO you can not get its content
    anymore. Wtf does it close the underlying file? There is absolute no
    reason for doing this. Are you still sure this isn't a bug?

    Regards
    Sebastian Noack
     
    , May 27, 2008
    #3
  4. Guest

    I have written a FileWrapper class as workaround, which works for me
    (see the code below). The FileWrapper object holds an internal file-
    like object and maps its attributes, but prevents the user (in this
    case tarfile) from closing the internal file, so I can still access
    StringIO's content after closing the TarFile object.

    But this should not be required to create in memory tar files. It is
    definitely a bug, that TarFile closes external file objects passed to
    tarfile.open, when closing the TarFile object. The code which opens a
    file is also responsible for closing it.

    Regards
    Sebastian Noack


    #
    # File: StringIO-tarfile.py
    #
    #!/usr/bin/env python

    from StringIO import StringIO
    import tarfile
    import sys

    class FileWrapper(object):
    def __init__(self, fileobj):
    self.file = fileobj
    self.closed = fileobj.closed

    def __getattr__(self, name):
    # Raise AttributeError, if it isn't a file attribute.
    if name not in dir(file):
    raise AttributeError(name)

    # Get the attribute of the internal file object.
    value = getattr(self.file, name)

    # Raise a ValueError, if the attribute is callable (e.g. an instance
    # method) and the FileWrapper is closed.
    if callable(value) and self.closed:
    raise ValueError('I/O operation on closed file')
    return value

    def close(self):
    self.closed = True

    def create_tar_file(filenames, fileobj, mode):
    tar_file = tarfile.open(mode=mode, fileobj=fileobj)
    for f in filenames:
    tar_file.add(f)
    tar_file.close()

    if __name__ == '__main__':
    files = sys.argv[1:]
    modes = ['w%s%s' % (x, y) for x in (':', '|') for y in ('', 'gz',
    'bz2')]

    for mode in modes:
    ext = mode.replace('w|', '-pipe.tar.').replace('w:',
    '.tar.').rstrip('.')
    # StringIO test.
    stream = FileWrapper(StringIO())
    create_tar_file(files, stream, mode)
    fd = open('StringIO%s' % ext, 'w')
    fd.write(stream.file.getvalue())
    stream.file.close()
    fd.close()

    # file object test.
    fd = open('file%s' % ext, 'w')
    create_tar_file(files, fd, mode)
     
    , May 27, 2008
    #4
  5. On Tue, May 27, 2008 at 01:51:47AM -0700, wrote:
    > I have written a FileWrapper class as workaround, which works for me
    > (see the code below). The FileWrapper object holds an internal file-
    > like object and maps its attributes, but prevents the user (in this
    > case tarfile) from closing the internal file, so I can still access
    > StringIO's content after closing the TarFile object.
    >
    > But this should not be required to create in memory tar files. It is
    > definitely a bug, that TarFile closes external file objects passed to
    > tarfile.open, when closing the TarFile object. The code which opens a
    > file is also responsible for closing it.


    You're right, _BZ2Proxy.close() calls the wrapped file object's close() method
    and that is definitely not the desired behaviour. So, if you can do without 'bz2'
    modes for now, you're problem is gone, all other modes work fine.

    I fixed it (r63744), so the next beta release will work as expected. Your test
    script helped a lot, thanks.

    Regards,

    --
    Lars Gustäbel


    A casual stroll through a lunatic asylum shows that
    faith does not prove anything.
    (Friedrich Nietzsche)
     
    Lars Gustäbel, May 27, 2008
    #5
  6. En Tue, 27 May 2008 02:43:53 -0300,
    <> escribió:

    > On May 27, 2:17 am, "Gabriel Genellina" <>
    > wrote:
    >> It's not a bug, you must extract the StringIO contents *after* closing
    >> tar_file, else you won't get the last blocks pending to be written.

    >
    > I looked at tarfile's source code last night after I wrote this
    > message and figured it out. But the problem is that TarFile's close
    > method closes the underlying file object after the last block is
    > written and when you close StringIO you can not get its content
    > anymore. Wtf does it close the underlying file? There is absolute no
    > reason for doing this. Are you still sure this isn't a bug?


    Ouch, sorry, I only tried with gzip (and worked fine), not bz2 (which is
    buggy).

    --
    Gabriel Genellina
     
    Gabriel Genellina, May 27, 2008
    #6
  7. Guest

    That is right, only bz2 is affected. I am happy that i could help. ;)

    Regards
    Sebastian Noack
     
    , May 28, 2008
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Skip Montanaro
    Replies:
    0
    Views:
    165
    Skip Montanaro
    May 30, 2013
  2. Cameron Simpson
    Replies:
    0
    Views:
    111
    Cameron Simpson
    May 31, 2013
  3. Göktuğ Kayaalp
    Replies:
    0
    Views:
    124
    Göktuğ Kayaalp
    May 31, 2013
  4. Skip Montanaro
    Replies:
    0
    Views:
    121
    Skip Montanaro
    May 31, 2013
  5. Serhiy Storchaka
    Replies:
    0
    Views:
    92
    Serhiy Storchaka
    May 31, 2013
Loading...

Share This Page