bz2 module doesn't work properly with all bz2 files

Discussion in 'Python' started by Magdoll, Jun 4, 2010.

  1. Magdoll

    Magdoll Guest

    I'm not sure what's causing this, but depending on the compression
    program used, the bz2 module sometimes exits earlier.

    I used pbzip2 to compress my bz2 files and read through the file using
    the bz2 module. The file descriptor always exits much earlier than
    where the actual EOF is. If I use bzip2 instead of pbzip2 to compress
    the files, then everything is fine.

    My files are generally big (several GBs) so decompressing them is not
    a wise choice, and it is a little unfortunate that I can't use pbzip2
    because it's usually much faster than bz2.
     
    Magdoll, Jun 4, 2010
    #1
    1. Advertising

  2. On 04Jun2010 12:53, Magdoll <> wrote:
    | I'm not sure what's causing this, but depending on the compression
    | program used, the bz2 module sometimes exits earlier.
    |
    | I used pbzip2 to compress my bz2 files and read through the file using
    | the bz2 module. The file descriptor always exits much earlier than
    | where the actual EOF is. If I use bzip2 instead of pbzip2 to compress
    | the files, then everything is fine.
    |
    | My files are generally big (several GBs) so decompressing them is not
    | a wise choice, and it is a little unfortunate that I can't use pbzip2
    | because it's usually much faster than bz2.

    Have you tested the decompression or the problematic files with the
    bunzip2 command? Just to ensure the bug is with the python bz2 module
    and not with the pbzip2 utility?
    --
    Cameron Simpson <> DoD#743
    http://www.cskk.ezoshosting.com/cs/

    A lot of people don't know the difference between a violin and a viola, so
    I'll tell you. A viola burns longer. - Victor Borge
     
    Cameron Simpson, Jun 4, 2010
    #2
    1. Advertising

  3. Magdoll

    Magdoll Guest

    On Jun 4, 3:05 pm, Cameron Simpson <> wrote:
    > On 04Jun2010 12:53, Magdoll <> wrote:
    > | I'm not sure what's causing this, but depending on the compression
    > | program used, the bz2 module sometimes exits earlier.
    > |
    > | I used pbzip2 to compress my bz2 files and read through the file using
    > | the bz2 module. The file descriptor always exits much earlier than
    > | where the actual EOF is. If I use bzip2 instead of pbzip2 to compress
    > | the files, then everything is fine.
    > |
    > | My files are generally big (several GBs) so decompressing them is not
    > | a wise choice, and it is a little unfortunate that I can't use pbzip2
    > | because it's usually much faster than bz2.
    >
    > Have you tested the decompression or the problematic files with the
    > bunzip2 command? Just to ensure the bug is with the python bz2 module
    > and not with the pbzip2 utility?
    > --
    > Cameron Simpson <> DoD#743http://www.cskk.ezoshosting.com/cs/
    >
    > A lot of people don't know the difference between a violin and a viola, so
    > I'll tell you.  A viola burns longer.   - Victor Borge


    Yes. Decompressing them with either pbzip2 or bunzip2 are both fine.
    So the problem is not with pbzip2.
     
    Magdoll, Jun 4, 2010
    #3
  4. On Fri, 04 Jun 2010 12:53:26 -0700, Magdoll wrote:

    > I'm not sure what's causing this, but depending on the compression
    > program used, the bz2 module sometimes exits earlier.

    [...]

    The current bz2 module only supports files written as a single stream,
    and not multiple stream files. This is why the BZ2File class has no
    "append" mode. See this bug report:

    http://bugs.python.org/issue1625

    Here's an example:

    >>> bz2.BZ2File('a.bz2', 'w').write('this is the first chunk of text')
    >>> bz2.BZ2File('b.bz2', 'w').write('this is the second chunk of text')
    >>> bz2.BZ2File('c.bz2', 'w').write('this is the third chunk of text')
    >>> # concatenate the files

    .... d = file('concate.bz2', 'w')
    >>> for name in "abc":

    ....     f = file('%c.bz2' % name, 'rb')
    ....     d.write(f.read())
    ....
    >>> d.close()
    >>>
    >>> bz2.BZ2File('concate.bz2', 'r').read()

    'this is the first chunk of text'

    Sure enough, BZ2File only sees the first chunk of text, but if I open it
    in (e.g.) KDE's Ark application, I see all the text.

    So this is a known bug, sorry.


    --
    Steven
     
    Steven D'Aprano, Jun 5, 2010
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Saunders
    Replies:
    2
    Views:
    2,241
    John Saunders
    Jun 4, 2004
  2. Replies:
    3
    Views:
    1,628
    S. Justin Gengo
    Dec 27, 2005
  3. Jeff Robichaud
    Replies:
    0
    Views:
    375
    Jeff Robichaud
    Mar 7, 2006
  4. John Salerno
    Replies:
    3
    Views:
    596
    John Salerno
    Feb 2, 2006
  5. Brad Tilley

    bz2 module

    Brad Tilley, Oct 18, 2004, in forum: Python
    Replies:
    7
    Views:
    688
    Raymond Hettinger
    Oct 24, 2004
Loading...

Share This Page