Am Freitag, 5. November 2004 19:19 schrieb Josiah Carlson:
I am not aware of any such method. I am fairly certain gzip (and the
associated zlib) does the following:
while bytes remaining:
reset/initialize state
while state is not crappy and bytes remaining:
compress portion of remaining bytes
update state
Even if one could discover the last reset/initialization of state, one
would still need to decompress the data from then on in order to
discover the two empty blocks.
This is not entirely true... There is a full flush which is done every n bytes
(n > 100000 bytes, IIRC), and can also be forced by the programmer. In case
you do a full flush, the block which you read is complete as is up till the
point you did the flush.
From the documentation:
"""flush([mode])
All pending input is processed, and a string containing the remaining
compressed output is returned. mode can be selected from the constants
Z_SYNC_FLUSH, Z_FULL_FLUSH, or Z_FINISH, defaulting to Z_FINISH. Z_SYNC_FLUSH
and Z_FULL_FLUSH allow compressing further strings of data and are used to
allow partial error recovery on decompression, while Z_FINISH finishes the
compressed stream and prevents compressing any more data. After calling
flush() with mode set to Z_FINISH, the compress() method cannot be called
again; the only realistic action is to delete the object."""
Anyway, the state is reset to the initial state after the full flush, so that
the next block of data is independent from the block that was flushed. So,
you might start writing after the full flush, but you'd have to make sure
that the compressed stream was of the same format specification as the one
previously written (see the compression level parameter of
compress/decompress), and you'd also have to make sure that the gzip header
is supressed, and that the FINISH compression block correctly reflects the
data that was appended (because you basically overwrite the finish block of
the first compress).
Little example:
import zlib
x = zlib.compressobj(6)
x
a = x.compress("hahahahahaha"*20)
a += x.flush(zlib.Z_FULL_FLUSH)
a 'x\x9c\xcaH\xcc\x18Q\x10\x00\x00\x00\xff\xff'
b = x.flush(zlib.Z_FINISH)
b '\x03\x00^\x84^9'
x = zlib.compressobj(6) # New compression object with same compression.
c = x.compress("hahahahahaha"*20)
c += x.flush(zlib.Z_FULL_FLUSH)
c 'x\x9c\xcaH\xcc\x18Q\x10\x00\x00\x00\xff\xff'
d = x.flush(zlib.Z_FINISH)
d '\x03\x00^\x84^9'
e = a+c[2:] # Strip header of second block.
x = zlib.decompressobj()
f = x.decompress(e)
len(f) 480 # Two times 240 = 480.
f
'haha...' # Rest stripped for clarity.
So, as far as this goes, it works. But:
x = zlib.decompressobj()
e = a+c[2:]+d
f = x.decompress(e)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
zlib.error: Error -3 while decompressing: incorrect data check
You see here that if you append the new end of stream marker of the second
block (which is written by x.flush(zlib.Z_FINISH)), the data checksum is
broken, as the data checksum is always written for the entire data, but
leaving out the end of stream marker doesn't cause data-decompression to
fail.
I know too little about the internal format of a gzip file (which appends more
header data, but otherwise is just a zlib compressed stream) to tell whether
an approach such as this one would also work on gzip-files, but I presume it
should.
Hope this little explanation helps!
Heiko.