Zlib: correct checksum but error decompressing

A

Andre

I have been trying to solve this issue for a while now. I receive data
from a TCP connection which is compressed. I know the correct checksum
for the data and both the client and server generate the same
checksum. However, in Python when it comes to decompressing the data I
get the exception: "Error -5 while decompressing data"! I would assume
that if the string in python is equivalent to the correct checksum
than the decompress function should also work on the same string, but
that's clearly not the case.

# convert data to a byte array
data = array('b', raw_data)
# print checksum for visual inspection
print zlib.crc32(data.tostring())
# try to decompress, but fails!
str = zlib.decompress(data.tostring())

Does anyone know what's going on?
 
I

InvisibleRoads Patrol

I have been trying to solve this issue for a while now. I receive data
from a TCP connection which is compressed. I know the correct checksum
for the data and both the client and server generate the same
checksum. However, in Python when it comes to decompressing the data I
get the exception: "Error -5 while decompressing data"! I would assume
that if the string in python is equivalent to the correct checksum
than the decompress function should also work on the same string, but
that's clearly not the case.

# convert data to a byte array
data = array('b', raw_data)
# print checksum for visual inspection
print zlib.crc32(data.tostring())
# try to decompress, but fails!
str = zlib.decompress(data.tostring())

Does anyone know what's going on?

Hi Andre,

Hmm. Can you decompress the string on the server before it was sent?
Maybe the zipfile or gzip module will work.
Reference:
http://bytes.com/topic/python/answers/42131-zlib-decompress-cannot-gunzip-can
from cStringIO import StringIO
from gzip import GzipFile
body = GzipFile('', 'r', 0, StringIO(raw_data)).read()

You might want to try experimenting with the wbits parameter of
zlib.decompress()
Reference:
http://mail.python.org/pipermail/python-list/2008-December/691694.html
zlib.decompress(data, -15)

The zlib module seems to work fine with both strings and byte arrays.
import array, zlib
dataAsString = zlib.compress('example string')
dataAsArray = array.array('b', dataAsString)
zlib.decompress(dataAsString) == zlib.decompress(dataAsArray)
zlib.decompress(dataAsString) == zlib.decompress(dataAsArray.tostring())
 
P

Paul Rubin

Andre said:
I have been trying to solve this issue for a while now. I receive data
from a TCP connection which is compressed.

Are you sure it is compressed with zlib? If yes, does it include the
standard zlib header? Some applications save a few bytes by stripping
the header. See the zlib doc page for how to deal with that, there is
a flag that causes the header check to be skipped on decompression if
you pass a negative number. That's the first thing I would try.
 
J

John Machin

Paul Rubin said:
Are you sure it is compressed with zlib? If yes, does it include the
standard zlib header? Some applications save a few bytes by stripping
the header. See the zlib doc page for how to deal with that, there is
a flag that causes the header check to be skipped on decompression if
you pass a negative number. That's the first thing I would try.

Short answer:

Try this:
zlib.decompress(incoming_data, -15)
If that doesn't work:
print repr(incoming_data[:30])
# post the results here

Longer answer:

A zlib stream consists of a deflate stream preceded by
a 2-byte header and followed by a 4-byte Adler32
checksum of the original data.

The problem occurs not out of a desire to save 6 bytes
but through compounding of 2 mistakes:

Mistake (1) is in the HTTP protocol.
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html
The "deflate" content coding should have been called "zlib".
Read this and weep:
"""deflate The "zlib" format defined in RFC 1950 [31] in
combination with the "deflate" compression mechanism
described in RFC 1951 [29]."""

Mistake (2) happens when software implementers read only
the first word of the above quote and provide only a
deflate stream.

A reader can handle both possibilities by checking for a
(usual, default) zlib header:

data[0] == '\x78' and (ord(data[1]) + 0x7800) % 31 == 0

HTH,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top