deflater/inflater and dictionnary for huffman

NOBODY · Oct 17, 2003

(deflater/inflater and dictionnary for huffman)

I want to compress/decompress small data like strings of 1k or less.
I know huffman compression with static dictionnary is the best solution.

Seams I could use the java.util.zip.Deflater with strategy Huffman_only and
maybe prepare a dictionnary.

Does anyone can explain the way to use this? I can't find any docs after 2
hours on google and the groups! Javadoc is useless.

Thanks!

Roedy Green · Oct 17, 2003

Seams I could use the java.util.zip.Deflater with strategy Huffman_only and
maybe prepare a dictionnary.

For sample code see http://mindprod.com/fileio.html

I don't know if you can force Java's GZIP to use only Huffman though.

Thomas Weidenfeller · Oct 17, 2003

NOBODY said:
Seams I could use the java.util.zip.Deflater with strategy Huffman_only and
maybe prepare a dictionnary.

Does anyone can explain the way to use this? I can't find any docs after 2
hours on google and the groups! Javadoc is useless.

Yep, it is knowledge that is passed from father to son with a secret
handshake. I once tried to interest some Java journals in publishing an
article about "all" the magic of the Zip an Jar classes, but no one was
interested.

First of all, consider using a DeflaterOutputStream instead of a raw
Deflater. It handles all the ugly details of feeding data to the
compression algorithm and writing the results. If you want to have the
result in memory, provide a ByteArrayOutputStream to the
DeflaterOutputStream's constructor.

If you need to do it by your own, I suggest to study the
DeflaterOutputStream's source code (source comes with the J2SDK in a
file called scr.zip or src.jar).

Please note that DeflaterOutputStream uses a Deflater differently than
the Deflater documentation suggests. The documentation suggests to
check needsInput() if deflate() returns with a 0. DeflaterOutputStream
ignores the return value and just loops until needsInput() returns
true. They can do this, because they ensure their output buffer has at
least a size of 1, and they immediately write out any result and reuse
the buffer.

For using Deflator, you need two sets of "pointers" (this is where the
underlying C library shines through). One "pointer" points to the part
of the data that still needs to be processed, the other to some
storage location to which compressed data should be written to.

Of course, you don't have pointers in Java, so what the API expects is
a pair of a byte[] and an integer. The byte[] holds the data, the
integer serves as a pointer into the byte[]. In addition, you need
something to know the remaining data in the byte[], that's another
integer. So for the input data, and the output data you have a triplet
of

byte[] b; // memory holding the data
int off; // position "pointer" into b
int len; // For input: remaining input data in this buffer
// For output: remaining free memory in this buffer
// In both cases: off + len <= b.length

You provide such a triplet to setInput() to tell the Defalter where to
get the data from. And you provide another such triplet to deflate() to
tell the Deflater where to place the output.

Now the really tricky part starts. You call deflate() and then you have
to react according to the return value of deflate():

deflate() == 0 && needsInput():

All data provided via setInput() has been read (but maybe not
completely processed). Either provide more data by calling
setInput() again, or finish the compression.

Finishing the compression is tricky, too:
(a) Call finish()
(b) Flush all data still in internal Deflater buffers. You do this
by calling deflate() in a loop while finished() returns false.
Check the return value of deflate(), because you still
might need to provide more output memory.

deflate() == 0 && !needsInput():

This is undocumented, but important. The Deflater needs more
output storage. Save the already compressed data in the
output byte[], and call deflate() again with more memory.

deflate() > 0

Some output data is available. The data is in the byte[] provided
to deflate(), starts at off as provided to deflate() and has a
length as returned by deflate().

Do whatever you want with the data, then adjust off and len if
necessary, and call deflate() again.

HTH

/Thomas

huffman encoder	16	Dec 21, 2005
[QUIZ] Huffman Encoder (#123)	11	May 11, 2007
[SUMMARY] Huffman Encoder (#123)	0	May 17, 2007
[PAID][REMOTE] Hiring programmer/dev for indie game	2	Feb 19, 2023
Data saving in condition of changing reality	0	Apr 29, 2022
A text lossy compression scheme	1	Sep 1, 2012
What data structure for a move-to-front coder?	2	Mar 11, 2009
Binary File I/O and ^M	3	Nov 19, 2005

deflater/inflater and dictionnary for huffman

NOBODY

Roedy Green

Thomas Weidenfeller

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads