Quick compression

K

kk

Hello everybody!
I want to build a distributed application, and I want to compress the
packets that are sent between the system's nodes.

Is there a class that I could use that provides quick compression/
decompression?
 
A

Arne Vajhøj

kk said:
I want to build a distributed application, and I want to compress the
packets that are sent between the system's nodes.

Is there a class that I could use that provides quick compression/
decompression?

One obvious solution was the classes in java.util.zip.

Arne
 
R

rossum

Hello everybody!
I want to build a distributed application, and I want to compress the
packets that are sent between the system's nodes.

Is there a class that I could use that provides quick compression/
decompression?
How big are the packets? How many possible different packets are
there?

For large packets then Zip might well be useful as Arne said. For a
reasonably small number of possible different packets then a code book
would suffice:

1 -> armadillo
2 -> aardvark
3 -> pangolin
etc.

You could also use a code book if a high percentage of the packets are
taken from a small set of all possible packets. In this case you need
to be sure that all allowed codes cannot be mistaken for one of the
uncommon packets.

For a lot of small packets then you have a difficult problem.

rossum
 
K

kk

Thank you all for your answers!
I use udp packets (<1 kb) (in a similar way that rossum described) to
regulate the way that the nodes interact, but at certain times I need
to exchange some data between two nodes. The amount of data that I may
need to transfer could be from 1 kb to 1mb (pure text), and I was
wondering weather there was a quicker way (from zip) for the
compression/decompression.
A friend of mine recommended jzlib (http://www.jcraft.com/jzlib/) but
I will look first at what Roedy suggested.

Thanks!
 
A

Arne Vajhøj

kk said:
A friend of mine recommended jzlib (http://www.jcraft.com/jzlib/) but

If you read the page then you can see that it basically provides
the same functionality as java.util.zip just adding some more
flexibility.

Unless you are absolutely sure that you need that extra flexibility
then you should stick with the standard.

Arne
 
R

Roedy Green

I use udp packets (<1 kb) (in a similar way that rossum described) to
regulate the way that the nodes interact, but at certain times I need
to exchange some data between two nodes. The amount of data that I may
need to transfer could be from 1 kb to 1mb (pure text), and I was
wondering weather there was a quicker way (from zip) for the
compression/decompression.

Compression can't get much traction unless it has a fair size chunk to
work on. There is not as much repetition to find within an tiny
packet.

You might want to look into some type of dictionary compression where
you convert tokens to ints and DON'T transmit the meaning of the
tokens with each packet, but rather pre-transmit an entire dictionary
of what they mean.

In the simplest case, if your packets consisted of words separated by
a single space, create a dictionary of all the words you ever use,
sorted by frequency. Then assign numbers. The low number can get
8-bit codes, the next lowest 16-bit, the next lowest 24-bit.

Then encode your message as a string of ints. Each word includes the
space.

You can then turn GZIP compress on top of that.

see http://mindprod.com/project/supercompressor.html
 
A

Arne Vajhøj

Roedy said:
Compression can't get much traction unless it has a fair size chunk to
work on. There is not as much repetition to find within an tiny
packet.

1 KB is enough to give reasonable compression with ZIP in most cases.
You might want to look into some type of dictionary compression where
you convert tokens to ints and DON'T transmit the meaning of the
tokens with each packet, but rather pre-transmit an entire dictionary
of what they mean.

In the simplest case, if your packets consisted of words separated by
a single space, create a dictionary of all the words you ever use,
sorted by frequency. Then assign numbers. The low number can get
8-bit codes, the next lowest 16-bit, the next lowest 24-bit.

I serious doubt that would create better compression than ZIP in
general cases like english and source code.
You can then turn GZIP compress on top of that.

It is very rarely a good idea to double compress.

Or to put another way: one definition of a good compression
algorithm is that compressing the output with another algorithm
will not shrink it additionally.

#Ordinary ZIP (Lempel, Ziv Welch) compression

ZIP is LZ77

LZW is LZ78

ZIP is not LZW

Arne
 
R

Roedy Green

I serious doubt that would create better compression than ZIP in
general cases like english and source code.

Obviously you can to better. Zip has to pack the dictionary with each
packet. I did some experiments on this compacting HTML this way some
years ago. It does buy you much for big documents, but it would for
little chunks.
 
C

Christian

Roedy said:
Obviously you can to better. Zip has to pack the dictionary with each
packet. I did some experiments on this compacting HTML this way some
years ago. It does buy you much for big documents, but it would for
little chunks.
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Why should it have to pass the dictionary?
Dictionary is build up online while decompressing.

If you don't use such a dictionary method you would have to create a
table with probabilitys of characters and words (like in Huffman) and
transmit that. Or what ever compression you want otherwise you could not
make use of letters or words coming more often than others.

Christian
 
A

Arne Vajhøj

Roedy said:
Obviously you can to better. Zip has to pack the dictionary with each
packet.

No. That is not how ZIP works. The dictionary is the data already
read/written.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top