Quick compression

Discussion in 'Java' started by kk, Feb 9, 2008.

  1. kk

    kk Guest

    Hello everybody!
    I want to build a distributed application, and I want to compress the
    packets that are sent between the system's nodes.

    Is there a class that I could use that provides quick compression/
    decompression?
    kk, Feb 9, 2008
    #1
    1. Advertising

  2. kk

    Arne Vajhøj Guest

    kk wrote:
    > I want to build a distributed application, and I want to compress the
    > packets that are sent between the system's nodes.
    >
    > Is there a class that I could use that provides quick compression/
    > decompression?


    One obvious solution was the classes in java.util.zip.

    Arne
    Arne Vajhøj, Feb 9, 2008
    #2
    1. Advertising

  3. kk

    rossum Guest

    On Sat, 9 Feb 2008 15:37:55 -0800 (PST), kk <> wrote:

    >Hello everybody!
    >I want to build a distributed application, and I want to compress the
    >packets that are sent between the system's nodes.
    >
    >Is there a class that I could use that provides quick compression/
    >decompression?

    How big are the packets? How many possible different packets are
    there?

    For large packets then Zip might well be useful as Arne said. For a
    reasonably small number of possible different packets then a code book
    would suffice:

    1 -> armadillo
    2 -> aardvark
    3 -> pangolin
    etc.

    You could also use a code book if a high percentage of the packets are
    taken from a small set of all possible packets. In this case you need
    to be sure that all allowed codes cannot be mistaken for one of the
    uncommon packets.

    For a lot of small packets then you have a difficult problem.

    rossum
    rossum, Feb 10, 2008
    #3
  4. kk

    Roedy Green Guest

    Roedy Green, Feb 10, 2008
    #4
  5. kk

    kk Guest

    Thank you all for your answers!
    I use udp packets (<1 kb) (in a similar way that rossum described) to
    regulate the way that the nodes interact, but at certain times I need
    to exchange some data between two nodes. The amount of data that I may
    need to transfer could be from 1 kb to 1mb (pure text), and I was
    wondering weather there was a quicker way (from zip) for the
    compression/decompression.
    A friend of mine recommended jzlib (http://www.jcraft.com/jzlib/) but
    I will look first at what Roedy suggested.

    Thanks!
    kk, Feb 10, 2008
    #5
  6. kk

    Arne Vajhøj Guest

    kk wrote:
    > A friend of mine recommended jzlib (http://www.jcraft.com/jzlib/) but


    If you read the page then you can see that it basically provides
    the same functionality as java.util.zip just adding some more
    flexibility.

    Unless you are absolutely sure that you need that extra flexibility
    then you should stick with the standard.

    Arne
    Arne Vajhøj, Feb 10, 2008
    #6
  7. kk

    Roedy Green Guest

    On Sun, 10 Feb 2008 08:30:36 -0800 (PST), kk <> wrote,
    quoted or indirectly quoted someone who said :

    >I use udp packets (<1 kb) (in a similar way that rossum described) to
    >regulate the way that the nodes interact, but at certain times I need
    >to exchange some data between two nodes. The amount of data that I may
    >need to transfer could be from 1 kb to 1mb (pure text), and I was
    >wondering weather there was a quicker way (from zip) for the
    >compression/decompression.


    Compression can't get much traction unless it has a fair size chunk to
    work on. There is not as much repetition to find within an tiny
    packet.

    You might want to look into some type of dictionary compression where
    you convert tokens to ints and DON'T transmit the meaning of the
    tokens with each packet, but rather pre-transmit an entire dictionary
    of what they mean.

    In the simplest case, if your packets consisted of words separated by
    a single space, create a dictionary of all the words you ever use,
    sorted by frequency. Then assign numbers. The low number can get
    8-bit codes, the next lowest 16-bit, the next lowest 24-bit.

    Then encode your message as a string of ints. Each word includes the
    space.

    You can then turn GZIP compress on top of that.

    see http://mindprod.com/project/supercompressor.html
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
    Roedy Green, Feb 11, 2008
    #7
  8. kk

    Arne Vajhøj Guest

    Roedy Green wrote:
    > On Sun, 10 Feb 2008 08:30:36 -0800 (PST), kk <> wrote,
    > quoted or indirectly quoted someone who said :
    >> I use udp packets (<1 kb) (in a similar way that rossum described) to
    >> regulate the way that the nodes interact, but at certain times I need
    >> to exchange some data between two nodes. The amount of data that I may
    >> need to transfer could be from 1 kb to 1mb (pure text), and I was
    >> wondering weather there was a quicker way (from zip) for the
    >> compression/decompression.

    >
    > Compression can't get much traction unless it has a fair size chunk to
    > work on. There is not as much repetition to find within an tiny
    > packet.


    1 KB is enough to give reasonable compression with ZIP in most cases.

    > You might want to look into some type of dictionary compression where
    > you convert tokens to ints and DON'T transmit the meaning of the
    > tokens with each packet, but rather pre-transmit an entire dictionary
    > of what they mean.
    >
    > In the simplest case, if your packets consisted of words separated by
    > a single space, create a dictionary of all the words you ever use,
    > sorted by frequency. Then assign numbers. The low number can get
    > 8-bit codes, the next lowest 16-bit, the next lowest 24-bit.


    I serious doubt that would create better compression than ZIP in
    general cases like english and source code.

    > You can then turn GZIP compress on top of that.


    It is very rarely a good idea to double compress.

    Or to put another way: one definition of a good compression
    algorithm is that compressing the output with another algorithm
    will not shrink it additionally.

    > see http://mindprod.com/project/supercompressor.html


    #Ordinary ZIP (Lempel, Ziv Welch) compression

    ZIP is LZ77

    LZW is LZ78

    ZIP is not LZW

    Arne
    Arne Vajhøj, Feb 12, 2008
    #8
  9. kk

    Roedy Green Guest

    On Mon, 11 Feb 2008 21:49:55 -0500, Arne Vajhøj <>
    wrote, quoted or indirectly quoted someone who said :

    >I serious doubt that would create better compression than ZIP in
    >general cases like english and source code.


    Obviously you can to better. Zip has to pack the dictionary with each
    packet. I did some experiments on this compacting HTML this way some
    years ago. It does buy you much for big documents, but it would for
    little chunks.
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
    Roedy Green, Feb 12, 2008
    #9
  10. kk

    Christian Guest

    Roedy Green schrieb:
    > On Mon, 11 Feb 2008 21:49:55 -0500, Arne Vajhøj <>
    > wrote, quoted or indirectly quoted someone who said :
    >
    >> I serious doubt that would create better compression than ZIP in
    >> general cases like english and source code.

    >
    > Obviously you can to better. Zip has to pack the dictionary with each
    > packet. I did some experiments on this compacting HTML this way some
    > years ago. It does buy you much for big documents, but it would for
    > little chunks.
    > --
    >
    > Roedy Green Canadian Mind Products
    > The Java Glossary
    > http://mindprod.com


    Why should it have to pass the dictionary?
    Dictionary is build up online while decompressing.

    If you don't use such a dictionary method you would have to create a
    table with probabilitys of characters and words (like in Huffman) and
    transmit that. Or what ever compression you want otherwise you could not
    make use of letters or words coming more often than others.

    Christian
    Christian, Feb 12, 2008
    #10
  11. kk

    Arne Vajhøj Guest

    Roedy Green wrote:
    > On Mon, 11 Feb 2008 21:49:55 -0500, Arne Vajhøj <>
    > wrote, quoted or indirectly quoted someone who said :
    >> I serious doubt that would create better compression than ZIP in
    >> general cases like english and source code.

    >
    > Obviously you can to better. Zip has to pack the dictionary with each
    > packet.


    No. That is not how ZIP works. The dictionary is the data already
    read/written.

    Arne
    Arne Vajhøj, Feb 13, 2008
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jens Mander
    Replies:
    0
    Views:
    502
    Jens Mander
    Jun 10, 2005
  2. Jens Mander
    Replies:
    2
    Views:
    1,365
    Jerry Coffin
    Sep 1, 2005
  3. Replies:
    0
    Views:
    1,782
  4. Melanie Nasic
    Replies:
    19
    Views:
    3,030
    Thomas Rudloff
    Jan 1, 2006
  5. JKop
    Replies:
    11
    Views:
    873
Loading...

Share This Page