compress a short string to an even shorter string

Discussion in 'Java' started by Austin, Nov 28, 2003.

  1. Austin

    Austin Guest

    I am looking for a way to compress a short string to a shorter string
    then be able to un-compress it at the other end.

    To be more specific I have a site with an extremely long set of
    parameters in the URL, I want to compress these and use one very short
    parameter which I can then un-compress and read them all back. I have
    already taken other steps to reduce the length.

    To through a bit more information in the pot, the string is made up of
    [a-z0-9] and the compressed string can use any of the UTF-8 chars.

    Our server side code is J2EE Java and I will be using this for the
    compression/un-compression.

    I have tried Gzip, and briefly looked at crypto triple des ecryption
    not because i want to make this secure but because for some reason i
    thought it might give me a short encrypted string.
     
    Austin, Nov 28, 2003
    #1
    1. Advertising

  2. Are we to assume the String is transmitted in UTF-8
    (16 bits per character)?


    Two approaches come to mind (there may be more).

    Base conversion.
    A base-36 String [0-9a-z] takes 16 bits to encode each digit.
    You can convert it to base-16 using BigInteger
    String asHex = new BigInteger(String,36).toString(16);
    and from that create a byte array encoding 2 hex digits in each byte.
    The byte array will take up about a third of the original String.

    Lempel-Ziv compression.
    If your data contains repetitions you'll want to have a
    look at the LZ compression algorithm.
    Google "Lempel-Ziv Java".
     
    Thomas Schodt, Nov 28, 2003
    #2
    1. Advertising

  3. (Austin) writes:
    > To be more specific I have a site with an extremely long set of
    > parameters in the URL, I want to compress these and use one very short
    > parameter which I can then un-compress and read them all back. I have
    > already taken other steps to reduce the length.


    The usual way to handle this is to change from a HTTP GET request with
    a long URL in the header to an HTTP POST request with the data in the
    body of the request.

    /Thomas
     
    Thomas Weidenfeller, Nov 28, 2003
    #3
  4. Austin

    jpshahom Guest

    (Austin) wrote in message news:<>...

    How about using a Hashtable on the server side if enough of your data is repetitive?

    > I am looking for a way to compress a short string to a shorter string
    > then be able to un-compress it at the other end.
    >
    > To be more specific I have a site with an extremely long set of
    > parameters in the URL, I want to compress these and use one very short
    > parameter which I can then un-compress and read them all back. I have
    > already taken other steps to reduce the length.
    >
    > To through a bit more information in the pot, the string is made up of
    > [a-z0-9] and the compressed string can use any of the UTF-8 chars.
    >
    > Our server side code is J2EE Java and I will be using this for the
    > compression/un-compression.
    >
    > I have tried Gzip, and briefly looked at crypto triple des ecryption
    > not because i want to make this secure but because for some reason i
    > thought it might give me a short encrypted string.
     
    jpshahom, Nov 28, 2003
    #4
  5. Austin

    nos Guest

    i think base 36 needs only 5.17 bits for each digit
    not 16 bits

    "Thomas Schodt" <news0310@xenoc.$DEMON.co.uk> wrote in message
    news:Xns944195BE8137Exenoc@158.152.254.254...
    > Are we to assume the String is transmitted in UTF-8
    > (16 bits per character)?
    >
    >
    > Two approaches come to mind (there may be more).
    >
    > Base conversion.
    > A base-36 String [0-9a-z] takes 16 bits to encode each digit.
    > You can convert it to base-16 using BigInteger
    > String asHex = new BigInteger(String,36).toString(16);
    > and from that create a byte array encoding 2 hex digits in each byte.
    > The byte array will take up about a third of the original String.
    >
    > Lempel-Ziv compression.
    > If your data contains repetitions you'll want to have a
    > look at the LZ compression algorithm.
    > Google "Lempel-Ziv Java".
     
    nos, Nov 29, 2003
    #5
  6. "nos" <> wrote in news:T4Wxb.249368$9E1.1349089@attbi_s52:

    > "Thomas Schodt" <news0310@xenoc.$DEMON.co.uk> wrote in message
    > news:Xns944195BE8137Exenoc@158.152.254.254...

    ....
    >> A base-36 String [0-9a-z] takes 16 bits to encode each digit.


    > i think base 36 needs only 5.17 bits for each digit
    > not 16 bits


    Yes, but
    encoded in a [java.lang.]String it will take 16 bits per digit.
     
    Thomas Schodt, Nov 29, 2003
    #6
  7. Austin

    Tim Tyler Guest

    nos <> wrote or quoted:

    > i think base 36 needs only 5.17 bits for each digit
    > not 16 bits


    ....not that URLs are in anything like base 36 - they can be case
    sensitive for one thing...
    --
    __________
    |im |yler http://timtyler.org/ Remove lock to reply.
     
    Tim Tyler, Dec 2, 2003
    #7
  8. Austin

    Tim Tyler Guest

    Tim Tyler <> wrote or quoted:
    > nos <> wrote or quoted:


    >> i think base 36 needs only 5.17 bits for each digit
    >> not 16 bits

    >
    > ...not that URLs are in anything like base 36 - they can be case
    > sensitive for one thing...


    I see the OP /did/ say his strings were of this form - though it seems
    rather suprising that there are no "=" or "&" characters involved.
    --
    __________
    |im |yler http://timtyler.org/ Remove lock to reply.
     
    Tim Tyler, Dec 2, 2003
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stan Goodman

    Even older fart, even newer newbie

    Stan Goodman, Jul 3, 2003, in forum: Java
    Replies:
    11
    Views:
    704
    Stan Goodman
    Jul 4, 2003
  2. David Geering

    longs, long longs, short short long ints . . . huh?!

    David Geering, Jan 8, 2007, in forum: C Programming
    Replies:
    15
    Views:
    568
    Keith Thompson
    Jan 11, 2007
  3. Replies:
    4
    Views:
    835
    Kaz Kylheku
    Oct 17, 2006
  4. Ioannis Vranos

    unsigned short, short literals

    Ioannis Vranos, Mar 4, 2008, in forum: C Programming
    Replies:
    5
    Views:
    688
    Eric Sosman
    Mar 5, 2008
  5. Andre
    Replies:
    5
    Views:
    544
    Keith Thompson
    Jul 17, 2012
Loading...

Share This Page