compress a short string to an even shorter string

Discussion in 'Java' started by Austin, Nov 28, 2003.

  1. Austin

    Austin Guest

    I am looking for a way to compress a short string to a shorter string
    then be able to un-compress it at the other end.

    To be more specific I have a site with an extremely long set of
    parameters in the URL, I want to compress these and use one very short
    parameter which I can then un-compress and read them all back. I have
    already taken other steps to reduce the length.

    To through a bit more information in the pot, the string is made up of
    [a-z0-9] and the compressed string can use any of the UTF-8 chars.

    Our server side code is J2EE Java and I will be using this for the
    compression/un-compression.

    I have tried Gzip, and briefly looked at crypto triple des ecryption
    not because i want to make this secure but because for some reason i
    thought it might give me a short encrypted string.
     
    Austin, Nov 28, 2003
    #1
    1. Advertisements

  2. Are we to assume the String is transmitted in UTF-8
    (16 bits per character)?


    Two approaches come to mind (there may be more).

    Base conversion.
    A base-36 String [0-9a-z] takes 16 bits to encode each digit.
    You can convert it to base-16 using BigInteger
    String asHex = new BigInteger(String,36).toString(16);
    and from that create a byte array encoding 2 hex digits in each byte.
    The byte array will take up about a third of the original String.

    Lempel-Ziv compression.
    If your data contains repetitions you'll want to have a
    look at the LZ compression algorithm.
    Google "Lempel-Ziv Java".
     
    Thomas Schodt, Nov 28, 2003
    #2
    1. Advertisements

  3. (Austin) writes:
    > To be more specific I have a site with an extremely long set of
    > parameters in the URL, I want to compress these and use one very short
    > parameter which I can then un-compress and read them all back. I have
    > already taken other steps to reduce the length.


    The usual way to handle this is to change from a HTTP GET request with
    a long URL in the header to an HTTP POST request with the data in the
    body of the request.

    /Thomas
     
    Thomas Weidenfeller, Nov 28, 2003
    #3
  4. Austin

    jpshahom Guest

    (Austin) wrote in message news:<>...

    How about using a Hashtable on the server side if enough of your data is repetitive?

    > I am looking for a way to compress a short string to a shorter string
    > then be able to un-compress it at the other end.
    >
    > To be more specific I have a site with an extremely long set of
    > parameters in the URL, I want to compress these and use one very short
    > parameter which I can then un-compress and read them all back. I have
    > already taken other steps to reduce the length.
    >
    > To through a bit more information in the pot, the string is made up of
    > [a-z0-9] and the compressed string can use any of the UTF-8 chars.
    >
    > Our server side code is J2EE Java and I will be using this for the
    > compression/un-compression.
    >
    > I have tried Gzip, and briefly looked at crypto triple des ecryption
    > not because i want to make this secure but because for some reason i
    > thought it might give me a short encrypted string.
     
    jpshahom, Nov 28, 2003
    #4
  5. Austin

    nos Guest

    i think base 36 needs only 5.17 bits for each digit
    not 16 bits

    "Thomas Schodt" <news0310@xenoc.$DEMON.co.uk> wrote in message
    news:Xns944195BE8137Exenoc@158.152.254.254...
    > Are we to assume the String is transmitted in UTF-8
    > (16 bits per character)?
    >
    >
    > Two approaches come to mind (there may be more).
    >
    > Base conversion.
    > A base-36 String [0-9a-z] takes 16 bits to encode each digit.
    > You can convert it to base-16 using BigInteger
    > String asHex = new BigInteger(String,36).toString(16);
    > and from that create a byte array encoding 2 hex digits in each byte.
    > The byte array will take up about a third of the original String.
    >
    > Lempel-Ziv compression.
    > If your data contains repetitions you'll want to have a
    > look at the LZ compression algorithm.
    > Google "Lempel-Ziv Java".
     
    nos, Nov 29, 2003
    #5
  6. "nos" <> wrote in news:T4Wxb.249368$9E1.1349089@attbi_s52:

    > "Thomas Schodt" <news0310@xenoc.$DEMON.co.uk> wrote in message
    > news:Xns944195BE8137Exenoc@158.152.254.254...

    ....
    >> A base-36 String [0-9a-z] takes 16 bits to encode each digit.


    > i think base 36 needs only 5.17 bits for each digit
    > not 16 bits


    Yes, but
    encoded in a [java.lang.]String it will take 16 bits per digit.
     
    Thomas Schodt, Nov 29, 2003
    #6
  7. Austin

    Tim Tyler Guest

    nos <> wrote or quoted:

    > i think base 36 needs only 5.17 bits for each digit
    > not 16 bits


    ....not that URLs are in anything like base 36 - they can be case
    sensitive for one thing...
    --
    __________
    |im |yler http://timtyler.org/ Remove lock to reply.
     
    Tim Tyler, Dec 2, 2003
    #7
  8. Austin

    Tim Tyler Guest

    Tim Tyler <> wrote or quoted:
    > nos <> wrote or quoted:


    >> i think base 36 needs only 5.17 bits for each digit
    >> not 16 bits

    >
    > ...not that URLs are in anything like base 36 - they can be case
    > sensitive for one thing...


    I see the OP /did/ say his strings were of this form - though it seems
    rather suprising that there are no "=" or "&" characters involved.
    --
    __________
    |im |yler http://timtyler.org/ Remove lock to reply.
     
    Tim Tyler, Dec 2, 2003
    #8
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Edwin Knoppert

    This could be shorter tight? (String to Hex)

    Edwin Knoppert, Feb 17, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    566
    Edwin Knoppert
    Feb 17, 2006
  2. Stan Goodman

    Even older fart, even newer newbie

    Stan Goodman, Jul 3, 2003, in forum: Java
    Replies:
    11
    Views:
    954
    Stan Goodman
    Jul 4, 2003
  3. José Carlos

    compress string of data

    José Carlos, Jan 25, 2004, in forum: Python
    Replies:
    6
    Views:
    786
    Josiah Carlson
    Jan 27, 2004
  4. David Geering

    longs, long longs, short short long ints . . . huh?!

    David Geering, Jan 8, 2007, in forum: C Programming
    Replies:
    15
    Views:
    847
    Keith Thompson
    Jan 11, 2007
  5. Replies:
    4
    Views:
    1,126
    Kaz Kylheku
    Oct 17, 2006
  6. Ioannis Vranos

    unsigned short, short literals

    Ioannis Vranos, Mar 4, 2008, in forum: C Programming
    Replies:
    5
    Views:
    958
    Eric Sosman
    Mar 5, 2008
  7. Matt Porter

    Compress a string

    Matt Porter, May 18, 2008, in forum: Python
    Replies:
    12
    Views:
    830
    Bruno Desthuilliers
    May 20, 2008
  8. Andre
    Replies:
    5
    Views:
    819
    Keith Thompson
    Jul 17, 2012
Loading...