compress a short string to an even shorter string

A

Austin

I am looking for a way to compress a short string to a shorter string
then be able to un-compress it at the other end.

To be more specific I have a site with an extremely long set of
parameters in the URL, I want to compress these and use one very short
parameter which I can then un-compress and read them all back. I have
already taken other steps to reduce the length.

To through a bit more information in the pot, the string is made up of
[a-z0-9] and the compressed string can use any of the UTF-8 chars.

Our server side code is J2EE Java and I will be using this for the
compression/un-compression.

I have tried Gzip, and briefly looked at crypto triple des ecryption
not because i want to make this secure but because for some reason i
thought it might give me a short encrypted string.
 
T

Thomas Schodt

Are we to assume the String is transmitted in UTF-8
(16 bits per character)?


Two approaches come to mind (there may be more).

Base conversion.
A base-36 String [0-9a-z] takes 16 bits to encode each digit.
You can convert it to base-16 using BigInteger
String asHex = new BigInteger(String,36).toString(16);
and from that create a byte array encoding 2 hex digits in each byte.
The byte array will take up about a third of the original String.

Lempel-Ziv compression.
If your data contains repetitions you'll want to have a
look at the LZ compression algorithm.
Google "Lempel-Ziv Java".
 
T

Thomas Weidenfeller

To be more specific I have a site with an extremely long set of
parameters in the URL, I want to compress these and use one very short
parameter which I can then un-compress and read them all back. I have
already taken other steps to reduce the length.

The usual way to handle this is to change from a HTTP GET request with
a long URL in the header to an HTTP POST request with the data in the
body of the request.

/Thomas
 
J

jpshahom

(e-mail address removed) (Austin) wrote in message
How about using a Hashtable on the server side if enough of your data is repetitive?
 
T

Tim Tyler

nos said:
i think base 36 needs only 5.17 bits for each digit
not 16 bits

....not that URLs are in anything like base 36 - they can be case
sensitive for one thing...
 
T

Tim Tyler

Tim Tyler said:
...not that URLs are in anything like base 36 - they can be case
sensitive for one thing...

I see the OP /did/ say his strings were of this form - though it seems
rather suprising that there are no "=" or "&" characters involved.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,998
Latest member
MarissaEub

Latest Threads

Top