Compressing a String?

S

sven abels

Hi!
I need a method to compress a (quite large) String and to uncompress it
later.

I wrote the following code that uses the ZIP-Functions of Java.
However: This functions (I think it's the compress-function) don't work
properly on some strings.
I get an error message when uncompressing the string:

java.util.zip.ZipException: incomplete distance tree or sometimes:
java.util.zip.ZipException: invalid entry size

Can somebody give me a tip?

If somebody has a faster / other method to simply compress and uncompress a
String (it can be a quite bad compression, too), then this would be
perfectly fine, too...


Here'S my code which has the described bug:

------------
public static String compress(String stream)
{
ByteArrayInputStream fis=null;
ByteArrayOutputStream fos=null;
String erg="";
try {
fis = new ByteArrayInputStream(stream.getBytes());
fos = new ByteArrayOutputStream();

ZipOutputStream zos =
new ZipOutputStream(fos);
ZipEntry ze = new ZipEntry("name1");
zos.putNextEntry(ze);
final int BUFSIZ = 4096;
byte inbuf[] = new byte[BUFSIZ];
int n;
while ((n = fis.read(inbuf)) != -1)
zos.write(inbuf, 0, n);
fis.close();
fis = null;
zos.close();
}
catch (IOException e) {
System.err.println(e);
e.printStackTrace();
}
finally {
try {
if (fis != null)
fis.close();
if (fos != null)
{
fos.close();
erg=new String(fos.toByteArray());
}
}
catch (IOException e) {
e.printStackTrace();
}
}
return erg;
}



public static String uncompress(String stream)
{
ByteArrayInputStream fis=null;
ByteArrayOutputStream fos=null;
String erg="";
try {
fis = new ByteArrayInputStream(stream.getBytes());
fos = new ByteArrayOutputStream();
ZipInputStream zis = new ZipInputStream(fis);
ZipEntry ze = zis.getNextEntry();
final int BUFSIZ = 4096;
byte inbuf[] = new byte[BUFSIZ];
int n;
while ((n = zis.read(inbuf, 0, BUFSIZ)) != -1)
{
System.out.println(".");
fos.write(inbuf, 0, n);
}
zis.close();
fis = null;
fos.close();
}
catch (IOException e) {
System.err.println(e);
e.printStackTrace();
}
finally {
try {
if (fis != null)
fis.close();
if (fos != null)
fos.close();
erg=new String(fos.toByteArray());
}
catch (IOException e) {
e.printStackTrace();
}
}
return erg;
}
-------------
 
K

Kerry Shetline

A quick guess about what's wrong: The platform-specific conversions
to/from characters from/to bytes performed by new String(byte[] b) and
String.getBytes() are mucking up your ZIP data.

ZIP decompression is very sensitive -- one bit out of place and it
usually fails completely before there's even a chance of getting garbled
data.

I think (that's *think*) that the deprecated method...

String.getBytes(int srcBegin, int srcEnd, byte[] dst, int dstBegin)

....along with the deprecated String constructor:

String(byte[] ascii, int hibyte)

....will fix your problems in a quick-and-dirty way, if I'm on the right
track. If this works, you'll want to write your own code for doing what
the deprecated constructor and method do, rather than sticking with code
that's deprecated.

In your compress method, you'd use:

erg=new String(fos.toByteArray(), 0);

....and in your decompress method you'd use:

bytes[] b = new byte[stream.length()];
b.getBytes(0, b.length, b, 0);
fis = new ByteArrayInputStream(b);

The reason I'm a little hesitant is that I suspect, but don't absolutely
know, that the reason why getBytes(int, int, byte[], int) is deprecated
is that in simply copies the low-order byte of each String character --
exactly what you want here, but not ideal for actual character conversions.

If the deprecated method *doesn't* work as I suspect, you'll need to
write your own code right from the start for pulling the low-order bytes
out of a String, without any conversion.

Please note that in your compress function, "stream.getBytes()" is fine
as is. Also, "erg=new String(fos.toByteArray())" in decompress in fine
too. It's the conversion from bytes on the way out of the compress
method, and the conversion to bytes at the beginning of the decompress
method, that are the likely problem areas.

-Kerry
 
X

X_AWemner_X

1) stream.getBytes("ISO-8859-1");
Try to use a named charset when converting string to a byte array.

2) erg = new String(for.toByteArray(), "ISO-8859-1");
Try to use same charset decoding the array back to java unicode string.

BUT: that may not work properly anyway. Most likely you should encode a
compressed byte array to a printable characters (base64 most common) if want
to move it as string instance. Then decode base64ed string to bytearray
before decompressing it.
 
R

Roedy Green

I need a method to compress a (quite large) String and to uncompress it
later.

Typically you are using only some subset of the characters. A simple
encoding would be to create a bit string formed by concatenating base
n numbers/characters where n is some number less than 2^16. You map
chars onto the subset.
 
S

sven abels

1) stream.getBytes("ISO-8859-1");
Try to use a named charset when converting string to a byte array.

2) erg = new String(for.toByteArray(), "ISO-8859-1");
Try to use same charset decoding the array back to java unicode string.


This works fine, thanks.

It's strange, because I thought, Java would choose the save decoding for
both
functions when calling them without a decoding-String...

However: It seems to work now...

Thank you!
 
M

Michael Borgwardt

sven said:
It's strange, because I thought, Java would choose the save decoding for
both functions when calling them without a decoding-String...

No. Why should it? Representing arbitrary binary data as text strings
is certainly not what they are intended for.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top