How to check if byte array is compressed or not?

P

prakasan

hi All,

I have some data stored in the database as blob. I can read this blob
into byte array using jdbc. Now I need to read the byte array using
InputStream. The problem is I want to know if the data in the database
is stored in comprressed form. If the byte array is compressed I have
to use InflaterInputStream. other wise I can use ObjectInputStream.

Before using which stream to use with the byte array I want to know if
the byte array is compressed or not?

Any idea how this can be done??

thanks in advance,
Prakasan
 
I

Ingo R. Homann

Hi,

hi All,

I have some data stored in the database as blob. I can read this blob
into byte array using jdbc. Now I need to read the byte array using
InputStream. The problem is I want to know if the data in the database
is stored in comprressed form. If the byte array is compressed I have
to use InflaterInputStream. other wise I can use ObjectInputStream.

Before using which stream to use with the byte array I want to know if
the byte array is compressed or not?

Any idea how this can be done??

I am not familar with the InflaterInputStream, and you did not say much
about the kind of compression. In fact, *every* byte array can be viewed
as 'compressed' if you specify an adequate compression algorithm. So,
without knowing the algorithm, your problem cannot be solved.

Having said this, look at some of your byte arrays in detail. Normally,
because of what I said above, most compression algorithms add some kind
of header to mark the byte array as 'compressed by this algorithm'.

Hth,
Ingo

PS: Note that even if a byte array starts with "Compressed with GZIP
Version 3.45", this might be a coincidental sequence of bytes! :)
So, If you *really* want to be sure, it would be a good idea to add a
header in any case (even, if the byte array is not compressed).
 
A

Antti S. Brax

PS: Note that even if a byte array starts with "Compressed with GZIP
Version 3.45", this might be a coincidental sequence of bytes! :)
So, If you *really* want to be sure, it would be a good idea to add a
header in any case (even, if the byte array is not compressed).

My first suggestion would be to add a table column which
indicates the compression algorithm (if any) used in the
data. If you can't change the database schema, then consider
adding a header. But then be aware that the data will become
useless for programs that don't understand your magic header.
 
J

John Currier

I have some data stored in the database as blob. I can read this blob
into byte array using jdbc. Now I need to read the byte array using
InputStream. The problem is I want to know if the data in the database
is stored in comprressed form. If the byte array is compressed I have
to use InflaterInputStream. other wise I can use ObjectInputStream.

Before using which stream to use with the byte array I want to know if
the byte array is compressed or not?

A simplistic approach is to just try to decompress it. If that fails
then it's not compressed (or is corrupted). This simplistic approach
assumes that the majority of the stored data is compressed. The
approach can be slightly optimized by checking the header yourself
before attempting a decompression. I've used this technique in
interceptors for decompressing CORBA traffic.

The only real risk is of successfully decompressing something that
wasn't compressed.

Antti's approach, however, is cleaner.

John
http://schemaspy.sourceforge.net
 
H

Harald

hi All,

I have some data stored in the database as blob. I can read this blob
into byte array using jdbc. Now I need to read the byte array using
InputStream. The problem is I want to know if the data in the database
is stored in comprressed form. If the byte array is compressed I have
to use InflaterInputStream. other wise I can use ObjectInputStream.

Before using which stream to use with the byte array I want to know if
the byte array is compressed or not?

The entropy of compressed data tends to be higher than that of
uncompressed data. But this is just a statistical observation.-)

Strange setup you have there where you don't know what kind of data
format you are dealing with.

Harald.
 
R

Roedy Green

Any idea how this can be done??

but a boolean on the front of the stream to tell you is the easiest
way.

I am beginning to realize the wisdom of putting a field on the front
of any stream giving the version. Otherwise it becomes impossible to
deal with old format files later on.

Other that that, look at the two files with hex viewer to see if
the9re is a recognisable signature.

see http://mindprod.com/jgloss/hex.html

--
Bush crime family lost/embezzled $3 trillion from Pentagon.
Complicit Bush-friendly media keeps mum. Rumsfeld confesses on video.
http://www.infowars.com/articles/us/mckinney_grills_rumsfeld.htm

Canadian Mind Products, Roedy Green.
See http://mindprod.com/iraq.html photos of Bush's war crimes
 
Joined
Jan 31, 2011
Messages
1
Reaction score
0
hi All,

I have some data stored in the database as blob. I can read this blob
into byte array using jdbc. Now I need to read the byte array using
InputStream. The problem is I want to know if the data in the database
is stored in comprressed form. If the byte array is compressed I have
to use InflaterInputStream. other wise I can use ObjectInputStream.

Before using which stream to use with the byte array I want to know if
the byte array is compressed or not?

Any idea how this can be done??

thanks in advance,
Prakasan

Easiest way is to inspect the first bytes of the byte array. ZLib compression starts with 0x78AD - whilst Zip files start with 0x04034b50

E:)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top