How to check if byte array is compressed or not?

Discussion in 'Java' started by prakasan@gmail.com, Aug 3, 2005.

  1. Guest

    hi All,

    I have some data stored in the database as blob. I can read this blob
    into byte array using jdbc. Now I need to read the byte array using
    InputStream. The problem is I want to know if the data in the database
    is stored in comprressed form. If the byte array is compressed I have
    to use InflaterInputStream. other wise I can use ObjectInputStream.

    Before using which stream to use with the byte array I want to know if
    the byte array is compressed or not?

    Any idea how this can be done??

    thanks in advance,
    Prakasan
     
    , Aug 3, 2005
    #1
    1. Advertising

  2. Hi,

    wrote:
    > hi All,
    >
    > I have some data stored in the database as blob. I can read this blob
    > into byte array using jdbc. Now I need to read the byte array using
    > InputStream. The problem is I want to know if the data in the database
    > is stored in comprressed form. If the byte array is compressed I have
    > to use InflaterInputStream. other wise I can use ObjectInputStream.
    >
    > Before using which stream to use with the byte array I want to know if
    > the byte array is compressed or not?
    >
    > Any idea how this can be done??


    I am not familar with the InflaterInputStream, and you did not say much
    about the kind of compression. In fact, *every* byte array can be viewed
    as 'compressed' if you specify an adequate compression algorithm. So,
    without knowing the algorithm, your problem cannot be solved.

    Having said this, look at some of your byte arrays in detail. Normally,
    because of what I said above, most compression algorithms add some kind
    of header to mark the byte array as 'compressed by this algorithm'.

    Hth,
    Ingo

    PS: Note that even if a byte array starts with "Compressed with GZIP
    Version 3.45", this might be a coincidental sequence of bytes! :)
    So, If you *really* want to be sure, it would be a good idea to add a
    header in any case (even, if the byte array is not compressed).
     
    Ingo R. Homann, Aug 3, 2005
    #2
    1. Advertising

  3. On 3 Aug 2005 00:52:13 -0700, wrote:

    > hi All,


    Hello again.. Please refrain from multi-posting.
    <http://www.physci.org/codes/javafaq.jsp#xpost>

    Also, please be clear that you are posting to Usenet, not Google.
    <http://www.physci.org/codes/javafaq.jsp#usenet>

    --
    Andrew Thompson
    physci.org 1point1c.org javasaver.com lensescapes.com athompson.info
    Beats A Hard Kick In The Face
     
    Andrew Thompson, Aug 3, 2005
    #3
  4. wrote in comp.lang.java.programmer:
    > wrote:
    >> I have some data stored in the database as blob. I can read this blob
    >> into byte array using jdbc. Now I need to read the byte array using
    >> InputStream. The problem is I want to know if the data in the database
    >> is stored in comprressed form. If the byte array is compressed I have
    >> to use InflaterInputStream. other wise I can use ObjectInputStream.
    >>
    >> Before using which stream to use with the byte array I want to know if
    >> the byte array is compressed or not?

    <snip>
    > PS: Note that even if a byte array starts with "Compressed with GZIP
    > Version 3.45", this might be a coincidental sequence of bytes! :)
    > So, If you *really* want to be sure, it would be a good idea to add a
    > header in any case (even, if the byte array is not compressed).


    My first suggestion would be to add a table column which
    indicates the compression algorithm (if any) used in the
    data. If you can't change the database schema, then consider
    adding a header. But then be aware that the data will become
    useless for programs that don't understand your magic header.

    --
    Antti S. Brax Rullalautailu pitää lapset poissa ladulta
    http://www.iki.fi/asb/ http://www.cs.helsinki.fi/u/abrax/hlb/

    [1385 messages expunged from folder "Spam"]
     
    Antti S. Brax, Aug 3, 2005
    #4
  5. John Currier Guest

    wrote:
    > I have some data stored in the database as blob. I can read this blob
    > into byte array using jdbc. Now I need to read the byte array using
    > InputStream. The problem is I want to know if the data in the database
    > is stored in comprressed form. If the byte array is compressed I have
    > to use InflaterInputStream. other wise I can use ObjectInputStream.
    >
    > Before using which stream to use with the byte array I want to know if
    > the byte array is compressed or not?


    A simplistic approach is to just try to decompress it. If that fails
    then it's not compressed (or is corrupted). This simplistic approach
    assumes that the majority of the stored data is compressed. The
    approach can be slightly optimized by checking the header yourself
    before attempting a decompression. I've used this technique in
    interceptors for decompressing CORBA traffic.

    The only real risk is of successfully decompressing something that
    wasn't compressed.

    Antti's approach, however, is cleaner.

    John
    http://schemaspy.sourceforge.net
     
    John Currier, Aug 3, 2005
    #5
  6. Harald Guest

    writes:

    > hi All,
    >
    > I have some data stored in the database as blob. I can read this blob
    > into byte array using jdbc. Now I need to read the byte array using
    > InputStream. The problem is I want to know if the data in the database
    > is stored in comprressed form. If the byte array is compressed I have
    > to use InflaterInputStream. other wise I can use ObjectInputStream.
    >
    > Before using which stream to use with the byte array I want to know if
    > the byte array is compressed or not?


    The entropy of compressed data tends to be higher than that of
    uncompressed data. But this is just a statistical observation.-)

    Strange setup you have there where you don't know what kind of data
    format you are dealing with.

    Harald.
    --
    ---------------------+---------------------------------------------
    Harald Kirsch (@home)|
    Java Text Crunching: http://www.ebi.ac.uk/Rebholz-srv/whatizit/software
     
    Harald, Aug 3, 2005
    #6
  7. Roedy Green Guest

    On 3 Aug 2005 00:52:13 -0700, wrote or quoted :

    >
    >Any idea how this can be done??


    but a boolean on the front of the stream to tell you is the easiest
    way.

    I am beginning to realize the wisdom of putting a field on the front
    of any stream giving the version. Otherwise it becomes impossible to
    deal with old format files later on.

    Other that that, look at the two files with hex viewer to see if
    the9re is a recognisable signature.

    see http://mindprod.com/jgloss/hex.html

    --
    Bush crime family lost/embezzled $3 trillion from Pentagon.
    Complicit Bush-friendly media keeps mum. Rumsfeld confesses on video.
    http://www.infowars.com/articles/us/mckinney_grills_rumsfeld.htm

    Canadian Mind Products, Roedy Green.
    See http://mindprod.com/iraq.html photos of Bush's war crimes
     
    Roedy Green, Aug 4, 2005
    #7
  8. espenskogen

    Joined:
    Jan 31, 2011
    Messages:
    1
    Easiest way is to inspect the first bytes of the byte array. ZLib compression starts with 0x78AD - whilst Zip files start with 0x04034b50

    E:)
     
    espenskogen, Jan 31, 2011
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bharat Bhushan

    Appending byte[] to another byte[] array

    Bharat Bhushan, Aug 5, 2003, in forum: Java
    Replies:
    15
    Views:
    40,257
    Roedy Green
    Aug 5, 2003
  2. Kirby
    Replies:
    3
    Views:
    651
    Kirby
    Oct 8, 2004
  3. Replies:
    20
    Views:
    9,791
    licebmi
    Sep 7, 2009
  4. Replies:
    7
    Views:
    569
    Charlie Gordon
    Oct 1, 2007
  5. Tom McGlynn
    Replies:
    4
    Views:
    858
    Mark Space
    Apr 19, 2008
Loading...

Share This Page