can Inflater be used to uncompress GZIP data?

C

chattycow

I am trying to uncompress GZIP data that is being given to me slowly 1
byte array at a time.
It appears that inflater may be able to do something like this, but can
it inflate GZIP data?

Thanks again everybody?

---Dean.
 
R

Roland de Ruiter

chattycow said:
I am trying to uncompress GZIP data that is being given to me slowly 1
byte array at a time.
It appears that inflater may be able to do something like this, but can
it inflate GZIP data?

Thanks again everybody?

---Dean.
How do you get your data exactly?

You can use GZIPInputStream to uncompress gzipped data. The following
example shows it to uncompress a gzipped file.

// UNTESTED
// Open the gzip file
String inFilename = "c:\path\to\gzippedfile.gz";
GZIPInputStream gzipInputStream =
new GZIPInputStream(new FileInputStream(inFilename));

// Open the output file
String outFilename = "c:\path\to\unzippedfile.xyz";
OutputStream out = new FileOutputStream(outFilename);


// Transfer bytes from the compressed file to the output file
byte[] buf = new byte[1024];
int len;
while ((len = gzipInputStream.read(buf)) >= 0) {
out.write(buf, 0, len);
}

// Close the file and stream
gzipInputStream.close();
out.close();
 
C

Chris Uppal

chattycow said:
I am trying to uncompress GZIP data that is being given to me slowly 1
byte array at a time.
It appears that inflater may be able to do something like this, but can
it inflate GZIP data?

It might be able to. I think you'd be better off using a custom InputStream as
described in my earlier post (different thread, same topic). But if you want
to go this route then you should familiarise yourself with the zlib library
(www.zlib.org) for which the Java util.zip stuff is just a thin and incomplete
wrapper.

That library implements zlib compression (as you might guess ;-) and gzip is
one of three closely related compression formats ("gzip", "deflate, and "raw")
which use zlib compression. The only difference is that they use different
envelopes for the data (header, trailer, checksum, etc.). Somewhere in the
zlib documentation it claims that it will automatically recognize "gzip" vs.
"deflate" data and decompress either, but I haven't found that to be true -- I
have to request the correct decompression explicitly. And the Java wrapper
omits the necessary API for configuring this. So, it might work if either (a)
the library /does/ auto-configure itself despite my experience, or (b) Java
happens to set the library up with the configuration you need.

-- chris
 
C

chattycow

Chris,
Thanks for the information.
Unfortunately, all the posts and information so far assume that I'm
getting the data from a file, and it's actually coming from a network
connection.

Here is my situation:
I'm getting the data from a network socket in packets of about 32Kb.
The file(s) was already saved in a database on the other side of the
network connection so I don't have much ability to change this...it was
done for network compression, then easy direct extraction in a
different instance without the end user having to use uncommon tools to
extract a file...anyway.....

I get these packets at a regular pace, but the files can be quite large
in total (1GB+) and there isn't enough memory to just create a big
ByteArray.
I basically need a way to feed the byte blocks into a GZIPInput stream
1 or more bytes at a time until I have enough to uncompress, then save
that portion to a file, then go onto the next chunk of 32K and do it
over again until the entire file has been transfered via these 32Kb
chunks of bytes.
Like you were saying, I would really rather stay away from having to
unravel the inner workings of ZLIB and GZIP to accomplish this. A if I
could get it to do what I want, GZIPInputStream seems like the best
bet...I just can't figure out how to make it take chunks of bytes at a
time.

Thanks again for your help,
----Dean.
 
C

chattycow

Chris,

Do you remember the thread name so I can search for it? I haven't
had much luck finding anything that does what I need. I found 1 that
was close, but the people in the list weren't too happy and never did
reveal the answer.

Thanks,
---Dean.
 
C

Chris Uppal

chattycow said:
Do you remember the thread name so I can search for it?

Very quick reply: I only meant my own post in your earlier "GZIPInputStream
(how to uncompress from a succession of byte arrays)" thread.

If that isn't clear enough, or if I'm missing the point, then feel free to ask
more...

-- chris
 
C

chattycow

All of your ideas finally led me to the idea of extending the
InputStream and OutputStream classes to handle my own internal buffer.
This way I can write to the buffer(using OutputStream) while reading
from it(Using InputStream) in another thread. Once I create the
inputstream, I feed that into the GZIPInputStream class and start
uncompressing...the GZIPInputStream never sees an end to the stream, so
it doesn't have any problems.

Thanks for all your help,
----Dean.
 
D

Dale King

chattycow said:
All of your ideas finally led me to the idea of extending the
InputStream and OutputStream classes to handle my own internal buffer.
This way I can write to the buffer(using OutputStream) while reading
from it(Using InputStream) in another thread. Once I create the
inputstream, I feed that into the GZIPInputStream class and start
uncompressing...the GZIPInputStream never sees an end to the stream, so
it doesn't have any problems.

Note that the Java API already provides such an abstraction, the
PipedInputStream and PipedOutputStream combination.

But I think you are still making this too complicated. How are you
getting the data you want to pump into your buffer. Presumably you are
reading it on some other input stream somewhere. So why not eliminate
the middle man? Perhaps you should consider something like
SequenceInputStream which concatenates InputStreams.

You still have not adequately explained how the chunks of data are
getting to the machine to be decompressed. Why isn't is a single stream
of data?
 
C

chattycow

The data is comming from a socket/network inputstream delivered to me
~32K at a time.

No doubt pipedinput/output streams are an option. However, there is a
few limitations with pipeinput/output streams...namely the static 1K
buffer, the need to connect them, 2 classes as apposed to 1, and any
other overhead associated with these...I'm sure there are more though.


What I wrote is a very simply extension of inputstream and outputstream
(~300 lines..w/ comments) connected by a single bytebuffer array.
I write blocks(32K at a time) to the output stream (that I extended).
In another thread, I connect the GZIPInputStream to the inputstream(I
extended) and start to uncompress.

SequenceInputStream is fine if you already have all your streams
identified, but I don't have all the data yet and I can't put it all in
memory. SequenceInputStream will combine all the streams that you tell
it, but it can't combine streams on the fly as you add them so it
doesn't help.

There are two features I'm trying to get:
1) The socket connection is also being used for general communication
before and after the gzip file is transfered to transmit other items,
informational messages, etc.. If I give the socket stream over to
GZIPInputStream, when it completes it will close the entire connection.
In addition, GZIPInputStream doesn't know it's done until it reads a
"-1" from the inputstream...it assumes it's reading from a file
connected to an input stream.
2) I want to be able to give a status of the file download while it's
downloading via another monitoring thread. As far as I know, you can't
do that with a stream that your already using for something else....so
every 32K, I check to make sure the data is OK and count how much I've
sent/received, how fast, etc.

I'm sure everything I just said brought up a lot more questions? I
posted back because I wanted others to see what my solution was so that
they might have an answer when they come accross the same problem.

In any case, the problem is solved.

Thanks again,
----Dean.
 
E

Eric Sosman

chattycow wrote On 05/19/06 16:15,:
[...]
SequenceInputStream is fine if you already have all your streams
identified, but I don't have all the data yet and I can't put it all in
memory. SequenceInputStream will combine all the streams that you tell
it, but it can't combine streams on the fly as you add them so it
doesn't help.

I confess I haven't actually used it myself, but from
the documentation it certainly appears you can "dream up"
additional inputs as you go along. Use the

SequenceInputStream(Enumeration)

constructor, with an implementation of Enumeration that
decides (at run time) whether to produce another InputStream
or to call it quits.
 
D

Dale King

chattycow said:
The data is comming from a socket/network inputstream delivered to me
~32K at a time.

No doubt pipedinput/output streams are an option. However, there is a
few limitations with pipeinput/output streams...namely the static 1K
buffer, the need to connect them, 2 classes as apposed to 1, and any
other overhead associated with these...I'm sure there are more though.

I don't really see any limitations there. The 1K is not really a
limitation. The thread pushing data will have to block at some point.
Whether it blocks at 1K or 32K is not going to have any real impact that
I can see.

The fact that it is 2 classes makes sense because you have to
abstractions used from 2 different threads. Each end of the pipe gets
its own object.
What I wrote is a very simply extension of inputstream and outputstream
(~300 lines..w/ comments) connected by a single bytebuffer array.

Which is not as good of a design as putting it into 2 separate classes.
I write blocks(32K at a time) to the output stream (that I extended).
In another thread, I connect the GZIPInputStream to the inputstream(I
extended) and start to uncompress.

But I'm really not trying to push you to the piped stream or your
implementation. I think both are the wrong idea. What you have is a
pull-push-pull design. You have one thread that is pulling data from the
server then pushing it to a buffer and you have another thread that is
reading from the GzipInputStream which is pulling from the buffer. I see
no reason to do all that. You should be able to that with just a pull
system where the GzipInputStream pulls from the server and eliminate the
middleman. But you have not been forthcoming on how you get the data
from the server to be able to advise you more completely.
SequenceInputStream is fine if you already have all your streams
identified, but I don't have all the data yet and I can't put it all in
memory. SequenceInputStream will combine all the streams that you tell
it, but it can't combine streams on the fly as you add them so it
doesn't help.

As someone else pointed out you can set up your own enumeration so that
is not true.
There are two features I'm trying to get:
1) The socket connection is also being used for general communication
before and after the gzip file is transfered to transmit other items,
informational messages, etc.. If I give the socket stream over to
GZIPInputStream, when it completes it will close the entire connection.
In addition, GZIPInputStream doesn't know it's done until it reads a
"-1" from the inputstream...it assumes it's reading from a file
connected to an input stream.
2) I want to be able to give a status of the file download while it's
downloading via another monitoring thread. As far as I know, you can't
do that with a stream that your already using for something else....so
every 32K, I check to make sure the data is OK and count how much I've
sent/received, how fast, etc.

I'm sure everything I just said brought up a lot more questions? I
posted back because I wanted others to see what my solution was so that
they might have an answer when they come accross the same problem.

I still see nothing that justifies creating the other thread and using
the push-pull-push design. The simplest thing to do here is to create
your own implementation of InputStream that wraps your server
connection. That implementation would contain the logic that is
currently in your thread that is pulling the data from the server it's
just that it won't be done in a loop, but instead will respond to calls
for more data.

If you were to post the code that is reading from the server and pumping
it to your new class then prehaps I can show you how it would look.
In any case, the problem is solved.

Solved quite incorrectly in my opinion.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,024
Latest member
ARDU_PROgrammER

Latest Threads

Top