Combining GZIP with OutputStreamWriter

R

Roedy Green

I am finishing up a little tool to create a Google Site Map. a list of
all your files in XML, their last updates, how frequently you update
them, and how important they are.

Google prefers the whole file be Gziped.

Is there a way plug together the GZIP and OutputStreamWriter so that
you compress on the fly?

It seems I may need to create the file, then read it back as bytes and
create the gzip, or use two passes with a ByteArrayOutputStream.
 
J

JScoobyCed

Roedy said:
I am finishing up a little tool to create a Google Site Map. a list of
all your files in XML, their last updates, how frequently you update
them, and how important they are.

Google prefers the whole file be Gziped.

Is there a way plug together the GZIP and OutputStreamWriter so that
you compress on the fly?

It seems I may need to create the file, then read it back as bytes and
create the gzip, or use two passes with a ByteArrayOutputStream.

yes there is: GZIPOutputStream. But if you do so, I doubt you want to
use an OutputStreamWriter.
<code>

public OutputStream toGzipOutputStream(OutputStream os)
throws IOException, NullPointerException {
return new GZIPOutputStream(os);
}

</code>
If you really want the OutputStreamWriter:
<code>
OutputStreamWriter osw = null;
try {
osw = new OutputStreamWriter(toGzipOutputStream(outStream));
} catch(IOException ioe) {
// TODO
} catch(NullPointerException npe) {
// TODO
}
</code>
 
R

Roedy Green

Is there a way plug together the GZIP and OutputStreamWriter so that
you compress on the fly?

It pretty obvious.

FileOutputStream fos = new FileOutputStream( new File( webRoot,
"googlesitemap.gz" ) );
GZIPOutputStream gzos = new GZIPOutputStream( fos, 10 * 1024 );

OutputStreamWriter eosw = new OutputStreamWriter( gzos, "UTF-8" );

I think my problem was getting into a headset of applying the gzip
last by thinking of the creating on the FileOutputSTream and
OutputStreamWriter as if there were an atomic pair.

When you start combinining layers like this, I wonder what rules of
thumb there are for where you put the buffering.
 
J

John C. Bollinger

Roedy said:
FileOutputStream fos = new FileOutputStream( new File( webRoot,
"googlesitemap.gz" ) );
GZIPOutputStream gzos = new GZIPOutputStream( fos, 10 * 1024 );

OutputStreamWriter eosw = new OutputStreamWriter( gzos, "UTF-8" );
[...]

When you start combinining layers like this, I wonder what rules of
thumb there are for where you put the buffering.

The usual rule of thumb is to put buffering as close as possible to the
external device. In this case that would mean inserting a
BufferedOutputStream between the GZIPOutputStream and the FileOutputStream.
 
J

John C. Bollinger

JScoobyCed said:
Roedy Green wrote:

[For creating a GZIPped XML file.]
yes there is: GZIPOutputStream. But if you do so, I doubt you want to
use an OutputStreamWriter.

Why do you doubt that? Roedy has character data that he wants to
deliver to a byte stream. OutputStreamWriter is the bridge between
character data and binary streams. Why /wouldn't/ he want to use one?
 
J

JScoobyCed

John said:
Why do you doubt that? Roedy has character data that he wants to
deliver to a byte stream. OutputStreamWriter is the bridge between
character data and binary streams. Why /wouldn't/ he want to use one?

Yes, I missed the point here :) I inverted the way the data are pulled
and considered the Writer would get bytes as input data (from the
Gzipped stream). But in fact data is gzipped after conversion.
Still under New Year's party influence maybe :)
 
C

Chris Uppal

John said:
The usual rule of thumb is to put buffering as close as possible to the
external device. In this case that would mean inserting a
BufferedOutputStream between the GZIPOutputStream and the
FileOutputStream.

In this case the rule of thumb might be misleading. GZIPOutputStream does a
fair bit of buffering of the compressed output (in the underlying zlib
implementation), so omitting the buffering around the file will do much less
damage than would normally be the case. On the other hand, the cost of writing
a single byte/character to a GZIPOutputStream may be higher than is usual for a
stream which is not connected to an external device. If each write() results
in crossing the JNI barrier into zlib (even if the supplied data is just copied
into zlib's internal buffers, as would typically be the case), then the writes
will take a performance hit.

I admit that I've never tried to measure the various overheads in this
situation, but there is at least a chance that putting buffering around
GZIPOutputStream would bring greater benefits than putting it around the raw
file. (In practise, I would put the buffering around the file only, but that's
only on a "well, at the worst it won't be /too/ far off optimal, and I can
always tune it later if I feel the need" basis.)

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top