Chunked GZIP processing using Java Sockets

A

aztechnology

Hi,

My java client program is reading web sites using low level sockets.
The response from the web site HTML is chunked and gzipped. I am aware
of the HTTPClient and JRE's HTTPUrlConnection APIs that can handle this
directly, however; I must use the low level socket due to the error
control that I need to implement is not available via HTTPClient/JRE.

Can anyone be kind enough to point me how to read http response that is
chunked and gzipped using java sockets? Are thre any classes that
provide the ability to coalesce the chunked stream and then deflate the
zipped contents?

Thanks
 
C

Chris Smith

Can anyone be kind enough to point me how to read http response that is
chunked and gzipped using java sockets? Are thre any classes that
provide the ability to coalesce the chunked stream and then deflate the
zipped contents?

You'll probably have to implement your own class to handle chunking as a
subclass of FilterInputStream, if you can't use a higher-level API like
HttpClient. Chunking isn't difficult, so this shouldn't take long.
There is already a GZIPInputStream in java.util.zip. So you'd do this:

InputStream base = ...;
InputStream logical = new GZIPInputStream(
new MyChunkedInputStream(base);

...

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
A

aztechnology

Makes sense - anyone has chunked input stream logic that I can readily
use? any pointers there.

I tried using the ones from other libraries and for some reason it
chokes - most likely the HTTP hedaer needs to be stripped before
passing onto the chunkedinputstream class - I do not think these
implementations expect the headers to be in tact (beginning of the
response) - I will need to strip the headers before passing onto the
stream handler.
 
A

aztechnology

HTTP/1.1 200 OK
Cache-Control: private,max-age=0
Date: Thu, 23 Mar 2006 21:26:29 GMT
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Expires: Wed, 01 Jan 1997 12:00:00 GMT
Vary: Accept-Encoding
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Server: Unauthorized-Use-Prohibited

These are the headers I get from the output - if I try to Jam this
input stream via just Gzip of course it is not going to work, so I need
to take the input stream, wrap it in Gzip, and then wrap it on Chunked,
right?

Any reference for chunked stream reader? Also, do I need to strip the
HTTP headers (as above) before passing to the stream handlers?

Thanks
 
A

aztechnology

HTTP/1.1 200 OK
Cache-Control: private,max-age=0
Date: Thu, 23 Mar 2006 21:26:29 GMT
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Expires: Wed, 01 Jan 1997 12:00:00 GMT
Vary: Accept-Encoding
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Server: Unauthorized-Use-Prohibited

These are the headers I get from the output - if I try to Jam this
input stream via just Gzip of course it is not going to work, so I need
to take the input stream, wrap it in Gzip, and then wrap it on Chunked,
right?

Any reference for chunked stream reader? Also, do I need to strip the
HTTP headers (as above) before passing to the stream handlers? Again I
am using raw sockets

Thanks
 
C

Chris Smith

HTTP/1.1 200 OK
Cache-Control: private,max-age=0
Date: Thu, 23 Mar 2006 21:26:29 GMT
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Expires: Wed, 01 Jan 1997 12:00:00 GMT
Vary: Accept-Encoding
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Server: Unauthorized-Use-Prohibited

These are the headers I get from the output - if I try to Jam this
input stream via just Gzip of course it is not going to work, so I need
to take the input stream, wrap it in Gzip, and then wrap it on Chunked,
right?

The other way around. You need to un-chunk it first, then take that
result and gunzip it. Yes, you definitely need to remove the HTTP
headers.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
T

tom fredriksen

Hi,

My java client program is reading web sites using low level sockets.
The response from the web site HTML is chunked and gzipped. I am aware
of the HTTPClient and JRE's HTTPUrlConnection APIs that can handle this
directly, however; I must use the low level socket due to the error
control that I need to implement is not available via HTTPClient/JRE.

Can anyone be kind enough to point me how to read http response that is
chunked and gzipped using java sockets? Are thre any classes that
provide the ability to coalesce the chunked stream and then deflate the
zipped contents?

Could you give a little more information as to why you need to use the
socket? It seems to me as bad design doing it that way, maybe I am wrong
though. Could you not have it as to separate tasks? Where the socket
operation handles any errors while the API handles all non error
situations, that way you would have automatic dechunking and gzipping,
while at the same time error control.

I am sorry if this sounds like a bad idea, but I dont have any info on
architecture so its difficult to say if there is a better way of doing
it. It just sound to me like what you have now is not good architecture:(

/tom
 
A

aztechnology

Sockets give me the control on handling network problems and server
responses the way I like it to. I need the total control - so for my
needs I know I cannot use JRE HTTPConnectionURL APIs.

So I guess I need to do this:

InputSptream is = new
unGZipstream(de-chunkstream(socket.getinputstream());

And then I need to find the chunkingstream code.

Also, how do I strip the headers before calling the above line? Just
advance the stream until the input stream is past the headers? I can
do readLine() perhaps 13/14 times to get the stream to point to the
dechnked/gzip data in the stream.

Thanks
 
T

tom fredriksen

Sockets give me the control on handling network problems and server
responses the way I like it to. I need the total control - so for my
needs I know I cannot use JRE HTTPConnectionURL APIs.

So I guess I need to do this:

InputSptream is = new
unGZipstream(de-chunkstream(socket.getinputstream());

And then I need to find the chunkingstream code.

Also, how do I strip the headers before calling the above line? Just
advance the stream until the input stream is past the headers? I can
do readLine() perhaps 13/14 times to get the stream to point to the
dechnked/gzip data in the stream.

The chunking is part of the http rfc and as far as I remember its not
all that difficult to implement, at least the basic principle isn't.
The problem is that you need the headers to be able to to de-chunking
properly, so what you actually need to do is create a sort of filter
applied at the socket level which is activated when you see a chunked
transmission. Perhaps just looking at the code the HTTPConnectionURL
could help you, if its possible you could even copy it. Other than that
I think your best bet is to search the web for a HTTP protocol
implementation in java which you can use some stuff from. The irony is
that you, in effect, are creating an HTTP socket on an HTTP socket.

/tom
 
A

aztechnology

This is a special purpose code [known client and server]- so in this
case the HTTP response is always going to be gzip and chunked - so I am
OK to hard code assuming a gzip chunked response. So, HTTP headers are
not anything meaningful in my case, I know beforehand the response
type.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top