Chunked GZIP processing using Java Sockets

Discussion in 'Java' started by aztechnology, Mar 23, 2006.

  1. aztechnology

    aztechnology Guest

    Hi,

    My java client program is reading web sites using low level sockets.
    The response from the web site HTML is chunked and gzipped. I am aware
    of the HTTPClient and JRE's HTTPUrlConnection APIs that can handle this
    directly, however; I must use the low level socket due to the error
    control that I need to implement is not available via HTTPClient/JRE.

    Can anyone be kind enough to point me how to read http response that is
    chunked and gzipped using java sockets? Are thre any classes that
    provide the ability to coalesce the chunked stream and then deflate the
    zipped contents?

    Thanks
     
    aztechnology, Mar 23, 2006
    #1
    1. Advertisements

  2. aztechnology

    Chris Smith Guest

    You'll probably have to implement your own class to handle chunking as a
    subclass of FilterInputStream, if you can't use a higher-level API like
    HttpClient. Chunking isn't difficult, so this shouldn't take long.
    There is already a GZIPInputStream in java.util.zip. So you'd do this:

    InputStream base = ...;
    InputStream logical = new GZIPInputStream(
    new MyChunkedInputStream(base);

    ...

    --
    www.designacourse.com
    The Easiest Way To Train Anyone... Anywhere.

    Chris Smith - Lead Software Developer/Technical Trainer
    MindIQ Corporation
     
    Chris Smith, Mar 23, 2006
    #2
    1. Advertisements

  3. aztechnology

    aztechnology Guest

    Makes sense - anyone has chunked input stream logic that I can readily
    use? any pointers there.

    I tried using the ones from other libraries and for some reason it
    chokes - most likely the HTTP hedaer needs to be stripped before
    passing onto the chunkedinputstream class - I do not think these
    implementations expect the headers to be in tact (beginning of the
    response) - I will need to strip the headers before passing onto the
    stream handler.
     
    aztechnology, Mar 23, 2006
    #3
  4. aztechnology

    Roedy Green Guest

    Roedy Green, Mar 23, 2006
    #4
  5. aztechnology

    aztechnology Guest

    HTTP/1.1 200 OK
    Cache-Control: private,max-age=0
    Date: Thu, 23 Mar 2006 21:26:29 GMT
    Transfer-Encoding: chunked
    Content-Type: text/html; charset=utf-8
    Content-Encoding: gzip
    Expires: Wed, 01 Jan 1997 12:00:00 GMT
    Vary: Accept-Encoding
    X-Powered-By: ASP.NET
    X-AspNet-Version: 1.1.4322
    Server: Unauthorized-Use-Prohibited

    These are the headers I get from the output - if I try to Jam this
    input stream via just Gzip of course it is not going to work, so I need
    to take the input stream, wrap it in Gzip, and then wrap it on Chunked,
    right?

    Any reference for chunked stream reader? Also, do I need to strip the
    HTTP headers (as above) before passing to the stream handlers?

    Thanks
     
    aztechnology, Mar 23, 2006
    #5
  6. aztechnology

    aztechnology Guest

    HTTP/1.1 200 OK
    Cache-Control: private,max-age=0
    Date: Thu, 23 Mar 2006 21:26:29 GMT
    Transfer-Encoding: chunked
    Content-Type: text/html; charset=utf-8
    Content-Encoding: gzip
    Expires: Wed, 01 Jan 1997 12:00:00 GMT
    Vary: Accept-Encoding
    X-Powered-By: ASP.NET
    X-AspNet-Version: 1.1.4322
    Server: Unauthorized-Use-Prohibited

    These are the headers I get from the output - if I try to Jam this
    input stream via just Gzip of course it is not going to work, so I need
    to take the input stream, wrap it in Gzip, and then wrap it on Chunked,
    right?

    Any reference for chunked stream reader? Also, do I need to strip the
    HTTP headers (as above) before passing to the stream handlers? Again I
    am using raw sockets

    Thanks
     
    aztechnology, Mar 23, 2006
    #6
  7. aztechnology

    Chris Smith Guest

    The other way around. You need to un-chunk it first, then take that
    result and gunzip it. Yes, you definitely need to remove the HTTP
    headers.

    --
    www.designacourse.com
    The Easiest Way To Train Anyone... Anywhere.

    Chris Smith - Lead Software Developer/Technical Trainer
    MindIQ Corporation
     
    Chris Smith, Mar 24, 2006
    #7
  8. Could you give a little more information as to why you need to use the
    socket? It seems to me as bad design doing it that way, maybe I am wrong
    though. Could you not have it as to separate tasks? Where the socket
    operation handles any errors while the API handles all non error
    situations, that way you would have automatic dechunking and gzipping,
    while at the same time error control.

    I am sorry if this sounds like a bad idea, but I dont have any info on
    architecture so its difficult to say if there is a better way of doing
    it. It just sound to me like what you have now is not good architecture:(

    /tom
     
    tom fredriksen, Mar 24, 2006
    #8
  9. aztechnology

    aztechnology Guest

    Sockets give me the control on handling network problems and server
    responses the way I like it to. I need the total control - so for my
    needs I know I cannot use JRE HTTPConnectionURL APIs.

    So I guess I need to do this:

    InputSptream is = new
    unGZipstream(de-chunkstream(socket.getinputstream());

    And then I need to find the chunkingstream code.

    Also, how do I strip the headers before calling the above line? Just
    advance the stream until the input stream is past the headers? I can
    do readLine() perhaps 13/14 times to get the stream to point to the
    dechnked/gzip data in the stream.

    Thanks
     
    aztechnology, Mar 25, 2006
    #9
  10. The chunking is part of the http rfc and as far as I remember its not
    all that difficult to implement, at least the basic principle isn't.
    The problem is that you need the headers to be able to to de-chunking
    properly, so what you actually need to do is create a sort of filter
    applied at the socket level which is activated when you see a chunked
    transmission. Perhaps just looking at the code the HTTPConnectionURL
    could help you, if its possible you could even copy it. Other than that
    I think your best bet is to search the web for a HTTP protocol
    implementation in java which you can use some stuff from. The irony is
    that you, in effect, are creating an HTTP socket on an HTTP socket.

    /tom
     
    tom fredriksen, Mar 25, 2006
    #10
  11. aztechnology

    aztechnology Guest

    This is a special purpose code [known client and server]- so in this
    case the HTTP response is always going to be gzip and chunked - so I am
    OK to hard code assuming a gzip chunked response. So, HTTP headers are
    not anything meaningful in my case, I know beforehand the response
    type.
     
    aztechnology, Mar 29, 2006
    #11
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.