Bytes coming through OK, but only up to a point

Discussion in 'Java' started by AndrewTK, Jul 23, 2006.

  1. AndrewTK

    AndrewTK Guest

    Hi,

    I'm looking for an explanation to my problem (bad data) or an alternate
    solution idea...

    I'm trying to get data from a form posted via HTTP/1.1 , but the lack
    of information in HTTP headers is offputting...

    The data I want to extract is in part text, in part binary (ZIP file)

    I tried to make a home grown implementation of what I called a
    "BoundInputStream" which simulates the end of a stream if it finds a
    certain byte sequence (the boundary) in its internal buffer, because
    when I have to pass the data on to the extractor, I have no control
    over what it reads - hence it could swallow whole chunks of data that
    follows or misinterpret extra data as part of the ZIP file...

    It works OK to a certain extent, but when I pass this input stream to
    the ZIP extractor, the latter extracts the first entry (perfectly well
    mind you) and invariably chokes on the second.

    The same happens when simply wrapping a FileInputStream in a
    BoundInputStream, but simply using the FileInputStream as is and
    passing it to the ZIP extractor works just fine and only on the second
    one does it start playing up. In one of my ZIPs, the first file was 1.9
    MB which came through fine and the second one didn't extract. In
    another file, the first entry was a mere 50K but the second entry never
    made it...



    I suspect the BoundInputStream is mangling the bytes that it is passing
    up to the ZipInputStream that wraps it in turn but I have no way of
    properly checking. The troubling part is that the first file always
    comes through fine, so the bytes can't be that mangled...

    It uses a byte array as a data buffer, which I suspect might at some
    point start kicking in with the "precision loss" effect...
    The main code follows - any idea as to why the bug....?
    (the files are also in my web directory at
    http://www.dcs.st-and.ac.uk/~atk1/zip_server2/ and the form that sends
    the data is at /~atk1/index.php?page=upload , "password" is not
    necessary, the rest is fairly bogus info to enter):

    // ==============================================

    private byte[] buffer;
    private byte[] boundary = null;

    private int BSIZE = 1024;

    private int offset = 0;
    private int end = 0;
    private int bpos = 0;

    // ......................

    /** Read data into the specified buffer.

    See the general contract for read(byte[] buff, int off, int len) in
    java.lang.InputStream
    */
    public int read(byte[] buff, int off, int len) throws IOException {
    if(closed) {throw new IOException("The stream is closed.");}

    if(boundary == null) {
    if(offset<end) {
    int x;
    for(x=0;x<len && offset < end;x++) {// stops when max read or meet
    EOS
    buff[x] = buffer[offset++];
    }

    return x;
    } else {
    refill();//System.out.println("### read(byte[], int, int) buffer
    null ###");
    return read( buff, off, len );
    }
    }

    if(offset < bpos) {
    int x;
    for(x=0;x<len && offset < bpos;x++) {// stops when max read or meet
    "EOS"
    //System.out.println( x+":"+buff[x] +" :=
    "+buffer[offset]+"\tO:"+offset+"\tB:"+bpos+"\tE:"+end );
    buff[x] = buffer[offset++];
    }

    return x;
    } else if(boundaryAbsent() || boundaryPartial() ) {// offset == bpos,
    but we are not in presence of a definitive boundary
    refill();//System.out.println("### read(byte[], int, int) ###");
    return read( buff, off, len );
    } else {// boundary present, and offset == bpos
    return -1;
    }
    }

    // ...............

    private int refill() throws IOException {
    // move remaining bytes to start of the buffer
    int datastart = (// where the start of the significant data is
    offset == bpos // there is no significant data
    && bpos != end // a boundary start has been found
    && ( (boundary == null)?false:end-bpos >= boundary.length) // the
    full boundary has been found
    )?
    offset+boundary.length:// the boundary is entirely in the buffer,
    get rid of it
    offset;// keep everything we have

    //System.out.println("REFILL from "+datastart+" -
    @start:\tOFST:"+offset+"\tBPOS:"+bpos+"\tEND:"+end);// ###

    System.arraycopy( buffer, datastart, buffer, 0, end-datastart );//
    more efficient to wrap around the buffer array...?

    end = end-datastart;
    offset = 0;

    int c = source.read( buffer, end, buffer.length-end );

    end += c==-1?0:c;

    locate();

    is_eos = c == -1;
    return c;
    }

    /**
    Find the index of the boundary in the buffer.
    */
    public int locate() {
    if(boundary == null) {
    bpos = end;
    return -1;
    }
    if(boundary.length == 0) {
    bpos = end;
    return -1;
    }

    int loc = indexOf( boundary, buffer , offset, end );

    if( loc == -1 ) {
    bpos = end;
    } else if( loc > -1 ) {
    bpos = loc;
    } else {
    bpos = -(loc+1);
    }

    return loc;
    }

    /**
    Find the index of one byte sequence in another.

    If the needle is found in the haystack, its position in the haystack
    will be returned, otherwise
    this method will return an int smaller than zero.

    -1 indicates that the byte sequence was not found.

    Any other negative return value is an indication of the offset at
    which the start of the needle was found.

    The index is the returned value, negated and decremented by 1.

    For example, if the value returned was -5, the start of the boundary
    was found in haystack at 5-1=4, and continues
    until the hasytack ends.
    */
    public static int indexOf(byte[] needle, byte[] haystack, int start,
    int finish) {
    int n = 0;
    int h = start;
    int pos = -1;
    while( h < finish ) {
    if(needle[n] == haystack[h]) {
    pos = n==0 ? h:pos;// if first time we are finding the start of the
    needle, register this position
    n++;// next position of needle
    if(n == needle.length)// all pieces of needle have been found in
    order in sequence in haystack
    {return pos;}// return last registered position
    // else just continue
    } else {// did not coincide
    n = 0;// reset the needle
    pos = -1;// false alarm. initialize
    }

    h++; // ever incrementing on the haystack
    }

    return pos!=-1?-(pos+1):-1; // never found the full item
    }
    AndrewTK, Jul 23, 2006
    #1
    1. Advertising

  2. AndrewTK

    Tony Morris Guest

    On Sun, 23 Jul 2006 00:14:55 -0700, AndrewTK wrote:

    > It works OK to a certain extent, but when I pass this input stream to
    > the ZIP extractor, the latter extracts the first entry (perfectly well
    > mind you) and invariably chokes on the second.


    First guess based on experience alone, streams aren't being closed
    properly.
    First guess further backed up by the lack of a finally clause in any of
    the code posted. Further still, backed up by the lack of a call to
    close(). And finally, the assignment of local declarations to 'null'.

    Asymptotically conclusive.

    --
    Tony Morris
    http://tmorris.net/
    Tony Morris, Jul 23, 2006
    #2
    1. Advertising

  3. AndrewTK

    AndrewTK Guest

    Thanks for the reply

    Tony Morris wrote:
    > On Sun, 23 Jul 2006 00:14:55 -0700, AndrewTK wrote:


    > First guess based on experience alone, streams aren't being closed
    > properly.


    Well I don't actually want to close any streams at all. In fact, there
    is only one real stream and if I close it I effectively lose any other
    data I was planning to get.

    The "real stream" might be containing say

    abcdefghijk :+: [some binary] :+: lmnopqrst

    What the BoundInputStream does is set the boundary to ":+:", and let me
    read the first part, "abcdefghijk". When the BIS sees the boundary, it
    pretends it found the end of the stream, and calls to read return -1

    By calling a reopen() function, I make the "reading head" so to speak
    go to right after the boundary (ready to read the binary data). I can
    then pass the stream on to a ZipStreamExtractor that just reads the
    stream. It will also get a simulated EOS when the next boundary is
    reached. The extractor calls close() when it gets the data it wants, or
    on error, and then the next call to reopen() will allow us to read the
    second string "lmnopqrst"

    So no closing is required, or as far as I can see. I certainly don't
    want to close the underlying "real" stream. What were you actually
    suggesting...?

    > First guess further backed up by the lack of a finally clause


    I don't normally use finally clauses, they are not a necessary piece,
    and I don't see what that would change. Please enlighten me.

    > Further still, backed up by the lack of a call to
    > close(). And finally, the assignment of local declarations to 'null'.


    For close() see above. I have to declare locals as null to give them
    scope outwith the place where they are first initialized and avoid the
    compiler moaning that "variable might not have been initialized" and
    not compiling. Unless there's another way around that?

    Thanks for the comments but I don't quite see the sparks of light yet...
    AndrewTK, Jul 23, 2006
    #3
  4. AndrewTK

    Chris Uppal Guest

    AndrewTK wrote:

    > I tried to make a home grown implementation of what I called a
    > "BoundInputStream" which simulates the end of a stream if it finds a
    > certain byte sequence (the boundary) in its internal buffer, because
    > when I have to pass the data on to the extractor, I have no control
    > over what it reads - hence it could swallow whole chunks of data that
    > follows or misinterpret extra data as part of the ZIP file...


    I can't see anything wrong with the basic approach. So, unless I'm missing
    something, the problem must be either in your custom stream implementation, or
    in the way you are asking the ZIP library to /use/ that stream.

    I would try splitting the problem up. Use your stream to split the underlying
    input into several files (actually written out to the file system) by reading
    up to each fake EOS in turn. If that produces the output you expect, and a ZIP
    utility can read the files that should be in ZIP format, then you know the
    problem is in the way that the ZIP library uses your stream(s). If not then
    its a simple bug in your stream, which you will be able to find easily enough
    once you know for sure that it's there.


    > The same happens when simply wrapping a FileInputStream in a
    > BoundInputStream, but simply using the FileInputStream as is and
    > passing it to the ZIP extractor works just fine and only on the second
    > one does it start playing up.


    That's good news, because it means that you can create a simple,
    self-contained, example which illustrates the problem. If you can't find the
    bug using the about technique then by all means post an SCE. (Please note that
    the files you posted a link to do not -- not at all -- constitute an SCE;
    there's a hell of a lot of stuff at that link and it's by no means clear what's
    relevant and what's not. Firstprototype.zip looked promising at first, but it
    has a load of clearly irrelevant stuff like sockets and threads in it)

    -- chris
    Chris Uppal, Jul 25, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steve Holden

    PyCon is Coming! PyCon is Coming!

    Steve Holden, Jan 5, 2006, in forum: Python
    Replies:
    0
    Views:
    294
    Steve Holden
    Jan 5, 2006
  2. AndrewTK

    Bytes coming through as -1

    AndrewTK, Aug 2, 2006, in forum: Java
    Replies:
    15
    Views:
    602
    AndrewTK
    Aug 3, 2006
  3. Replies:
    5
    Views:
    509
    Flash Gordon
    Apr 9, 2006
  4. Replies:
    8
    Views:
    474
    Bob Hairgrove
    Apr 10, 2006
  5. Saraswati lakki
    Replies:
    0
    Views:
    1,273
    Saraswati lakki
    Jan 6, 2012
Loading...

Share This Page