Bytes coming through OK, but only up to a point

A

AndrewTK

Hi,

I'm looking for an explanation to my problem (bad data) or an alternate
solution idea...

I'm trying to get data from a form posted via HTTP/1.1 , but the lack
of information in HTTP headers is offputting...

The data I want to extract is in part text, in part binary (ZIP file)

I tried to make a home grown implementation of what I called a
"BoundInputStream" which simulates the end of a stream if it finds a
certain byte sequence (the boundary) in its internal buffer, because
when I have to pass the data on to the extractor, I have no control
over what it reads - hence it could swallow whole chunks of data that
follows or misinterpret extra data as part of the ZIP file...

It works OK to a certain extent, but when I pass this input stream to
the ZIP extractor, the latter extracts the first entry (perfectly well
mind you) and invariably chokes on the second.

The same happens when simply wrapping a FileInputStream in a
BoundInputStream, but simply using the FileInputStream as is and
passing it to the ZIP extractor works just fine and only on the second
one does it start playing up. In one of my ZIPs, the first file was 1.9
MB which came through fine and the second one didn't extract. In
another file, the first entry was a mere 50K but the second entry never
made it...



I suspect the BoundInputStream is mangling the bytes that it is passing
up to the ZipInputStream that wraps it in turn but I have no way of
properly checking. The troubling part is that the first file always
comes through fine, so the bytes can't be that mangled...

It uses a byte array as a data buffer, which I suspect might at some
point start kicking in with the "precision loss" effect...
The main code follows - any idea as to why the bug....?
(the files are also in my web directory at
http://www.dcs.st-and.ac.uk/~atk1/zip_server2/ and the form that sends
the data is at /~atk1/index.php?page=upload , "password" is not
necessary, the rest is fairly bogus info to enter):

// ==============================================

private byte[] buffer;
private byte[] boundary = null;

private int BSIZE = 1024;

private int offset = 0;
private int end = 0;
private int bpos = 0;

// ......................

/** Read data into the specified buffer.

See the general contract for read(byte[] buff, int off, int len) in
java.lang.InputStream
*/
public int read(byte[] buff, int off, int len) throws IOException {
if(closed) {throw new IOException("The stream is closed.");}

if(boundary == null) {
if(offset<end) {
int x;
for(x=0;x<len && offset < end;x++) {// stops when max read or meet
EOS
buff[x] = buffer[offset++];
}

return x;
} else {
refill();//System.out.println("### read(byte[], int, int) buffer
null ###");
return read( buff, off, len );
}
}

if(offset < bpos) {
int x;
for(x=0;x<len && offset < bpos;x++) {// stops when max read or meet
"EOS"
//System.out.println( x+":"+buff[x] +" :=
"+buffer[offset]+"\tO:"+offset+"\tB:"+bpos+"\tE:"+end );
buff[x] = buffer[offset++];
}

return x;
} else if(boundaryAbsent() || boundaryPartial() ) {// offset == bpos,
but we are not in presence of a definitive boundary
refill();//System.out.println("### read(byte[], int, int) ###");
return read( buff, off, len );
} else {// boundary present, and offset == bpos
return -1;
}
}

// ...............

private int refill() throws IOException {
// move remaining bytes to start of the buffer
int datastart = (// where the start of the significant data is
offset == bpos // there is no significant data
&& bpos != end // a boundary start has been found
&& ( (boundary == null)?false:end-bpos >= boundary.length) // the
full boundary has been found
)?
offset+boundary.length:// the boundary is entirely in the buffer,
get rid of it
offset;// keep everything we have

//System.out.println("REFILL from "+datastart+" -
@start:\tOFST:"+offset+"\tBPOS:"+bpos+"\tEND:"+end);// ###

System.arraycopy( buffer, datastart, buffer, 0, end-datastart );//
more efficient to wrap around the buffer array...?

end = end-datastart;
offset = 0;

int c = source.read( buffer, end, buffer.length-end );

end += c==-1?0:c;

locate();

is_eos = c == -1;
return c;
}

/**
Find the index of the boundary in the buffer.
*/
public int locate() {
if(boundary == null) {
bpos = end;
return -1;
}
if(boundary.length == 0) {
bpos = end;
return -1;
}

int loc = indexOf( boundary, buffer , offset, end );

if( loc == -1 ) {
bpos = end;
} else if( loc > -1 ) {
bpos = loc;
} else {
bpos = -(loc+1);
}

return loc;
}

/**
Find the index of one byte sequence in another.

If the needle is found in the haystack, its position in the haystack
will be returned, otherwise
this method will return an int smaller than zero.

-1 indicates that the byte sequence was not found.

Any other negative return value is an indication of the offset at
which the start of the needle was found.

The index is the returned value, negated and decremented by 1.

For example, if the value returned was -5, the start of the boundary
was found in haystack at 5-1=4, and continues
until the hasytack ends.
*/
public static int indexOf(byte[] needle, byte[] haystack, int start,
int finish) {
int n = 0;
int h = start;
int pos = -1;
while( h < finish ) {
if(needle[n] == haystack[h]) {
pos = n==0 ? h:pos;// if first time we are finding the start of the
needle, register this position
n++;// next position of needle
if(n == needle.length)// all pieces of needle have been found in
order in sequence in haystack
{return pos;}// return last registered position
// else just continue
} else {// did not coincide
n = 0;// reset the needle
pos = -1;// false alarm. initialize
}

h++; // ever incrementing on the haystack
}

return pos!=-1?-(pos+1):-1; // never found the full item
}
 
T

Tony Morris

It works OK to a certain extent, but when I pass this input stream to
the ZIP extractor, the latter extracts the first entry (perfectly well
mind you) and invariably chokes on the second.

First guess based on experience alone, streams aren't being closed
properly.
First guess further backed up by the lack of a finally clause in any of
the code posted. Further still, backed up by the lack of a call to
close(). And finally, the assignment of local declarations to 'null'.

Asymptotically conclusive.
 
A

AndrewTK

Thanks for the reply

First guess based on experience alone, streams aren't being closed
properly.

Well I don't actually want to close any streams at all. In fact, there
is only one real stream and if I close it I effectively lose any other
data I was planning to get.

The "real stream" might be containing say

abcdefghijk :+: [some binary] :+: lmnopqrst

What the BoundInputStream does is set the boundary to ":+:", and let me
read the first part, "abcdefghijk". When the BIS sees the boundary, it
pretends it found the end of the stream, and calls to read return -1

By calling a reopen() function, I make the "reading head" so to speak
go to right after the boundary (ready to read the binary data). I can
then pass the stream on to a ZipStreamExtractor that just reads the
stream. It will also get a simulated EOS when the next boundary is
reached. The extractor calls close() when it gets the data it wants, or
on error, and then the next call to reopen() will allow us to read the
second string "lmnopqrst"

So no closing is required, or as far as I can see. I certainly don't
want to close the underlying "real" stream. What were you actually
suggesting...?
First guess further backed up by the lack of a finally clause

I don't normally use finally clauses, they are not a necessary piece,
and I don't see what that would change. Please enlighten me.
Further still, backed up by the lack of a call to
close(). And finally, the assignment of local declarations to 'null'.

For close() see above. I have to declare locals as null to give them
scope outwith the place where they are first initialized and avoid the
compiler moaning that "variable might not have been initialized" and
not compiling. Unless there's another way around that?

Thanks for the comments but I don't quite see the sparks of light yet...
 
C

Chris Uppal

AndrewTK said:
I tried to make a home grown implementation of what I called a
"BoundInputStream" which simulates the end of a stream if it finds a
certain byte sequence (the boundary) in its internal buffer, because
when I have to pass the data on to the extractor, I have no control
over what it reads - hence it could swallow whole chunks of data that
follows or misinterpret extra data as part of the ZIP file...

I can't see anything wrong with the basic approach. So, unless I'm missing
something, the problem must be either in your custom stream implementation, or
in the way you are asking the ZIP library to /use/ that stream.

I would try splitting the problem up. Use your stream to split the underlying
input into several files (actually written out to the file system) by reading
up to each fake EOS in turn. If that produces the output you expect, and a ZIP
utility can read the files that should be in ZIP format, then you know the
problem is in the way that the ZIP library uses your stream(s). If not then
its a simple bug in your stream, which you will be able to find easily enough
once you know for sure that it's there.

The same happens when simply wrapping a FileInputStream in a
BoundInputStream, but simply using the FileInputStream as is and
passing it to the ZIP extractor works just fine and only on the second
one does it start playing up.

That's good news, because it means that you can create a simple,
self-contained, example which illustrates the problem. If you can't find the
bug using the about technique then by all means post an SCE. (Please note that
the files you posted a link to do not -- not at all -- constitute an SCE;
there's a hell of a lot of stuff at that link and it's by no means clear what's
relevant and what's not. Firstprototype.zip looked promising at first, but it
has a load of clearly irrelevant stuff like sockets and threads in it)

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top