Ridiculous readInt() bug? Read-head not advancing far enough?

N

nobrow

This one is just fantastic! I have a large binary file being processed
in Java. After insertion of much debugging code, and with a hex editor
I have discovered the following behaviour.

At some point the DataInputStream (dis) which I am using to read the
file hits the following sequence of bytes;

.... 05 00 00 00 00 00 03 07 C0 08 05 24 18 4D 11 E0 A8 ...

The following sequence of methods are executed;

dis.read() ... gives 5 (0x5) ... fine
dis.readLong() ... gives 198592 (0x00000000000307C0) ... fine
dis.readInt() ... gives 134554648 (0x08052418) ... fine
dis.readInt() ... gives 407704032 (0x184D11E0) ... WTF!?

Notice anything about that last one? ... The last byte read by the
preceeding readInt() is being read as the first byte by this
readInt()!!!!!

The really annoying thing is that its intermittent. Happens every time
today. Worked fine yesterday. Happened every time the day before. There
is nothing unusal about my system. No background processes that could
be getting involved. No changes in it from day to day.

I know posting code would be a good move but the program is quite
involved and difficult to chop down into a minimal example. Suffice it
to say that there is nothing complicated about the offending portion of
the code. The DataInputStream is not being shared across threads or
anything, and those methods are executed in succession, with nothing
else happening in between.

Am running on Linux.

$ java -version
java version "1.4.2_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_02-b03)
Java HotSpot(TM) Client VM (build 1.4.2_02-b03, mixed mode)

Seriously! Whats that about? Anyone ever seen anything like this before?
 
E

El

Also, on some occassions, an EOFException is thrown, despite the fact
that the DataInputStream is nowhere near the end of the file.
 
D

Daniel Dyer

This one is just fantastic! I have a large binary file being processed
in Java. After insertion of much debugging code, and with a hex editor
I have discovered the following behaviour.

At some point the DataInputStream (dis) which I am using to read the
file hits the following sequence of bytes;

... 05 00 00 00 00 00 03 07 C0 08 05 24 18 4D 11 E0 A8 ...

Where is the DataInputStream getting this data from? Can you be sure the
bug is in the DataInputStream and not somewhere else? Have you wrapped
the DataInputStream around some other input stream (is the data coming
from a file or a socket)? If the bug is intermittent are you certain that
the above sequence of bytes is exactly what is being fed to the
DataInputStream every time?

Dan.
 
C

Chris Uppal

The really annoying thing is that its intermittent. Happens every time
today. Worked fine yesterday. Happened every time the day before. There
is nothing unusal about my system. No background processes that could
be getting involved. No changes in it from day to day.

I think there must be something very strange about the stream you are reading
from. The source to DataInputStream.readInt() is straightforward and could not
possibly cause the results you are seeing (at least the 1.4.2 for Windows
version is, I assume the Linux version is identical).

Unless someone else recognises the symptoms, I think you'll have to give more
detail about how you are creating the DataInputStream.

Incidentally, can you reproduce the effect on a different Linux box (ideally
one
that does not have an identical installation) ?

-- chris
 
B

bugbear

This one is just fantastic! I have a large binary file being processed
in Java. After insertion of much debugging code, and with a hex editor
I have discovered the following behaviour.

At some point the DataInputStream (dis) which I am using to read the
file hits the following sequence of bytes;

... 05 00 00 00 00 00 03 07 C0 08 05 24 18 4D 11 E0 A8 ...

The following sequence of methods are executed;

dis.read() ... gives 5 (0x5) ... fine
dis.readLong() ... gives 198592 (0x00000000000307C0) ... fine
dis.readInt() ... gives 134554648 (0x08052418) ... fine
dis.readInt() ... gives 407704032 (0x184D11E0) ... WTF!?

Notice anything about that last one? ... The last byte read by the
preceeding readInt() is being read as the first byte by this
readInt()!!!!!

The really annoying thing is that its intermittent. Happens every time
today. Worked fine yesterday. Happened every time the day before. There
is nothing unusal about my system. No background processes that could
be getting involved. No changes in it from day to day.

I would recommend interposing a BufferredInputStream between
your DataInputStream and your actual InputStream, and
messing around with the BufferSize to see what happens.
I suspect this wil "stir the pot".

This sounds (horribly) like a buffer boundary problem
somewhere in your layers of InputStream-nes

BugBear
 
A

Alex Buell

I would recommend interposing a BufferredInputStream between
your DataInputStream and your actual InputStream, and
messing around with the BufferSize to see what happens.
I suspect this wil "stir the pot".

This sounds (horribly) like a buffer boundary problem
somewhere in your layers of InputStream-nes

Change to BufferedReader instead. Isn't DataInputStream old hat
anyway?

Cheers,
Alex.
 
E

El

The DataInputStream is wrapping a FileInputStream.

The nature of the app means that the file varies considerably from
execution to execution. My OP was just one example, but the same thing
happens (with different numbers) each execution.
 
E

El

There really is nothing special about how the stream is created. A
FileInputStream is created and then turned to a DataInputStream.

It is difficult to test on other systems as the project is a cumbersome
in terms of the other bits and pieces that have to be configured in
order to run it so Im pretty much stuck where I am.
 
E

El

I threw a BufferredInputStream into the mix and it worked. I should add
that its only worked once ... the program is slow so itll take a while
to gain confidence in this result.

Thats just bad. You understand what the problem actually is?

Thanks for the suggestion.
 
N

Nigel Wade

Alex said:
Change to BufferedReader instead. Isn't DataInputStream old hat
anyway?

Cheers,
Alex.

No, he certainly doesn't want to use any kind of io.Reader. They are for
reading character streams, which would be no use for binary data.

To the OP, could this be a problem with the underlying filesystem? What
happens if you strip out everything apart from the FileInputStream and
DataInputStream to read the data?
 
T

Tor Iver Wilhelmsen

Alex Buell said:
Change to BufferedReader instead. Isn't DataInputStream old hat
anyway?

Of course not: Readers are for character data, this is binary data.
 
T

Thomas Weidenfeller

El said:
Thats just bad. You understand what the problem actually is?

We can only guess. E.g. is the file still written while you start to
read it (maybe the OS gets confused)? Do you (accidentally) share the
reference to the reader (e.g. in another thread)? Does the problem
happen at some magic position in the file (multiple of 512 bytes, 1k,
2k, 1G, etc.)?

Does it happen only if you read the file from the particular file
system? Can you change the type of the file system? Is that by any
change a network mounted / shared file system? Does it happen with
different hard drives, or only with one particular drive? Does it happen
on different machined / different motherboards, or only a particular
one, or ones of a particular type? Does it happen with other VMs, too?
Are you absolutely sure the input data is correct?

You best bet would be to manage to create a stand-alone test case which
reproduces the bug, at least most of the time, in an acceptable time
frame. From that you have a much better change to work.

/Thomas
 
B

bugbear

El said:
I threw a BufferredInputStream into the mix and it worked. I should add
that its only worked once ... the program is slow so itll take a while
to gain confidence in this result.

Thats just bad. You understand what the problem actually is?

Thanks for the suggestion.

It wasn't meant to be a fix - just part of an information
gathering excercise leading to a daignosis - with any luck!

BugBear
 
B

bugbear

El said:
I threw a BufferredInputStream into the mix and it worked. I should add
that its only worked once ... the program is slow so itll take a while
to gain confidence in this result.

Thats just bad. You understand what the problem actually is?

Thanks for the suggestion.

BTW, for *performance*, you should have a BufferredInputStream
over your FileInputStream anyway. FileInputStream
was never meant to accessed a byte at a time.

BugBear
 
C

Chris Uppal

El said:
There really is nothing special about how the stream is created. A
FileInputStream is created and then turned to a DataInputStream.

Well, that /shouldn't/ cause problems, but it obviously is. So it seems that
/something/ in the IO stack between the lower levels of Java and the actual
disk is flaky on your box -- it could be the disk, the file-system, the kernel,
the IO libraries linked into the JVM, ....

Whatever, I recommend ensuring that your distaster recovery plans are adequate,
and that you have proper (validated) backups.

However, you shouldn't be putting DataInputStream directly around a
FileInputStream; there should be a layer of buffering inbetween, otherwise --
for instance -- reading one int will cause 4 separate read()s to hit the
kernel. That will be /killing/ your performance. It is also possible that
such an unusual load is showing up bugs that would not affect an application
that read data in "normal" sized chunks. If so then fixing the buffering may
fix the problem. Unfortunately it may instead only /mask/ the problem, so that
it seems as if it's fixed but it's really still waiting to happen again later
(or even for the disk to crash...).

-- chris
 
S

Steve Horsley

El said:
I threw a BufferredInputStream into the mix and it worked. I should add
that its only worked once ... the program is slow so itll take a while
to gain confidence in this result.

Thats just bad. You understand what the problem actually is?

Thanks for the suggestion.

So I guess the system has errors when the IO is being hit real
hard. You shoud worry about that.

It occurs to me that if you are writing results to another file
byte-by-byte as well, (i.e. a FileOutputStream not wrapped in a
BufferedOutputStream) than this will also be generating lots of
unneccesary I/O calls and hurting performance.

Steve
 
J

Joseph Dionne

This one is just fantastic! I have a large binary file being processed
in Java. After insertion of much debugging code, and with a hex editor
I have discovered the following behaviour.

[snip]

Am running on Linux.

$ java -version
java version "1.4.2_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_02-b03)
Java HotSpot(TM) Client VM (build 1.4.2_02-b03, mixed mode)

Seriously! Whats that about? Anyone ever seen anything like this before?

I too am running on Linux, SuSE 9.2, kernel 2.6.8, and Java 1.4.2_06. I
tried to replicate your symptoms in Java (code below), and used a c app
(code below) to diff the outputs. The only oddity I could find is that
Long.toHexString(long) does not want to convert bytes with values less
than 16 correctly. It keeps dropping the '0', i.e. x02 is always printed
as '2'.

You need to look elsewhere in your application for the cause of your
failure. The alternating pattering is very interesting, and might be
telling you where the problem is perhaps. As a general rule, and IMHO,
software tends to fail the same way all the time, and not just part time.

Hope this helps. Code below;

// tdis.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>

int
main(int argc, char *argv[])
{
int fd,ii = 0,rdType[] = { 1, 8, 8, 4, 4, 0 };
char byte[8];

fd=open(argv[1],O_RDONLY);
while(-1)
{
int jj;

if (rdType[ii] != read(fd,&byte,rdType[ii]))
{
close(fd);
exit(0);
}

for(jj=0;jj < rdType[ii];jj++)
printf("%x",byte[jj]);
printf("\n");

if (!(rdType[++ii])) ii = 0;
}
}

// tDis.java
import java.io.DataInputStream;
import java.io.FileInputStream;

public class tDis
{
public static void main(String[] args) throws Exception
{
DataInputStream dis = null;

try {
dis = new DataInputStream(new
FileInputStream((String)args[0]));

while(true)
{
System.out.println(Integer.toHexString(dis.readByte()));
System.out.println(Long.toHexString(dis.readLong()));
System.out.println(Long.toHexString(dis.readLong()));
System.out.println(Integer.toHexString(dis.readInt()));
System.out.println(Integer.toHexString(dis.readInt()));
}
} catch(Exception ex) {}

if (null != dis) dis.close();
System.exit(0);
}
}

joseph
 
E

El

Thanks for all the replies. Im going to leave this for the time being.
Inserting the BufferedInputStream is consistently working (dont know
why I didnt have it in there anyway).

I may come back to this when I have more time and see if I can pin
point the problem, so keep your eyes peeled for a future post.

Thanks again.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top