Why would file reading stop at 1124 bytes with this code?

Mickey Segal · Nov 4, 2004

The code below runs in a servlet; it reads the file and then deletes it.
Usually this works fine, but with larger files about 10% of the time the
reportString gets truncated at 1124 bytes. My guess is that File.delete is
being called before adding to reportString is finished, but I thought
File.delete would only be called after file reading finishes.

Can someone help by pointing out what I am doing wrong here? Suggestions
about how to fix the code would be particularly welcome.

The code:

final synchronized String read_delete(String fullPathAndFileString)
{
FileInputStream fis = null;
BufferedInputStream bis = null;
DataInputStream dataIS = null;
String reportString = "";
try
{
File fNow = new File(fullPathAndFileString);
fis = new FileInputStream(fNow);
bis = new BufferedInputStream(fis);
dataIS = new DataInputStream(bis);
int bytesAvailable;
while ((bytesAvailable = bis.available()) > 0)
{
byte[] b = new byte[bytesAvailable];
dataIS.readFully(b);
reportString += new String(b);
}
fNow.delete();
}
catch (EOFException ignored) {}
catch (FileNotFoundException ignored) {}
catch (IOException ignored) {}
finally
{
try
{
if (dataIS != null) dataIS.close();
if (bis != null) bis.close();
if (fis != null) fis.close();
}
catch (IOException ignored) {}
}
return(reportString);
}

Carl Howells · Nov 4, 2004

Mickey said:
The code below runs in a servlet; it reads the file and then deletes it.
Usually this works fine, but with larger files about 10% of the time the
reportString gets truncated at 1124 bytes. My guess is that File.delete is
being called before adding to reportString is finished, but I thought
File.delete would only be called after file reading finishes.

Can someone help by pointing out what I am doing wrong here? Suggestions
about how to fix the code would be particularly welcome.

The code:

final synchronized String read_delete(String fullPathAndFileString)
{
FileInputStream fis = null;
BufferedInputStream bis = null;
DataInputStream dataIS = null;
String reportString = "";

Ack. You only ever use dataIS, outside of initialization and closing
(and it's not necessary to use the others there, either)... Why do you
have variables for the other two? Just chain the constructors together.
It makes it clearer that they're not being used except as intermediate
values.

Additionally, you don't need to use a DataInputStream for this at all.
BufferedInputStream's read() method is clearly sufficient.

try
{
File fNow = new File(fullPathAndFileString);
fis = new FileInputStream(fNow);
bis = new BufferedInputStream(fis);
dataIS = new DataInputStream(bis);
int bytesAvailable;
while ((bytesAvailable = bis.available()) > 0)

Bad! Don't use available for *anything*... It's fundamentally broken.
It's certainly the cause of your problem. Just use the return value
from read(). If it's negative, you've reached the end of the file.

{
byte[] b = new byte[bytesAvailable];
dataIS.readFully(b);
reportString += new String(b);

Bad #1: You should use an explicit conversion from bytes to String,
rather than the system default.

Bad #2: You're using += with strings, in a loop. Unless you need to
use each string generated this way (which you clearly don't, in this
case... You only need the final result), this is awfully inefficient.
Use a StringBuffer or StringBuilder, depending on your target java
version (StringBuilder in 1.5).

}
fNow.delete();
}
catch (EOFException ignored) {}
catch (FileNotFoundException ignored) {}
catch (IOException ignored) {}
finally
{
try
{
if (dataIS != null) dataIS.close();
if (bis != null) bis.close();
if (fis != null) fis.close();

Calling close on any of the decorator InputStreams also closes the
stream they decorate. Closing only the outermost one is fine.

Steve Horsley · Nov 4, 2004

Mickey said:
The code below runs in a servlet; it reads the file and then deletes it.
Usually this works fine, but with larger files about 10% of the time the
reportString gets truncated at 1124 bytes. My guess is that File.delete is
being called before adding to reportString is finished, but I thought
File.delete would only be called after file reading finishes.

Can someone help by pointing out what I am doing wrong here? Suggestions
about how to fix the code would be particularly welcome.

The code:

<snip>

I suspect the problem is caused by your using InputStream.available().
This merely returns the number of bytes that can be guaranteed to be
supplied without the possibility of blocking. If there is any doubt as
to whether a read would block or not then available() MUST return 0.
This doesn't mean you've reached the end of the file, just that there
might be a pause before you can read any more.

I think you should not use available() at all - just read until you
get an EOF.

Steve.

Carl · Nov 4, 2004

Response clipped, question inline...

Mickey Segal wrote:

{
byte[] b = new byte[bytesAvailable];
dataIS.readFully(b);
reportString += new String(b);

Click to expand...

Bad #1: You should use an explicit conversion from bytes to String,
rather than the system default.

Would you mind elaborating on this. I Assume you are referring to the
"new String(b)" statement. Why is this wrong, and what exactly is the
system default?

Thanks,
Carl.

Mickey Segal · Nov 4, 2004

Carl Howells said:
Additionally, you don't need to use a DataInputStream for this at all.
BufferedInputStream's read() method is clearly sufficient.
Bad! Don't use available for *anything*... It's fundamentally broken.
It's certainly the cause of your problem. Just use the return value from
read(). If it's negative, you've reached the end of the file.

I used DataInputStream to get DataInputStream.readFully() but I will try
using the approach you suggested using read() instead.

Bad #1: You should use an explicit conversion from bytes to String,
rather than the system default.

Bad #2: You're using += with strings, in a loop. Unless you need to use
each string generated this way (which you clearly don't, in this case...
You only need the final result), this is awfully inefficient. Use a
StringBuffer or StringBuilder, depending on your target java version
(StringBuilder in 1.5).

The vast majority of files read are tiny so I was trying to avoid
StringBuffer code, but I suppose it makes more sense to optimize for the
rare longer files.

Thanks. They should put warning labels on methods that are fundamentally
broken. Does someone keep a list of such methods?

Tor Iver Wilhelmsen · Nov 4, 2004

Mickey Segal said:
The vast majority of files read are tiny so I was trying to avoid
StringBuffer code, but I suppose it makes more sense to optimize for the
rare longer files.

Using string concatenation is the same as implicitly using
StringBuffer anyway. Except you create a new one and throw it away for
each iteration.

Chris Smith · Nov 4, 2004

Carl said:
Would you mind elaborating on this. I Assume you are referring to the
"new String(b)" statement. Why is this wrong, and what exactly is the
system default?

What Carl is referring to is the character encoding. A character
encoding is the conversion between characters and bytes. For example,
ASCII maps a certain limited set of characters to the numbers 0-127.
Various ISO8859 standards extend ASCII with different (and incompatible)
characters mapped to bytes 128-255, so that the whole ASCIOI character
set plus a few extra characters can be included. There are also
multibyte encodings, such as UTF-16BE and UTF-16LE, which use two bytes
per character and can express the entire Unicode character set.
Finally, there are variable-length encodings such as UTF-8, in which
some characters (such as ASCII characters) take up only one byte, but
others can take two or more.

Each operating system (and sometimes depending on the localization of
the OS) chooses a default character encoding. That's okay for temporary
storage that will never extend beyond this system, such as for simple
application preferences. For anything that will be stored in a user-
exposed file, transferred over a network, or ANYTHING else where there's
a remote possibility that a file will be moved to a different machine,
you should pick a very specific encoding, such as ASCII or UTF-8.

All conversions between bytes and characters can take a character
encoding name as a parameter. Look for that overload in the API docs.
In particular, the constructor for String and String.getBytes both have
these overloads. InputStreamReader and OutputStreamWriter both have
constructors that take an encoding parameter as well.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation

John C. Bollinger · Nov 4, 2004

Carl said:
Response clipped, question inline...

Mickey Segal wrote:

{
byte[] b = new byte[bytesAvailable];
dataIS.readFully(b);
reportString += new String(b);

Click to expand...

Bad #1: You should use an explicit conversion from bytes to String,
rather than the system default.

Click to expand...

Would you mind elaborating on this. I Assume you are referring to the
"new String(b)" statement. Why is this wrong, and what exactly is the
system default?

In answer to your first question, it's wrong because there is no
universal answer to your second question. When you want to convert
between bytes and characters you should _always_ specify the charset to
use. Not only does it ensure that your program does not break (in this
regard) when you move it between machines or upgrade your machine's JRE,
it also documents your assumptions about the data your program is
intended to work with.

John Bollinger
(e-mail address removed)

Mickey Segal · Nov 4, 2004

Carl said:
Would you mind elaborating on this. I Assume you are referring to the "new
String(b)" statement. Why is this wrong, and what exactly is the system
default?

In re-writing the program I see another problem with the "new String(b)"
statement. On the final loop in which the byte array does not get filled up
completely there are old characters in the unused part at the end of the
byte array, so one needs to create a smaller String to include only the new
part of the byte array.

Mickey Segal · Nov 4, 2004

Chris Smith said:
All conversions between bytes and characters can take a character
encoding name as a parameter. Look for that overload in the API docs.
In particular, the constructor for String and String.getBytes both have
these overloads. InputStreamReader and OutputStreamWriter both have
constructors that take an encoding parameter as well.

How far back in Java does this feature go? We are keeping all our
client-side programming at Java 1.1 for now.

Mickey Segal · Nov 4, 2004

Mickey Segal said:
How far back in Java does this feature go? We are keeping all our
client-side programming at Java 1.1 for now.

Oops. I just found this in Java 1.1.

Michael Borgwardt · Nov 5, 2004

Chris said:
encoding is the conversion between characters and bytes. For example,
ASCII maps a certain limited set of characters to the numbers 0-127.
Various ISO8859 standards extend ASCII with different (and incompatible)
characters mapped to bytes 128-255, so that the whole ASCIOI character
set plus a few extra characters can be included. There are also
multibyte encodings, such as UTF-16BE and UTF-16LE, which use two bytes
per character and can express the entire Unicode character set.
Finally, there are variable-length encodings such as UTF-8, in which
some characters (such as ASCII characters) take up only one byte, but
others can take two or more.

There's even worse: some encodings are *stateful*, most notably ISO-2022.

Mickey Segal · Nov 5, 2004

Carl Howells said:
Bad! Don't use available for *anything*... It's fundamentally broken.
It's certainly the cause of your problem. Just use the return value from
read(). If it's negative, you've reached the end of the file.

I got rid of the available() method and the problem remains. The new source
code is below. I also tried increasing the byte array size from 1024 to
8196 and this did not fix the problem or change the number of bytes that
were read (1124). Getting rid of the fNow.delete() method did not get rid
of the problem either.

Any other hypotheses as to the cause of the problem?

New source code:

final synchronized String read_delete(String fullPathAndFileString)
{
FileInputStream fis = null;
BufferedInputStream bis = null;
String reportString = "";
try
{
File fNow = new File(fullPathAndFileString);
fis = new FileInputStream(fNow);
bis = new BufferedInputStream(fis);
byte[] b = new byte[8192];
int length;
while ((length = bis.read(b)) > 0) reportString += new String(b, 0,
length);
fNow.delete();
}
catch (Exception e) {}
finally
{
try
{
if (bis != null) bis.close();
if (fis != null) fis.close();
}
catch (Exception e) {}
}
return(reportString);
}

John C. Bollinger · Nov 5, 2004

Mickey said:
I got rid of the available() method and the problem remains. The new source
code is below. I also tried increasing the byte array size from 1024 to
8196 and this did not fix the problem or change the number of bytes that
were read (1124). Getting rid of the fNow.delete() method did not get rid
of the problem either.

Does the problem manifest consistently on specific files, or are there
files for which it sometimes works and sometimes doesn't?

Any other hypotheses as to the cause of the problem?

Have you considered actually displaying or logging the exceptions you
may be catching? That might be very revealing, and it would certainly
be a better general practice. Also, it would be better form to close
your input stream before deleting the underlying file. As others have
written, you only need to close the outermost stream.

New source code:

final synchronized String read_delete(String fullPathAndFileString)
{
FileInputStream fis = null;
BufferedInputStream bis = null;
String reportString = "";
try
{
File fNow = new File(fullPathAndFileString);
fis = new FileInputStream(fNow);
bis = new BufferedInputStream(fis);
byte[] b = new byte[8192];
int length;
while ((length = bis.read(b)) > 0) reportString += new String(b, 0,
length);

If your system's default character encoding is a multi-byte encoding
then this may crash from time when a partial read from the file ends in
the middle of the sequence of bytes of one encoded character. UTF-8 is
a reasonably common default, so that's not so unlikely. If your file
contains only character data then you should be using a suitably
configured Reader instead of an InputStream; otherwise you shouldn't be
trying to convert the data to String form at all.

fNow.delete();
}
catch (Exception e) {}
finally
{
try
{
if (bis != null) bis.close();
if (fis != null) fis.close();
}
catch (Exception e) {}
}
return(reportString);
}

Are you sure you're actually using the revised code? I suspect that
most here have had the occasional sheepish "Doh!" moment when we realize
that the reason our debugging isn't going anywhere is that we are still
running the original code instead of the debugged. It has happened to
me, at least.

John Bollinger
(e-mail address removed)

Mickey Segal · Nov 5, 2004

John C. Bollinger said:
Does the problem manifest consistently on specific files, or are there
files for which it sometimes works and sometimes doesn't?

When the file is about 4000 bytes the truncation occurs almost always. The
truncation seems much less frequent for files of 2000 bytes.

Have you considered actually displaying or logging the exceptions you may
be catching? That might be very revealing, and it would certainly be a
better general practice. Also, it would be better form to close your
input stream before deleting the underlying file. As others have written,
you only need to close the outermost stream.

I had full Exception code in the test version that I ran in a local applet
and never got any Exceptions. Of course that could be different running as
a servlet, but it seemed unlikely that the servlet was exiting with an
Exception before file reading was complete since execution proceeds on after
file reading to the next step of file deletion. But something is going
wrong so maybe I should test more explicitly.

I'll try moving File.delete to after stream closing, but since the problem
occurs with File.delete commented out it does not seem to be the problem.

Which stream counts as the "outermost" stream? FileInputStream or
BufferedInputStream?

If your system's default character encoding is a multi-byte encoding then
this may crash from time when a partial read from the file ends in the
middle of the sequence of bytes of one encoded character. UTF-8 is a
reasonably common default, so that's not so unlikely. If your file
contains only character data then you should be using a suitably
configured Reader instead of an InputStream; otherwise you shouldn't be
trying to convert the data to String form at all.

The files consist entirely of Strings generated by a Java applet and written
on the server using the servlet method for which code is listed below. I
have verified that the files written are complete. I don't remember why the
writing method used non-analagous classes to those used in the
reading/deleting method. Would this lead to different String encoding?

Are you sure you're actually using the revised code? I suspect that most
here have had the occasional sheepish "Doh!" moment when we realize that
the reason our debugging isn't going anywhere is that we are still running
the original code instead of the debugged. It has happened to me, at
least.

This thought occurred to me part way through testing but I satisfied myself
this was not the problem for the following reason: when I modified the
servlet code to remove file deletion, the file was not deleted.

Source code for writing method:

final synchronized String write(String fullPathAndFileString, String
actionString, String dataString)
{
String reportString = "";
FileWriter fileWriter = null;
PrintWriter printWriter = null;
try
{
fileWriter = new FileWriter(fullPathAndFileString,
actionString.equals("append"));
printWriter = new PrintWriter(fileWriter);
printWriter.println(dataString);
reportString = "OK";
}
catch (IOException e)
{
reportString = "error";
}
finally
{
try
{
if (printWriter != null) printWriter.close();
if (fileWriter != null) fileWriter.close();
}
catch (IOException ignored) {}
}
return(reportString);
}

John C. Bollinger · Nov 5, 2004

Mickey said:
When the file is about 4000 bytes the truncation occurs almost always. The
truncation seems much less frequent for files of 2000 bytes.

I had full Exception code in the test version that I ran in a local applet
and never got any Exceptions. Of course that could be different running as
a servlet, but it seemed unlikely that the servlet was exiting with an
Exception before file reading was complete since execution proceeds on after
file reading to the next step of file deletion. But something is going
wrong so maybe I should test more explicitly.

You should never catch general Exception and ignore it. If you want to
catch and ignore more specific exceptions then that may be OK, but you
should put a comment in the catch block that explains why you can safely
ignore the exception.

I'll try moving File.delete to after stream closing, but since the problem
occurs with File.delete commented out it does not seem to be the problem.

No, it probably isn't causing your problem, but do observe good form all
the same. Doing so habitually will help you avoid potential problems
you may not even have been aware of.

Which stream counts as the "outermost" stream? FileInputStream or
BufferedInputStream?

The BufferedInputStream, because you passed the FileInputStream to its
constructor. You need not even retain a reference to the
FileInputStream anywhere in your own code, and you absolutely shouldn't
manipulate it while the BufferedInputStream is still in use. (And
probably not afterward, either.)

The files consist entirely of Strings generated by a Java applet and written
on the server using the servlet method for which code is listed below. I
have verified that the files written are complete. I don't remember why the
writing method used non-analagous classes to those used in the
reading/deleting method. Would this lead to different String encoding?

If the files are written by other code running in the same VM instance,
or by code running in a different instance of the same VM on the same
machine, then there should not be an encoding difference. That does not
ensure that the encoding is suitable for the data, but I doubt whether
your problem runs along those lines.

Here's a thought, though: is the same file written and then read and
deleted within the scope of processing one request, or are the write and
the read / delete split between two different requests? If in two
different requests then could they be in flight concurrently? That
could very easily explain how the files are [appearing to be] truncated.
The file size trend you're seeing could be related to an I/O buffer size.

Source code for writing method:

final synchronized String write(String fullPathAndFileString, String
actionString, String dataString)
{
String reportString = "";
FileWriter fileWriter = null;
PrintWriter printWriter = null;
try
{
fileWriter = new FileWriter(fullPathAndFileString,
actionString.equals("append"));

FileWriter is a broken class. It automatically uses the system's
default charset, with no option to configure a different one. If there
is any reason at all to care about the charset (and there almost always
is at least _some_ reason) then you should instead use a suitably
configured OutputStreamWriter wrapped around a FileOutputStream. You
can wrap that in a PrintWriter if you like.

printWriter = new PrintWriter(fileWriter);
printWriter.println(dataString);
reportString = "OK";
}
catch (IOException e)
{
reportString = "error";
}

[...]

John Bollinger
(e-mail address removed)

Mickey Segal · Nov 5, 2004

John C. Bollinger said:
Here's a thought, though: is the same file written and then read and
deleted within the scope of processing one request, or are the write and
the read / delete split between two different requests? If in two
different requests then could they be in flight concurrently? That could
very easily explain how the files are [appearing to be] truncated. The
file size trend you're seeing could be related to an I/O buffer size.

The servlet write is called by one user and the servlet read/delete is
called by another user, each using Java applets, so there is no concurrency
issue of that sort.

I've now made a test version of this applet so I can fiddle more than I
could with the version that in use by our software. One of the things I can
now test is whether the problem is reading the file or sending out the
response string.

Mickey Segal · Nov 5, 2004

Mickey Segal said:
I've now made a test version of this applet so I can fiddle more than I
could with the version that in use by our software. One of the things I
can now test is whether the problem is reading the file or sending out the
response string.

Using my new test version I can verify that the full string is read properly
by the servlet. The problem seems to be later, sending a large string back
as a response doesn't work. The reportString output of my read_delete
method is handed off to what I thought was pretty generic code in the
servlet's doPost method:

res.setContentType("text/plain");
PrintWriter out = res.getWriter();
out.print(reportString);
out.flush();

Is there some problem using such code for a String of longer than 1124
bytes? Sometimes a longer String is sent without truncation to 1124 bytes,
but usually it gets truncated to 1124 bytes.

Mickey Segal · Nov 5, 2004

Mickey Segal said:
Using my new test version I can verify that the full string is read
properly by the servlet. The problem seems to be later, sending a large
string back as a response doesn't work.

The problem may be even later, in the receiving code in the applet. I will
re-do that; hopefully it will solve the problem.

Steve Horsley · Nov 6, 2004

Mickey said:
The problem may be even later, in the receiving code in the applet. I will
re-do that; hopefully it will solve the problem.

Ah! You're not using that nasty available() thing at the receiver are you?
Available() may welll return 0 if not all the reply has arrived yet. This
is more likely to happen with larger strings. 1120 could be the size of the
payload after the HTTP header that hapens to fit in one packet.

Steve

JAVA Filing related	6	Oct 19, 2007
Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022
Restrict user to directly call an excel file by typing URL in browser address	0	Oct 26, 2006
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Problem in deleting file from java	1	Jan 12, 2005
Using JPS, this class does not print (Help with Code)	0	May 20, 2008
Download ZIP-file with Internet Exlorer	9	Sep 6, 2005
Java XML getChildNodes problem	3	May 5, 2011

Why would file reading stop at 1124 bytes with this code?

Mickey Segal

Carl Howells

Steve Horsley

Carl

Mickey Segal

Tor Iver Wilhelmsen

Chris Smith

John C. Bollinger

Mickey Segal

Mickey Segal

Mickey Segal

Michael Borgwardt

Mickey Segal

John C. Bollinger

Mickey Segal

John C. Bollinger

Mickey Segal

Mickey Segal

Mickey Segal

Steve Horsley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads