java.io.File to java.lang.String

B

Benjamin

What's the best way to get the contents of a file represented by a
java.io.File object into a String?
 
K

Knute Johnson

Benjamin said:
What's the best way to get the contents of a file represented by a
java.io.File object into a String?

You don't specify what you consider best so how about simple as best?
Now for my curiosity, why would you want to do this?

import java.io.*;

public class test {
public static void main(String[] args) throws Exception {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
File f = new File(args[0]);
FileInputStream fis = new FileInputStream(f);
int n;
while ((n = fis.read()) != -1)
baos.write(n);
fis.close();
String str = baos.toString();
System.out.println(str);
}
}
 
T

Tom Hawtin

import java.io.*;

public class test {
public static void main(String[] args) throws Exception {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
File f = new File(args[0]);
FileInputStream fis = new FileInputStream(f);

Going for a FileReader would probably be better.

The next few lines should be wrapped a try/finally.
int n;
while ((n = fis.read()) != -1)

One byte at a time. Not going to be fast.
baos.write(n);
fis.close();
String str = baos.toString();

I don't believe that will do anything useful.
 
J

Jeff Higgins

Tom said:
One byte at a time. Not going to be fast.

Hi Tom,

Your comment prompted me to look for ways to do
block(bulk) read operations on text files.

One way I've come up with is below.
Will you comment on this, and can you suggest altenatives?

Thanks,
Jeff Higgins

import java.io.*;
import java.nio.CharBuffer;

public class TestBlockRead {
public static void main(String[] args)
{
try
{
File file = new File("file.9612544.bytes");
FileReader fileReader = new FileReader(file);
CharBuffer charBuffer = CharBuffer.allocate((int)file.length());
fileReader.read(charBuffer);
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
} } }
 
T

Tom Hawtin

Jeff said:
File file = new File("file.9612544.bytes");

Still need try-finally.
FileReader fileReader = new FileReader(file);
CharBuffer charBuffer = CharBuffer.allocate((int)file.length());

This could allocate a buffer three times to large, or way too small for
a huge file. allocateDirect may be a win if it were reused as a
temporary buffer (but I bet the implementation messes up somewhere).
fileReader.read(charBuffer);

This does not necessarily read all that could be read. Should be in a loop.

Tom Hawtin
 
B

Benjamin

You don't specify what you consider best so how about simple as best?
Now for my curiosity, why would you want to do this?
You're right I should be more specific. I did mean simplest. I am
writing a program which requires me to retrieve the full content of a
file.
import java.io.*;

public class test {
public static void main(String[] args) throws Exception {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
File f = new File(args[0]);
FileInputStream fis = new FileInputStream(f);
int n;
while ((n = fis.read()) != -1)
baos.write(n);
fis.close();
String str = baos.toString();
System.out.println(str);
}

}
 
J

Jeff Higgins

Tom said:
Still need try-finally.

Yes, thanks.
This could allocate a buffer three times to large,

Going over Javadocs... could you elaborate?
or way too small for a huge file.

OK, yes.

allocateDirect may be a win if it were reused as a
temporary buffer (but I bet the implementation messes up somewhere).

Skipping over this for the time being.
This does not necessarily read all that could be read. Should be in a
loop.

Again, I'm sorry but I haven't been able to figure out what might
cause read(charBuffer) to not read all that could be read?

Is this a sufficent loop?
while(fileReader.ready()){fileReader.read(charBuffer);}

Appreciate your help.

Thanks,

Jeff Higgins
 
L

Lew

Jeff said:
Going over Javadocs... could you elaborate?

Because Strings and Chars are encoded, as are files. UTF-8, for example, uses
one to three bytes per character depending on the character set and other
factors.

I'm not sure about how Tom arrived at three times as large but I can easily
see how the CharBuffer could be twice as large as the file data. CharBuffers
are allocated at two bytes per character. A file encoding that uses 8 bits
per character will only fill half such a buffer. I'm guessing that Tom is
familiar with some combination of encoding schemes that would have the
CharBuffer wind up three times too large for the file.

If the file uses a multibyte encoding with lots of characters that require
more than two bytes each.
Again, I'm sorry but I haven't been able to figure out what might
cause read(charBuffer) to not read all that could be read?

Is this a sufficent loop?
while(fileReader.ready()){fileReader.read(charBuffer);}

No. You'll have to fill the buffer, flip() it, read it to store or processe
the data, then rewind() and repeat. I haven't played with java.nio much but
if I erred here someone should step up and correct me pretty quickly.

<http://java.sun.com/developer/technicalArticles/releases/nio/index.html>
<http://www.javaworld.com/javaworld/jw-09-2001/jw-0907-merlin.html>

GIYF.
 
J

Jeff Higgins

Lew said:
Because Strings and Chars are encoded, as are files. ...

OK, chars are not bytes. (int)file.length() not a good choice here.

if file.length() > Integer.MAX_VALUE file == huge file
No. You'll have to fill the buffer, flip() it, read it to store or
processe the data, then rewind() and repeat. I haven't played with
java.nio much but if I erred here someone should step up and correct me
pretty quickly.

Going back over Javadocs -- silly condition.

Thanks for the pointers. I read the javaworld article, very interesting.
GIGR The Google isa great resource.

Back to the OP which caught my eye, and to Tom's response,
"One byte at a time. Not going to be fast."

OK, scratch the CharBuffer solution. Now my latest solution:
[snippet]

startBlock = System.currentTimeMillis();
for(int i = 0; i < 10; i++)
{
File file = new File("file.9612544.bytes");
byte[] a = new byte[(int)file.length()];
FileInputStream fis = new FileInputStream(file);
fis.read(a);
String str = new String(a,"US-ASCII");
fis.close();
}
endBlock = System.currentTimeMillis();
startLoop = System.currentTimeMillis();
for(int i = 0; i < 10; i++)
{
File file = new File("file.9612544.bytes");
byte[] a = new byte[(int)file.length()];
FileInputStream fis = new FileInputStream(file);
int n;
int c = 0;
while ((n = fis.read()) != -1)
{
a[0] = (byte)n;
}
String str = new String(a,"US-ASCII");
fis.close();
}
endLoop = System.currentTimeMillis();

Block 1547
Loop 287750

Thanks,
appreciate the OP
and all the comments.
Jeff Higgins
 
K

Knute Johnson

Jeff said:
Lew said:
Because Strings and Chars are encoded, as are files. ...

OK, chars are not bytes. (int)file.length() not a good choice here.

if file.length() > Integer.MAX_VALUE file == huge file
No. You'll have to fill the buffer, flip() it, read it to store or
processe the data, then rewind() and repeat. I haven't played with
java.nio much but if I erred here someone should step up and correct me
pretty quickly.

Going back over Javadocs -- silly condition.

Thanks for the pointers. I read the javaworld article, very interesting.
GIGR The Google isa great resource.

Back to the OP which caught my eye, and to Tom's response,
"One byte at a time. Not going to be fast."

OK, scratch the CharBuffer solution. Now my latest solution:
[snippet]

startBlock = System.currentTimeMillis();
for(int i = 0; i < 10; i++)
{
File file = new File("file.9612544.bytes");
byte[] a = new byte[(int)file.length()];
FileInputStream fis = new FileInputStream(file);
fis.read(a);

This may or may not read as many bytes as the length of the array a and
is therefore guaranteed not to work every time. See the docs.
String str = new String(a,"US-ASCII");
fis.close();
}
endBlock = System.currentTimeMillis();
startLoop = System.currentTimeMillis();
for(int i = 0; i < 10; i++)
{
File file = new File("file.9612544.bytes");
byte[] a = new byte[(int)file.length()];
FileInputStream fis = new FileInputStream(file);
int n;
int c = 0;
while ((n = fis.read()) != -1)
{
a[0] = (byte)n;

a[c++] = (byte)n;
 
G

Guest

Knute said:
Jeff said:
File file = new File("file.9612544.bytes");
byte[] a = new byte[(int)file.length()];
FileInputStream fis = new FileInputStream(file);
fis.read(a);

This may or may not read as many bytes as the length of the array a and
is therefore guaranteed not to work every time. See the docs.

s/guaranteed not/not guaranteed/w

Arne
 
E

Esmond Pitt

Jeff said:
Again, I'm sorry but I haven't been able to figure out what might
cause read(charBuffer) to not read all that could be read?

The fact that the Javadoc specifically says so?
 
P

Patricia Shanahan

Jeff said:
:) Yup, it is what it is.
Better for me to focus on what rather than why.

I think the "why" is because part of the file may be buffered in memory.
Disk reads are always in fixed block sizes, and the data required to
fill the program buffer may cross block boundaries.

Suppose some, but not all, of the data for the read call is already in
memory. The system could make you wait many milliseconds for a physical
read to let it fill your buffer. It is often more efficient to let you
get on with processing the data that is already available, in parallel
with a physical read to get more data. For example, the read call may be
being issued by a BufferedReader doing a readLine, and it can return
data to its caller as soon as it has a whole line.

Patricia
 
R

Roedy Green

For example, the read call may be
being issued by a BufferedReader doing a readLine, and it can return
data to its caller as soon as it has a whole line.

even though we did double buffering and the like back in the days of
16K machines, I don't think java.io itself is that smart. I don't
think it is clever enough to read ahead another buffer why processing
the previous one , or letting your start processing lines before the
i/o completes.
 
J

Jeff Higgins

Patricia said:
I think the "why" is because part of the file may be buffered in memory.
Disk reads are always in fixed block sizes, and the data required to
fill the program buffer may cross block boundaries.

Suppose some, but not all, of the data for the read call is already in
memory. The system could make you wait many milliseconds for a physical
read to let it fill your buffer. It is often more efficient to let you
get on with processing the data that is already available, in parallel
with a physical read to get more data. For example, the read call may be
being issued by a BufferedReader doing a readLine, and it can return
data to its caller as soon as it has a whole line.
LOL :) What us noobs won't go through to gain a little understanding!
Yup, during the course of this discussion I spent a good bit of energy
exploring some of the issues you describe. Mostly what I took away from it
was:
When using the basic IO facilities I should be concentrating on what I'm
hoping
to accomplish and not how the JVM is fetching bytes from whatever physical
medium.
What caused most of my confusion I suppose was the fact that I didn't have a
real
use-case in mind for this exploration. The OP wanted to know how to read the
contents of a file into a String, and I immediatly reacted by trying to find
a solution
to that problem when I may well have been better off asking "What am I
hoping
to accomplish here?". When given the advice, "This does not necessarily read
all
that could be read. Should be in a loop.", and after having consulted the
javadocs
my next question should probably have been: "Ok, now what?" instead of
"Well, why not?".
Anyway, it's been a pleasant line of inquiry, and fun.
Thanks for the response, much appreciated.
JH
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top