Try to read a file faster

A

aquafresh3

Hello,

I have to analyse the content of file to find some specific words in
it.
First I was using the bufferedReader.readline() method to do read my
file but sometimes my file could be composed by only one line with a
size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
So I decided to use another method which is to read a line with a
maximum of characters read set to 204800 per line.

My problem is when I read the file it takes several minutes (about 30
minutes for a file that is 1.68Mb).

Here is the method I use to read the file

/**
* This method is used to read a line with a maximum of characters
read set to 204800 per line
* @param br The BufferedReader on which read
* @return finalLine The String representing the line read
* @throws IOException
*/
private String readLineWithMaxSize(BufferedReader br) throws
IOException {
String finalLine = null;
int readCharacter = -1;
char[] lineChars = new char[204800];
boolean bufferFull = false;
if (br != null) {
int index = 0;
readCharacter = br.read();
// If the read character does not correspond to a new line
or to
// an end of file, we treat it.
while (readCharacter != -1 && readCharacter != '\r' &&
readCharacter != '\n') {
// if the buffer is not full, we add the character to
the array of characters
if (!bufferFull) {
lineChars[index] = (char) readCharacter;
index++;
bufferFull = index >= lineChars.length;
}
readCharacter = br.read();
}
// If the read character is \r and the next one is \n, we
skip it.
if (readCharacter == '\r') {
br.mark(2);
int nextReadCharacter = br.read();
if (nextReadCharacter != '\n') {
br.reset();
}
}
// We construct a string representing the line from the
buffer of
// characters read
if (index != 0) {
finalLine = new String(lineChars);
} else if (readCharacter == '\r' || readCharacter == '\n')
{
finalLine = "";
}
}
return finalLine;
}

Is there a better solution/method do read a big file faster ?

Thanks in advance for your kind assistance
 
S

Shanmuhanathan T

Hello,

I have to analyse the content of file to find some specific words in
it.
First I was using the bufferedReader.readline() method to do read my
file but sometimes my file could be composed by only one line with a
size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
So I decided to use another method which is to read a line with a
maximum of characters read set to 204800 per line.

My problem is when I read the file it takes several minutes (about 30
minutes for a file that is 1.68Mb).

Here is the method I use to read the file SNIP
Is there a better solution/method do read a big file faster ?

Thanks in advance for your kind assistance

Hi,
If you are using JDK version >= 1.4, you could try with the
Native IO Methods (NIO).

Regards,
 
S

Skip

First I was using the bufferedReader.readline() method to do read my
file but sometimes my file could be composed by only one line with a
size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
So I decided to use another method which is to read a line with a
maximum of characters read set to 204800 per line.

Pleas do NOT read/write binary data with Readers/Writers. They are for text
only.
You could look at java.nio* in Java 1.4, yes, but java.io.* could be enough
in this case. Take a look at {Buffered | File}Input/OutputStreams.
 
M

Michael Borgwardt

aquafresh3 said:
I have to analyse the content of file to find some specific words in
it.
First I was using the bufferedReader.readline() method to do read my
file but sometimes my file could be composed by only one line with a
size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
So I decided to use another method which is to read a line with a
maximum of characters read set to 204800 per line.

My problem is when I read the file it takes several minutes (about 30
minutes for a file that is 1.68Mb).

Here is the method I use to read the file

Your method wastes a lot of time clinging to the illusion of
working with lines, which achieves nothing. It reads one character
at a time and doeas a lot of absolutely pointless work for each of
them. In the end, your size limit means that you are NOT working
with lines, so why pretend to?

Forget about lines. Forget about BufferedReader. Just use a FileReader
(or, if the encoding is an issue, an InputStreamReader wrapping a
FileInputStream). Use its read(char[]) method with a reasonably sized
char[] array acting as buffer (don't forget to look at the method's
return value - the buffern is not necessarily filled), and take
care that your word search won't miss words that span the boundaries
between two calls to read().
 
M

Michael Borgwardt

Skip said:
Pleas do NOT read/write binary data with Readers/Writers.

He's reading the data to look for "words" in it, so it can't be binary,
at least not all of it.
 
B

bugbear

Michael said:
Your method wastes a lot of time clinging to the illusion of
working with lines, which achieves nothing. It reads one character
at a time and doeas a lot of absolutely pointless work for each of
them. In the end, your size limit means that you are NOT working
with lines, so why pretend to?

Forget about lines. Forget about BufferedReader. Just use a FileReader
(or, if the encoding is an issue, an InputStreamReader wrapping a
FileInputStream). Use its read(char[]) method with a reasonably sized
char[] array acting as buffer (don't forget to look at the method's
return value - the buffern is not necessarily filled), and take
care that your word search won't miss words that span the boundaries
between two calls to read().

Agreed; to achive this, process all the words from the
buffer until a words ends at the end of the buffer.

This word *may* be partial.

Copy the partial word "tail" down to the base of your buffer,
and read some more. (note that read will read at an offset in the
buffer).

Repeat.

If you can do all this in 1 allocated buffer you'll also
do less new byte[] operations, which can't hurt.

BugBear
 
J

JScoobyCed

Michael said:
He's reading the data to look for "words" in it, so it can't be binary,
at least not all of it.

Well... not sure: he says he is reading a .rar file.
 
S

Steve Horsley

aquafresh3 said:
Hello,

I have to analyse the content of file to find some specific words in
it.
First I was using the bufferedReader.readline() method to do read my
file but sometimes my file could be composed by only one line with a
size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
So I decided to use another method which is to read a line with a
maximum of characters read set to 204800 per line.

My problem is when I read the file it takes several minutes (about 30
minutes for a file that is 1.68Mb).

For a start, try working in bytes rather than text. This removes
a lot of characterset conversion work, and discourages creation of
lots of String objects.

Use String.getBytes(chractersetname) to retrieve the byte sequences
you are searching for.

Use a bufferedInputStream to read the bytes into a sensible-sized
byte[] - maybe a few K. Then use a byte-pattern-matching routing to
scan for any of the desired byte sequences. After you have scanned
the buffer, copy the tail-end of the buffer (enough to capture any
search sequences that were incomplete) to the start, and fill up
with more data and search again.

I think this should go quite fast.

Steve
 
W

Will Hartung

aquafresh3 said:
Hello,

I have to analyse the content of file to find some specific words in
it.
First I was using the bufferedReader.readline() method to do read my
file but sometimes my file could be composed by only one line with a
size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
So I decided to use another method which is to read a line with a
maximum of characters read set to 204800 per line.

My problem is when I read the file it takes several minutes (about 30
minutes for a file that is 1.68Mb).

Here is the method I use to read the file

If you're working with text files, use Readers/Writers. If you're working
with binary files, Input/OutputStreams. If you're working with text and
happen to have intimate understanding of Unicode, then you can use Streams
and code that knowledge into your program. (I don't have intimate knowledge
of Unicode, so I let the Readers/Writers do the work for me).

If you KNOW that the file will ALWAYS fit in your heap space, then:

public char[] readIt(File f)
throws Exception
{
long size = f.length();
char[] buf = new char[size];
BufferedReader br = new BufferedReader(new FileReader(file));

br.read(buf, 0, size);
return buf;
}

Otherwise you have to page the file in through a smaller buffer, and deal
with that.

The NIO mapping functions MAY be faster, but probably not enough to worry
about unless you're dealing with EMORMOUS files. It would certainly be more
complicated.

Regards,

Will Hartung
([email protected])
 
R

Richard

aquafresh3 said:
Hello,

I have to analyse the content of file to find some specific words in
it.
First I was using the bufferedReader.readline() method to do read my
file but sometimes my file could be composed by only one line with a
size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
So I decided to use another method which is to read a line with a
maximum of characters read set to 204800 per line.

My problem is when I read the file it takes several minutes (about 30
minutes for a file that is 1.68Mb).

Here is the method I use to read the file

/**
* This method is used to read a line with a maximum of characters
read set to 204800 per line
* @param br The BufferedReader on which read
* @return finalLine The String representing the line read
* @throws IOException
*/
private String readLineWithMaxSize(BufferedReader br) throws
IOException {
String finalLine = null;
int readCharacter = -1;
char[] lineChars = new char[204800];
boolean bufferFull = false;
if (br != null) {
int index = 0;
readCharacter = br.read();
// If the read character does not correspond to a new line
or to
// an end of file, we treat it.
while (readCharacter != -1 && readCharacter != '\r' &&
readCharacter != '\n') {
// if the buffer is not full, we add the character to
the array of characters
if (!bufferFull) {
lineChars[index] = (char) readCharacter;
index++;
bufferFull = index >= lineChars.length;
}
readCharacter = br.read();
}
// If the read character is \r and the next one is \n, we
skip it.
if (readCharacter == '\r') {
br.mark(2);
int nextReadCharacter = br.read();
if (nextReadCharacter != '\n') {
br.reset();
}
}
// We construct a string representing the line from the
buffer of
// characters read
if (index != 0) {
finalLine = new String(lineChars);
} else if (readCharacter == '\r' || readCharacter == '\n')
{
finalLine = "";
}
}
return finalLine;
}

Is there a better solution/method do read a big file faster ?

Thanks in advance for your kind assistance

Here's how I did it
http://homepage.ntlworld.com/j.palethorpe/Programming/ITF/

The source code is in the jar file, it's a bit messy, but it works.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top