Try to read a file faster

Discussion in 'Java' started by aquafresh3, Sep 29, 2004.

  1. aquafresh3

    aquafresh3 Guest

    Hello,

    I have to analyse the content of file to find some specific words in
    it.
    First I was using the bufferedReader.readline() method to do read my
    file but sometimes my file could be composed by only one line with a
    size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
    So I decided to use another method which is to read a line with a
    maximum of characters read set to 204800 per line.

    My problem is when I read the file it takes several minutes (about 30
    minutes for a file that is 1.68Mb).

    Here is the method I use to read the file

    /**
    * This method is used to read a line with a maximum of characters
    read set to 204800 per line
    * @param br The BufferedReader on which read
    * @return finalLine The String representing the line read
    * @throws IOException
    */
    private String readLineWithMaxSize(BufferedReader br) throws
    IOException {
    String finalLine = null;
    int readCharacter = -1;
    char[] lineChars = new char[204800];
    boolean bufferFull = false;
    if (br != null) {
    int index = 0;
    readCharacter = br.read();
    // If the read character does not correspond to a new line
    or to
    // an end of file, we treat it.
    while (readCharacter != -1 && readCharacter != '\r' &&
    readCharacter != '\n') {
    // if the buffer is not full, we add the character to
    the array of characters
    if (!bufferFull) {
    lineChars[index] = (char) readCharacter;
    index++;
    bufferFull = index >= lineChars.length;
    }
    readCharacter = br.read();
    }
    // If the read character is \r and the next one is \n, we
    skip it.
    if (readCharacter == '\r') {
    br.mark(2);
    int nextReadCharacter = br.read();
    if (nextReadCharacter != '\n') {
    br.reset();
    }
    }
    // We construct a string representing the line from the
    buffer of
    // characters read
    if (index != 0) {
    finalLine = new String(lineChars);
    } else if (readCharacter == '\r' || readCharacter == '\n')
    {
    finalLine = "";
    }
    }
    return finalLine;
    }

    Is there a better solution/method do read a big file faster ?

    Thanks in advance for your kind assistance
    aquafresh3, Sep 29, 2004
    #1
    1. Advertising

  2. on 9/29/2004 1:53 PM aquafresh3 Wrote:
    > Hello,
    >
    > I have to analyse the content of file to find some specific words in
    > it.
    > First I was using the bufferedReader.readline() method to do read my
    > file but sometimes my file could be composed by only one line with a
    > size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
    > So I decided to use another method which is to read a line with a
    > maximum of characters read set to 204800 per line.
    >
    > My problem is when I read the file it takes several minutes (about 30
    > minutes for a file that is 1.68Mb).
    >
    > Here is the method I use to read the file

    SNIP
    > Is there a better solution/method do read a big file faster ?
    >
    > Thanks in advance for your kind assistance


    Hi,
    If you are using JDK version >= 1.4, you could try with the
    Native IO Methods (NIO).

    Regards,
    --
    Shanmu.
    Shanmuhanathan T, Sep 29, 2004
    #2
    1. Advertising

  3. aquafresh3

    Skip Guest

    > First I was using the bufferedReader.readline() method to do read my
    > file but sometimes my file could be composed by only one line with a
    > size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
    > So I decided to use another method which is to read a line with a
    > maximum of characters read set to 204800 per line.


    Pleas do NOT read/write binary data with Readers/Writers. They are for text
    only.
    You could look at java.nio* in Java 1.4, yes, but java.io.* could be enough
    in this case. Take a look at {Buffered | File}Input/OutputStreams.
    Skip, Sep 29, 2004
    #3
  4. aquafresh3 wrote:
    > I have to analyse the content of file to find some specific words in
    > it.
    > First I was using the bufferedReader.readline() method to do read my
    > file but sometimes my file could be composed by only one line with a
    > size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
    > So I decided to use another method which is to read a line with a
    > maximum of characters read set to 204800 per line.
    >
    > My problem is when I read the file it takes several minutes (about 30
    > minutes for a file that is 1.68Mb).
    >
    > Here is the method I use to read the file


    Your method wastes a lot of time clinging to the illusion of
    working with lines, which achieves nothing. It reads one character
    at a time and doeas a lot of absolutely pointless work for each of
    them. In the end, your size limit means that you are NOT working
    with lines, so why pretend to?

    Forget about lines. Forget about BufferedReader. Just use a FileReader
    (or, if the encoding is an issue, an InputStreamReader wrapping a
    FileInputStream). Use its read(char[]) method with a reasonably sized
    char[] array acting as buffer (don't forget to look at the method's
    return value - the buffern is not necessarily filled), and take
    care that your word search won't miss words that span the boundaries
    between two calls to read().
    Michael Borgwardt, Sep 29, 2004
    #4
  5. Skip wrote:

    >>First I was using the bufferedReader.readline() method to do read my
    >>file but sometimes my file could be composed by only one line with a
    >>size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
    >>So I decided to use another method which is to read a line with a
    >>maximum of characters read set to 204800 per line.

    >
    >
    > Pleas do NOT read/write binary data with Readers/Writers.


    He's reading the data to look for "words" in it, so it can't be binary,
    at least not all of it.
    Michael Borgwardt, Sep 29, 2004
    #5
  6. aquafresh3

    bugbear Guest

    Michael Borgwardt wrote:

    >
    > Your method wastes a lot of time clinging to the illusion of
    > working with lines, which achieves nothing. It reads one character
    > at a time and doeas a lot of absolutely pointless work for each of
    > them. In the end, your size limit means that you are NOT working
    > with lines, so why pretend to?
    >
    > Forget about lines. Forget about BufferedReader. Just use a FileReader
    > (or, if the encoding is an issue, an InputStreamReader wrapping a
    > FileInputStream). Use its read(char[]) method with a reasonably sized
    > char[] array acting as buffer (don't forget to look at the method's
    > return value - the buffern is not necessarily filled), and take
    > care that your word search won't miss words that span the boundaries
    > between two calls to read().


    Agreed; to achive this, process all the words from the
    buffer until a words ends at the end of the buffer.

    This word *may* be partial.

    Copy the partial word "tail" down to the base of your buffer,
    and read some more. (note that read will read at an offset in the
    buffer).

    Repeat.

    If you can do all this in 1 allocated buffer you'll also
    do less new byte[] operations, which can't hurt.

    BugBear
    bugbear, Sep 29, 2004
    #6
  7. aquafresh3

    JScoobyCed Guest

    Michael Borgwardt wrote:

    > Skip wrote:
    >
    >>> First I was using the bufferedReader.readline() method to do read my
    >>> file but sometimes my file could be composed by only one line with a
    >>> size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
    >>> So I decided to use another method which is to read a line with a
    >>> maximum of characters read set to 204800 per line.

    >>
    >>
    >>
    >> Pleas do NOT read/write binary data with Readers/Writers.

    >
    >
    > He's reading the data to look for "words" in it, so it can't be binary,
    > at least not all of it.


    Well... not sure: he says he is reading a .rar file.

    --
    JScoobyCed
    What about a JScooby snack Shaggy ? ... Shaggy ?!
    JScoobyCed, Sep 29, 2004
    #7
  8. aquafresh3

    Sudsy Guest

    Shanmuhanathan T wrote:
    <snip>
    > If you are using JDK version >= 1.4, you could try with the
    > Native IO Methods (NIO).


    NIO stands for New I/O, not Native I/O. Documentation can be found here:
    <http://java.sun.com/j2se/1.4.2/docs/guide/nio/>
    Sudsy, Sep 29, 2004
    #8
  9. on 9/29/2004 6:19 PM Sudsy Wrote:
    > Shanmuhanathan T wrote:
    > <snip>
    >
    >> If you are using JDK version >= 1.4, you could try with the
    >> Native IO Methods (NIO).

    >
    >
    > NIO stands for New I/O, not Native I/O. Documentation can be found here:
    > <http://java.sun.com/j2se/1.4.2/docs/guide/nio/>
    >

    Thanks Sudsy.
    Must have been confused when I posted that.
    Regards,
    --
    Shanmu.
    Shanmuhanathan T, Sep 29, 2004
    #9
  10. aquafresh3 wrote:
    > Hello,
    >
    > I have to analyse the content of file to find some specific words in
    > it.
    > First I was using the bufferedReader.readline() method to do read my
    > file but sometimes my file could be composed by only one line with a
    > size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
    > So I decided to use another method which is to read a line with a
    > maximum of characters read set to 204800 per line.
    >
    > My problem is when I read the file it takes several minutes (about 30
    > minutes for a file that is 1.68Mb).
    >


    For a start, try working in bytes rather than text. This removes
    a lot of characterset conversion work, and discourages creation of
    lots of String objects.

    Use String.getBytes(chractersetname) to retrieve the byte sequences
    you are searching for.

    Use a bufferedInputStream to read the bytes into a sensible-sized
    byte[] - maybe a few K. Then use a byte-pattern-matching routing to
    scan for any of the desired byte sequences. After you have scanned
    the buffer, copy the tail-end of the buffer (enough to capture any
    search sequences that were incomplete) to the start, and fill up
    with more data and search again.

    I think this should go quite fast.

    Steve
    Steve Horsley, Sep 29, 2004
    #10
  11. aquafresh3

    Will Hartung Guest

    "aquafresh3" <> wrote in message
    news:...
    > Hello,
    >
    > I have to analyse the content of file to find some specific words in
    > it.
    > First I was using the bufferedReader.readline() method to do read my
    > file but sometimes my file could be composed by only one line with a
    > size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
    > So I decided to use another method which is to read a line with a
    > maximum of characters read set to 204800 per line.
    >
    > My problem is when I read the file it takes several minutes (about 30
    > minutes for a file that is 1.68Mb).
    >
    > Here is the method I use to read the file


    If you're working with text files, use Readers/Writers. If you're working
    with binary files, Input/OutputStreams. If you're working with text and
    happen to have intimate understanding of Unicode, then you can use Streams
    and code that knowledge into your program. (I don't have intimate knowledge
    of Unicode, so I let the Readers/Writers do the work for me).

    If you KNOW that the file will ALWAYS fit in your heap space, then:

    public char[] readIt(File f)
    throws Exception
    {
    long size = f.length();
    char[] buf = new char[size];
    BufferedReader br = new BufferedReader(new FileReader(file));

    br.read(buf, 0, size);
    return buf;
    }

    Otherwise you have to page the file in through a smaller buffer, and deal
    with that.

    The NIO mapping functions MAY be faster, but probably not enough to worry
    about unless you're dealing with EMORMOUS files. It would certainly be more
    complicated.

    Regards,

    Will Hartung
    ()
    Will Hartung, Sep 29, 2004
    #11
  12. aquafresh3

    Richard Guest

    aquafresh3 wrote:
    > Hello,
    >
    > I have to analyse the content of file to find some specific words in
    > it.
    > First I was using the bufferedReader.readline() method to do read my
    > file but sometimes my file could be composed by only one line with a
    > size of 30Mb (e.g rar file) so it resulted in a outOfMemory error.
    > So I decided to use another method which is to read a line with a
    > maximum of characters read set to 204800 per line.
    >
    > My problem is when I read the file it takes several minutes (about 30
    > minutes for a file that is 1.68Mb).
    >
    > Here is the method I use to read the file
    >
    > /**
    > * This method is used to read a line with a maximum of characters
    > read set to 204800 per line
    > * @param br The BufferedReader on which read
    > * @return finalLine The String representing the line read
    > * @throws IOException
    > */
    > private String readLineWithMaxSize(BufferedReader br) throws
    > IOException {
    > String finalLine = null;
    > int readCharacter = -1;
    > char[] lineChars = new char[204800];
    > boolean bufferFull = false;
    > if (br != null) {
    > int index = 0;
    > readCharacter = br.read();
    > // If the read character does not correspond to a new line
    > or to
    > // an end of file, we treat it.
    > while (readCharacter != -1 && readCharacter != '\r' &&
    > readCharacter != '\n') {
    > // if the buffer is not full, we add the character to
    > the array of characters
    > if (!bufferFull) {
    > lineChars[index] = (char) readCharacter;
    > index++;
    > bufferFull = index >= lineChars.length;
    > }
    > readCharacter = br.read();
    > }
    > // If the read character is \r and the next one is \n, we
    > skip it.
    > if (readCharacter == '\r') {
    > br.mark(2);
    > int nextReadCharacter = br.read();
    > if (nextReadCharacter != '\n') {
    > br.reset();
    > }
    > }
    > // We construct a string representing the line from the
    > buffer of
    > // characters read
    > if (index != 0) {
    > finalLine = new String(lineChars);
    > } else if (readCharacter == '\r' || readCharacter == '\n')
    > {
    > finalLine = "";
    > }
    > }
    > return finalLine;
    > }
    >
    > Is there a better solution/method do read a big file faster ?
    >
    > Thanks in advance for your kind assistance


    Here's how I did it
    http://homepage.ntlworld.com/j.palethorpe/Programming/ITF/

    The source code is in the jar file, it's a bit messy, but it works.
    Richard, Sep 30, 2004
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. bienwell
    Replies:
    4
    Views:
    3,735
    bienwell
    May 27, 2005
  2. John Salerno
    Replies:
    20
    Views:
    842
    John Salerno
    Aug 11, 2006
  3. Fabio Z Tessitore

    who is simpler? try/except/else or try/except

    Fabio Z Tessitore, Aug 12, 2007, in forum: Python
    Replies:
    5
    Views:
    367
  4. =?Utf-8?B?QUo=?=

    When to TRY and not to TRY

    =?Utf-8?B?QUo=?=, Oct 16, 2007, in forum: ASP .Net
    Replies:
    2
    Views:
    324
    sloan
    Oct 17, 2007
  5. MRAB
    Replies:
    4
    Views:
    306
Loading...

Share This Page