Re: Why is java.io.FileInputStream.readBytes my performance bottleneck

Discussion in 'Java' started by Roedy Green, Jul 21, 2003.

  1. Roedy Green

    Roedy Green Guest

    On 21 Jul 2003 11:16:09 -0700, (Harald Kirsch) wrote
    or quoted :

    >java.io.FileInputStream.readBytes
    >seems to be the bottleneck


    try a BufferedFileInputStream.
    see http://mindprod.com/fileio.html for sample code.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Jul 21, 2003
    #1
    1. Advertising

  2. Roedy Green <> wrote in message news:<>...
    > On 21 Jul 2003 11:16:09 -0700, (Harald Kirsch) wrote
    > or quoted :
    >
    > >java.io.FileInputStream.readBytes
    > >seems to be the bottleneck

    >
    > try a BufferedFileInputStream.
    > see http://mindprod.com/fileio.html for sample code.


    Well, yes and no. Actually what I am working with in the end
    is a Reader. I used BufferedFileInputStream and it changed nothing.
    Then I put a BufferedReader in the chain and the speedup is nice,
    but still not convincing. Funny enough, java.io.FileInputStream.readBytes
    is still blamed by -Xrunhprof:cpu=samples to consume most of the
    processor time.

    To investigate it a bit, I wrote this dummy test program:


    import java.io.*;

    public class Javabug {
    public static void main(String[] argv)
    throws java.io.IOException
    {
    InputStream is = System.in;
    BufferedInputStream bis = new BufferedInputStream(is);
    InputStreamReader isr = new InputStreamReader(bis);
    BufferedReader br = new BufferedReader(isr);

    // a reader with the stream unbuffered
    BufferedReader xbr = new BufferedReader(new InputStreamReader(is));

    int ch;
    int count = 0;
    while( -1!=(ch=br.read()) ) count += 1;
    }
    }

    By changing the last line, I played with different types of input
    and measured throughput with this command line:

    cpipe -vw </dev/zero |java Javabug

    cpipe (for counting pipe, see freshmeat.net) measures how long it takes
    to write output and prints statistics to stderr as shown below:

    1) reading from InputStreamReader isr (via BufferedInputStream)
    out: 34.307ms at 3.6MB/s ( 3.7MB/s avg) 100.0MB

    This means: after writing out 100MB the average output rate was
    3.7MB/s. (While the last buffer of 128k took 34.307ms equiv of 3.6MB/s).

    2) reading from BufferedReader br:
    out: 4.698ms at 26.6MB/s ( 25.2MB/s avg) 100.0MB

    3) reading from BufferedReader xbr (stream not buffered)
    out: 4.626ms at 27.0MB/s ( 25.3MB/s avg) 100.0MB


    This shows that buffering the reader helps, but buffering the input
    stream does not seem to help when using a reader. Finally, for a really
    unfair comparison, look at this:

    % cpipe -vw </dev/zero |cat >/dev/null
    out: 0.233ms at 536.5MB/s ( 524.4MB/s avg) 100.0MB

    Of course 'cat' does not have to deal with character encoding.
    Well then, lets compare with buffered reading right
    from System.in. I understand no character encoding happens then, right?

    4) reading from bis:
    out: 3.666ms at 34.1MB/s ( 32.0MB/s avg) 100.0MB

    Compared with 'cat' this is still very bad, i.e. a factor of 16 slower.

    Nevertheless, I will change from the 1.4.2-beta version to the real thing
    and see what happens.

    Harald.
     
    Harald Kirsch, Jul 22, 2003
    #2
    1. Advertising

  3. (Harald Kirsch), Tue, 22 Jul 2003 03:38:20 -0700:

    > Roedy Green <> wrote in message news:<>...
    >> On 21 Jul 2003 11:16:09 -0700, (Harald Kirsch) wrote
    >> or quoted :


    > To investigate it a bit, I wrote this dummy test program:
    >
    >
    > import java.io.*;
    >
    > public class Javabug {
    > public static void main(String[] argv)
    > throws java.io.IOException
    > {
    > InputStream is = System.in;
    > BufferedInputStream bis = new BufferedInputStream(is);
    > InputStreamReader isr = new InputStreamReader(bis);
    > BufferedReader br = new BufferedReader(isr);
    >
    > // a reader with the stream unbuffered
    > BufferedReader xbr = new BufferedReader(new InputStreamReader(is));
    >
    > int ch;
    > int count = 0;
    > while( -1!=(ch=br.read()) ) count += 1;
    > }
    > }
    >
    > By changing the last line, I played with different types of input
    > and measured throughput with this command line:
    >
    > cpipe -vw </dev/zero |java Javabug
    >
    > cpipe (for counting pipe, see freshmeat.net) measures how long it takes
    > to write output and prints statistics to stderr as shown below:
    >
    > 1) reading from InputStreamReader isr (via BufferedInputStream)
    > out: 34.307ms at 3.6MB/s ( 3.7MB/s avg) 100.0MB
    >
    > This means: after writing out 100MB the average output rate was
    > 3.7MB/s. (While the last buffer of 128k took 34.307ms equiv of 3.6MB/s).
    >
    > 2) reading from BufferedReader br:
    > out: 4.698ms at 26.6MB/s ( 25.2MB/s avg) 100.0MB
    >
    > 3) reading from BufferedReader xbr (stream not buffered)
    > out: 4.626ms at 27.0MB/s ( 25.3MB/s avg) 100.0MB
    >
    >
    > This shows that buffering the reader helps, but buffering the input
    > stream does not seem to help when using a reader. Finally, for a really
    > unfair comparison, look at this:
    >


    [---snip--]

    Is this test program valid? System.in usually blocks until a newline is
    entered (that's what I noticed); did you put the terminal into raw mode
    before executing the test?

    Greets
    Bhun.
     
    dhek bhun kho, Jul 22, 2003
    #3
  4. On Tue, 22 Jul 2003 15:42:35 GMT, dhek bhun kho wrote:
    > Is this test program valid? System.in usually blocks until a newline
    > is entered (that's what I noticed); did you put the terminal into
    > raw mode before executing the test?


    System.in doesn't wait for newline, it's the terminal (or console, or
    whatever) that waits before passing any input to the process.

    If the process' stdin isn't connected to a terminal (such as in his
    example), there is no line-based buffering involved. The terminal mode
    has no bearing on this case.

    /gordon

    --
    [ do not send me private copies of your followups ]
    g o r d o n . b e a t o n @ e r i c s s o n . c o m
     
    Gordon Beaton, Jul 22, 2003
    #4
  5. Harald Kirsch:

    > InputStream is = System.in;
    > BufferedInputStream bis = new BufferedInputStream(is);


    Try FileReader reader = new FileReader(FileDescriptor.in); instead of
    System.in (another version of getting standard input). IIRC I got a
    speed-up from that once. Depending on operating system, Java version
    and implementation, that may be different for you, though.

    Regards,
    Marco
    --
    Please reply in the newsgroup, not by email!
    Java programming tips: http://jiu.sourceforge.net/javatips.html
    Other Java pages: http://www.geocities.com/marcoschmidt.geo/java.html
     
    Marco Schmidt, Jul 23, 2003
    #5
  6. Gordon Beaton <>, Tue, 22 Jul 2003 16:43:03 +0000:

    > On Tue, 22 Jul 2003 15:42:35 GMT, dhek bhun kho wrote:
    >> Is this test program valid? System.in usually blocks until a newline
    >> is entered (that's what I noticed); did you put the terminal into
    >> raw mode before executing the test?

    > System.in doesn't wait for newline, it's the terminal (or console, or
    > whatever) that waits before passing any input to the process.
    >
    > If the process' stdin isn't connected to a terminal (such as in his
    > example), there is no line-based buffering involved. The terminal mode
    > has no bearing on this case.
    >
    > /gordon


    OOps. :) Thanks. Learnt something new.
     
    dhek bhun kho, Jul 23, 2003
    #6
  7. On Wed, 23 Jul 2003 12:11:44 -0700, Harald Kirsch wrote:

    > (Harald Kirsch) wrote:
    >> % cpipe -vw </dev/zero |cat >/dev/null
    >> out: 0.233ms at 536.5MB/s ( 524.4MB/s avg) 100.0MB
    >>
    >> Of course 'cat' does not have to deal with character encoding.
    >> Well then, lets compare with buffered reading right
    >> from System.in. I understand no character encoding happens then, right?
    >>
    >> 4) reading from bis:
    >> out: 3.666ms at 34.1MB/s ( 32.0MB/s avg) 100.0MB
    >>
    >> Compared with 'cat' this is still very bad, i.e. a factor of 16 slower.

    >
    > Nobody pointed out that 'cat' is really the wrong thing to compare to.
    > But now I have the real two things to compare against each other:
    >
    > A) JAVA
    > import java.io.InputStream;
    > public class Jbug {
    > public static void main(String[] argv) throws java.io.IOException {
    > int count = 0;
    > while( -1!=(System.in.read()) ) count += 1;
    > }
    > }
    >
    > B) plain old C
    > #include <stdio.h>
    > int
    > main(int argc, char **argv) {
    > int count = 0;
    > while( EOF!=getchar() ) count += 1;
    > return 0;
    > }
    >
    >
    > Now the surprising part:
    >
    > JAVA:
    > % head -c `expr 1024 \* 1024 \* 400` /dev/zero |cpipe -vw -b 1024 | java Jbug
    > out: 29.227ms at 34.2MB/s ( 33.9MB/s avg) 400.0MB
    >
    > plain old C after compilation with "cc -O2 -W -Wall -ansi Jbug.c -o Jbug":
    > % head -c `expr 1024 \* 1024 \* 400` /dev/zero |cpipe -vw -b 1024|./Jbug
    > out: 34.247ms at 29.2MB/s ( 27.7MB/s avg) 400.0MB
    >
    > Zap ... Ouuch! Java is 22% *faster* here. I am impressed/puzzled.
    >
    > Harald.


    I guess java has the better optimisation in its compiler. I also guess
    that a lot of the time taken up by the java version is VM startup time.
    If I'm right, then I predict 2 things:

    * Java will look even better against larger files.

    * Java will slow down if you bother to print count at the end, because
    counting can no longer be optimised away.

    Steve
     
    Steve Horsley, Jul 23, 2003
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Glenn

    Performance Bottleneck in ASP.NET

    Glenn, Jan 8, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    575
    Glenn.
    Jan 8, 2004
  2. David Zimmerman
    Replies:
    1
    Views:
    2,117
    Harald Kirsch
    Jul 22, 2003
  3. Krick
    Replies:
    2
    Views:
    14,258
    Marco Schmidt
    Aug 28, 2003
  4. JLM
    Replies:
    2
    Views:
    1,465
  5. Mr. SweatyFinger
    Replies:
    2
    Views:
    1,984
    Smokey Grindel
    Dec 2, 2006
Loading...

Share This Page