Why is Java so slow????

  • Thread starter Java Performance Export
  • Start date

J

Java Performance Export

Hi all,

I'm wondering if anyone can help me understand why my Java is being
very slow compared to an equivalent program written in "C".

I'm simply trying to print out the first N integers like

"This is line <nnnn>"

as a simple benchmark.

My Java version is over 60 times slower than my "C" version and I
would like to establish a lower bound on how long the very fastest
Java version could take, by applying every possible performance
speedup availalbe in the Java environment.

I've profiled with "-Xrunhprof" and looked at the output (below) and
was surprised by what I saw. Over 50 different methods are involved
before I arrive at the point where 80% of the cumulative CPU usage for
the run is accounted for! What the heck is this stuff?????

Is this really happening, and is there a way to get around it?

My client is threatening to implement in "C" and I am trying to talk
him out of it.

I'd be very curious to see how this equivalent benchmark peforms on
others' environments.

Thanks,

Larry



MY ENVIRONMENT

Machine: Intel Core 2 Duo 2 GHz processor, 8GB of ram
O/S: Linux kernel V. 2.6.18-8.1.14.el5, x86_64 architecture

Java: >> java -version
java version "1.6.0_03"
Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
Java HotSpot(TM) Server VM (build 1.6.0_03-b05, mixed mode)



MY BENCHMARK


This "C" program takes about 11.5 seconds to print 50,000,000 lines or
4.3 million lines / sec... output is to /dev/null

Compiled with:
gcc -O4 -o t t.c

======================== t.c
#include <stdio.h>
int main(int argc, char ** argv)
{
int lim = atoi(argv[1]);
int i;
for (i=1; i <= lim; i++)
printf("this is line %d\n", i);
}
======================== t.c






This Java class takes about 15.6 seconds to print 5,000,000 lines or
63,700 lines / sec ... 67 times slower!

======================= t.java
class t
{
static public void main(String[] argv)
{
int lim = new Integer(argv[0]);
int i;
for (i=0; i < lim; i++)
System.out.println("This is line " + i);
}

}
======================= t.java






Here is the result of running with
-Xrunhprof:cpu=times
and # interations argument of 10,000

============== java.hprof.txt


JAVA PROFILE 1.0.1, created Mon Nov 19 13:40:52 2007
Header for -agentlib:hprof (or -Xrunhprof) ASCII Output (JDK 5.0 JVMTI
based)
....
....
rank self accum count trace method
1 10.64% 10.64% 20000 300932 sun.nio.cs.US_ASCII
$Encoder.encodeArrayLoop
2 5.59% 16.23% 20000 300958 sun.nio.cs.StreamEncoder.writeBytes
3 4.22% 20.45% 20000 300939 sun.nio.cs.StreamEncoder.implWrite
4 3.00% 23.45% 20000 300936
java.nio.charset.CharsetEncoder.encode
5 2.97% 26.42% 20000 300956 java.io.PrintStream.write
6 2.84% 29.26% 20000 300933 sun.nio.cs.US_ASCII
$Encoder.encodeLoop
7 2.23% 31.49% 10000 300985 java.io.PrintStream.newLine
8 2.15% 33.64% 20000 300955 java.io.BufferedOutputStream.flush
9 2.10% 35.74% 10000 300964 java.io.PrintStream.write
10 2.01% 37.75% 20013 300087 java.nio.Buffer.<init>
11 1.96% 39.71% 1 301017 t.main
12 1.70% 41.40% 60027 300305 java.nio.ByteBuffer.arrayOffset
13 1.61% 43.01% 60027 300301 java.nio.CharBuffer.arrayOffset
14 1.51% 44.53% 10000 300922 java.io.BufferedWriter.write
15 1.41% 45.94% 10000 300912
java.lang.AbstractStringBuilder.append
16 1.37% 47.31% 10000 300971 java.io.BufferedWriter.write
17 1.35% 48.66% 20000 300926 java.nio.CharBuffer.<init>
18 1.34% 50.00% 20000 300928 java.nio.CharBuffer.wrap
19 1.29% 51.29% 20000 300952 java.io.FileOutputStream.write
20 1.26% 52.55% 20000 300927 java.nio.HeapCharBuffer.<init>
21 1.25% 53.80% 20000 300953
java.io.BufferedOutputStream.flushBuffer
22 1.23% 55.03% 10000 300960 sun.nio.cs.StreamEncoder.flushBuffer
23 1.17% 56.20% 10000 300986 java.io.PrintStream.println
24 1.13% 57.33% 40018 300306 java.nio.Buffer.position
25 1.13% 58.46% 10000 300977 java.io.BufferedWriter.flushBuffer
26 1.12% 59.58% 10000 300923 java.io.Writer.write
27 1.08% 60.66% 40018 300303 java.nio.Buffer.limit
28 1.07% 61.73% 40018 300302 java.nio.Buffer.position
29 1.06% 62.80% 10000 300979
sun.nio.cs.StreamEncoder.implFlushBuffer
30 1.06% 63.85% 10000 300942 java.io.BufferedWriter.flushBuffer
31 1.03% 64.88% 10000 300972 java.io.Writer.write
32 1.00% 65.88% 10000 300980 sun.nio.cs.StreamEncoder.flushBuffer
33 0.98% 66.87% 10000 300959
sun.nio.cs.StreamEncoder.implFlushBuffer
34 0.97% 67.84% 10000 300908
java.lang.AbstractStringBuilder.append
35 0.97% 68.81% 10000 300975 sun.nio.cs.StreamEncoder.write
36 0.95% 69.76% 10000 300984 java.io.BufferedOutputStream.flush
37 0.94% 70.69% 10000 300940 sun.nio.cs.StreamEncoder.write
38 0.87% 71.56% 10000 300941 java.io_OutputStreamWriter.write
39 0.75% 72.32% 10000 300914 java.util.Arrays.copyOfRange
40 0.72% 73.03% 9000 300988 java.util.Arrays.copyOf
41 0.71% 73.74% 10000 300915 java.lang.String.<init>
42 0.68% 74.42% 10000 300905 java.lang.StringBuilder.<init>
43 0.66% 75.08% 10000 300963 java.lang.String.indexOf
44 0.66% 75.74% 10000 300981
java.io_OutputStreamWriter.flushBuffer
45 0.65% 76.39% 10000 300976 java.io_OutputStreamWriter.write
46 0.63% 77.03% 10000 300909 java.lang.StringBuilder.append
47 0.63% 77.66% 10000 300916 java.lang.StringBuilder.toString
48 0.63% 78.29% 20000 300947 java.nio.Buffer.position
49 0.63% 78.93% 10000 300961
java.io_OutputStreamWriter.flushBuffer
50 0.61% 79.54% 20000 300930 java.nio.CharBuffer.hasArray
51 0.61% 80.15% 10000 300913 java.lang.StringBuilder.append
....
100 0.02% 99.51% 4 300254 sun.net.www.ParseUtil.decode
101 0.02% 99.53% 4 300257 java.io.UnixFileSystem.normalize
102 0.02% 99.55% 1 300619
sun.net.www.protocol.file.Handler.createFileURLConnection
103 0.02% 99.57% 2 300717 java.io.FilePermission$1.run
CPU TIME (ms) END
 
Ad

Advertisements

M

Mark Thornton

Java said:
Hi all,

I'm wondering if anyone can help me understand why my Java is being
very slow compared to an equivalent program written in "C".

I'm simply trying to print out the first N integers like

"This is line <nnnn>"

as a simple benchmark.

Your Java example flushes the output buffer on every line while the C
example does not. However even if you fixed that difference it is hard
to see what useful conclusion could be drawn from such a benchmark.

Mark Thornton
 
M

Mike Schilling

message
Hi all,

I'm wondering if anyone can help me understand why my Java is being
very slow compared to an equivalent program written in "C".

I'm simply trying to print out the first N integers like

"This is line <nnnn>"

as a simple benchmark.

Your program does nothing besides convert integers to strings and write to
standard output. Is this a good simulation of what your application will
do? If not, the benchmark tells you nothing useful.
 
R

Ramon F Herrera

Hi all,

I'm wondering if anyone can help me understand why my Java is being
very slow compared to an equivalent program written in "C".

Larry:

I am as much of a Java fan and advocate as the next Java developer,
but let's face it: Java is slower than native code. It has to be.

Your short answer is: that's the price we have to pay for portability,
interoperability and freedom.

-Ramon
 
D

Daniel Pitts

Ramon said:
Larry:

I am as much of a Java fan and advocate as the next Java developer,
but let's face it: Java is slower than native code. It has to be.

Your short answer is: that's the price we have to pay for portability,
interoperability and freedom.

-Ramon
60 times slower seems like an error in the benchmark though, not a
fundamental issue with Java performance.

My suggestion would be to write a program that counts the number of
primes between 1 and 1000000 (or something like)
Don't forget, there is a start-up cost with Java, so you should start
your timer before the work loop, not before the program starts.
 
D

Daniel Dyer

This "C" program takes about 11.5 seconds to print 50,000,000 lines or
4.3 million lines / sec... output is to /dev/null

Compiled with:
gcc -O4 -o t t.c

======================== t.c
#include <stdio.h>
int main(int argc, char ** argv)
{
int lim = atoi(argv[1]);
int i;
for (i=1; i <= lim; i++)
printf("this is line %d\n", i);
}
======================== t.c


This Java class takes about 15.6 seconds to print 5,000,000 lines or
63,700 lines / sec ... 67 times slower!

======================= t.java
class t
{
static public void main(String[] argv)
{
int lim = new Integer(argv[0]);
int i;
for (i=0; i < lim; i++)
System.out.println("This is line " + i);
}

}
======================= t.java

Try the Java printf so that you are at least comparing something similar
(it should avoid a lot of the String allocations):

<http://java.sun.com/j2se/1.5.0/docs/api/java/io/PrintStream.html#printf(java.lang.String,
java.lang.Object...)>

Also, try the -server switch on the JVM. It may well inline a lot of
those method calls that you are seeing.

Finally, disregard your results until you have considered the following
Java micro-benchmarking advice posted here by Chris Uppal a while back:

<http://groups.google.com/group/comp...read/79219945c16fa272/7e96da29cca14efc?lnk=st>

Dan.
 
Ad

Advertisements

L

Lew

Daniel said:
60 times slower seems like an error in the benchmark though, not a
fundamental issue with Java performance.

My suggestion would be to write a program that counts the number of
primes between 1 and 1000000 (or something like)
Don't forget, there is a start-up cost with Java, so you should start
your timer before the work loop, not before the program starts.

Java is not always slower than "native" code. For one thing, significant
amounts of Java programs run as native code. For another, the JVMs have
gotten *very* smart about optimization.

Java takes a long time to start up because you have to load the JVM. As
Daniel pointed out, a valid benchmark will kick off its timing after the
program and all the library classes load.

Java is not designed for quick little utilities. Its greatest strength is in
full-scale programs that run for a while. Under those conditions, some
benchmarks run faster in Java than in statically-compiled languages.

To give a blanket assertion that Java "has to be" slower than "native" code
betrays a lack of knowledge of JVM techniques and current benchmarks.

Even when Java tests slower than C++ (say), as it often does, it's by a factor
of about 2 or 3 to 1, not 60 to 1. Again, this is after the program and JVM
have loaded. There's no denying that load time for the JVM (on most PCs) is a
large overhead for Java programs.
 
J

Java Performance Expert

On Mon, 19 Nov 2007 19:28:17 -0000, Java Performance Export



Try the Java printf so that you are at least comparing something similar
(it should avoid a lot of the String allocations):


tried it, make it run in over 40 secs instead of 15 seconds.
Also, try the -server switch on the JVM. It may well inline a lot of
those method calls that you are seeing.

had no effect.
Finally, disregard your results until you have considered the following
Java micro-benchmarking advice posted here by Chris Uppal a while back:

<http://groups.google.com/group/comp.lang.java.programmer/browse_frm/t...>

Followed these instructions, moving the "loop" into a separate method,
then called that a number of times. no improvement. Rewritten test
class below. I presume putting the meat of it in a separate method
allows the JIT to recognize it as a bottlneck and use inlining or
native compliation on subsequent invocations? But it did not do
so even after 100 invocations. Can't I just ask Java to compile
the very first one?

W/ the revised class I am also still seeing over 50 methods calling
each other before 80% of the CPU time is accounted for (see origional
post). I would have expected there to be some very small handful of
places where most of the work was done.

Mostly what i am wondering is what the heck are these methods
and why is all this necessary, or is it? and how can I get
this simple program to run as quickly as possible in Java.

Also, I saw comments this is "not a valid benchmark." Actually
this script is being used to pipe test input to a database processing
stream, and it was thought the component of the pipe that simply emits
the test data would be so negligible as to be ignored. However the
factor of ~ 70 slowdown w/ Java version is causing that to be an
issue.

The question I don't want to have to answer is "Why, again, can you
not make this program run fast?"

Thanks,

Larry
 
M

Mark Space

Java said:
Also, I saw comments this is "not a valid benchmark." Actually
this script is being used to pipe test input to a database processing
stream, and it was thought the component of the pipe that simply emits
the test data would be so negligible as to be ignored. However the
factor of ~ 70 slowdown w/ Java version is causing that to be an
issue.

Did you try not flushing the stream, as indicated above?
 
A

Andreas Leitgeb

Mark Space said:
Did you try not flushing the stream, as indicated above?

If that prog was indeed not just a benchmark, but *the* program
as he needs it, then using the C version seems like a reasonable
answer. Of course, this answer shouldn't be misapplied to other
domains.
Always use the right tool for a job. For that trivial prog, C is
the right tool.
 
Ad

Advertisements

J

Java Performance Expert

On Nov 19, 2:57 pm, "Daniel Dyer" <"You don't need it"> wrote:


Followed these instructions, moving the "loop" into a separate method,
then called that a number of times. no improvement. Rewritten test
class below.

Oops forgot to paste in the class. Also below is the output.


It was pointed out:


Not sure if my test data generator is a "quick utility" It is part of
a larger system, which is a full-scale program. What I would like is
to
make a full-scale program that DOESN'T run for a long while; hope that
makes sense.

I suspect what is happening is that I am not gaining the benefits
of JIT native compilation. Is there a way to simply tell the Java
environment to compile it from the get-go? It seem if I have to wait
'til my method becomes a bottlneck, then my compilation is not
"just in time" -- it is by definition too late.

Am I missing something obvious here?

Thx

Larry



=========================================================================

import java.util.Date;

class t1
{
static public void main(String[] argv)
{
int lim = new Integer(argv[0]);
int nbench = new Integer(argv[1]);
int b;
for (b=0; b < nbench; b++) {
System.err.println("Bench " + b);
Date start = new Date();
mytest(lim);
Date now = new Date();
System.err.println("Took " + ((now.getTime() -
start.getTime())/1000) + " seconds");
}
}

static public void mytest(int lim)
{
int i;
for (i=0; i < lim; i++)
System.out.println("This is line " + i);
}
}

=========================================================================






Bench 0
Took 15 seconds
Bench 1
Took 15 seconds
Bench 2
Took 15 seconds
Bench 3
Took 15 seconds
Bench 4
Took 15 seconds
Bench 5
Took 15 seconds
Bench 6
Took 15 seconds
Bench 7
Took 15 seconds
Bench 8
Took 15 seconds
Bench 9
Took 15 seconds
 
D

Daniel Dyer

tried it, make it run in over 40 secs instead of 15 seconds.

Yep, my mistake. I should have realised it would create a new formatter
for every call. I guess the other option is to create and reuse your own
message format instance. That is if the String allocations are a
significant part of the overhead.
had no effect.

None at all? I would expect it to make some difference, even if it made
it worse.
Followed these instructions, moving the "loop" into a separate method,
then called that a number of times. no improvement. Rewritten test
class below. I presume putting the meat of it in a separate method
allows the JIT to recognize it as a bottlneck and use inlining or
native compliation on subsequent invocations? But it did not do
so even after 100 invocations. Can't I just ask Java to compile
the very first one?

There is a switch that does do that. It's one of the "unsupported" -XX
options. I forget which one, I've never used it.
W/ the revised class I am also still seeing over 50 methods calling
each other before 80% of the CPU time is accounted for (see origional
post). I would have expected there to be some very small handful of
places where most of the work was done.
Mostly what i am wondering is what the heck are these methods
and why is all this necessary, or is it? and how can I get
this simple program to run as quickly as possible in Java.

Looking at your profiling output (which I ignored first time round,
sorry), a lot of the time is spent converting Java's Unicode Strings into
US ASCII. Your C version doesn't do Unicode, so it has a lot less work to
do.

Dan.
 
C

Chronic Philharmonic

Ramon F Herrera said:
Larry:

I am as much of a Java fan and advocate as the next Java developer,
but let's face it: Java is slower than native code. It has to be.

Your short answer is: that's the price we have to pay for portability,
interoperability and freedom.

A lot hinges on good software design. About 10 years ago, I worked for a now
defunct mainframe terminal emulation company. They had some widely-touted
Active-X controls that ran in a browser. They were supposed to be the
smallest, fastest implementation possible. My team was assigned the task of
porting them to Java.

After looking at the Active-X design, and looking at the Java language (for
the first time), we elected not to port the code, since it did not fit the
Java paradigm, and we would have abused the Java language in many respects.
When we finished, we had a terminal emulator that was smaller, faster, and
more correct (relative to real Mainframe terminals) than the Active-X
controls that we started with. This was on Java 1.2 with AWT, before JIT,
before Swing.

Since that time, I have worked on several projects to build massively
scalable server applications in Java. So when someone tells me that Java is
too slow, I just roll my eyes. I'm sure people can find edge cases where
Java just isn't fast enough. One thing's for sure: A direct port from C/C++
is almost guaranteed to fail in nearly every way imaginable.
 
D

Daniel Pitts

Java said:
tried it, make it run in over 40 secs instead of 15 seconds.


had no effect.


Followed these instructions, moving the "loop" into a separate method,
then called that a number of times. no improvement. Rewritten test
class below. I presume putting the meat of it in a separate method
allows the JIT to recognize it as a bottlneck and use inlining or
native compliation on subsequent invocations? But it did not do
so even after 100 invocations. Can't I just ask Java to compile
the very first one?

W/ the revised class I am also still seeing over 50 methods calling
each other before 80% of the CPU time is accounted for (see origional
post). I would have expected there to be some very small handful of
places where most of the work was done.

Mostly what i am wondering is what the heck are these methods
and why is all this necessary, or is it? and how can I get
this simple program to run as quickly as possible in Java.

Also, I saw comments this is "not a valid benchmark." Actually
this script is being used to pipe test input to a database processing
stream, and it was thought the component of the pipe that simply emits
the test data would be so negligible as to be ignored. However the
factor of ~ 70 slowdown w/ Java version is causing that to be an
issue.

The question I don't want to have to answer is "Why, again, can you
not make this program run fast?"

Thanks,

Larry
If you're simply trying to pipe data, use "cat" :)

Like someone else mentioned, Java shines for full applications, not
small utilities, and this is mostly to do with the startup costs.
 
Ad

Advertisements

J

Java Performance Expert

Did you try not flushing the stream, as indicated above?

OK, that helped a lot. I can now generate 50m lines in 28 seconds,
which is only 2.5 X slower than in "C". We can probably live with
that.
I am wondering if I have "taken it to the limit" tho. Revised class
is below.

I still don't think I'm getting the benefits of native compilation.

Another poster remarked:
Your Java example flushes the output buffer on every line while the C
example does not. However even if you fixed that difference it is hard
to see what useful conclusion could be drawn from such a benchmark.

The useful conclusion that can be drawn is that a test generator
written in "C" will run faster, and so be more desirable to use, than
one written in Java, to the extent that it is desirable for each and
every component to run as fast as possible, and to the extent that it
is useful to identify those aspects of our decision making that
produce desirable results.

Let me know if anything is unclear..

Thx

Larry

import java.io.BufferedOutputStream;
import java.util.Date;

class t1
{
static public void main(String[] argv)
{
int lim = new Integer(argv[0]);
int nbench = new Integer(argv[1]);
int b;
for (b=0; b < nbench; b++) {
System.err.println("Bench " + b);
Date start = new Date();

try {
mytest(lim);
}
catch ( Exception e) {
}

Date now = new Date();
System.err.println("Took " + ((now.getTime() -
start.getTime())/1000) + " seconds");
}
}

static public void mytest(int lim) throws Exception
{
int i;
BufferedOutputStream bos = new
BufferedOutputStream(System.out, 1000000);
for (i=0; i < lim; i++) {
String s = "This is line " + i;
byte[] barr = s.getBytes();
bos.write(barr, 0, barr.length);
}
}
}
 
J

Java Performance Expert

Like someone else mentioned, Java shines for full applications, not
small utilities, and this is mostly to do with the startup costs.

I'm calcualting and outputing my timing stats
well into the main() routine, so startup costs
do not apply to me.

Also, I am not getting around Unicode conversion which after
I eliminate it may bring me down to the fastest popssible

thanks
 
M

Mark Space

Java said:
The question I don't want to have to answer is "Why, again, can you
not make this program run fast?"

I'm doing some testing for you (for free!!) but just switching from
println to print gave me a 28% speed improvement.

Next step: get rid of those calls to encoders.

package microbenchmarks;

public class Main {

static public void main(String[] argv)
{
int lim = new Integer(argv[0]);
int i;
for (i=0; i < lim; i++) {
System.out.print( "This is line " + i + "\n");
}
}
}
 
Ad

Advertisements

L

Lew

Java said:
I'm wondering if anyone can help me understand why my Java is being
very slow compared to an equivalent program written in "C".

I'm simply trying to print out the first N integers like

"This is line <nnnn>"

as a simple benchmark.

My Java version is over 60 times slower than my "C" version and I
would like to establish a lower bound on how long the very fastest
Java version could take, by applying every possible performance
speedup availalbe in the Java environment.

I've profiled with "-Xrunhprof" and looked at the output (below) and
was surprised by what I saw. Over 50 different methods are involved
before I arrive at the point where 80% of the cumulative CPU usage for
the run is accounted for! What the heck is this stuff?????

Is this really happening, and is there a way to get around it?

My client is threatening to implement in "C" and I am trying to talk
him out of it.

I'd be very curious to see how this equivalent benchmark peforms on
others' environments.

I modified the Java benchmark in accordance with others' suggestions and for
five million lines of output came up with:

Java:
$ java -server -cp build/classes testit.TimePrin
Elapsed: 93.966 secs.

C program:
$ ./timepr
Elapsed: 88.000000 secs.

A far cry from 60:1, eh?

AMD-64 ~2 GHz, 1 GB RAM, Linux Fedora 7, the usual mix of other programs
running. 32-bit Java.

Code with my variations follows.
<sscce name="TimePrin.java">
package testit;
public class TimePrin
{
private static final int LIM = 5000000;

public static void main( String [] args)
{
int lim;
if ( args.length < 1 )
{
lim = LIM;
}
else
{
try
{
lim = Integer.parseInt( args [0] );
}
catch ( NumberFormatException ex )
{
lim = LIM;
}
}

long start = new Date().getTime();
for ( int i=0; i < lim; i++)
{
System.out.print( "This is line " + i +"\n" );
}
long end = new Date().getTime();

double elapsed = (end - start) / 1000.0;
System.out.println( "Elapsed: "+ elapsed +" secs." );
}
}
</sscce>

<sscce name="timepr.c" build="gcc -O4 -o timepr timepr.c" >
#include <stdio.h>
#include <time.h>

#define LIM 5000000

int main(int argc, char ** argv)
{
int lim;
if ( argc < 2 )
{
lim = LIM;
}
else
{
lim = atoi(argv[1]);
if ( lim <= 1000 )
{
lim = LIM;
}
}

int i;
time_t start = time( NULL );
for ( i=1; i <= lim; i++ )
{
printf("this is line %d\n", i);
}
time_t end = time( NULL );

printf( "Elapsed: %f secs.\n", difftime( end, start ));
}
</sscce>
 

Top