Performance Q: java hotspot vs. native code

Roedy Green · Mar 17, 2006

If the claim is "compile the fastest code you possibly get in C and
Java" then yes you are right, but then you are discussing which language
has come further along in their development of optimised code.
Sort of like comparing a Ferrari to a Koenigsegg car.

It is another matter to do a test, but limit what one language can use
and dont limit what another language can use. One car must be a Ford
Mondeo or similar, but the other car can be a Ferrari or similar if it
wants. Then you are not comparing speeds of comparable items.

If you introduce handicaps, YOU are rigging the outcome. You are not
really measuring anything objective. You are tricking people into
accepting your test as an objective measure of merit.

What counts is which performs best in the real world. Your job is to
make the test as reflective as possible of the real world, not to make
decisions on which optimisation techniques count as valid, unless for
some reason a technique could not actually be used in the real world.

That is why, for example, you make the tests add and print results so
the optimiser can't discard code in the test, which it could not do in
the real world. You do that by making the test more realistic, not by
disqualifying an optimiser.

..

Thomas Hawtin · Mar 17, 2006

tom said:
java -client 11.85 (java 1.5.0_04)
java -server 11.90
gcj 12.26 (gcc 3.3.2 on linux 2.6.3)
C integer 11.01
C float 7.23

I rewrote the Java version of the microbenchmark to be more realistic
and conventional. My results

1.5.0_06-b05, Client: 12216, 12182, 12174, 12186
1.5.0_06-b05, Server: 11079, 4210, 4207, 4231
1.6.0-beta2-b76, Client: 10203, 10191, 10208, 10247
1.6.0-beta2-b76, Server: 12675, 12668, 5484, 5491
g++ (GCC) 4.0.0 20050519 (Red Hat 4.0.0-8), -O3: 6647.844000
(using commented out code: 5849.171000)

So what can we conclude? After a start up penalty, Sun's current Server
HotSpot is much faster the C++. No. Microbenchmarks can help your
understanding of how a particular compiler behaves. They are useless at
determining the goodness of performance across languages.

Tom Hawtin

class Checksum {
private static int core(int[] data) {
int count = data.length;
int total = 0;

for (int d=0; d<50000; d++) {
for (int c=0; c<count; c++) {
total += data[c];
}
}
return total;
}

public static void main(String[] args) {
java.util.Random rand = new java.util.Random();
int count = 65500;
int[] data = new int[count];

for (int c=0; c<count; c++) {
data[c] = rand.nextInt(2000000000);
}
for (int run=0; run<4; ++run) {
long startTime = System.currentTimeMillis();
int total = core(data);
long endTime = System.currentTimeMillis();

System.out.println("Elapsed time (ms): " + (endTime -
startTime));
System.out.println("Total: " + total);
}
}
}

tom fredriksen · Mar 17, 2006

Roedy said:
The result I was talking about was a factor of 4 faster. No fine
detail is going to change that.

Now you are spreading FUD, microsoft style.

You are making your rules up on the fly to generate your desired
result. You are behaving like a religious fanatic distorting the
evidence to produce a predecided conclusion.

Enough with the personal characterisations! It makes you look like a
fanatic desperately trying to convince everybody you are right.

Look at this from a practical point of view. You don't really care
HOW a compiler gets its speed, all you care about is does it do the
calculations faster. Therefore I dismiss your talk of the compiler
"cheating".

That's your prerogative, It still does not give you a statistically
sound or objective result. Because you are controlling the results.
I am not saying the measurement I am doing are perfect, just that they
are more fair than what yours are.

But if you are convinced you are right, you can prove it by doing the
following.

- implement loop unrolling and use a C optimiser, then run the tests
again, then post the details of the code and optimiser used.
- post the measurement numbers of both C tests.

if you can not do that, you can not prove fairness. You have nothing to
loose because the Jet version is, according to you, superior anyway.

/tom

tom fredriksen · Mar 17, 2006

Roedy said:
What counts is which performs best in the real world. Your job is to
make the test as reflective as possible of the real world, not to make
decisions on which optimisation techniques count as valid, unless for
some reason a technique could not actually be used in the real world.

That would have been true if the point of the test was "get the best
performance you can get of these two languages", but it was not it was
an informal comparison to chart the landscape.

That is why, for example, you make the tests add and print results so
the optimiser can't discard code in the test, which it could not do in
the real world.

That has nothing to do with the rigging the test, it helps set up a
comparable test, and you know it. Stick to the facts, not what suits
your arguments.

You do that by making the test more realistic, not by
disqualifying an optimiser.

Of course it is entirely possible to implement another test which does
exhibit such behaviour. please do so then, I have accomplished what I
want. If you want something else then feel free to do so or not.

/tom

Chris Uppal · Mar 19, 2006

Thomas said:
So what can we conclude? After a start up penalty, Sun's current Server
HotSpot is much faster the C++. No. Microbenchmarks can help your
understanding of how a particular compiler behaves. They are useless at
determining the goodness of performance across languages.

I got interested enough to reproduce Thomas's tests with a number of C++
compilers.

gcc running with -O3, and no other optimisation settings (life's too short even
to read the man page!).

MS VC6, in "Release" mode, plus telling it to optimise for speed only, and to
generate code targetting the "Pentium Pro" (the most modern target available).

MS VS 2003 in default "Release" mode. Note that this includes array overrun
checking by default (presumably Tom considers this necessary foran apples to
apples comparison -- although I don't).

MS VS 2003 in "Release" mode, plus telling it to generate code for a Pentium 4,
and turning on all the other relevant-looking optimisations.

Java -client and -server. In both cases JDK 1.5.0

Results are:

gcc -O3 5458 5177 5278 5187
vc6 +opt 7020 6850 6759 6850
vs2003 3555 3385 3465 3385
vs2003 +opt 3635 3385 3385 3465
java -client 13770 13610 13699 13620
java -server 11456 3485 3365 3385

In all cases running on a 1.5 GHz celeron box. I haven't attempted to explore
what would happen running the same code on diferent chips (especially AMD).

What can we conclude ? Well, provided we remember that this is only one very,
very, specific test, and that other apparently similar tests might give very
different results, I think it's obvious...

-- chris

Scott Ellsworth · Mar 20, 2006

tom fredriksen said:
That would have been true if the point of the test was "get the best
performance you can get of these two languages", but it was not it was
an informal comparison to chart the landscape.

Right, and part of the landscape is the available tool set.

BEA's JRockit has not been ported to the Mac, so my interest in it is
minimal. GCC is on my platform, so my interest in it, especially with a
reasonable optimization set, is high.

Similarly, someone on a platform where Jet works is going to be
interested in it, while on an unsupported platform, it does them little
good. It is not part of the landscape that they want charted.

So, whether _you_ find JRockit, Jet, or GCC with certain optimizations
on useful for your purposes, it is still a valid comparison for some
potential users. It lets them chart their landscape.

Scott

Twisted · Mar 21, 2006

At this point, it's looking like java -server is comparable to C++ with
reasonably up-to-date stuff and integer math in a tight loop.

What about floating point math (say, a few adds and a couple mults) in
a similar loop? How does it perform on different chips? Say (and
someone in this group probably has access to each of these)
-- Latest 32-bit Intel offering
-- AMD Athlon same clock speed
-- Athlon 64, same speed again
-- dual core? (double the data length if you can make it use both
cores; if you can't, report that fact.)

And what exactly is Jet? I know, I know, google it, but somehow I doubt
page after page of aeronautical Web sites will be enlightening in this
instance.

Roedy Green · Mar 21, 2006

And what exactly is Jet? I know, I know, google it, but somehow I doubt
page after page of aeronautical Web sites will be enlightening in this
instance.

see http://mindprod.com/jgloss/jet.html
and http://mindprod.com/jgloss/aot.html

Twisted · Mar 22, 2006

Ugh. They want you to pay money? Even for noncommercial/freeware
development, open source, personal use, etc.???

Forget it. Especially as Sun's HotSpot with -server seems to perform
same as native C, and will be far more portable.

Roedy Green · Mar 23, 2006

1.5.0_06-b05, Client: 12216, 12182, 12174, 12186
1.5.0_06-b05, Server: 11079, 4210, 4207, 4231
1.6.0-beta2-b76, Client: 10203, 10191, 10208, 10247
1.6.0-beta2-b76, Server: 12675, 12668, 5484, 5491
g++ (GCC) 4.0.0 20050519 (Red Hat 4.0.0-8), -O3: 6647.844000
(using commented out code: 5849.171000)

here are my results on Win2K.

java 1.6 -client 11016 11046 11032 11047
java jdk1.6.0\bin] -server 12781 12766 6516 6500
Java jdk1.6.0\jre\bin -server 12391 12453 6500 6500
Jet 4.1 4656 4656 4656 4657

So Jet is faster than Hotspot by a factor of 2.7 to start and by 1.4
after HotSpot warms up.

Here is the key to Jet's speed: it unravelled the inner loop to handle
an odd/even pair in one iteration.

L10:
add ebx, 16(eax, esi, 4) ; bypass 16 bytes of overhead
add ebx, 20(eax, esi, 4) ; indexing by 4-byte groups
add esi,2
cmp esi,ecx
jl L10

The unraveling likely does more than cut your cmp/jmp overhead in
half. It gives the pipeline a little extra time to get the second
operand ready..

Twisted · Mar 23, 2006

Is there an open source equivalent?

Roedy Green · Mar 23, 2006

Is there an open source equivalent?

There are only two AOT compilers left standing. See
http://mindprod.com/jgloss/aot.html

Wow: jruby vs. MRE performance...	3	Apr 24, 2009
Clojure vs Java speed (was: Alternatives to C)	22	Jul 29, 2009
Using Java Classes to Sort a Small Array Quickly	31	Sep 1, 2011
mmap in C++ vs. Java	2	Feb 27, 2007
Bignums, object grind, and garbage collection	15	Dec 16, 2006
machine code for Java	1	Jun 7, 2005
mmap() in C/C++ vs. Java	2	Feb 27, 2007
Strange performance problem in C++ program	17	Nov 18, 2007

Performance Q: java hotspot vs. native code

Roedy Green

Thomas Hawtin

tom fredriksen

tom fredriksen

Chris Uppal

Scott Ellsworth

Twisted

Roedy Green

Twisted

Roedy Green

Twisted

Roedy Green

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads