Performance Q: java hotspot vs. native code

R

Roedy Green

If the claim is "compile the fastest code you possibly get in C and
Java" then yes you are right, but then you are discussing which language
has come further along in their development of optimised code.
Sort of like comparing a Ferrari to a Koenigsegg car.

It is another matter to do a test, but limit what one language can use
and dont limit what another language can use. One car must be a Ford
Mondeo or similar, but the other car can be a Ferrari or similar if it
wants. Then you are not comparing speeds of comparable items.

If you introduce handicaps, YOU are rigging the outcome. You are not
really measuring anything objective. You are tricking people into
accepting your test as an objective measure of merit.

What counts is which performs best in the real world. Your job is to
make the test as reflective as possible of the real world, not to make
decisions on which optimisation techniques count as valid, unless for
some reason a technique could not actually be used in the real world.

That is why, for example, you make the tests add and print results so
the optimiser can't discard code in the test, which it could not do in
the real world. You do that by making the test more realistic, not by
disqualifying an optimiser.

..
 
T

Thomas Hawtin

tom said:
java -client 11.85 (java 1.5.0_04)
java -server 11.90
gcj 12.26 (gcc 3.3.2 on linux 2.6.3)
C integer 11.01
C float 7.23

I rewrote the Java version of the microbenchmark to be more realistic
and conventional. My results

1.5.0_06-b05, Client: 12216, 12182, 12174, 12186
1.5.0_06-b05, Server: 11079, 4210, 4207, 4231
1.6.0-beta2-b76, Client: 10203, 10191, 10208, 10247
1.6.0-beta2-b76, Server: 12675, 12668, 5484, 5491
g++ (GCC) 4.0.0 20050519 (Red Hat 4.0.0-8), -O3: 6647.844000
(using commented out code: 5849.171000)

So what can we conclude? After a start up penalty, Sun's current Server
HotSpot is much faster the C++. No. Microbenchmarks can help your
understanding of how a particular compiler behaves. They are useless at
determining the goodness of performance across languages.

Tom Hawtin

class Checksum {
private static int core(int[] data) {
int count = data.length;
int total = 0;

for (int d=0; d<50000; d++) {
for (int c=0; c<count; c++) {
total += data[c];
}
}
return total;
}

public static void main(String[] args) {
java.util.Random rand = new java.util.Random();
int count = 65500;
int[] data = new int[count];

for (int c=0; c<count; c++) {
data[c] = rand.nextInt(2000000000);
}
for (int run=0; run<4; ++run) {
long startTime = System.currentTimeMillis();
int total = core(data);
long endTime = System.currentTimeMillis();

System.out.println("Elapsed time (ms): " + (endTime -
startTime));
System.out.println("Total: " + total);
}
}
}
 
T

tom fredriksen

Roedy said:
The result I was talking about was a factor of 4 faster. No fine
detail is going to change that.

Now you are spreading FUD, microsoft style.
You are making your rules up on the fly to generate your desired
result. You are behaving like a religious fanatic distorting the
evidence to produce a predecided conclusion.

Enough with the personal characterisations! It makes you look like a
fanatic desperately trying to convince everybody you are right.
Look at this from a practical point of view. You don't really care
HOW a compiler gets its speed, all you care about is does it do the
calculations faster. Therefore I dismiss your talk of the compiler
"cheating".

That's your prerogative, It still does not give you a statistically
sound or objective result. Because you are controlling the results.
I am not saying the measurement I am doing are perfect, just that they
are more fair than what yours are.

But if you are convinced you are right, you can prove it by doing the
following.

- implement loop unrolling and use a C optimiser, then run the tests
again, then post the details of the code and optimiser used.
- post the measurement numbers of both C tests.

if you can not do that, you can not prove fairness. You have nothing to
loose because the Jet version is, according to you, superior anyway.

/tom
 
T

tom fredriksen

Roedy said:
What counts is which performs best in the real world. Your job is to
make the test as reflective as possible of the real world, not to make
decisions on which optimisation techniques count as valid, unless for
some reason a technique could not actually be used in the real world.

That would have been true if the point of the test was "get the best
performance you can get of these two languages", but it was not it was
an informal comparison to chart the landscape.
That is why, for example, you make the tests add and print results so
the optimiser can't discard code in the test, which it could not do in
the real world.

That has nothing to do with the rigging the test, it helps set up a
comparable test, and you know it. Stick to the facts, not what suits
your arguments.
You do that by making the test more realistic, not by
disqualifying an optimiser.

Of course it is entirely possible to implement another test which does
exhibit such behaviour. please do so then, I have accomplished what I
want. If you want something else then feel free to do so or not.

/tom
 
C

Chris Uppal

Thomas said:
So what can we conclude? After a start up penalty, Sun's current Server
HotSpot is much faster the C++. No. Microbenchmarks can help your
understanding of how a particular compiler behaves. They are useless at
determining the goodness of performance across languages.

I got interested enough to reproduce Thomas's tests with a number of C++
compilers.

gcc running with -O3, and no other optimisation settings (life's too short even
to read the man page!).

MS VC6, in "Release" mode, plus telling it to optimise for speed only, and to
generate code targetting the "Pentium Pro" (the most modern target available).

MS VS 2003 in default "Release" mode. Note that this includes array overrun
checking by default (presumably Tom considers this necessary foran apples to
apples comparison -- although I don't).

MS VS 2003 in "Release" mode, plus telling it to generate code for a Pentium 4,
and turning on all the other relevant-looking optimisations.

Java -client and -server. In both cases JDK 1.5.0

Results are:

gcc -O3 5458 5177 5278 5187
vc6 +opt 7020 6850 6759 6850
vs2003 3555 3385 3465 3385
vs2003 +opt 3635 3385 3385 3465
java -client 13770 13610 13699 13620
java -server 11456 3485 3365 3385

In all cases running on a 1.5 GHz celeron box. I haven't attempted to explore
what would happen running the same code on diferent chips (especially AMD).

What can we conclude ? Well, provided we remember that this is only one very,
very, specific test, and that other apparently similar tests might give very
different results, I think it's obvious...

-- chris
 
S

Scott Ellsworth

tom fredriksen said:
That would have been true if the point of the test was "get the best
performance you can get of these two languages", but it was not it was
an informal comparison to chart the landscape.

Right, and part of the landscape is the available tool set.

BEA's JRockit has not been ported to the Mac, so my interest in it is
minimal. GCC is on my platform, so my interest in it, especially with a
reasonable optimization set, is high.

Similarly, someone on a platform where Jet works is going to be
interested in it, while on an unsupported platform, it does them little
good. It is not part of the landscape that they want charted.

So, whether _you_ find JRockit, Jet, or GCC with certain optimizations
on useful for your purposes, it is still a valid comparison for some
potential users. It lets them chart their landscape.

Scott
 
T

Twisted

At this point, it's looking like java -server is comparable to C++ with
reasonably up-to-date stuff and integer math in a tight loop.

What about floating point math (say, a few adds and a couple mults) in
a similar loop? How does it perform on different chips? Say (and
someone in this group probably has access to each of these)
-- Latest 32-bit Intel offering
-- AMD Athlon same clock speed
-- Athlon 64, same speed again
-- dual core? (double the data length if you can make it use both
cores; if you can't, report that fact.)

And what exactly is Jet? I know, I know, google it, but somehow I doubt
page after page of aeronautical Web sites will be enlightening in this
instance.
 
T

Twisted

Ugh. They want you to pay money? Even for noncommercial/freeware
development, open source, personal use, etc.???

Forget it. Especially as Sun's HotSpot with -server seems to perform
same as native C, and will be far more portable.
 
R

Roedy Green

1.5.0_06-b05, Client: 12216, 12182, 12174, 12186
1.5.0_06-b05, Server: 11079, 4210, 4207, 4231
1.6.0-beta2-b76, Client: 10203, 10191, 10208, 10247
1.6.0-beta2-b76, Server: 12675, 12668, 5484, 5491
g++ (GCC) 4.0.0 20050519 (Red Hat 4.0.0-8), -O3: 6647.844000
(using commented out code: 5849.171000)

here are my results on Win2K.

java 1.6 -client 11016 11046 11032 11047
java jdk1.6.0\bin] -server 12781 12766 6516 6500
Java jdk1.6.0\jre\bin -server 12391 12453 6500 6500
Jet 4.1 4656 4656 4656 4657

So Jet is faster than Hotspot by a factor of 2.7 to start and by 1.4
after HotSpot warms up.

Here is the key to Jet's speed: it unravelled the inner loop to handle
an odd/even pair in one iteration.

L10:
add ebx, 16(eax, esi, 4) ; bypass 16 bytes of overhead
add ebx, 20(eax, esi, 4) ; indexing by 4-byte groups
add esi,2
cmp esi,ecx
jl L10

The unraveling likely does more than cut your cmp/jmp overhead in
half. It gives the pipeline a little extra time to get the second
operand ready..
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,121
Latest member
LowellMcGu
Top