Performance Q: java hotspot vs. native code

T

Twisted

In each of these two cases, would optimized C run substantially faster
than Java (hotspot or other JIT VM)?

* A number-crunching algorithm with a tight loop and a large number of
iterations (from
thousands potentially up to millions, or more) using doubles.

* Ditto, but with the C code using arrays of uints and carries to
effect a high precision
fixed-point math, and the Java code using BigDecimals.

* Ditto, but with roll-your-own Java BigDecimal-alikes using arrays and
math.

If there's an even higher performance option (short of compile-to-FPGA
:)) for the high-precision cases, please let me know about that as
well. (I know when it gets up into the 500+ digits it can be faster to
use FFT for the multiplies -- O(n log n) vs O(n^2). I'll cross that one
when I come to it.)
 
T

tom fredriksen

Twisted said:
In each of these two cases, would optimized C run substantially faster
than Java (hotspot or other JIT VM)?

You would be hard pressed to find anything that runs faster than C. The
reason for this is simple, C is a low abstraction language only slightly
more abstracted than assembler. This means it produces platform close
code with few runtime hindrances. While any interpreted language will be
reduced by active runtime checks and interpreter operations. This
particularly applies to languages with higher abstraction levels than C,
which is quite a few these days, e.g. perl, java, ruby, lisp etc.
Even natively compiled java code would still be slower as it still needs
runtime checks and so forth.

Of course algorithm is another of the most important factors, but if its
the same in both implementations then C wins for the aforementioned
reasons. The only way an any other language might win is if the language
has an algorithmic enhancer which changes the code to an algorithm
better than the one you have programmed, but that is not likely to
happen. (The only thing I can think of for this to be true is f.ex. that
javas regexp engine is faster than a similar regexp engine used in c,
but that comes down to algorithm again.)

/tom
 
T

Twisted

This even applies to basic math calcs, without e.g. arrays (and thus
bounds-checking) and objects (dynamic dispatch, null pointer checking)?
 
T

tom fredriksen

Twisted said:
This even applies to basic math calcs, without e.g. arrays (and thus
bounds-checking) and objects (dynamic dispatch, null pointer checking)?

I don have a definitive answer for that, because it depends on some issues.

The first question is if the code is absolutely free of any support
methods or mechanisms in the language that needs f.ex. runtime checks
and controls.

The second is runtime environment.
If its interpreted; then most likely, at least because of the
interpreter operations.
If its compiled to native code then it might happen.

If its only basic calcs with native data types, then I suggest you make
a prototype in both languages and compare them, just make sure the codes
are equal otherwise some language thing might make some difference.

For the sake of it I will try to figure out a prototype test in both
language, and give it a go I will post it here, please do so aswell as
we could have two different operations to compare wrt speed.

/tom
 
T

Thomas Hawtin

Twisted said:
This even applies to basic math calcs, without e.g. arrays (and thus
bounds-checking) and objects (dynamic dispatch, null pointer checking)?

Bounds checking is exceptionally cheap. It's a register-register compare
and an untaken conditional. It can be hoisted out of inner loops, but
because it's such a cheap operation there isn't a enormous benefit.

Dynamic dispatch. A decent performing JVM will inline methods. It's not
as if virtual functions often cause problems in C++ anyway.

Null pointer checking is similarly simple. Mostly it's a case of letting
the memory management unit trap the page fault.

Probably the worst thing is object's memory layout. The inability to
keep one object within the memory allocated to another. Think Complex[].

Tom Hawtin
 
R

Roedy Green

Bounds checking is exceptionally cheap. It's a register-register compare
and an untaken conditional. It can be hoisted out of inner loops, but
because it's such a cheap operation there isn't a enormous benefit.

Since Java's array elements are always powers of two, you can get the
address offset from the index by a simple shift. Some hardware
architectures even give you that shift for free. In languages where
you can have arrays of objects rather than arrays of references, you
have to do a full multiply. There it becomes really important to
convert the multiply to an add each time through the loop, which chews
up some of your precious registers.

As a side effect of this sort of hoisting you can eliminate the bounds
checks. They are built in to the loop termination check.
 
T

tom fredriksen

tom said:
For the sake of it I will try to figure out a prototype test in both
language, and give it a go I will post it here, please do so aswell as
we could have two different operations to compare wrt speed.

I made a test which perform a simplified in_cksum calculation on a 64KB
packet in a loop. It is done in both C and java (with -server and gcj
(gcc java compiler)

The results where the following:

java -client 11.85 (java 1.5.0_04)
java -server 11.90
gcj 12.26 (gcc 3.3.2 on linux 2.6.3)
C integer 11.01
C float 7.23

So my advice would be to use C if you need absolute speed, but if you
can accept a little reduction, then java might be ok if you are sticking
with pure native datatypes and operations.


Since the in_cksum operation is entirely an integer operation the C
float version is not really interesting (but I made a mistake to begin
with so I thought it was a valid result)

The code is as follows:

(The C code has to be sligthly changed otherwise the array
initailisation routine would have used float operations instead leading
to the whole program being float operations, with invalid results.)

/***** JAVA *****/

public class Cksum
{
public static void main(String args[])
{
Random rand = new Random();
int total = 0;
int count = 65500;
int data[] = new int[count];

for(int c=0; c<count; c++) {
data[c] = rand.nextInt(2000000000);
}

long startTime = System.currentTimeMillis();
for(int d=0; d<50000; d++) {
for(int c=0; c<count; c++) {
total += data[c];
}
}
long endTime = System.currentTimeMillis();

System.out.println("Elapsed time (ms): " + (endTime - startTime));
System.out.println("Total: " + total);
}
}



/***** C *****/

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

int main(int argc, char *argv[])
{
unsigned int total = 0;
int count = 65500;
unsigned int data[count];
unsigned int data2[count];

for(int c=0; c<count; c++) {
/* data[c]=1.0+(unsigned int) (2000000000.0*rand()/(RAND_MAX+1.0)); */
data2[c] = data[c];
}

struct timeval start_time;
struct timeval end_time;

gettimeofday(&start_time, NULL);

for(int d=0; d<50000; d++) {
for(int c=0; c<count; c++) {
total += data[c];
}
}
gettimeofday(&end_time, NULL);

double t1=(start_time.tv_sec*1000)+(start_time.tv_usec/1000.0);
double t2=(end_time.tv_sec*1000)+(end_time.tv_usec/1000.0);

printf("Elapsed time (ms): %.6lf\n", t2-t1);
printf("Total: %u\n", total);

for(int c=0; c<100; c++) {
printf("data2: %u ", data2[c]);
}
printf("\n");

return(0);
}
 
R

Roedy Green

java -client 11.85 (java 1.5.0_04)
java -server 11.90
gcj 12.26 (gcc 3.3.2 on linux 2.6.3)
C integer 11.01
C float 7.23

have yo uposted the code for your benchmark. Iwould like to try it
with Jet.
 
R

Roedy Green

/* data[c]=1.0+(unsigned int) (2000000000.0*rand()/(RAND_MAX+1.0)); */

I don't get it. What are you comparing? The algorithms are not even
close. and you have the code commented out.
 
T

tom fredriksen

Roedy said:
/* data[c]=1.0+(unsigned int) (2000000000.0*rand()/(RAND_MAX+1.0)); */

I don't get it. What are you comparing? The algorithms are not even
close. and you have the code commented out.

What do you mean? I am comparing the following.

C:

for(int d=0; d<50000; d++) {
for(int c=0; c<count; c++) {
total += data[c];
}
}

Java:

for(int d=0; d<50000; d++) {
for(int c=0; c<count; c++) {
total += data[c];
}
}


All the other stuff appears outside the loop, so it is irrelevant. In
the C code I have to use a different initialisation method to be sure I
get integer operations that are comparable, but it should not affect the
measurement. Thats the only difference. Just so you now, the algorithm
is a simplified internet checksum used in tcp/ip, which does not take
into account overflow, its just a simple test that performs some
"credible" math operation.

/tom
 
R

Roedy Green

public class Cksum

I ran it on my machine:

with Jet
Elapsed time (ms): 4641
Total: 476899872

with Java 1.6 client
Elapsed time (ms): 11297
Total: -1699311728

I would change the benchmark to
Random rand = new Random(149);
to give repeatable results. Then that total would verify the algorithm
worked.

That makes Jet the clear winner, far faster than C.

That is because Jet does loop unravelling.
 
T

tom fredriksen

Roedy said:
I ran it on my machine:

with Jet
Elapsed time (ms): 4641
Total: 476899872

with Java 1.6 client
Elapsed time (ms): 11297
Total: -1699311728

I would change the benchmark to
Random rand = new Random(149);
to give repeatable results. Then that total would verify the algorithm
worked.

That makes Jet the clear winner, far faster than C.

I consider using Jet is cheating, if I had used other techniques to
enhance the C code or had access to some C optimisers I am sure I could
make it go faster as well. But I was trying to make an equal
implementation/compilation comparison.

Some comments
- did you try it with the java 1.5 which is production grade compared to
1.6, with server option as well.
- where are the numbers for the C implementation on your machine
otherwise you need to tell us what kind of machine are you using?

/tom
 
R

Roedy Green

I consider using Jet is cheating, if I had used other techniques to
enhance the C code or had access to some C optimisers I am sure I could
make it go faster as well. But I was trying to make an equal
implementation/compilation comparison.

It is not cheating if the result is the same and there are no
conditions under which the code does not work. It is simply using a
superior compiler. If you want to compare languages it is silly
comparing them with less than the best compilers. You are then
artificially skewing the result by which inept compilers you choose.

The best compilers are limited only by the theoretical constraints of
the language. The not so hot ones have all sorts of limitation
nothing whatever to do with the language.
 
T

tom fredriksen

Roedy said:
It is not cheating if the result is the same and there are no
conditions under which the code does not work. It is simply using a
superior compiler. If you want to compare languages it is silly
comparing them with less than the best compilers. You are then
artificially skewing the result by which inept compilers you choose.

The best compilers are limited only by the theoretical constraints of
the language. The not so hot ones have all sorts of limitation
nothing whatever to do with the language.

But where are your numbers for the C version and what cpu are you
running it on?

/tom
 
T

tom fredriksen

Roedy said:
you don't need them. All you need in the ratio of Jet to Java
-client.

Yes if it had been me running it on my machine, but since it is running
on a different machine with different software than I used to test it it
does matter. F.ex java 6 is not yet as fast as java 5, it might contain
other runtime enhancements than java 5 and so on.

In any case you mentioned Jet is using loop unrolling, which is a
techniques which boils down to algorithm enhancing, so by that if the C
code had been using loop unrolling too, the results would be different.

You have to compare an apple with an apple not with a sugar coated
apple. My point is this, code enhancements can be categorised and you
can not use enhancements from a different category because it
significantly changes the comparison. Because then its not about
comparing languages its about comparing different algorithms.

To perform proper comparison and measurements all test must run under
same or similar conditions, you can not mix and switch as you desire and
then make a claim.

/tom
 
C

Chris Uppal

tom said:
To perform proper comparison and measurements all test must run under
same or similar conditions, you can not mix and switch as you desire and
then make a claim.

This is true. But don't take it too far; if one implementation strategy makes
certain types of automatic optimisation possible which are impossible to apply
with another strategy, then the advantages of using those optimisations are
legitimately part of the comparison. They don't turn it into an apples vs.
oranges comparison.

E.g. if a JITing JVM can detect the availability of Intel SMP instructions, and
dynamically choose to generate code which utilises them, then that is a
legitimate advantage over another implementation (of Java, C, or anything else)
which uses static compilation, and therefore does not generate comparable code.

Another example, also hypothetical. If the semantics of a language such as C
are such that the compiler cannot perform automatic loop unrolling, whereas the
semantics of another language are such that the compiler /can/ spot some
opportunities unaided, then it's perfectly legitimate to compare the two
implementations directly.

-- chris
 
T

tom fredriksen

Chris said:
This is true. But don't take it too far; if one implementation strategy makes
certain types of automatic optimisation possible which are impossible to apply
with another strategy, then the advantages of using those optimisations are
legitimately part of the comparison. They don't turn it into an apples vs.
oranges comparison.

E.g. if a JITing JVM can detect the availability of Intel SMP instructions, and
dynamically choose to generate code which utilises them, then that is a
legitimate advantage over another implementation (of Java, C, or anything else)
which uses static compilation, and therefore does not generate comparable code.

If the claim is "compile the fastest code you possibly get in C and
Java" then yes you are right, but then you are discussing which language
has come further along in their development of optimised code.
Sort of like comparing a Ferrari to a Koenigsegg car.

It is another matter to do a test, but limit what one language can use
and dont limit what another language can use. One car must be a Ford
Mondeo or similar, but the other car can be a Ferrari or similar if it
wants. Then you are not comparing speeds of comparable items.
Another example, also hypothetical. If the semantics of a language such as C
are such that the compiler cannot perform automatic loop unrolling, whereas the
semantics of another language are such that the compiler /can/ spot some
opportunities unaided, then it's perfectly legitimate to compare the two
implementations directly.

Since loop unrolling and smp systems make the test enter an entirely
different class of performance, you can not use those techniques unless
both tests are using them. Otherwise its not a race, its a slaughter:)

When setting out on a project to perform a test, a statement of what the
test is to perform must be decided. After that is done, it must be made
sure that the test is objective and comparative based on the premise of
the test. I am not claiming that my test absolutely adheres to those two
criteria, basically because its an informal test, but I did try to make
it relatively comparable. But there are of course too many variables in
a proper test for me to undertake now. Because you would have to
classify all compiler and programming techniques etc and decide which
are applicable for the test to be objective and so on.

/tom
 
R

Roedy Green

To perform proper comparison and measurements all test must run under
same or similar conditions, you can not mix and switch as you desire and
then make a claim.

The result I was talking about was a factor of 4 faster. No fine
detail is going to change that.

You remind me of a skinny kid named Ritchie Dowrey at whose house we
used to play football. He owned the football. One every play he had
a "new rule" that always favoured his team. Nobody knew enough to
challenge him.

The theme was echoed in the movie Can Hieronymus Merkin Ever Forget
Mercy Humppe and Find True Happiness?

You are making your rules up on the fly to generate your desired
result. You are behaving like a religious fanatic distorting the
evidence to produce a predecided conclusion.

Look at this from a practical point of view. You don't really care
HOW a compiler gets its speed, all you care about is does it do the
calculations faster. Therefore I dismiss your talk of the compiler
"cheating".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top