execution speed java vs. C

B

beliavsky

Chris Uppal said:
Interesting example. I'm pleased to see that the traditional superiority of
Fortan still holds. (Not that I'm a Fortran programmer myself). Presumably a
32-bit integer version would perform about the same ? Or would it be twice as
fast ?

The 32-bit integer version also takes 1.0 s. Using 64-bit integers, by changing
(kind=4) to (kind=8) in the program below, increases the run time to 21.4
s.

To optimize performance on Intel hardware one should probably use the Intel
Math Kernel Library http://www.intel.com/software/products/mkl/ , which is
callable from C or Fortran. The matmul function I used is an intrinsic function
of F95.

Here is the Fortran 95 code for multiplying integer matrices.

program xmatmul_int_time
implicit none
integer, parameter :: n = 800,icalc=3, iscale = 10
real :: x
real :: t1,t2
integer(kind=4) :: i,j,ix(n,n),iy(n,n),isum
real , parameter :: shift_ran = 0.5
call cpu_time(t1)
call random_seed()
do i=1,n
do j=1,n
call random_number(x)
ix(i,j) = iscale*(x-shift_ran)
call random_number(x)
iy(i,j) = iscale*(x-shift_ran)
end do
end do
isum = sum(matmul(ix,iy))
call cpu_time(t2)
print*,1000*(t2-t1),icalc,isum
end program xmatmul_int_time
 
N

nicolasbock

Skip said:
after enableding the -server in java commandline 5.9s (you need the java SDK
for that)

Do you happen to know whether the java that's shipped with OS X
(10.3.6) comes with the SDK? Java claims to know about the "-server"
option, but when I run the matrix multiply with it there is virtually
no improvement in runtime.

nick
 
N

nicolasbock

Chris said:
Michael Borgwardt wrote:
Bottom line, I suppose is (insofar as this test is representative):
1) it would appear that java -server /can/ be as fast as well-optimised C.
2) using 1-dimensional arrays /can/ be a big win, but only if everything else
is optimised (e.g. by running -server) to the extent where the array access is
in fact the bottleneck.

Thanks for that extensive reply. That's very interesting. It occured to
me that there might then also be a, possibly large, platform specific
component to the runtime result, since it depends on the quality of the
runtime engine and how well it translates bytecode into native code.
When I changed the two-dimensional arrays into one-dimensional ones
though, the runtimes changed to

C (-O3): 9.5s
java (-server): 11.5s

In this form the Java code then is basically comparable in performance
to the C code.

nick
 
J

James Kanze

Skip said:
So I went ahead and wrote a very simple matrix multiplication program
in C and Java and benchmarked them. To my disappointment, C turned out
to be about 1.5 to 2 times faster than Java.
int a [][] = new int [N][M];
int b [][] = new int [N][M];
int c [][] = new int [N][M];
In java every array is a separate object, if you do:
int a [] = new int [N*M];
int b [] = new int [N*M];
int c [] = new int [N*M];
your code will be a LOT faster.

But more difficult to maintain.
further: you benchmark only the first run it seems. the HotSpot JIT seems to
optimize the code only after the same code block is run a couple of times
(normally the 2nd time).

A lot depends on the implementation. I noticed that at least on some
implementations, the first pass through every function was strictly
interpreted -- and at least in his C example, everything was in main.
Putting the actual matrix operations in a separate function, and calling
it a lot, could easily speed up the Java by a factor of 10 or more.

He also said he was running on a Power PC: Mac or AIX, I suppose.
Different implementations of Java have more or less advanced
technologies. For obvious market reasons, the Windows implementations
are the leading edge; Solaris and (I think) Linux comes next, and the
rest follows as it can. Still, IBM's support for Java should ensure
reasonable quality on an AIX -- maybe with Jikes instead of JDK?
 
C

Chris Smith

James Kanze said:
Still, IBM's support for Java should ensure
reasonable quality on an AIX -- maybe with Jikes instead of JDK?

The compiler won't matter much at all. It's the VM that matters.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
S

Skip

Skip said:
Do you happen to know whether the java that's shipped with OS X
(10.3.6) comes with the SDK? Java claims to know about the "-server"
option, but when I run the matrix multiply with it there is virtually
no improvement in runtime.

If you can compile anything (with javac), you have the SDK. the JRE can only
*run* apps.

Sometimes the -server option has no effect, sometimes it's even slower than
the client. Depends on the algorithm, the VM, the JIT, even the RAM (if you
are short -server takes more ram).

In this case however -server should give you a nice performance boost, if
not, it's probably ignored or not supported.

I was running WinXP 1.5.0 SDK
 
T

Tom Dyess

Here are some timed specs with source code and results for Java vs. C/C++

http://www.scottsarra.org/timer/timerMain.html

which links to this at the end (haven't read second one)

http://www.javaworld.com/javaworld/jw-02-1998/jw-02-jperf.html

I write programs at work to do numerical calculations. Those programs
are usually written in C. I recently started looking into Java and
found the language and its features very attractive. I figured that I
should give it a try and see whether it is suitable for numerical work.
So I went ahead and wrote a very simple matrix multiplication program
in C and Java and benchmarked them. To my disappointment, C turned out
to be about 1.5 to 2 times faster than Java.

Below more detail on what I actually did. I calculated the matrix
product of two random 800x800 matrices for int and double matrices. The
tests were run on a powerBook G4 1.33GHz. The C compiler was gcc-3.3
and the Java version I used 1.4.2_05. The C code was compiled with
optimization level 3 (-O3). I couldn't find anything equivalent for
javac, so I compiled the code presumably without optimizations or with
some default level of optimization. The runtimes I found were

int matrices
C (-O3): 15s
C (without -O): 33s
Java: 32s

double matrices
C (-O3): 28s
C (without -O): 47s
Java: 45s

Since the unoptimized C code (without specifying any -O level) runs
about as fast as the Java code I assume that I am looking at badly
optimized Java code.

Can I force the java compiler or the runtime engine to optimize my code
further?

Thanks, nick






The C program was

#include <stdio.h>
#include <stdlib.h>

#define N 800
#define M 800

int a [N][M];
int b [N][M];
int c [N][M];

int main ()
{
int i, j, k;

printf ("creating matrices a and b\n");

/* Initialize the two matrices. */
for (i = 0; i < N; ++i) {
for (j = 0; j < M; ++j) {

a[j] = (int) ((rand () / (double) RAND_MAX - 0.5) * 10);
b[j] = (int) ((rand () / (double) RAND_MAX - 0.5) * 10);
}
}

/* Multiply the matrices. */

printf ("multiplying them\n");

for (i = 0; i < N; ++i) {
for (j = 0; j < M; ++j) {

c[j] = 0;
}
}

for (i = 0; i < N; ++i) {
for (j = 0; j < M; ++j) {
for (k = 0; k < M; ++k) {

c[j] += a[k] * b[k][j];
}
}
}
}


The Java code was:

public class MatrixTestDirty
{
public static void main (String args [])
{
int N = 800;
int M = 800;

int a [][] = new int [N][M];
int b [][] = new int [N][M];
int c [][] = new int [N][M];

System.out.println ("creating matrices a and b");

/* Initialize the two matrices. */
for (int i = 0; i < N; ++i) {
for (int j = 0; j < M; ++j) {

a[j] = (int) ((Math.random () - 0.5) * 10);
b[j] = (int) ((Math.random () - 0.5) * 10);
}
}

/* Multiply the matrices. */

System.out.println ("multiplying them");

for (int i = 0; i < N; ++i) {
for (int j = 0; j < M; ++j) {

c[j] = 0;
}
}

for (int i = 0; i < N; ++i) {
for (int j = 0; j < M; ++j) {
for (int k = 0; k < M; ++k) {

c[j] += a[k] * b[k][j];
}
}
}
}
}
 
C

Chris Uppal

The 32-bit integer version also takes 1.0 s. Using 64-bit integers, by
changing (kind=4) to (kind=8) in the program below, increases the run
time to 21.4 s.

The 32-bit integer version takes the same time as the 64-bit float version
(which is odd enough in itself, since there's twice as much data to push
around), but the 64-bit integer version takes >20 time longer ?!

There /must/ be something wrong with that...

I'm no Fortran programmer, but looking at the code you posted, it appears that
you are including the time to initialise the input arrays with random numbers
in your timing, and also including the call to sum() (which I guess is there to
prevent the optimiser from removing the multiply() ?). If you move both of
those outside the section that you time, does the strange result persist ?

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,781
Messages
2,569,615
Members
45,299
Latest member
JewelDeLaC

Latest Threads

Top