Is Java 1.6 number crunching slower than 1.5?

Kevin McMurtrie · Aug 13, 2009

I'm tuning some graphics routines that do pure number crunching.
There's no AWT, Swing, memory allocation, java.lang.Math, or system
calls in the main loops. It's just running kernels over RGBA int arrays
to apply an anti-aliased affine transformation. I've optimized the code
over time but one thing always remains the same: Java 1.6 benchmarks at
4% slower than 1.5.

Is there anything I should look out for in Java 1.6 HotSpot? Different
register allocation? Slower array access (pointer math)? Strange
runtime overhead in loops? Is it just an Apple thing that I shouldn't
worry about?

Version info:

MacOS X 10.5.8

Darwin desktop.pixelmemory.us 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul
15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386

java version "1.5.0_20"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02-308)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)

java version "1.6.0_15"
Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)

Roedy Green · Aug 13, 2009

Is there anything I should look out for in Java 1.6 HotSpot? Different
register allocation?

I would guess all it is is that 1.6 library code is fatter. Even with
the same code generated for your app, 1.6 would run slower in the same
amount of RAM.

Even if you don't use methods of a used class, the code for them still
gets loaded.

Do an experiment. In a RAM rich machine crank up the various RAM
allocations on the java.exe command line as see if the gap narrows.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"If you think it’s expensive to hire a professional to do the job, wait until you hire an amateur."
~ Red Adair (born: 1915-06-18 died: 2004-08-07 at age: 89)

Kevin McMurtrie · Aug 13, 2009

Roedy Green said:
I would guess all it is is that 1.6 library code is fatter. Even with
the same code generated for your app, 1.6 would run slower in the same
amount of RAM.

Even if you don't use methods of a used class, the code for them still
gets loaded.

Do an experiment. In a RAM rich machine crank up the various RAM
allocations on the java.exe command line as see if the gap narrows.

I'm testing on a 5GB machine with a 3GB heap.

Roedy Green · Aug 13, 2009

I'm testing on a 5GB machine with a 3GB heap.

You can disassemble to see the generated class code, but I don't know
what you might use (without spending) that would let you peek at the
generated machine code. Is there esoteric option or annotation to get
a look?

Perhaps you could use nanotime to benchmark some code that does not do
memory allocation. It seems unlikely number crunching code would not
get faster with each release.

You need to figure out some way to discount GC overhead and OS
swapping overhead. I would expect that to be bigger in 1.6.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"If you think it’s expensive to hire a professional to do the job, wait until you hire an amateur."
~ Red Adair (born: 1915-06-18 died: 2004-08-07 at age: 89)

markspace · Aug 13, 2009

Kevin said:
Is there anything I should look out for in Java 1.6 HotSpot? Different
register allocation? Slower array access (pointer math)? Strange
runtime overhead in loops? Is it just an Apple thing that I shouldn't
worry about?

4% sounds like a very small number, and it might be hard to find what is
causing that in the code. I'm hardly an expert, but I'll take a wild
stab at it.

First question would be what optimizations and/or parameters are you
passing to each JVM? This is an area where the two JVMs might not be
exactly equivalent, as the defaults tend to vary from build to build (as
we were just discussing on another thread.)

Arne Vajhøj · Aug 13, 2009

Kevin said:
I'm tuning some graphics routines that do pure number crunching.
There's no AWT, Swing, memory allocation, java.lang.Math, or system
calls in the main loops. It's just running kernels over RGBA int arrays
to apply an anti-aliased affine transformation. I've optimized the code
over time but one thing always remains the same: Java 1.6 benchmarks at
4% slower than 1.5.

Is there anything I should look out for in Java 1.6 HotSpot? Different
register allocation? Slower array access (pointer math)? Strange
runtime overhead in loops? Is it just an Apple thing that I shouldn't
worry about?

1.6 is slightly different from 1.5 - some things may be faster, some
things may be slower.

I have my own little micro benchmark and it shows identical int and
double performance but significantly improved String performance
from 1.5 to 1.6.

But different benchmarks will give different results.

I obviously assume that you are using -server.

But you could try and experiment with some of the more
exotic -XX options.

Arne

Arne Vajhøj · Aug 13, 2009

Roedy said:
You can disassemble to see the generated class code,

Since the optimization is done in the JIT compiler not in javac, then
there is not much point in that.

Perhaps you could use nanotime to benchmark some code

Benchmarks that need nanotime have to much uncertainty on a typical
multiuser OS.

It seems unlikely number crunching code would not
get faster with each release.

It is not unlikely that some number crunching code would get slower.

It is unlikely that the majority of number crunching code would
get slower.

You need to figure out some way to discount GC overhead and OS
swapping overhead. I would expect that to be bigger in 1.6.

Give that the OP stated:
There's no ... , memory allocation, ...
then he does not.

Arne

Daniel Sjöblom · Aug 14, 2009

You can disassemble to see the generated class code, but I don't know
what you might use (without spending) that would let you peek at the
generated machine code. Is there esoteric option or annotation to get
a look?

There is such an option (-XX:+PrintAssembly), but unfortunately I
think
it is disabled in the production build of the VM. If you are
interested
enough, it is possible to write a very simple agent in JVMTI that can
accomplish this as well.

Regards,
Daniel

Kevin McMurtrie · Aug 15, 2009

I pulled out a few bits of code and patched it together so a test case
does the same kind of math as the real deal. (Don't be a style freak -
it's demo fragment squished to fit in a Usenet posting.)

Machine:
MacOS X 10.5.8

Darwin desktop.pixelmemory.us 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul
15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386

---------------
Java 1.5

Version:
java version "1.5.0_20"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02-308)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)

Options: -d64 -mx2G

Output:
Millis: 5979.04
Millis: 5984.168
Millis: 5987.027
Millis: 5979.992
Millis: 5953.974

---------------
Java 1.6

Version:
java version "1.6.0_15"
Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)

Options: -d64 -mx2G

Output:
Millis: 6943.407
Millis: 6937.324
Millis: 6917.524
Millis: 6931.662
Millis: 6917.065

---------------

public class Benchmark
{
static final int s_offR= 16, s_offG= 8, s_offB= 0, s_offA= 24;
static final int s_maskR= 0xff0000, s_maskG= 0xff00,
s_maskB= 0xff, s_maskA= 0xff000000;

public static void main (final String args[])
{
final Benchmark b= new Benchmark ();
b.rasterize();

for (int restest= 0; restest < 5; ++restest)
{
final long start= System.nanoTime();
for (int i= 0; i < 1000; ++i)
b.rasterize();
final long end= System.nanoTime();

System.out.println("Millis: " + (end - start) / 1000000d);
}
}

final int m_src[], m_dst[];
final int m_srcYstride, m_dstYstride;
float m_R, m_G, m_B, m_A;

public Benchmark ()
{
m_src= new int [640 * 480];
m_dst= new int [320 * 240];
m_srcYstride= 640;
m_dstYstride= 320;
}

void rasterize ()
{
final short kerns[][][]= new short[][][] {
{{300, 134, -23, 121}, {234, 45, 12, -18},
{37, -86, 7, 0}, {4, 86, -13, 197}},
{{300, 134, -23, 123}, {45, 234, 12, -20},
{37, -54, 7, 0}, {4, 54, -13, 197}}
};
final int sum= 1069;

for (int srcY= 0, dstY= 0, ySrcScan= 0;
srcY < (480 - 4);
srcY+= 2, ++dstY, ySrcScan+= 2*m_srcYstride)
{
for (int srcX= 0, dstX= 0; srcX < (640 - 4); srcX+= 2, ++dstX)
{
m_R= m_G= m_B= m_A= 0;
read4x4 (kerns[srcY & 1], m_src[ySrcScan + srcX], 0, 0);
writePixel (sum, dstX, dstY);
}
}
}

final void writePixel
(final float kernelSum, final int dstX, final int dstY)
{
final int v;
if (m_A > 0)
{
final int r= (int)(m_R / m_A);
final int g= (int)(m_G / m_A);
final int b= (int)(m_B / m_A);
final int a= (int)(m_A / kernelSum);
v= (((r < 0) ? 0 : ((r > 255) ? 255 : r) << s_offR) & s_maskR)
| (((g < 0) ? 0 : ((g > 255) ? 255 : g) << s_offG) & s_maskG)
| (((b < 0) ? 0 : ((b > 255) ? 255 : b) << s_offB) & s_maskB)
| (((a < 0) ? 0 : ((a > 255) ? 255 : a) << s_offA) & s_maskA);
}
else
v= 0;

m_dst[dstX + dstY * m_dstYstride]= v;
}

final void read4x4
(final short[][] array, final int scan, final int kx, final int ky)
{
int R = 0, G = 0, B = 0, A = 0;
final short k0[]= array[ky];
final short k1[]= array[ky + 1];
final short k2[]= array[ky + 2];
final short k3[]= array[ky + 3];

{
final short k = k0[kx + 0];
final int value = m_src[scan];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R= alphaMult * ((value & s_maskR) >>> s_offR);
G= alphaMult * ((value & s_maskG) >>> s_offG);
B= alphaMult * ((value & s_maskB) >>> s_offB);
A= alphaMult;
}
{
final short k = k0[kx + 1];
final int value = m_src[scan + 1];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}
{
final short k = k0[kx + 2];
final int value = m_src[scan + 2];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}
{
final short k = k0[kx + 3];
final int value = m_src[scan + 3];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}
{
final short k = k1[kx + 0];
final int value = m_src[scan + m_srcYstride];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}
{
final short k = k1[kx + 1];
final int value = m_src[scan + m_srcYstride + 1];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}
{
final short k = k1[kx + 2];
final int value = m_src[scan + m_srcYstride + 2];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}
{
final short k = k1[kx + 3];
final int value = m_src[scan + m_srcYstride + 3];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}
{
final short k = k2[kx + 0];
final int value = m_src[scan + m_srcYstride + m_srcYstride];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}
{
final short k = k2[kx + 1];
final int value = m_src[scan + m_srcYstride + m_srcYstride + 1];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}
{
final short k = k2[kx + 2];
final int value = m_src[scan + m_srcYstride + m_srcYstride + 2];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}
{
final short k = k2[kx + 3];
final int value = m_src[scan + m_srcYstride + m_srcYstride + 3];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}
{
final short k = k3[kx + 0];
final int value = m_src[scan + 3 * m_srcYstride];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}
{
final short k = k3[kx + 1];
final int value = m_src[scan + 3 * m_srcYstride + 1];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}
{
final short k = k3[kx + 2];
final int value = m_src[scan + 3 * m_srcYstride + 2];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}
{
final short k = k3[kx + 3];
final int value = m_src[scan + 3 * m_srcYstride + 3];
final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
R += alphaMult * ((value & s_maskR) >>> s_offR);
G += alphaMult * ((value & s_maskG) >>> s_offG);
B += alphaMult * ((value & s_maskB) >>> s_offB);
A += alphaMult;
}

m_R+= R;
m_G+= G;
m_B+= B;
m_A+= A;
}
}

Tom Anderson · Aug 15, 2009

I pulled out a few bits of code and patched it together so a test case
does the same kind of math as the real deal. (Don't be a style freak -
it's demo fragment squished to fit in a Usenet posting.)

Machine:
MacOS X 10.5.8

Darwin desktop.pixelmemory.us 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul
15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386

---------------
Java 1.5

Version:
java version "1.5.0_20"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02-308)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)

Options: -d64 -mx2G

Output:
Millis: 5979.04
Millis: 5984.168
Millis: 5987.027
Millis: 5979.992
Millis: 5953.974

---------------
Java 1.6

Version:
java version "1.6.0_15"
Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)

Options: -d64 -mx2G

Output:
Millis: 6943.407
Millis: 6937.324
Millis: 6917.524
Millis: 6931.662
Millis: 6917.065

Zoinks. I'd suggest filing a bug report with Sun - that is a substantial
performance regression.

tom

Robert Klemme · Aug 15, 2009

Zoinks. I'd suggest filing a bug report with Sun - that is a substantial
performance regression.

Maybe Sun optimized the JVM in other areas (e.g. IO bandwidth and
throughput) which are more important for the average server application
today. Maybe the optimization kicks in later. I am not convinced that
what we have seen constitutes a bug.

Kind regards

robert

Tom Anderson · Aug 15, 2009

Maybe Sun optimized the JVM in other areas (e.g. IO bandwidth and throughput)
which are more important for the average server application today.

Doubtless. But they've still slowed down integer array maths of the kind
you're doing.

Maybe the optimization kicks in later.

Perhaps - you could try that, right? Just change the top loop to a
while(true), fire the test off and leave it running overnight.

I am not convinced that what we have seen constitutes a bug.

It's not a bug, no, but it *is* a performance regression. How about
telling Sun and letting them decide if it's a problem?

That said, they must know about it - i doubt they make a release without
doing fairly thorough benchmarking.

Besides, if you report it, they may suggest a fix to make it faster under
1.6 - a VM flag, or code patterns to avoid or something.

tom

John B. Matthews · Aug 15, 2009

Tom Anderson said:
Zoinks. I'd suggest filing a bug report with Sun - that is a substantial
performance regression.

Tom: Sun won't care, but Apple would. Maybe. Unofficially.

Kevin: How'd you get 1.5.0_20 & 1.6.0_15? I though I was patched up!

I get a little closer race:

$ make clean run
rm -f *.class
javac Benchmark.java
java -version
java version "1.5.0_19"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02-304)
Java HotSpot(TM) Client VM (build 1.5.0_19-137, mixed mode, sharing)
java -d64 -mx2G Benchmark
Millis: 5970.23
Millis: 5962.926
Millis: 5961.268
Millis: 5964.333
Millis: 5964.136

$ make clean run
rm -f *.class
javac Benchmark.java
java -version
java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03-211)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02-83, mixed mode)
java -d64 -mx2G Benchmark
Millis: 6308.654
Millis: 6299.484
Millis: 6298.147
Millis: 6296.435
Millis: 6300.94

Lew · Aug 15, 2009

My results on a 1GB RAM 64-bit Linux installation:

$ /opt/java/jdk1.5.0_20/bin/java -version
java version "1.5.0_20"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_20-b02, mixed mode)

$ /opt/java/jdk1.6.0_16/bin/java -version
java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)

$ cd ~/projects/testit/src/
~/projects/testit/src

$ /opt/java/jdk1.5.0_20/bin/javac -d ../build/classes/ testit/Benchmark.java

$ /opt/java/jdk1.5.0_20/bin/java -cp ../build/classes/ testit.Benchmark
Millis: 11994.786
Millis: 11238.298
Millis: 11274.685
Millis: 11760.477
Millis: 12342.302

$ /opt/java/jdk1.6.0_16/bin/java -cp ../build/classes/ testit.Benchmark
Millis: 12608.171015
Millis: 11576.036953
Millis: 11428.551717
Millis: 12272.149676
Millis: 12045.456421

$ /opt/java/jdk1.6.0_16/bin/javac -d ../build/classes/ testit/Benchmark.java

$ /opt/java/jdk1.6.0_16/bin/java -cp ../build/classes/ testit.Benchmark
Millis: 14205.535646
Millis: 11449.014148
Millis: 11421.515997
Millis: 12346.804192
Millis: 11967.196742

$

As you can see, not nearly the extent of difference - the timing ranges overlap.

Robert Klemme · Aug 16, 2009

Doubtless. But they've still slowed down integer array maths of the kind
you're doing.

No - I'm not the OP.

Perhaps - you could try that, right? Just change the top loop to a
while(true), fire the test off and leave it running overnight.

It's not a bug, no, but it *is* a performance regression. How about
telling Sun and letting them decide if it's a problem?

I am not even convinced yet that there *is* a performance regression
(see for example Lew's results).

That said, they must know about it - i doubt they make a release without
doing fairly thorough benchmarking.
Exactly.

Besides, if you report it, they may suggest a fix to make it faster
under 1.6 - a VM flag, or code patterns to avoid or something.

Certainly I won't report it because I don't have a problem.

Cheers

robert

Tom Anderson · Aug 16, 2009

No - I'm not the OP.

Whoops! Apologies.

tom

Arne Vajhøj · Aug 16, 2009

Tom said:
Zoinks. I'd suggest filing a bug report with Sun - that is a substantial
performance regression.

There are two good reasons for why that will not accomplish anything:
* the Java on MacOS X is Apple's responsibility not SUN's (the fact
that Apple is buying Java technology from SUN as the basis for
their Java does not mean that SUN has a responsibility for
Apple's end users)
* neither SUN nor Apple has claimed that there will not exist code
where the newer version perform worse than the old version (I don't
think any compiler vendor has done that - it happens frequently for
C compilers)

Arne

Arne VajhÃ¸j · Aug 16, 2009

Lew said:
My results on a 1GB RAM 64-bit Linux installation:

$ /opt/java/jdk1.5.0_20/bin/java -version
java version "1.5.0_20"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_20-b02, mixed mode)

$ /opt/java/jdk1.6.0_16/bin/java -version
java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)

$ cd ~/projects/testit/src/
~/projects/testit/src

$ /opt/java/jdk1.5.0_20/bin/javac -d ../build/classes/
testit/Benchmark.java

$ /opt/java/jdk1.5.0_20/bin/java -cp ../build/classes/ testit.Benchmark
Millis: 11994.786
Millis: 11238.298
Millis: 11274.685
Millis: 11760.477
Millis: 12342.302

$ /opt/java/jdk1.6.0_16/bin/java -cp ../build/classes/ testit.Benchmark
Millis: 12608.171015
Millis: 11576.036953
Millis: 11428.551717
Millis: 12272.149676
Millis: 12045.456421

$ /opt/java/jdk1.6.0_16/bin/javac -d ../build/classes/
testit/Benchmark.java

$ /opt/java/jdk1.6.0_16/bin/java -cp ../build/classes/ testit.Benchmark
Millis: 14205.535646
Millis: 11449.014148
Millis: 11421.515997
Millis: 12346.804192
Millis: 11967.196742

$

As you can see, not nearly the extent of difference - the timing ranges
overlap.

You are assuming that the Apple is reusing the SUN JIT compiler
unchanged ?

(MacOS X Java is from Apple, this Linux Java is from SUN)

Arne

Arne Vajhøj · Aug 16, 2009

Kevin said:
I pulled out a few bits of code and patched it together so a test case
does the same kind of math as the real deal. (Don't be a style freak -
it's demo fragment squished to fit in a Usenet posting.)

Machine:
MacOS X 10.5.8

Darwin desktop.pixelmemory.us 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul
15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386

---------------
Java 1.5

Version:
java version "1.5.0_20"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02-308)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)

Options: -d64 -mx2G

Output:
Millis: 5979.04
Millis: 5984.168
Millis: 5987.027
Millis: 5979.992
Millis: 5953.974

---------------
Java 1.6

Version:
java version "1.6.0_15"
Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)

Options: -d64 -mx2G

Output:
Millis: 6943.407
Millis: 6937.324
Millis: 6917.524
Millis: 6931.662
Millis: 6917.065

I made some experimentation on Windows.

It seems as if the order fastest to slowest with
no special parameter except -server is:
IBM 1.5
SUN 1.7 beta
SUN 1.5
IBM 1.6
SUN 1.6
BEA 1.5
BEA 1.6

My assumption is still that different code (or different
options) may result in a completely different result.

Arne

Lew · Aug 17, 2009

Arne said:
You are assuming that the Apple is reusing the SUN JIT compiler
unchanged ?

No.

java run version in linux	7	Nov 22, 2010
java 1.4 fail with java.lang.UnsatisfiedLinkError but 1.5, 1.6 ok	8	Jan 7, 2009
change existing java from 1.5 to 1.6 on solaris 10 sparc	5	Aug 22, 2008
Is java 1.6 compatible with java 1.5?	13	Aug 10, 2009
A HashMap isn't storing all of the entries.	14	Dec 5, 2012
Java is slower than C++!	46	Jun 9, 2007
Custom Minecraft launcher client error; I think regarding java	0	Sep 7, 2022
What is java.vm.version 14.1-b02?	13	Aug 13, 2009

Is Java 1.6 number crunching slower than 1.5?

Kevin McMurtrie

Roedy Green

Kevin McMurtrie

Roedy Green

markspace

Arne Vajhøj

Arne Vajhøj

Daniel Sjöblom

Kevin McMurtrie

Tom Anderson

Robert Klemme

Tom Anderson

John B. Matthews

Lew

Robert Klemme

Tom Anderson

Arne Vajhøj

Arne VajhÃ¸j

Arne Vajhøj

Lew

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads