Scott said:
I created code based on some of the changes, and found that 1.5 had very
different performance characteristics on one of our quad processor
2.8GHz linux machines.
I suspect that all you are seeing is that, because of the way you have written
the benchmark, the multiply routine is never properly optimised. I don't know
whether the server JVM can do on-stack replacement to update running code with
a faster version, but it seems clear that in this case it does not do so. So
what's happening is that the JVM is all fired up and eager to optimise the
inner loop, as soon as the loops have finished, but your program then exits, so
it doesn't bother...
I tried the same code on a 1.4Gz WinXP laptop, and -- like you -- saw no
important difference between -client and -server, but when I recoded it to pull
the matrix multiply out into a separate method, /and/ called that several
times, the optimisation kicked in quite nicely.
Here is the output from a -client run (with some trivia deleted for clarity):
==========================================
[java -cp . -XX:+PrintCompilation -client MatrixTestDirty2]
1 b java.lang.String::charAt (33 bytes)
2 b java.lang.Math::max (11 bytes)
creating matrices a and b
3 b java.util.Random::next (47 bytes)
4 b java.lang.Math::random (16 bytes)
1% b MatrixTestDirty2::main @ 19 (182 bytes)
multiplying them
2% b MatrixTestDirty2::multiply @ 11 (125 bytes)
Result: 10965
5 b MatrixTestDirty2::multiply (125 bytes)
Result: 10886
Result: 10956
Result: 10966
Result: 10885
Result: 10956
Result: 10956
Result: 10956
Result: 10875
Result: 10966
==========================================
which stabilises at around 11 seconds. Notice how multiply() gets compiled
twice, and that the second time only happens /after/ it has returned.
(BTW, I have no idea why this laptop is able to perform the benchmark so much
faster than your machine -- a factor of 4 seems very odd to me...).
Now, running -server:
==========================================
[java -cp . -XX:+PrintCompilation -server MatrixTestDirty2]
creating matrices a and b
1 MatrixTestDirty2::main (182 bytes)
2 java.lang.Math::random (16 bytes)
3* sun.misc.Unsafe::compareAndSwapLong (0 bytes)
1% MatrixTestDirty2::main @ 19 (182 bytes)
multiplying them
2% MatrixTestDirty2::multiply @ 11 (125 bytes)
Result: 8272
4 MatrixTestDirty2::multiply (125 bytes)
Result: 5538
Result: 5608
Result: 5528
Result: 5588
Result: 5528
Result: 5508
Result: 5608
Result: 5518
Result: 5508
==========================================
which stabilises at around half the execution time compared to -client. Notice
how the execution time plummets after the second compilation of multiply().
I'll append the code for completeness, though it's only a trivial modification
to your own code.
--- chris
===========================================
public class MatrixTestDirty2
{
private static final int N = 800;
private static final int M = 800;
private static int a[] = new int[N * M];
private static int b[] = new int[N * M];
private static int c[] = new int[N * M];
public static void main(String args[])
{
System.out.println("creating matrices a and b");
/* Initialize the two matrices. */
for (int i = 0; i < N; ++i)
{
for (int j = 0; j < M; ++j)
{
a[i * M + j] = (int)((Math.random() - 0.5) * 10);
b[i * M + j] = (int)((Math.random() - 0.5) * 10);
}
}
/* Multiply the matrices. */
System.out.println("multiplying them");
for (int i = 0; i < 10; i++)
{
long startMillis = System.currentTimeMillis();
multiply();
long endMillis = System.currentTimeMillis();
long total = endMillis - startMillis;
System.out.println("An element: " + c[25 * M + 25]);
System.out.println("Result: " + total);
}
}
private static void multiply()
{
for (int i = 0; i < N; ++i)
{
for (int j = 0; j < M; ++j)
{
c[i * M + j] = 0;
}
}
for (int i = 0; i < N; ++i)
{
for (int j = 0; j < M; ++j)
{
for (int k = 0; k < M; ++k)
{
c[i * M + j] += a[i * M + k] * b[k * M + j];
}
}
}
}
}