Anyone have a 1.5 server vs client vm comparison?

S

Scott Ellsworth

Hi, all.

I am looking for a very simple class with different performance
characteristics under 1.5.0 server than under 1.5.0 client. I am hoping
for perhaps a factor of two in speed, but I will take what I can get.

I have a few such microbenchmarks that varied under 1.4.x, but Sun seems
to have fixed up the 1.5 client vm, at least for these.

(Why, you ask? Because someone here contended that there are no
differences between the two now. I wanted to measure it myself.)

Scott
 
K

Kevin McMurtrie

Scott Ellsworth said:
Hi, all.

I am looking for a very simple class with different performance
characteristics under 1.5.0 server than under 1.5.0 client. I am hoping
for perhaps a factor of two in speed, but I will take what I can get.

I have a few such microbenchmarks that varied under 1.4.x, but Sun seems
to have fixed up the 1.5 client vm, at least for these.

(Why, you ask? Because someone here contended that there are no
differences between the two now. I wanted to measure it myself.)

Scott

As far as I've seen, the only _performance_ difference is during HotSpot
compilation. The final performance is the same. Which is better
depends on whether your code has hot spots or an broad distribution of
activity. You can play with these:

-XX:CICompilerCount=n
Sets the maximum concurrent compilations. My extensive testing shows
that 1.4.2 and 1.5.0 HotSpot aren't thread safe so increasing this
number rapidly increases the risk of a HotSpot crash. Java 1.5.0 is
especially prone to crashing on transitions from high CPU load to low
CPU load because it queues up methods to be compiled during idle time.

-XX:CompileThreshold=n
Controls how many times a method is executed before compiling or
re-optimizing. Don't set this too low or you'll waste lots of memory
and CPU time compiling startup code, static initializers, and exception
handlers. Even 50000 isn't too high for very large apps.

-XX:ReservedCodeCacheSize=size
Heap size for compiled code. It's one of Sun's many brain-dead heaps in
that can't size itself correctly. Do any of them work reliably?

-XX:+PrintCompilation
Prints handy information about HotSpot compilation.

The client/server switch tweaks some other parameter defaults too. I
forget where that page is of all the secret Sun JVM options.
 
C

Chris Uppal

Scott said:
I am looking for a very simple class with different performance
characteristics under 1.5.0 server than under 1.5.0 client. I am hoping
for perhaps a factor of two in speed, but I will take what I can get.

This thread:

http://groups.google.co.uk/[email protected]&rnum=1

has a fairly good example of the optimisation in the server JVM being
significantly better than that in the client; even to the extent that hand
optimisation in the code submitted to the client can't claw back the
difference.

I'm not sure whether I still have the code I used for measuring. If you want,
I'll try to find it, but you can probably re-create it from the original
example in that thread.

-- chris
 
S

Scott Ellsworth

[request]

Thanks, Chris and Kevin. I am busily playing with the sample code, and
with the command line flags. I find I say smarter things if I have done
some experimentation on my own.

Scott
 
S

Scott Ellsworth

Chris Uppal said:
This thread:

http://groups.google.co.uk/[email protected]
&rnum=1

has a fairly good example of the optimisation in the server JVM being
significantly better than that in the client; even to the extent that hand
optimisation in the code submitted to the client can't claw back the
difference.

I created code based on some of the changes, and found that 1.5 had very
different performance characteristics on one of our quad processor
2.8GHz linux machines.

# /mnt/java/jdk15/bin/java -cp . -XX:+PrintCompilation -server
MatrixTestDirty
creating matrices a and b
1 MatrixTestDirty::main (330 bytes)
2 java.lang.Math::random (16 bytes)
3* sun.misc.Unsafe::compareAndSwapLong (0 bytes)
1% MatrixTestDirty::main @ 57 (330 bytes)
multiplying them
Var[123] = -535
Result: 46511
# /mnt/java/jdk15/bin/java -cp . -XX:+PrintCompilation -server
MatrixTestDirty
creating matrices a and b
1 MatrixTestDirty::main (330 bytes)
2 java.lang.Math::random (16 bytes)
3* sun.misc.Unsafe::compareAndSwapLong (0 bytes)
1% MatrixTestDirty::main @ 57 (330 bytes)
multiplying them
Var[123] = 173
Result: 47814
# /mnt/java/jdk15/bin/java -cp . -XX:+PrintCompilation -client
MatrixTestDirty
1 b java.lang.String::hashCode (60 bytes)
creating matrices a and b
2 b java.util.Random::next (47 bytes)
3 b java.lang.Math::random (16 bytes)
1% b MatrixTestDirty::main @ 57 (330 bytes)
multiplying them
Var[123] = -35 4 !b sun.nio.cs.UTF_8$Encoder::encodeArrayLoop (698
bytes)

Result: 46789
# /mnt/java/jdk15/bin/java -cp . -XX:+PrintCompilation -client
MatrixTestDirty
1 b java.lang.String::hashCode (60 bytes)
creating matrices a and b
2 b java.util.Random::next (47 bytes)
3 b java.lang.Math::random (16 bytes)
1% b MatrixTestDirty::main @ 57 (330 bytes)
multiplying them
Var[123] = -96 4 !b sun.nio.cs.UTF_8$Encoder::encodeArrayLoop (698
bytes)

Result: 46550

In other words, it appears that the server and client vms took almost
the same time on this particular problem.

Code below.

Scott

public class MatrixTestDirty
{
private static final int N = 800;
private static final int M = 800;
private static int a [] = new int [N*M];
private static int b [] = new int [N*M];
private static int c [] = new int [N*M];
public static void main (String args [])
{
System.out.println ("creating matrices a and b");
long startMillis=System.currentTimeMillis();

/* Initialize the two matrices. */
for (int i = 0; i < N; ++i) {
for (int j = 0; j < M; ++j) {

a[i*M+j] = (int) ((Math.random () - 0.5) * 10);
b[i*M+j] = (int) ((Math.random () - 0.5) * 10);
}
}

/* Multiply the matrices. */

System.out.println ("multiplying them");

for (int i = 0; i < N; ++i) {
for (int j = 0; j < M; ++j) {

c[i*M+j] = 0;
}
}

for (int i = 0; i < N; ++i) {
for (int j = 0; j < M; ++j) {
for (int k = 0; k < M; ++k) {

c[i*M+j] += a[i*M+k] * b[k*M+j];
}
}
}
long endMillis=System.currentTimeMillis();
long total=endMillis-startMillis;
System.out.println("An element: "+c[25*M+25]);
System.out.println("Result: "+total);
}
}
 
C

Chris Uppal

Scott said:
I created code based on some of the changes, and found that 1.5 had very
different performance characteristics on one of our quad processor
2.8GHz linux machines.

I suspect that all you are seeing is that, because of the way you have written
the benchmark, the multiply routine is never properly optimised. I don't know
whether the server JVM can do on-stack replacement to update running code with
a faster version, but it seems clear that in this case it does not do so. So
what's happening is that the JVM is all fired up and eager to optimise the
inner loop, as soon as the loops have finished, but your program then exits, so
it doesn't bother...

I tried the same code on a 1.4Gz WinXP laptop, and -- like you -- saw no
important difference between -client and -server, but when I recoded it to pull
the matrix multiply out into a separate method, /and/ called that several
times, the optimisation kicked in quite nicely.

Here is the output from a -client run (with some trivia deleted for clarity):

==========================================
[java -cp . -XX:+PrintCompilation -client MatrixTestDirty2]
1 b java.lang.String::charAt (33 bytes)
2 b java.lang.Math::max (11 bytes)
creating matrices a and b
3 b java.util.Random::next (47 bytes)
4 b java.lang.Math::random (16 bytes)
1% b MatrixTestDirty2::main @ 19 (182 bytes)
multiplying them
2% b MatrixTestDirty2::multiply @ 11 (125 bytes)
Result: 10965
5 b MatrixTestDirty2::multiply (125 bytes)
Result: 10886
Result: 10956
Result: 10966
Result: 10885
Result: 10956
Result: 10956
Result: 10956
Result: 10875
Result: 10966
==========================================

which stabilises at around 11 seconds. Notice how multiply() gets compiled
twice, and that the second time only happens /after/ it has returned.

(BTW, I have no idea why this laptop is able to perform the benchmark so much
faster than your machine -- a factor of 4 seems very odd to me...).

Now, running -server:

==========================================
[java -cp . -XX:+PrintCompilation -server MatrixTestDirty2]
creating matrices a and b
1 MatrixTestDirty2::main (182 bytes)
2 java.lang.Math::random (16 bytes)
3* sun.misc.Unsafe::compareAndSwapLong (0 bytes)
1% MatrixTestDirty2::main @ 19 (182 bytes)
multiplying them
2% MatrixTestDirty2::multiply @ 11 (125 bytes)
Result: 8272
4 MatrixTestDirty2::multiply (125 bytes)
Result: 5538
Result: 5608
Result: 5528
Result: 5588
Result: 5528
Result: 5508
Result: 5608
Result: 5518
Result: 5508
==========================================

which stabilises at around half the execution time compared to -client. Notice
how the execution time plummets after the second compilation of multiply().

I'll append the code for completeness, though it's only a trivial modification
to your own code.

--- chris


===========================================
public class MatrixTestDirty2
{
private static final int N = 800;
private static final int M = 800;

private static int a[] = new int[N * M];
private static int b[] = new int[N * M];
private static int c[] = new int[N * M];

public static void main(String args[])
{
System.out.println("creating matrices a and b");

/* Initialize the two matrices. */
for (int i = 0; i < N; ++i)
{
for (int j = 0; j < M; ++j)
{

a[i * M + j] = (int)((Math.random() - 0.5) * 10);
b[i * M + j] = (int)((Math.random() - 0.5) * 10);
}
}

/* Multiply the matrices. */
System.out.println("multiplying them");
for (int i = 0; i < 10; i++)
{
long startMillis = System.currentTimeMillis();
multiply();
long endMillis = System.currentTimeMillis();
long total = endMillis - startMillis;
System.out.println("An element: " + c[25 * M + 25]);
System.out.println("Result: " + total);
}
}

private static void multiply()
{
for (int i = 0; i < N; ++i)
{
for (int j = 0; j < M; ++j)
{
c[i * M + j] = 0;
}
}

for (int i = 0; i < N; ++i)
{
for (int j = 0; j < M; ++j)
{
for (int k = 0; k < M; ++k)
{
c[i * M + j] += a[i * M + k] * b[k * M + j];
}
}
}
}
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,434
Messages
2,571,685
Members
48,796
Latest member
Greg L.

Latest Threads

Top