Jon said:
In absolute terms, the fastest program is still the OCaml running on my
machine (0.7s) even though my machine is significantly slower than yours.
Also, we've seen benchmark results from several different machines and the
one where Java is slightly faster than OCaml is an outlier. In most cases,
Java is ~2-3x slower, in at least one case it was 5x slower.
Also, you haven't normalized by the load average, which is 1.0 for OCaml but
~1.4 for Java here.
I'm not sure what you mean here, when restricting to one core, both
programs used about 100% and I don't think any other processes(or at
least very few) were scheduled on that core.
Incidentally, I suspect memory bandwidth is the problem on most machines. I
think your machine where Java does comparatively well has extremely high
memory bandwidth. My reason for speculating this is that the Java program
is using 100x as much memory as the OCaml.
The memory bandwidth might be one reason but memory usage was not a
constraint or a measurement point earlier in the discussion. And 100
times the memory? I'm running a longer test with heapsize of 1.5MB, the
JVM as runtime system take a bit of memory. The resident size of the
Java process is 11MB while the size of the Ocaml is 1MB. I'm measuring
the memory use of the Java process, and during the first 10 minutes no
data has survived to the old heap, a few object temporarily survived a
short while in eden while almost everything seems to die within the
nurseries (this is of course expected behavior).
We should try it rather than speculate.
I will try to make a cache for all "Val", since we have so few different
values a simple switch should solve that, and then we might be able to
measure how much time a "Val" would take to create/destroy. The problem
is that the we don't stress the nurseries with "Val" objects, so they
might behave differently.
Also, the improvement in performance noted by Steve that was attributed to
Hotspot seems to be nothing more than GC interactions that periodically
slow the Java code down. I did 1,000 runs of the benchmark and, when the GC
kicks in, the Java code slows down by almost 2x and the load average
increases to 2.0, i.e. Java is maxing out both of my CPUs and is still
several times slower here.
For me GC does not "kick in" at all, I have no full GCs at all (so far).
I have 800 minor GCs a second, but you would have to fine tune the
measuring to be able to see the slowdown while they run.
Exactly, yes. There is no point in avoiding the allocation in OCaml code
because it is extremely fast. In contrast, you can often get huge
performance improvements in Java my manually rewriting code to amortize
allocations.
My experience is the opposite, for small, short lived objects it has
been better to create-use-forget than use smart code that reuses
objects. However, once the objects grow larger than a few KB, then they
are directly promoted to the old heap, and for those objects it might be
better to handle them differently. But this could vary between
applications. I normally work with web service application with a lot of
xml processing and generation.
Yes. Note how the fastest Java implementation achieves a huge performance
boost by manually working around the allocation of temporaries, which is
very slow in Java.
I'd like to know if the fastest and second fastest implementations of the
Java ray tracer also show a smaller performance gap on your machine (the
one where Java is slightly faster than OCaml).
This is my output:
Java 1
real 0m12.624s
user 0m12.264s
sys 0m0.812s
Ocaml 1
real 1m13.171s
user 1m13.112s
sys 0m0.045s
Java 2
real 0m8.911s
user 0m9.186s
sys 0m0.177s
Ocaml 2
real 0m51.302s
user 0m51.261s
sys 0m0.032s
Java 3
real 0m8.012s
user 0m8.304s
sys 0m0.163s
Ocaml 3
real 0m8.200s
user 0m8.181s
sys 0m0.020s
Java 4
real 0m7.962s
user 0m8.255s
sys 0m0.150s
Ocaml 4
real 0m7.402s
user 0m7.378s
sys 0m0.019s
Java 5
real 0m5.557s
user 0m5.837s
sys 0m0.113s
Ocaml 5
real 0m7.350s
user 0m7.342s
sys 0m0.008s
As far as I can see, the slowest Java code takes about 2.5 times longer
to complete than the fastest code.
Oh and why isn't the Ocaml programs saving the image to a file but
instead writes to standard out?
//Roger Lindsjö