Optimise my ray tracer

R

Remon van Vliet

It's on that web page.

Just downloaded your app, changed a few lines of code and it's now exactly
twice as fast (-server went from 34.718s to 16.183s), at 512x512, 9 levels
deep, ss 4. I changed it so that the add/sub/scale methods have a version
that reuse a single temporary Vec instance (for things like add(a, sub(b,
c)), the result Vec of the sub has a short lifespan and can be reused). I'm
aware this isnt "neatly OOP" and "shouldnt be necessary" by the way, but i
never claimed Java is perfect for the job, just good enough :). It does
prove object creation is indeed one of the big bottlenecks. The other
languages you mention dont do it faster, just less (it's an obvious
optimization for a native compiler i would think). Btw, i ran it within my
IDE so it's probably a bit faster from the command line.
 
J

Jon Harrop

Remon said:
Just downloaded your app, changed a few lines of code and it's now exactly
twice as fast (-server went from 34.718s to 16.183s), at 512x512, 9 levels
deep, ss 4. I changed it so that the add/sub/scale methods have a version
that reuse a single temporary Vec instance (for things like add(a, sub(b,
c)), the result Vec of the sub has a short lifespan and can be reused).
I'm aware this isnt "neatly OOP" and "shouldnt be necessary" by the way,
but i never claimed Java is perfect for the job, just good enough :). It
does prove object creation is indeed one of the big bottlenecks. The other
languages you mention dont do it faster, just less (it's an obvious
optimization for a native compiler i would think). Btw, i ran it within my
IDE so it's probably a bit faster from the command line.

Wow, fantastic thanks. :)

That is a neat in-between of the current approach and C-style inlining of
all vector operations. It probably isn't much more verbose.

Can I have the code?
 
R

Remon van Vliet

Jon Harrop said:
Wow, fantastic thanks. :)

That is a neat in-between of the current approach and C-style inlining of
all vector operations. It probably isn't much more verbose.
It is a little, few extra methods..
Can I have the code?
But ofcourse sir...i did it at work during my break so i dont have it here,
but the changes were so trivial i can redo em here as well. I'll mail it..
 
J

Jon Harrop

Jon said:
I've written a mini ray tracer for the computer language shootout. The
original version was written in OCaml which I ported to C++:

http://www.ffconsultancy.com/free/ray_tracer/comparison.html

I have since ported the program to several other languages, including
Java. Currently, the Mlton-compiled SML, OCaml, C++ and Fortran are the
fastest, and Java trails a long way behind. I am concerned that this is
because I am a much better OCaml/C++ than Java programmer so I'm asking
for advice here.

For anyone who is interested, I'll give a summary of this thread.

My Java implementation of the ray tracer is slow because I represented 3D
vectors as objects and wrote arithmetic functions over these vectors in a
functional style, which leads to intensive allocation and garbage
collection.

This can be optimised away but only by sacrificing two positive aspects of
Java: garbage collection and functional programming style. The garbage
collector is circumvented by maintaining you own "pool" of objects which
are to be reused. In the case of the ray tracer, it is possible to use a
single temporary 3D vector for this purpose. Functional programming style
(e.g. writing vector addition as a function which takes two vectors and
returns a new vector) is sacrificed by resorting to imperative style. This
involves repeatedly altering a temporary variable in order to build up the
desired result.

Two other points are worth noting. Firstly, the equivalent C++ is very
efficient as the overhead of creating an object is relatively low in C++
compared to Java. Secondly, the natural representation of a 3D vector in
OCaml is a record type. However, the OCaml can be written in a similar
style to the Java, constantly creating and destroying objects. This makes
the OCaml program ~8x slower.

Given that much better performance can be obtained from other garbage
collected languages, it is a shame that comparable performance can only be
obtained in Java by circumventing the garbage collector.
 
R

Remon van Vliet

Jon Harrop said:
My ray tracers are on this page:

http://www.ffconsultancy.com/free/ray_tracer/languages.html


My straightforward C implementation is almost as slow as Java on this
test.
Well that's quite odd as well then...hm..
Ok, I was completely wrong there. I just reimplemented the OCaml ray
tracers
using objects instead of records for 3D vectors and the time taken went
from 3.5s up to 26.4s!

No idea why...
Object maintenance is just way more costly than records/tuples. The latter
is little more than a glorified 2d array.
I'm under the impression that there are no decent alternatives in Java: no
tuples, no records, no variant types and certainly nothing more
sophisticated.
This is true, it's primary types or objects, there's no real middle ground.
But in almost all cases you can design things in such a way that the object
overhead is minimal. I showed you my source that was a few times faster than
the original, but you can even go further and, for example, implement 3d
vector stacks. In case you're unfamiliar with those, it's a very uncommon
way to reduce object overhead, you'd do something like :

stack.push(0, 1, 1); // push a vector on the stack
stack.push(3, 2, 2); // push another one
stack.normalize(); // normale last vector we pushed
double d = stack.dot(); // get dot product

as you can see, no object creation is necessary at all both in this snippet
and the actual stack code. Alternatively, you'd have to do :

Vec v1 = new Vec(0, 1, 1); // new object
Vec v2 = new Vec(3, 2, 2); // new object
Vec v3 = v2.normalize(); // new object
double d = Vec.dot(v1, v3);// finally!

Another quite elegant property of the stack is that it doesnt care how many
vector a method needs as parameters, and that it wouldnt even care if you
use a single scalar or a vector :

stack.push(1, 1, 1);
stack.push(5);
stack.scale();

==

stack.push(1, 1, 1);
stack.push(5, 5, 5);
stack.scale();

This is a more efficient way to handle vectors in languages where object
managment slows things down. Obviously it does come at the cost of being a
fair bit less natural to use, especially if the language has operator
overloading, you'd go from :

a = v1 + v2

to

push(v1); push(v2);
a = add();

Anyway, food for thought perhaps?
Well, Java beats OCaml when the OCaml is made to use objects. This is
probably another reason why OCaml programmers rarely use OO.
Fair enough, but OO has a few big advantages, so it's not something most
people are willing to not use for some extra speed.
I really think they should. Although I wouldn't have written a book on it
if
I didn't think so. :)
AHA!, you wrote a book, i was wondering why you were so happy with OCaml :p
Well it is a nice language, i'll just be happy you didnt write a book called
"GW-BASIC for Scientists"..
I'm willing to accept that this benchmark hits weak points of Java. It
hits
weak points of SML/NJ as well.
Finally ;)...and like i showed you, there are some workarounds to most
bottlenecks, all of which at the cost of some code clarity.
I just tried it again and I get 8.3s with -client and 7.8s with -server.
That's only 6% faster. Maybe if I did a longer run...

Just did it again and got 34.7s -> 29.8s. That's 14% faster, which is more
like it.
Perhaps it's also related to the OS, i run on good ol' hardly working
Windows...i'm lazy like that.
All of the implementations use a linked list.
Okay, JCF is still pretty slow in my experience, but it shouldnt matter too
much.
I'm sorry, they have been public the whole time. I gave the link in the
original post that started this thread but I've updated it with the page
at
the top of this post.
I know, my fault, i lost the first few posts where the links are. As you
know i found them yesterday and sent you the modded code.
I have a hunch that Java will be as good as C at handling float arrays
because it doesn't require GC and object creation. Unfortunately I don't
have many applications that require this, even as a computational
scientist. :)
Well, if you use vector stacks you would, or if you move to the realm of
global illumination you would (think particle arrays for photon mapping).
And it's also about simple loops, math etc.

What i was considering is make a program that raytraces a sphereflake on the
fly in such a way that the flake does not have to be pre-generated, but
instead it's sphere hierarchy is dynamically traversed during the
raytracing. I can imagine that especially in computational science not all
data can be generated in memory before calculations (such as raytracing) can
be done, some data sets are simply too large and can be generated on the fly
when needed. The only issue may be that my math may not be up for the task
;)

Remon van Vliet
 
R

Roedy Green

Given that much better performance can be obtained from other garbage
collected languages, it is a shame that comparable performance can only be
obtained in Java by circumventing the garbage collector.

see also http://mindprod.com/jgloss/nativecompiler.html

--
Bush crime family lost/embezzled $3 trillion from Pentagon.
Complicit Bush-friendly media keeps mum. Rumsfeld confesses on video.
http://www.infowars.com/articles/us/mckinney_grills_rumsfeld.htm

Canadian Mind Products, Roedy Green.
See http://mindprod.com/iraq.html photos of Bush's war crimes
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top