# Optimise my ray tracer

J

#### Jon Harrop

I've written a mini ray tracer for the computer language shootout. The
original version was written in OCaml which I ported to C++:

http://www.ffconsultancy.com/free/ray_tracer/comparison.html

I have since ported the program to several other languages, including Java.
Currently, the Mlton-compiled SML, OCaml, C++ and Fortran are the fastest,
and Java trails a long way behind. I am concerned that this is because I am
a much better OCaml/C++ than Java programmer so I'm asking for advice here.

These are my main questions:

1. What major optimisations are missing from my program (e.g. in C++, I pass
vectors by reference and try to inline vector operations).

2. Where is the floating-point machine epsilon in the Java libraries?

3. How do you do infix operators in Java?

4. Am I supposed to have a static main function which instantiates a class
and invokes a member function of it in order to start the program?

Here's my Java port:

// The Great Computer Language Shootout
// http://shootout.alioth.debian.org/
// Contributed by Jon Harrop, 2005
// Compile: javac ray.java

import java.io.*;
import java.util.*;
import java.text.*;

public final class ray {
// FIXME: Where is the floating point machine epsilon available in the
// Java standard library?
double delta = 1.49012e-08, infinity = Float.POSITIVE_INFINITY,
pi = Math.PI;

// A 3D vector
class Vec {
public double x, y, z;
public Vec(double x2, double y2, double z2) { x = x2; y = y2; z = z2; }
}

Vec zero = new Vec(0, 0, 0);

// Vector arithmetic (FIXME: Use infix operators if possible)
{ return new Vec(a.x + b.x, a.y + b.y, a.z + b.z); }
Vec sub(Vec a, Vec b)
{ return new Vec(a.x - b.x, a.y - b.y, a.z - b.z); }
Vec scale(double s, Vec a) { return new Vec(s*a.x, s*a.y, s*a.z); }
double dot(Vec a, Vec b) { return a.x*b.x + a.y*b.y + a.z*b.z; }
Vec unitise(Vec a) { return scale(1 / Math.sqrt(dot(a, a)), a); }

// A semi-infinite ray
class Ray {
public Vec orig, dir;
public Ray(Vec o, Vec d) { orig = o; dir = d; }
}

// A parametric intersection point and the normal vector at the point of
// intersection
class Intersect {
public double lambda;
public Vec normal;
public Intersect(double l, Vec n) { lambda = l; normal = n; }
}

// Abstract base class representing a node in the scene tree
abstract class Scene {
abstract public Intersect intersect(Intersect i, Ray ray);
}

// Derived class representing a leaf node in the scene tree
class Sphere extends Scene {
public Vec center;

public Sphere(Vec c, double r) { center = c; radius = r; }

// Find the first intersection of the given ray with this sphere
public double ray_sphere(Ray ray) {
Vec v = sub(center, ray.orig);
double b = dot(v, ray.dir),
if (disc < 0) return infinity;
double d = Math.sqrt(disc), t2 = b + d;
if (t2 < 0) return infinity;
double t1 = b - d;
return (t1 > 0 ? t1 : t2);
}

// Accumulate the first intersection of the given ray with this sphere
public Intersect intersect(Intersect i, Ray ray) {
double l = ray_sphere(ray);
if (l >= i.lambda) return i;
Vec n = add(ray.orig, sub(scale(l, ray.dir), center));
return new Intersect(l, unitise(n));
}
}

// Derived class representing a non-leaf node in the scene tree
class Group extends Scene {
public Sphere bound;

public Group(Sphere b) {
bound = b;
// We must initialise to an empty linked list or the null object
// exception will be thrown at first use.
}

// Accumulate the first intersection of the given ray with this group
// This function is used both for primary and shadow rays.
public Intersect intersect(Intersect i, Ray ray) {
double l = bound.ray_sphere(ray);
if (l >= i.lambda) return i;
// Loop over the list of child nodes, accumulating the result.
ListIterator it = objs.listIterator(0);
while (it.hasNext()) {
Scene scene = (Scene)it.next();
i = scene.intersect(i, ray);
}
return i;
}
}

// Trace a single ray into the scene
double ray_trace(Vec light, Ray ray, Scene scene) {
Intersect i = scene.intersect(new Intersect(infinity,
new Vec(0, 0, 0)), ray);
if (i.lambda == infinity) return 0;
scale(delta, i.normal)));
double g = -dot(i.normal, light);
// If we are on the shadowed side of a sphere then don't bother casting a
// shadow ray as we know it will intersect the same sphere.
if (g <= 0) return 0.;
Ray sray = new Ray(o, sub(new Vec(0, 0, 0), light));
Intersect si =
scene.intersect(new Intersect(infinity, i.normal), sray);
return (si.lambda == infinity ? g : 0);
}

// Recursively build the scene tree
Scene create(int level, double r, Vec c) {
Sphere sphere = new Sphere(c, r);
if (level == 1) return sphere;
Group group = new Group(new Sphere(c, 3*r));
double rn = 3*r/Math.sqrt(12);
for (int dz=-1; dz<=1; dz+=2)
for (int dx=-1; dx<=1; dx+=2) {
Vec c2 = new Vec(c.x-dx*rn, c.y+rn, c.z-dz*rn);
}
return group;
}

// Build a scene and trace many rays into it, outputting a PGM image
void run(int n, int level, int ss) {
// Scene tree
Scene scene = create(level, 1, new Vec(0, -1, 0));

System.out.print("P5\n"+n+" "+n+"\n255\n");
for (int y=n-1; y>=0; --y)
for (int x=0; x<n; ++x) {
double g=0;
for (int dx=0; dx<ss; ++dx)
for (int dy=0; dy<ss; ++dy) {
// We use "dx*1." instead of "double(dx)" to save space
Vec d = new Vec(x+dx*1./ss-n/2., y+dy*1./ss-n/2., n);
Ray ray = new Ray(new Vec(0, 0, -4), unitise(d));
g += ray_trace(unitise(new Vec(-1, -3, 2)),
ray, scene);
}
System.out.print((char)(.5+255*g/(ss*ss)));
}
}

public static void main(String[] args) {
(new ray()).run(Integer.parseInt(args), 6, 4);
}
}

S

#### Skip

Jon Harrop said:
I've written a mini ray tracer for the computer language shootout. The
original version was written in OCaml which I ported to C++:

http://www.ffconsultancy.com/free/ray_tracer/comparison.html

I have since ported the program to several other languages, including Java.
Currently, the Mlton-compiled SML, OCaml, C++ and Fortran are the fastest,
and Java trails a long way behind. I am concerned that this is because I am
a much better OCaml/C++ than Java programmer so I'm asking for advice here.

The first run of this code is done in interprete-mode. If you run it a few
times, things get faster. Here is my benchmark (with n=32)

iteration #0: 474.3ms
iteration #1: 198.0ms
iteration #2: 151.1ms
iteration #3: 154.9ms
iteration #4: 153.1ms
iteration #5: 155.9ms
iteration #6: 149.5ms
iteration #7: 149.3ms

P4 2.4GHz (running at 1.8GHz)

What kinds of differences do you see for n=32 in calc-duration, comparing
Java and C++?

L

#### Lasse Reichstein Nielsen

Jon Harrop said:
1. What major optimisations are missing from my program (e.g. in C++, I pass
vectors by reference and try to inline vector operations).

I'll leave that for now (meaning: I haven't read the program yet Generally, there are several tricks that allows the compiler and
runtime to optimize, like defining variables and field as "final" if
they don't change.

links. I know that Peter Sestoft has a file on Java performance on
2. Where is the floating-point machine epsilon in the Java libraries?

I'm not sure what epsilon is to you, but you might want
Double.MIN_VALUE, the smalles positive value representable as a
double.
3. How do you do infix operators in Java?

You don't. The designers of Java decided early on that they wanted the
*syntax* to always mean the same thing, making it easier to read other
people's programs (in particular, they didn't want #define, and I guess
overriding infix operators were ruled out for the same reason).
4. Am I supposed to have a static main function which instantiates a class
and invokes a member function of it in order to start the program?

The static "main" function *does* start the program. You want to call
the "run" function from it, and since "run" is not static, you need
to create an object before calling it. However, since "run" doesn't
use that object, the "run" function could have been made static as
well. Likewise the other methods on the class "ray" (classes are

Anyway, I might have time to look at it some more later Here's my Java port:
[snip]
A link to an online file would take less room here /L

C

#### Chris Uppal

Jon said:
I've written a mini ray tracer for the computer language shootout. The
original version was written in OCaml which I ported to C++:

http://www.ffconsultancy.com/free/ray_tracer/comparison.html

You've managed to squeeze down the C++ to the point where it's almost as short
as the OCaml source ! I wouldn't have thought that was possible. I admit that
I don't like the resulting code, but I do feel that you deserve some sort of
award nevertheless ;-)

1. What major optimisations are missing from my program (e.g. in C++, I
pass vectors by reference and try to inline vector operations).

I think you have a bug -- the value that you parse out of the arguments to the
program is then used as the parameter 'n' to run() (which I think is used to
set the size of the generated image); whereas your C++ and OCaml codes'
equivalents are used as the 'level' parameter.

With that corrected, I'm finding that your Java code runs faster than your C++
code -- by a factor of nearly two.

I'm comparing the Java running on the server JVM from Sun's JDK 1.5 (on WInXP)
compared against with the C++ code compiled using all the optimisations I can
think of with either M\$ VC 6 or MS VS 2003 (which generate significantly
different code sometimes). In my limited experience, the GNU C++ compiler does
not usually produce code that is significantly faster or slower than that pair.

The server JVM (java -server <classname>) uses significantly heavier
optimisation than the client JVM (java -client <classname>), although the
client JVM is the default on most desktop systems.

Some numbers, measured on this 1.5 GHz laptop (with some sort of Intel chip
inside...), for values of your 'level' parameter over the range 1 to 10 (i.e.
from 1 to about 440K spheres)

C++ (M\$ VS 2003)
1: 3.555
2: 9.894
3: 20.32
4: 31.124
5: 41.65
6: 50.623
7: 58.575
8: 66.215
9: 73.475
10: 80.797

The times are in seconds, and were generated from a single run of the program
looping internally over the levels, hence the times do not include program
startup.

For Java (-server)
1: 3.60
2: 6.60
3: 12.88
4: 18.38
5: 22.59
6: 26.04
7: 28.93
8: 31.80
9: 44.75
10: 46.21

Again, this was generated from a single run, looping internally over the values
of 'level'. That might be significant since it means that the Hotspot
optimiser will have had more chance to work on the problem than if the java
program had been invoked separately 10 times.

Your Java code is non-idomatic in several ways, and does not really resemble
how a "normal" OO Java programmer would structure this problem. Since you
mention that you don't consider yourself to be a Java programmer, I won't go
into details. I did, though, want to mention that I tried re-jigging the code
into more a typical OO pattern, and to make more idiomatic use of Java's
features (such as static final fields), the point being that it didn't make a
significant difference to the performance.

2. Where is the floating-point machine epsilon in the Java libraries?

Are you looking for java.lang.Double.MIN_VALUE ? Or for something else ?

3. How do you do infix operators in Java?

You don't.

4. Am I supposed to have a static main function which instantiates a class
and invokes a member function of it in order to start the program?

You /have/ to start with a static main function (if you are using the standard
java loading program -- which you typically will be). If you want to write
reasonably OO programs, then you'll want to get some objects created and
talking to each other as soon as possible, rather than writting a load of
static "functions". So, yes you are.

-- chris

J

#### John M. Gamble

I'm not sure what epsilon is to you, but you might want
Double.MIN_VALUE, the smalles positive value representable as a
double.

Yes, that's commonly known as epsilon. In many other computer
languages, it's calculated in the code and used as a constant.

P

#### Patricia Shanahan

Lasse said:
I'm not sure what epsilon is to you, but you might want
Double.MIN_VALUE, the smalles positive value
representable as a double.

That is one definition of machine epsilon.

The other one I've seen is the gap between 1.0 and the
smallest number greater than 1.0. For Java doubles, it is
Math.ulp(1.0). In general, the gap between a Java double d
and the smallest double greater than d is Math.ulp(d).

Patricia

J

#### Jesper Nordenberg

Jon Harrop said:
I've written a mini ray tracer for the computer language shootout. The
original version was written in OCaml which I ported to C++:

http://www.ffconsultancy.com/free/ray_tracer/comparison.html

I have since ported the program to several other languages, including Java.
Currently, the Mlton-compiled SML, OCaml, C++ and Fortran are the fastest,
and Java trails a long way behind. I am concerned that this is because I am
a much better OCaml/C++ than Java programmer so I'm asking for advice here.

These are my main questions:

1. What major optimisations are missing from my program (e.g. in C++, I pass
vectors by reference and try to inline vector operations).

Without studying your code thoroughly I would suggest you reuse Vector
new ones. Even with the latest Hotspot JVM, creating new objects are
still way slower than reusing them. For example:

{ return new Vec(a.x + b.x, a.y + b.y, a.z + b.z); }

becomes:

Vec add(Vec a, Vec b) {
a.x += b.x;
a.y += b.y;
a.z += b.z;
return a;
}

Only create new objects when you need to. This optimization will
require some changes in the code structure and makes it slightly

Another optimization is to replace the LinkedList with an ArrayList
and use .size() and .get() when iterating the list. Iterators are
slow.

Also remember to use the -server option when running the program, it
can improve performance significantly.

One other thing to try is to use float instead of double (if the
accuracy isn't needed). The latest Hotspot uses SSE, and this change
may have some effect on SSE optimizations.

Remember that the Hotspot JVM requires a quite long warmup before
compiling code, so if the program completes in a short time, you will
never reach the performance of C++.

With these optimizations I think the performance will be similar to
the C++ version.
3. How do you do infix operators in Java?

4. Am I supposed to have a static main function which instantiates a class
and invokes a member function of it in order to start the program?

Yes, that works.

/Jesper Nordenberg

S

#### Scott Ellsworth

Jon Harrop said:
I have since ported the program to several other languages, including Java.
Currently, the Mlton-compiled SML, OCaml, C++ and Fortran are the fastest,
and Java trails a long way behind. I am concerned that this is because I am
a much better OCaml/C++ than Java programmer so I'm asking for advice here.

I strongly, strongly suggest finding a profiler. Such a beast can tell
If you are on Windows, the NetBeans folks have the JFluid profiler built
in, on some builds. On the Mac, the Shark profiler is included with the
developer tools. I have had great success with JProfiler, and the
JetBrains folks seem to really like YourKit. (The first two are free,
the second are for pay.)

As always, try to take a high level look at the profiler's results. For
example, if it tells you that one arithmetic operation is the big cost,
try to decide whether you can not do the computation, or change the
algorithm to do that task fewer times before trying to get a ++ or -- to
execute faster.

In Java, the above vague advice usually translates to looking at when
and how you are creating objects, and making sure you do not have
excessive synchronization overhead. Also, a few Sun classes are quite
lame, and worth rewriting. (If, and only if, they are taking
appreciable time...)

Then, once you know your algorithm is appropriate, tune the lines of
code that matter.

Scott

L

#### Lasse Reichstein Nielsen

Jon Harrop said:
1. What major optimisations are missing from my program (e.g. in C++, I pass
vectors by reference and try to inline vector operations).

You create a *lot* of vector objects. That means that you should make
creation as fast as possible. That includes having as few fields as
possible on the object.

All your nested classes are non-static. That means that they all have
a reference to the enclosing *object*. That reference is a field, and
it's not used, so it wastes space and therefore time.

Make all your nested classes static. I'm certain that will give
you a boost.

Fiddling with the code, I see that you only ever use scaling in
an extra scaling argument, avoiding one intermediate vector value.

Here's my Java port:

Here's a modification of it, that is (IMHO) a little prettier and
more Java-like. For a real Java program, I wouldn't have all those
classes as nested classes at all, they would be in separate files.

<URL:http://www.infimum.dk/privat/RayTracing.java>

everything final, that could be made final. The latter probably
doesn't affect performance much in the long run, but it should allow
the JVM to do some optimizations sooner.

I tried not to change the actual algorithm at all.

Hope it works /L

B

#### bugbear

Jesper said:
Another optimization is to replace the LinkedList with an ArrayList
and use .size() and .get() when iterating the list. Iterators are
slow.

You've worried me; my company has a java App
that uses Collections HEAVILY, and
(for convenience) we use Iterators exclusively.

Would you care to quantify the speed difference?

BugBear

J

#### Jon Harrop

Skip said:
iteration #0: 474.3ms
iteration #1: 198.0ms
iteration #2: 151.1ms
iteration #3: 154.9ms
iteration #4: 153.1ms
iteration #5: 155.9ms
iteration #6: 149.5ms
iteration #7: 149.3ms

P4 2.4GHz (running at 1.8GHz)

What kinds of differences do you see for n=32 in calc-duration, comparing
Java and C++?

Java:

real 0m1.053s
real 0m0.992s
real 0m1.012s

C++:

real 0m0.151s
real 0m0.114s
real 0m0.106s

J

#### Jon Harrop

Patricia said:
The other one I've seen is the gap between 1.0 and the
smallest number greater than 1.0. For Java doubles, it is
Math.ulp(1.0). In general, the gap between a Java double d
and the smallest double greater than d is Math.ulp(d).

This is the epsilon that I'm after. However, neither gcj (3.4) nor Sun J2SE
(1.4) seem to implement it?!

Google turns up only 5 pages, one is a discussion of floating point under
Java (which you seem to have posted to!), another implies that ulp is not
implemented in libgcj and the rest are useless.

J

#### Jon Harrop

Chris said:
You've managed to squeeze down the C++ to the point where it's almost as
short
as the OCaml source ! I wouldn't have thought that was possible. I admit
that I don't like the resulting code, but I do feel that you deserve some
sort of award nevertheless ;-)

Thanks. To be honest, it isn't that much different to my usual C++ coding style. I
used to space things out a lot more when using templates, but for vector
arithmetic I find one-liners better. The OCaml code is certainly not that
much different from usual style.
I think you have a bug -- the value that you parse out of the arguments to
the program is then used as the parameter 'n' to run() (which I think is
used to set the size of the generated image); whereas your C++ and OCaml
codes' equivalents are used as the 'level' parameter.

I should explain that there are a few variants of these programs. The first
one is a 222-line OCaml program which is substantially fancier (incremental
OpenGL rendering):

http://www.ffconsultancy.com/free/ray_tracer/index.html

The next is my first attempt at cutting that down for the shootout (<100LOC)
and porting it to C++:

http://www.ffconsultancy.com/free/ray_tracer/comparison.html

Finally, the implementations now on the shootout accept the resolution as a
command-line argument and use 6 levels of spheres:

http://shootout.alioth.debian.org/sandbox/benchmark.php?test=raytracer&lang=all&sort=fullcpu

I have implementations in many other languages but the shootout is being
With that corrected, I'm finding that your Java code runs faster than your
C++ code -- by a factor of nearly two.

I think you must be using the totally unoptimised C++ as Java is 6x slower
on my computer. The shootout has slightly better code:

http://shootout.alioth.debian.org/sandbox/benchmark.php?test=raytracer&lang=all&sort=fullcpu

which takes about 8s to run for n=256 and level=6 (your measurement was 50s
on a 50%-faster computer). I have better optimised code which only takes
6s.

I'd also like to do language comparisons on my AMD64. Is J2SE available for
AMD64?
Are you looking for java.lang.Double.MIN_VALUE ? Or for something else ?

I'm looking for the smallest positive double which when added to one does
not give one. Other people have said Math.ulp(1.0) but this fails on both
GCJ and Sun J2SE. I'd be amazed if the floating point epsilon were not
available in such a popular language...

Incidentally, my "delta" is the sqrt of epsilon (~4e-8).
You don't.

:-(

J

#### Jesper Nordenberg

bugbear said:
You've worried me; my company has a java App
that uses Collections HEAVILY, and
(for convenience) we use Iterators exclusively.

I wouldn't worry to much unless you have really performance critical
code (like a tight inner loop in a game or benchmark test). The
iterators probably takes a negligible amount of your total CPU time.
Would you care to quantify the speed difference?

I wrote a little benchmark (see code below) to compare iteration times
using array access, ArrayList.get() and Iterator. Here's my results on
an Athlon 2GHz using Java 1.5.0 -server:

Array: 1990 ms
ArrayList.get(): 2021 ms
ArrayList.iterator(): 2979 ms

So, it seems ArrayList.get() is about as fast as normal array access,
and using an Iterator is about 50% slower.

/Jesper Nordenberg

Benchmark code used:

import java.util.ArrayList;
import java.util.Iterator;

public class IteratorTest {
private static long getTime() {
return System.nanoTime() / 1000000L;
}

private static long time(Runnable runnable) {
long startTime = getTime(), time;
long minTime = Long.MAX_VALUE;

do {
long t1 = getTime();

for (int i = 0; i < 1000; i++)
runnable.run();

time = getTime();
minTime = Math.min(time - t1, minTime);
} while (time - startTime < 10 * 1000);

return minTime;
}

private static Integer[] a = new Integer;
private static ArrayList list = new ArrayList(a.length);
private static int sum;

private static void sum1() {
for (int i = 0; i < a.length; i++)
sum += a.intValue();
}

private static void sum2() {
for (int i = 0, s = list.size(); i < s; i++)
sum += ((Integer) list.get(i)).intValue();
}

private static void sum3() {
for (Iterator it = list.iterator(); it.hasNext() sum += ((Integer) it.next()).intValue();
}

public static void main(String[] args) {
for (int i = 0; i < a.length; i++) {
a = new Integer(i);
}

System.out.println("Array: " + time(new Runnable() {
public void run() {
sum1();
}
}) + " ms");

System.out.println("ArrayList.get(): " + time(new Runnable() {
public void run() {
sum2();
}
}) + " ms");

System.out.println("ArrayList.iterator(): " + time(new Runnable()
{
public void run() {
sum3();
}
}) + " ms");

System.out.println(sum);
}
}

P

#### Patricia Shanahan

Jon said:
This is the epsilon that I'm after. However, neither gcj (3.4) nor Sun J2SE
(1.4) seem to implement it?!

Sorry, it's a "Since 1.5", so you would need a really up to
date JDK.

Meanwhile, here is a way to calculate it directly, depending
only on very well established features:

public class Epsilon {
public static void main(String[] args) {
System.out.println(Math.ulp(1.0));
System.out.println(Math.sqrt(Math.ulp(1.0)));
double epsilon = Double.longBitsToDouble(
Double.doubleToLongBits(1.0)+1)-1;
System.out.println(epsilon);
}
}

[You can delete the ulp stuff for your testing]

The output is:

2.220446049250313E-16
1.4901161193847656E-8
2.220446049250313E-16

confirming that it gives the same answer as Math.ulp(1.0).

Patricia

P

#### Patricia Shanahan

Jon said:
I'm looking for the smallest positive double which when
added to one does not give one. Other people have said
Math.ulp(1.0) but this fails on both GCJ and Sun J2SE.
I'd be amazed if the floating point epsilon were not
available in such a popular language...

Incidentally, my "delta" is the sqrt of epsilon (~4e-8).

That seems rather large. If sqrt(epsilon) is ~4e-8 then
epsilon is ~1.6e-15.

Most systems, including all Java implementations, use IEEE
754 64 bit binary floating point for double. IEEE 64 bit has
a 52 bit stored mantissa, so the least significant bit has
value 2^-52 times the value of the unstored most signficant
bit. If the most significant bit has value 1, then the least
significant bit has value 2^-52, which matches

[If that paragraph does not make sense to you, I can rewrite
it with more background. I have no idea how much you know

Your epsilon seems to be about 7 times as big as it should be???

Patricia

J

#### Jon Harrop

Patricia said:
Your epsilon seems to be about 7 times as big as it should be???

Yes, that was from my failing memory. I meant 1.4e-8 (actually 1.49012e-08).
Sorry. G

#### googmeister

I'm looking for the smallest positive double which
when added to one does not give one.

It's a property of the IEEE 754 binary floating point
standard. The correct answer is 2^(-53) + 2^(-105).

P

#### Patricia Shanahan

It's a property of the IEEE 754 binary floating point
standard. The correct answer is 2^(-53) + 2^(-105).

Jon agreed, elsewhere in this thread, with my suggestion of
the gap between 1.0 and the smallest number greater than
1.0. That is 2^-52, the value of Math.ulp(1.0).

Googmeister's answer is slightly over half the gap between
1.0 and the smallest double greater than 1.0, so that adding
it to 1.0 will round up. It is correct for the definition
given above.

I've taken another look at it, and I now agree with the
definition above, and Googmeister's answer, but the really
important thing is how is this going to be used? What
assumptions does the program make about the meaning of epsilon?

Patricia

C

#### Chris Uppal

Jon said:
To be honest, it isn't that much different to my usual C++ coding style. I
used to space things out a lot more when using templates, but for vector
arithmetic I find one-liners better. The OCaml code is certainly not that
much different from usual style.

I can understand that if you are essentially thinking in terms of the equations
of the underling geometry, then the sheer sprawl of the usual C-family
layout(s) would feel unnatural. I can't help adding, though, that it makes it
virually unreadable for anyone else -- or at least /I/ find it very difficult

Which, btw, is one of the main reasons why I doubt if I'll follow up:
I think you must be using the totally unoptimised C++ as Java is 6x slower
on my computer. The shootout has slightly better code:
http://shootout.alioth.debian.org/sandbox/benchmark.php?test=raytracer&lang=all&sort=fullcpu

-- chris