Symbolic benchmark

C

Chris Dollin

Steve Wampler wrote:
one. I didn't bother with parsing an
input string because it looks like people were timing
just the time to walk the parse tree and simplify it.

I've been hesitant to post it because it feels like cheating
(and the code isn't very OO - I actually tend to think in
a different language for these puzzles and then cast the
result into Java).
Why so fast? Because the simplified string is built up as the parse
is constructed. So the simplification loop just outputs the result!
I'll grant that this isn't in the spirit of things, but I couldn't see
where it was prohibited :) After all, if you're allowed to simplify
while building the parse tree, this isn't much more of a stretch...

I thought of that and decided it wasn't in the spirit of the test; it
does show that without the context the problem originally came from,
one doesn't know if it's a realistic optimisation or not, and it shows
how algorithmic changes can beat mere [micro] optimisation. The other
trick I thought of, almost but not quite a similar complete cheat, is
to give each node a slot to hold its simplified form and store it the
first time it gets computed. In this case, that would mean that the
first pass through the loop would do all the computation and the remaining
passes would be trivial ...

[I know Jon wanted immutable nodes, but these /would/ be immutable nodes;
there's no external visible change to the behaviour except performance.
Also, given that Java doesn't even pretend to be functional, disallowing
assignments could be seen as bias ... I do think it's sensible to make
the expression nodes /functionally/ immutable.]
 
S

Steve Wampler

Chris said:
The other
trick I thought of, almost but not quite a similar complete cheat, is
to give each node a slot to hold its simplified form and store it the
first time it gets computed. In this case, that would mean that the
first pass through the loop would do all the computation and the remaining
passes would be trivial ...

Yes, that's essentially what my original (unpublished) solution (which includes
the parsing of the string inside every iteration of the inner timing loop) does. I
wrote it with a string transformation approach instead of thinking of it
as a parse tree approach and used caching to avoid reparsing any
previously parsed substring. Since every string (including the
input!) is a substring of itself, this means that it is slow on the first
iteration and reduces to a cache lookup on iterations 2-10,000,000
for this 'benchmark'.

So, the times, where the input string is passed to the simplifier on
each iteration are roughly 0.12 ms per 10,000,000 iterations. The
parsing code is really a quick and (very) dirty hack, so I decided
not to put it out.

I'm still debating with myself about whether or not saving the
simplified string is a cheat or not - it's just a representation of
the parse tree *at that point*, so if you're allowed to simplify
while building the parse tree, why not save its representation
as well? After all, without it, the challenge really reduces to
how fast you can recurse, concatenate strings, and convert big
integers to strings - not really do symbolic math manipulations.

Incidently, I noticed I had written

public Expr(int i) {...

where it should be:

public Expr(BigInteger i)

instead. (And the hand-built parse tree would have to use "new BigInteger(...)" instead
of int literals). Doesn't affect the timings enough to notice.

Also, the code could be cleaned up...and made more OO. Then it would be pretty much the
same as your solution, of course.
 
C

Chris Dollin

Steve said:
Chris Dollin wrote:

(stuff about caching, and at the end of Steve's response he wrote:)
Also, the code could be cleaned up...and made more OO. Then it would be
pretty much the same as your solution, of course.

Could easily be better. I don't like using `instanceof` -- it doesn't
seem to me to be in the spirit of an OO language. Type decisions should
be made by polymorphism. But, as I said, my attempt at using polymorphism
via multiple-dispatch was slower than my instanceof code. Perhaps I gave
up too early ...
 
R

Roger Lindsjö

I have done some more test, and used the same Ocaml compiler (3.09.3)
and the same Java (1.6 update 3). The times posted are averages over a
few iterations. The processes were locked to one cpu to not allow the gc
running on a separate cpu, but this didn't make any noticeable
difference. Both machines runs late (but not the same) GNU/Linux kernels
(Fedora Core and Fedora systems).

Machine 1 is a 3.2GHz Pentium with 2 cores (not hyperthreading). A 64
bit machine (were they called Pentium D?). 2GB memory.
Java: 5.123 s
Ocaml: 1.142 s

Machine 2 is a 2.4GHz Core 2 Quad also with 2GB memory.
Java: 1.074 s
Ocaml: 1.515 s

The compilers must be very different in how they utilize the cpu
architecture. Sitting on machine 1 (not my primary computer) I'd say
Ocaml is faster, but sitting on machine 2 (my primary machine ;-) ) I'd
say Java is faster.

Perhaps Java is better at utilizing the large cache on machine 2? I
don't know.

//Roger Lindsjö
 
S

Steve Wampler

Roger said:
The compilers must be very different in how they utilize the cpu
architecture. Sitting on machine 1 (not my primary computer) I'd say
Ocaml is faster, but sitting on machine 2 (my primary machine ;-) ) I'd
say Java is faster.

Thanks for testing this - I've been trying to figure out why Jon and I
have been getting such different results. This may well explain it.
 
L

Lew

Steve said:
Thanks for testing this - I've been trying to figure out why Jon and I
have been getting such different results. This may well explain it.

This supports the points of those who say, "Watch out for benchmarks," and,
"Java can be as fast as C++", and. "Java is slower that C++", and, "It depends".
 
J

Jon Harrop

Steve said:
Why so fast? Because the simplified string is built up as the parse
is constructed. So the simplification loop just outputs the result!
I'll grant that this isn't in the spirit of things, but I couldn't see
where it was prohibited :) After all, if you're allowed to simplify
while building the parse tree, this isn't much more of a stretch...

To be honest, I'm not really sure how this relates to the original challenge
or to my OCaml. Specifically, I haven't used any strings or done any
parsing.

The previous Java implementations weren't benchmarking parsing, were they?
 
J

Jon Harrop

Chris said:
Could easily be better. I don't like using `instanceof` -- it doesn't
seem to me to be in the spirit of an OO language. Type decisions should
be made by polymorphism. But, as I said, my attempt at using polymorphism
via multiple-dispatch was slower than my instanceof code. Perhaps I gave
up too early ...

Yes, this is exactly where OOP ceases to be useful. I believe the idiomatic
OOP solution is to implement lots of different member functions reflecting
pieces of pattern match. For example, you would have an add_int member
function to add an int and specialize the add_int member of the Int class
to create a new Int.

I might have a go but I've no idea what would be efficient in Java.
 
R

Roedy Green

high allocation rates for short-lived objects when
written in an idiomatic functional style and Java should be uniquely bad at
this.

Not at all. Java can allocate objects very quickly;, it just peels
them off a giant contiguous block. What takes the time is when you
have a large number of long lived objects since they have to be
enumerated each GC pass. Short lived objects are recovered in zero
time.
 
R

Roedy Green

Machine 2 is a 2.4GHz Core 2 Quad also with 2GB memory.
Java: 1.074 s
Ocaml: 1.515 s

If you pass me the Java code, I can run it both with java.exe and with
Jet. That ratio would give you another datapoint, compiled Java.

I have a dual core Athlon. I think there are no threads in your
implementation (are there in the Ocaml?), you would expect dual core
to buy you nothing.
 
S

Steve Wampler

Jon said:
To be honest, I'm not really sure how this relates to the original challenge
or to my OCaml. Specifically, I haven't used any strings or done any
parsing.

The previous Java implementations weren't benchmarking parsing, were they?

That's correct. I had originally assumed that the program should work
for different expressions and not have expression trees hard-coded. So
I took that to mean that some form of string input was needed so that
expressions could be parsed into the expression tree. That led me to
a string transformation solution as a first try. When I saw the C++
solution to the derivative solution you offered as an example of something
similar, I realized that there was no need to read a string and parse it.
But my earlier misinterpretation keeps me coming back to referring to the
hard-coded internal representation as a parse tree (that, and my previous
life writing compilers).

Since the simplification can be done when that hard-coded parse tree
is built, the timing loop to simplify the constructed tree actually
doesn't have to do anything to get the simplified expression at all -
it's already there. (My statement about the loop just outputting the
result isn't true, that's not needed either, after all!) Colin's solution,
for example, just walks the tree and builds a copy of it. But I'm
not convinced that's necessary - you could just return the tree without
a copy. (I went ahead and returned that tree in a string representation,
because that made it easier for me to test things, and I could do that
step quickly.)

[The version I posted had a error in the string representation of the
result of converting right associativity to left (teaches me to code
while battling a mild case of insomnia...). I found and fixed that
while cleaning up the code.]

It would probably be more interesting to include the code that
constructs the expression tree in the timing loop. You may very
well have had that in mind, but that's not what I'd seen tested.
 
S

Steve Wampler

Roedy said:
I have a dual core Athlon. I think there are no threads in your
implementation (are there in the Ocaml?), you would expect dual core
to buy you nothing.

I don't know how Jet handles the execution environment, but Sun's
JDK runs lots of administrative threads while a program is running.
I would expect some benefit from dual cores with that, but apparently
(from Roger's note) it's not much.
 
S

Steve Wampler

Steve said:
Since the simplification can be done when that hard-coded parse tree
is built, the timing loop to simplify the constructed tree actually
doesn't have to do anything to get the simplified expression at all -
it's already there.

Incidently, I think Java figured that out also. Here are the times
I get for 10 runs of 2,000,000,000:

->time java -server Simplify 2000000000 10
2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.01
2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0
2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0
2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0
2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0
2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0
2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0
2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0
2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0
2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0010
java -server Simplify 2000000000 10 0.11s user 0.02s system 102% cpu 0.127 total
->

I suspect that means it's figured out I have a loop invariant. Apparently
it has more trouble figuring that out when you actually do something.
Probably because new objects are built while 'cloning' the tree.
 
R

Roger Lindsjö

Roedy said:
If you pass me the Java code, I can run it both with java.exe and with
Jet. That ratio would give you another datapoint, compiled Java.

I have a dual core Athlon. I think there are no threads in your
implementation (are there in the Ocaml?), you would expect dual core
to buy you nothing.

No threading other than what the JVM does internally for housekeeping
such as GC. I even ran restricted to one core (using taskset) since I
was told that Ocaml was single threaded.

The code is 3 posts up, the one I was relying to.
http://groups.google.com/group/comp.lang.java.programmer/msg/b5185f597ba62103

//Roger Lindsjö
 
J

Jon Harrop

Roedy said:
Not at all. Java can allocate objects very quickly;, it just peels
them off a giant contiguous block. What takes the time is when you
have a large number of long lived objects since they have to be
enumerated each GC pass. Short lived objects are recovered in zero
time.

Then you'll be able to improve the Java implementations of these benchmarks
so that they aren't up to 5x slower than an unoptimized alternative.
 
L

Lew

Roger said:
No threading other than what the JVM does internally for housekeeping
such as GC. I even ran restricted to one core (using taskset) since I
was told that Ocaml was single threaded.

If it's true that Ocaml is single-threaded, then that kills it for a wide
range of applications for which Java is extremely suitable. Performance would
be moot, even if the numbers weren't so close.
 
D

Daniel Dyer

If it's true that Ocaml is single-threaded, then that kills it for a
wide range of applications for which Java is extremely suitable.
Performance would be moot, even if the numbers weren't so close.

http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html

I briefly looked at OCaml because I'd heard a lot of good things about.
The threading thing did put me off. It seems there are issues with its
garbage collector implementation in this respect.

Dan.
 
J

Jon Harrop

I think there are now two problems:

1. We're measuring bigint arithmetic but we don't really care about it.

2. The input should be such that the evaluation is irreducible. In
particular, the input must not be hardcoded.

We can address these problems quite easily by using machine-precision ints
(whatever they are, we'll just make sure the problem doesn't overflow them)
and doing something juicier with symbolic computation like a big derivative
or multiplying out polynomials.
 
J

Jon Harrop

Lew said:
If it's true that Ocaml is single-threaded, then that kills it for a wide
range of applications for which Java is extremely suitable. Performance
would be moot, even if the numbers weren't so close.

OCaml simply uses processes for concurrency rather than threads. In terms of
capabilities, there is no difference.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,196
Latest member
ScottChare

Latest Threads

Top