For performance, write it in C - Part 3, Source code now available

I

Isaac Gouy

Jon said:
Other people have posted not entirely dissimilar timings for Java and I'm
doing exactly the same thing on x86_64 where is runs much faster.

Has anyone managed to get Java close to OCaml or C?

Peter Hickman's measurements were displayed on his website before you
asked for the Java source code - they show the current Java program
taking ~2.15x the time of the current OCaml program, on his machine.
 
J

Jon Harrop

Jon said:
Here's a C++ implementation:

C++ 0.652s 0.608s g++ -O3 -Wall

For Java2 from the site and using Sun's Java 1.5 instead of GIJ 1.4.2
(d'oh!), I get:

Java 0.543s 0.711s javac

That's a much more respectable running time although the compile time is
comparatively poor.
 
C

Chad Perrin

It's 800MHz. I ran each program 4 times and took the average.

I guess that means we should be testing everything on the laptop I'm
currently using as a remote terminal for my server. It's a 366MHz
dinosaur.

. . except I'm doing real work right now, and don't need to be bogging
stuff down.
 
P

Peter Hickman

Time for another update.

Isaac Gouy provided a Java implementation based on mine (ie still pre
computes the tables in Perl) that brought the times down to sub 9 seconds.

real 0m8.966s
user 0m5.815s
sys 0m1.488s

But the big news is that William James' revision of his previous Ocaml
version is now the fastest.

real 0m3.660s
user 0m1.958s
sys 0m1.421s

The source code for both are available on the web site for you to examine.
 
P

Peter Hickman

Thanks for that fix. We now have an even faster Ocaml version

real 0m2.260s
user 0m1.867s
sys 0m0.221s
 
W

William James

Peter said:
Time for another update.

Isaac Gouy provided a Java implementation based on mine (ie still pre
computes the tables in Perl) that brought the times down to sub 9 seconds.

real 0m8.966s
user 0m5.815s
sys 0m1.488s

But the big news is that William James' revision of his previous Ocaml
version is now the fastest.

real 0m3.660s
user 0m1.958s
sys 0m1.421s

The source code for both are available on the web site for you to examine.

I stole code and ideas from Jon Harrop, so this should be called
the James/Harrop version.
 
W

William James

William said:
I stole code and ideas from Jon Harrop, so this should be called
the James/Harrop version.

I just noticed that your site doesn't have my last version (the 4th).
Here it is again:

(* Thanks to Jon Harrup for code and ideas. *)
(* compile with:
ocamlopt -unsafe -inline 100 latin-squares.ml -o latin-squares.exe
*)

(* permutation code by Eric C. Cooper *)
let rec distribute elt = function
(hd :: tl) as list -> (elt :: list) ::
(List.map (fun x -> hd :: x) (distribute elt tl))
| [] -> [ [elt] ]
let rec permute = function
x :: rest -> List.flatten (List.map (distribute x) (permute rest))
| [] -> [ [] ]

let list = [ 1; 2; 3; 4; 5 ]
let size = List.length list

let perms = Array.of_list (permute list)
let n = Array.length perms

(* Boolean array used to determine if one row is
compatible with another. *)
let compatible = Array.make_matrix n n true ;;
Array.iteri (fun x ex ->
Array.iteri (fun y ey ->
compatible.(x).(y) <- List.for_all2 (<>) ex ey) perms ) perms

let join list = String.concat "" (List.map string_of_int list)
let output_strings = Array.map join perms

(* For speed, create a string that's the length of the lines
that we'll print; the :'s that aren't needed as separators
will later be overwritten. *)
let output_line = String.make (size*(size+1)-1) ':' ^ "\n"
let board = Array.make size 0

(* A recursive function. *)
let rec add_a_row row =
if row = size then
( for i=0 to size-1 do
String.blit
output_strings.(board.(i)) 0 (* source *)
output_line (i*(size+1)) (* dest *)
size
done;
print_string output_line
)
else
for latest = 0 to n - 1 do
let compatible_slice = compatible.(latest) in
(* Create a changeable thing (variable). *)
let prev_row = ref 0 in
(* The ! below fetches the variable's value. *)
while !prev_row < row &&
compatible_slice.(board.(!prev_row)) do
incr prev_row
done;
if !prev_row = row then
( board.(row) <- latest ; add_a_row (row + 1) )
done
;;

add_a_row 0
 
P

Peter Hickman

When I get home I'll time it and add it.

I'm beginning to wonder if my computer is slow enough to give meaningful
timings as the Ocaml versions seem to be pushing for sub second timings :)
 
W

William James

Peter said:
When I get home I'll time it and add it.

I'm beginning to wonder if my computer is slow enough to give meaningful
timings as the Ocaml versions seem to be pushing for sub second timings :)

In order to be fair to gcc, I'd like to point out one thing.
Although I used your algorithm, I tweaked it in an attempt to make
it faster. So I think that you should take my add_a_row function
and translate it back to C.
 
P

Peter Hickman

I changed the use of the bool data type (which is really an int) to char
and shrunk the size of the program and sped it up.

real 0m1.885s
user 0m1.649s
sys 0m0.168s

I think this is because the top most loop, when processing the top row,
basically just zooms down a given row of the compared array. With four
chars being packed in where one bool would be more of the tests will be
using data that is available in the CPUs cache.
 
P

Peter Hickman

Peter said:
I changed the use of the bool data type (which is really an int) to
char and shrunk the size of the program and sped it up.

real 0m1.885s
user 0m1.649s
sys 0m0.168s

I think this is because the top most loop, when processing the top
row, basically just zooms down a given row of the compared array. With
four chars being packed in where one bool would be more of the tests
will be using data that is available in the CPUs cache.
Sorry but I have to retract these timings. They were from the tests to
find the best gcc compiler settings and do not include the Perl pre
calculation for each run so they cannot be compared to any of the other
timings. Sorry about that.
 
I

Isaac Gouy

Peter said:
Time for another update.

Isaac Gouy provided a Java implementation based on mine (ie still pre
computes the tables in Perl) that brought the times down to sub 9 seconds.

real 0m8.966s
user 0m5.815s
sys 0m1.488s

sub 9 seconds?
7.3s
"real" is elapsed time, which includes all those other processes that
grabbed CPU after you gave the time command.
 
P

Peter Hickman

Isaac said:
sub 9 seconds?
7.3s
"real" is elapsed time, which includes all those other processes that
grabbed CPU after you gave the time command.
This is a good point but in all the posts where I have mentioned the
time taken I have used the real time (which is the best case from 10
runs - except for the first Perl version) all the way back to the first
and fourth versions in Perl (473 and 12 minutes). However it does not
affect the ordering except to push my improved C version ahead of
William James' revised Ocaml version by just 0.029 of a second.

This late in the proceedings it would only confuse the issue to start
saying that the first Perl version ran in 251 minutes when 473 minutes
has been mentioned several times. Besides now that we are getting into
such short times for the Ocaml and C versions the difference between
real and user + sys is sub second and it is becoming harder to say that
one is significantly faster then the other. One more run of William
James' program and it might make up that 0.029 of a second difference.

Perhaps stepping up to a 6 x 6 grid would allow more meaningful timings?
 
A

ara.t.howard

Perhaps stepping up to a 6 x 6 grid would allow more meaningful timings?

showing how it performs on a 2, 4, and 8 grid would show how it scaled...

-a
 
P

Peter Hickman

showing how it performs on a 2, 4, and 8 grid would show how it scaled...

-a
I'll test the 6 x 6 in Ocaml, C and Java but I'm not sure that I will be
going for the best of 10 runs on this. 2 and 4 are simply too fast to
measure accurately except for the slowest code. Someone calculated that
some values higher than 6 would take months to run and 8 might be a
little too far into that ballpark.

I do have a very noisy and hot AMD 64 4400 with 2 Gb ram running Ubuntu.
Perhaps I will set this as the new benchmark machine.
 
W

William James

Peter said:
Sorry but I have to retract these timings. They were from the tests to
find the best gcc compiler settings and do not include the Perl pre
calculation for each run so they cannot be compared to any of the other
timings. Sorry about that.

I wish that you would dump Perl and do everything in C.
 
J

Jon Harrop

William said:
I wish that you would dump Perl and do everything in C.

I posted an OCaml implementation that used an imperative, array-based
permuter. We could just port that back to C (or find the site with the C
code that I translated).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top