For performance, write it in C - Part 3, Source code now available

W

William James

William said:
Here's an OCaml version that runs in about 1.5 seconds when
output is redirected to a file on my faster computer (3.2 GHz).
It uses the same algorithm as the C program.
The C version takes 1.961 seconds when writing to /dev/null on
the o.p.'s computer. Since this is my first OCaml program, I'm
certain an expert could improve it.

An expert did suggest replacing a couple of lists with arrays.
This should be a bit faster.

(* compile with:
ocamlopt -unsafe -inline 100 latin-squares.ml -o latin-squares.exe *)

(* permutation code by Eric C. Cooper *)
let rec distribute elt = function
| (hd :: tl) as list -> (elt :: list) ::
(List.map (fun x -> hd :: x) (distribute elt tl))
| [] -> [ [elt] ]
let rec permute = function
| x :: rest -> List.flatten (List.map (distribute x) (permute rest))
| [] -> [ [] ] ;;

let list = [ 1; 2; 3; 4; 5 ]
let size = List.length list

let perms = Array.of_list (permute list)
let n = Array.length perms
let incompat = Array.make_matrix n n true ;;

for x = 0 to n - 1 do
for y = 0 to n - 1 do
incompat.(x).(y) <-
List.exists2 ( = ) perms.(x) perms.(y)
done
done;;

let join list = String.concat "" (List.map string_of_int list)
let output_strings = Array.map join perms ;;
let board = ( Array.make size 0 ) ;;

(* recursive function *)
let rec add_a_row row =
if row = size then
print_endline
( String.concat ":"
(List.map (fun i -> output_strings.(i) )
(Array.to_list board)) )
else
for latest = 0 to n - 1 do
let prev_row = ref 0 in
while (!prev_row < row) &&
(not incompat.(latest).(board.(!prev_row))) do
incr prev_row
done;
if !prev_row = row then
( board.(row) <- latest ; add_a_row (row + 1) )
done
;;

add_a_row 0 ;;
 
S

sross

M. Edward (Ed) Borasky said:
I don't recall which of the "big four" open source CLs have virtual
machines, if any. IIRC the "standard Lisp benchmarks" are pretty much a
dead heat on GCL vs. CMUCL, with Clisp coming in third. When SBCL "grows
up", it will probably be as fast as GCL or CMUCL. My recollection is
that the CMUCL *compiler* is on a par with C in terms of generating x86
machine code.

Actually SBCL is a fork from CMUCL (with a focus on maintanability) and
as such uses the same compiler.
Compiler improvements are often ported between the two implementations
and when it comes to actual performance there is not much between
the two.
 
M

M. Edward (Ed) Borasky

William said:
Here's an OCaml version that runs in about 1.5 seconds when
output is redirected to a file on my faster computer (3.2 GHz).
It uses the same algorithm as the C program.
The C version takes 1.961 seconds when writing to /dev/null on
the o.p.'s computer. Since this is my first OCaml program, I'm
certain an expert could improve it.


An expert did suggest replacing a couple of lists with arrays.
This should be a bit faster.
*This* expert suggests comparing performance between the C version and
the Ocaml version on the *same* machine! :)
 
J

Jon Harrop

M. Edward (Ed) Borasky said:
*This* expert suggests comparing performance between the C version and
the Ocaml version on the *same* machine! :)

On my machine (Athlon X2 4400+, Debian Linux, gcc 4.0.4, OCaml 3.09.2, Sun
JDK 1.5):

Compile Run
C 0.137s 0.386s gcc -O3 -Wall
OCaml 0.171s 0.668s ocamlopt
OCaml 0.161s 0.626s ocamlopt -inline 100 -unsafe
OCaml2 0.165s 0.415s ocamlopt
OCaml2 0.165s 0.401s ocamlopt -unsafe
Java 1.565s 1.688s javac

Some niggles:

1. A lot of precalculation has been done in the C and Java, to the extent
that we're breaking Java compilers (always a bad sign).

2. Java is nothing like as slow as the OP claimed.

3. OCaml is only 4% slower than C and only 10% slower with bounds checking.

4. Whether Ruby is fast enough or not, you should be programming in OCaml
and not in Ruby or C. ;-)

Here's my OCaml2 (based upon William's):

let rec fact n = if n=0 then 1 else n*fact(n-1)

let size = 5
let n = fact size

(* Permutation generator *)
let p = Array.make size 0
let xx = Array.init size (fun i -> if i<size-1 then 0 else -1)
let gen_perm() =
let i = ref (size - 1) in
while !i > 0 && xx.(!i) = !i do
xx.(!i) <- 0;
decr i
done;
if !i = 0 then 1 else begin
xx.(!i) <- xx.(!i) + 1;
p.(0) <- 1;
for i=0 to size - 1 do
p.(i) <- p.(i - xx.(i));
p.(i - xx.(i)) <- i+1
done;
0
end

(* Permutations *)
let perms = Array.init n (fun _ -> ignore(gen_perm()); Array.copy p)

let incompat =
let aux x y =
Array.iteri (fun i px -> if px = perms.(y).(i) then raise Exit) perms
(x);
false in
Array.init n (fun x -> Array.init n (fun y -> try aux x y with Exit ->
true))

let join list = String.concat "" (Array.to_list (Array.map string_of_int
list))
let output_strings = Array.map join perms
let board = Array.make size 0

let op = String.make (size*(size+1)-1) ':'

let rec add_a_row row =
if row = size then begin
for i=0 to size-1 do
String.blit output_strings.(board.(i)) 0 op (i*(size+1)) size
done;
print_string op;
print_string "\n"
end else
for latest = 0 to n - 1 do
let prev_row = ref 0 in
let incompat = incompat.(latest) in
while !prev_row < row && not incompat.(board.(!prev_row)) do
incr prev_row
done;
if !prev_row = row then begin
board.(row) <- latest;
add_a_row (row + 1)
end
done

let () = add_a_row 0
 
P

Peter Hickman

I have run the first Ocaml version on the same box as all the other
timings have come from and it sits between the C version with default
compiler settings and the C version optimised for speed. The web page
has been updated to include the Ocaml source and Kristof Bastiaensen's
faster Ruby version. Plus a simple table so that you can view all the
results.

I will take the revised Ocaml version and time it and check the output
tonight.
 
P

Peter Hickman

I only have this on my box.

[peterhickman]$ ocaml -version
The Objective Caml toplevel, version 3.09.2

Is this good enough for the version you wrote?
 
P

Peter Hickman

Charles said:
I'd be interested in seeing all of these run without IO as well...that
was
the big bottleneck for Java.

Can we just clear something up here. The Latin squares mini project was
not plucked out of the air just to be a poster child for my first post.
It was part of a project that I am working on and as such the program
has to produce some output! It does not make any sense to optimise the
program in any language if we are not going to produce any output.
Otherwise I would go with this heavily optimised C version:

[peterhickman]$ cat latin.c
int
main(int argc, char *argv[])
{
return 0;
}

This is an application, it has to produce output. Cease this willy
waving otherwise you are saying "For performance, remove the output".
 
C

Chad Perrin

4. Whether Ruby is fast enough or not, you should be programming in OCaml
and not in Ruby or C. ;-)

Actually, OCaml is one of the languages in which I've been meaning to
get competent at some point in the hopefully-near future (for some
definition of "near"). I find that I pick up a language faster if
there's a good community for it whose means of communication suits the
way I think, though. What sort of online OCaml community resources are
available? Perl's biggest strength, in terms of interactive community,
seems (for my tastes) to be the PerlMonks site, and this list fills the
same need for Ruby. What comparable community interaction mechanism
could I use for OCaml?

I'm often disappointed by the options for a given language. For
instance, I recently (earlier this year) added another language to my
list of loves: the UCBLogo implementation of the Logo language (more a
"Lisp without parentheses" than any other Logo of which I'm aware, as it
provides macro support). It has no online community at all, except for
a bunch of half-baked educators who want to use Turtle Graphics to teach
elementary and middle school kids that computers aren't scary, which is
definitely not the community I want for my own purposes. My issue with
the Python community is far less drastic -- it actually has one of the
better programming language online communities, all things considered,
of course -- but the fact that the cultural rigidity of the Python
community seems so pervasive drives me away (as does that
fork-in-the-eye sensation I get when reading Python source). Et cetera.
Both Ruby and Perl are stand-out examples of excellence for online
communities, and while I don't need the perfection of either
necessarily, I hope OCaml provides something close.

So. Does it?

Wow, that was a lot of rambling for that question. Just to keep it
on-topic:

Thanks for being a good community, guys.
 
J

Jon Harrop

Peter said:
I only have this on my box.

[peterhickman]$ ocaml -version
The Objective Caml toplevel, version 3.09.2

Is this good enough for the version you wrote?

You probably also have:

$ ocamlopt -v
The Objective Caml native-code compiler, version 3.09.2
Standard library directory: /usr/lib/ocaml/3.09.2

In which case you just do:

$ ocamlopt latin.ml -o latin
$ time ./latin >/dev/null
 
P

Peter Hickman

Jon said:
Yeah. I didn't bother doing that. ;-)
Not wanting to name and shame but there was one submission (off list)
that was faster for completely the wrong reasons. Besides I get a warm
fuzzy feeling when the results match because the pain of checking which
program is actually wrong is too great. As my Perl, Java and C versions
all use the same Perl to pre compute their values they all produced the
same results, consistent yes but are they correct? I'm pretty confident
now, especially now that I have independently written programs produce
the same results.

The next stage, which is to stop it reporting rotational and flipped
versions of a grid, is going to be quite a grind to test.

Any views on Practical Ocaml by Joshua B. Smith, does it look like it
might be a good book to learn from?
 
K

Kristof Bastiaensen

Here's how it goes on my box:

[Latin]$ time ruby latin2.rb > r5

real 0m16.283s
user 0m15.380s
sys 0m0.498s

Which means that your computer is about as crap as mine :)

Yes. I was really suprised to see that your timings where longer than
mine :) But I am happy with the speed of my computer, though emacs
startup is a bit slow... :)
I will add it
to the pages and make it a download. I think I need a table summarising
the timings.

I just realised that it is better to move the permutations on the
row string (the ones that use String#tr) outside the row permutations,
to avoid recalculating the rows each time. The only method I changed is
print_solution. The program runs almost twice as fast now! (0m7.8s on my
computer).

Regards,
Kristof

----------------- latin.rb, version 2 -----------------------

require 'permutation'

$size = (ARGV.shift || 5).to_i
MaxVal = $size-1

RowPermutations = Permutation.new($size).map{|p| p.value}
BaseStr = (2..$size).to_a.join
StrPermutations = Permutation.new($size-1).map{|p| p.project(BaseStr)}

StartColumns = (1..MaxVal).to_a
def init_columns(el)
a = StartColumns.dup
a.delete_at(el-1)
return a
end

def insert(sqr, num, row, columns)
insert(sqr, num, row+1, columns) if (row == num)
columns.each do |col|
next if sqr[row][col] != ?.
sqr[row][col] = num + ?1
if (row == MaxVal)
insert(sqr, num+1, 1, init_columns(num+1))
elsif (num == MaxVal && row == MaxVal - 1)
print_solution(sqr)
else
insert(sqr, num, row+1, columns - [col])
end
sqr[row][col] = ?.
end
end

def print_solution(sqr)
StrPermutations.each do |sp|
newsqr = sqr.map { |r| r.tr(BaseStr, sp) }
RowPermutations.each do |rp|
rp.each do |r|
print newsqr[r]
print ":"
end
puts
end
end
end

$square = [("1" + BaseStr)] +
Array.new($size-1) { |i| (i+2).to_s + "." * ($size - 1) }

insert($square, 0, 1, StartColumns)
 
K

Kristof Bastiaensen

Not wanting to name and shame but there was one submission (off list)
that was faster for completely the wrong reasons. Besides I get a warm
fuzzy feeling when the results match because the pain of checking which
program is actually wrong is too great. As my Perl, Java and C versions
all use the same Perl to pre compute their values they all produced the
same results, consistent yes but are they correct? I'm pretty confident
now, especially now that I have independently written programs produce
the same results.

<snip>

I used the following tests:

"ruby latin.rb | wc -l" to test if the number of squares matches the
number that is written in the wikipedia article.
Then "ruby latin.rb | sort | uniq | wc -l" shows if all the squares that
are output by the program are different.
All that remains to be done is to check that each latin square is
actually a latin square. That wouldn't be so much work, but I just looked
at some of the output, and trusted that the rest of the squares are
correct too :)

Regards,
Kristof
 
P

Peter Hickman

The thing that was bothering me was "have I missed any?" Sure the
results I had were correct but I was unsure if they were complete. For
example the Perl permutation module and the Ruby one could perhaps share
some libpermute.so that I was unaware of and therefore harbour the same
error - this is not the case however. Or even be both developed from the
same source. Paranoia I know but many libraries and modules are based on
existing code rather than developed from scratch so bring subtle bugs
with them. Like the one recently found in the binary search, see
http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html.

But given that there is no relationship between the Perl, Ruby and Ocaml
versions I am sure that there will be no surprises.
 
W

wneumann

Chad said:
I find that I pick up a language faster if
there's a good community for it whose means of communication suits the
way I think, though. What sort of online OCaml community resources are
available?

Primarily, the online resources come from one of the two mailing lists,
OCaml-Beginners[1] for introductory to intermediate questions, and the
main OCaml list[2] for intermediate to advanced questions, or questions
on the OCaml internals and whatnot. There are a few other places where
info moves, like Richard Jones' cocan wiki [3], but at the moment
things are primarily handled through the lists (though it might be
about time for something like perlmonks to arrive as the community is
growing).

[1] http://groups.yahoo.com/group/ocaml_beginners/
[2] http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
[3] http://wiki.cocan.org/
 
W

wneumann

Peter said:
Any views on Practical Ocaml by Joshua B. Smith, does it look like it
might be a good book to learn from?

Hard to say, as it won't be available for another three months, and I
can't even find a sample table of contents...

However, in the meantime, some of the best resources are freely
available. In addition to parts I, III, and IV of the OCaml manual, I
recommend anyone starting with OCaml to begin with Jason Hickey's book
[1], and then look at the translation of the older French O'Reilly book
[2] (note however, that the book is a bit out of date, but this only
crops up as a real issue a few times -- feel free to hit the
ocaml-beginners mailing list [3] with questions). Richard Jones also
has a pretty nice tutorial available online [4]. Once you get going,
there are more specific resources for tools like OCamllex, ocamlyacc,
and camlp4.

[1] http://files.metaprl.org/doc/ocaml-book.pdf
[2] http://caml.inria.fr/pub/docs/oreilly-book/
[3] http://groups.yahoo.com/group/ocaml_beginners/
[4] http://www.ocaml-tutorial.org/
 
I

Isaac Gouy

Peter said:
Charles said:
I'd be interested in seeing all of these run without IO as well...that
was
the big bottleneck for Java.

Can we just clear something up here. The Latin squares mini project was
not plucked out of the air just to be a poster child for my first post.
It was part of a project that I am working on and as such the program
has to produce some output! It does not make any sense to optimise the
program in any language if we are not going to produce any output.
Otherwise I would go with this heavily optimised C version:

[peterhickman]$ cat latin.c
int
main(int argc, char *argv[])
{
return 0;
}

This is an application, it has to produce output. Cease this willy
waving otherwise you are saying "For performance, remove the output".


1) Over the past 3 or 4 days, people have said what's wrong with the
way you're producing output in Java, and suggested what you should do
instead. Is it that you don't know enough Java to follow their
suggestions and fix the program?

A couple of days ago single measurements on my machine gave
0.820s user+sys, gcc -pipe -Wall -O3 -fomit-frame-pointer
-funroll-loops -march=pentium4
4.444s user+sys, sun-jdk-1.5.0.07

Simply writing Java that converts the double-byte Unicode Java
outputStrings to single-byte /once/ instead of everytime there's a
print statement, and following the usual Java pattern of wrapping
output with a BufferedOutputStream, reduces that to

1.396s user+sys, sun-jdk-1.5.0.07

Yesterday I emailed you another version of your Java generating Perl
script that fixes those print problems, the output matches the C
program output.


2) If this is an application, and it has to produce output, and it has
to produce output for something larger than 6x6, then I think you need
a better algorithm.

You've mentioned the 5x5 C program took 2.473s, and the 6x6 took 9900s
- at that rate might we expect 7x7 in 15 months?
 
P

Peter Hickman

Isaac said:
1) Over the past 3 or 4 days, people have said what's wrong with the
way you're producing output in Java, and suggested what you should do
instead. Is it that you don't know enough Java to follow their
suggestions and fix the program?
So far the only language that has come close to running as fast as the C
is the Ocaml version and since then I have found other ways to speed up
the C version. Just as I have posted the Ruby and Ocaml version if any
certified Java expert cares to show me how it should be done (*cough*
Charles O Nutter *cough*) then I will be happy to time it and check the
output and report the results. That way we can get away from all this
stupidity like the fact that I didn't use the correct naming convention.
Which we all know hugely affects performance. But remember that the
timings will be based on a program that produces output.
A couple of days ago single measurements on my machine gave
0.820s user+sys, gcc -pipe -Wall -O3 -fomit-frame-pointer
-funroll-loops -march=pentium4
4.444s user+sys, sun-jdk-1.5.0.07

Simply writing Java that converts the double-byte Unicode Java
outputStrings to single-byte /once/ instead of everytime there's a
print statement, and following the usual Java pattern of wrapping
output with a BufferedOutputStream, reduces that to

1.396s user+sys, sun-jdk-1.5.0.07

Yesterday I emailed you another version of your Java generating Perl
script that fixes those print problems, the output matches the C
program output.
If it was the email I replied to yesterday then reason that it got close
to the speed of the C version was because it didn't do any output. Once
I put the output back in it was only around 5 seconds faster than my own
Java version and around 15 seconds slower than the C version. Hardly
encouragement to drop C and pursue Java. Remember I am running the
program to get the output.
2) If this is an application, and it has to produce output, and it has
to produce output for something larger than 6x6, then I think you need
a better algorithm.

You've mentioned the 5x5 C program took 2.473s, and the 6x6 took 9900s
- at that rate might we expect 7x7 in 15 months?
Actually the real problem is the amount of disk space I need to store
the output. 6 x 6 resulted in 32Gb of output. The 7 x 7 will probably
eat my hard disk well before it completes. I'm not sure that the 1Tb I
have will be enough so I am looking at removing the rotational and
mirrored versions of the grids from the output.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top