For performance, write it in C - Part 3, Source code now available

P

Peter Hickman

Isaac said:
The OP seems surprised by the order of magnitude differences between
Ruby and C performance - if he looked at regex-dna on the Computer
Language Shootout he might be surprised that the Ruby program is
relatively fast.
Come on. I would hardly be telling people to write something in C if I
thought that the improvement would only be few percentage points. It was
no surprise to me that C was faster than Ruby, neither was it a surprise
as to how much of an improvement it was. I have a habit of converting
things from Perl and Ruby to C for this very reason. Have you really
misread this whole set of threads for the past week?

The only surprise for me was Ocaml.
 
A

ara.t.howard

I guess the user I worked with would have demanded you be replaced by
someone willing to work with the requirements rather than produce a more
convenient wrong answer :)

you are right of course. at my present job this has been brought up in the
past by others. however, it's a powerful argument to assert (and then follow
through) that you can be 2-3 times more productive using the most efficient
problem solving approach available. in fact, i've gone through two rounds of
this but, instead of getting laid off, the other programers were - nothing is
as convincing to manangement and funding sources as finished, working, and
correct programs produced ahead of schedule. as it stands now our past group
of three is now just me and we are still making deadlines. so, sometimes the
wrong answer is later found to be right.

in any case i understand that there are sometimes requirements and that they
are almost always arbitrary, it's just that in the case of the test requiring
a specific random generator the requirement is bordering on arcane since the
benchmarks claim to test specific programs but this requirement makes fasta
actually test two and, from a logical standpoint, the concept is flawed since
requiring 'specific random numbers' is an oxymoron. i agree with the poster
that said the answer should have been tested to be within a tolerance to
neatly sidestep this ultimately unimportant (wrt to the overall shootout
which, unlike some others i find useful) issue.

regards.

-a
 
C

Caleb Clausen

If you're interested enough to speculate and question, please be
interested enough to click the URL that was provided and find out
something about it before you speculate and question ;-)

Here it is again
http://shootout.alioth.debian.org/gp4/benchmark.php?test=fasta&lang=all#about

All right. I'm looking at it. Based on what I'm seeing on your
website, I'll make the following statements. Correct me if I'm wrong
about any of these:

1) Your fasta benchmark is a random number (err.. DNA sequence)
generator. I feel this is a perfectly legitimate benchmark, by itself.
(Provided we undestand that this benchmark is testing the performance
of an arbitrary algorithm, not the performance that a real program
using random numbers in a given language would experience.)

2) 2 subsequent benchmarks, k-nucleotide and reverse-complement, use
the data generated by fasta. It appears that both are reading that
data from a canned file, and not calling the fasta random number
generator.

3) Therefore, Eric's statement:
I have to agree with Austin here. There are benchmarks inside that
specify "this benchmark must use the implementation of the random
number generator from benchmark X".

is incorrect. Ruby's (admittedly poor) performance on the fasta
benchmark should not be further artifically be dragging down its
(admittedly poor) performance on the subsequent 2 benchmarks.

4) Going back and looking again, I see in your reply to Eric that you
seem to be making this point. This was not appearent to me on first
reading. You could have been a little clearer.... on the other hand, I
could have done a little more digging before I posted. Still and all,
it looks like Ara came to the same misunderstanding.
 
I

Isaac Gouy

in any case i understand that there are sometimes requirements and that they
are almost always arbitrary, it's just that in the case of the test requiring
a specific random generator the requirement is bordering on arcane since the
benchmarks claim to test specific programs but this requirement makes fasta
actually test two

LOL!
I have no interest in quarreling about the definition of "program", but
following that reasoning perhaps you should have counted the non-random
sequence as a separate program, and the selection of cumulative
frequency as a separate program, and ...

and, from a logical standpoint, the concept is flawed since
requiring 'specific random numbers' is an oxymoron.

Oh, maybe that's why people talk about pseudorandom numbers.

i agree with the poster that said the answer should have been tested to be within a
tolerance to neatly sidestep this ultimately unimportant (wrt to the overall shootout
which, unlike some others i find useful) issue.

Have you actually looked at what the fasta programs are doing?!

Every year there seem to be people who miss out parts of the NYC
marathon course in an attempt to improve their race position - I find
it fascinating that people can both understand that a marathon race has
an arbitrary standard distance 26 miles 385 yards along an arbitrary
route, and still try to avoid doing that standard distance.

In contrast, it isn't surprising that programmers try to miss out parts
of a computation - it's what we do. The most unnatural thing about
writing a program for the Computer Language Shootout is having to write
a program constrained to an arbitrary distance and arbitrary route -
there's some latitude but not as much as we're accustomed to.
 
I

Isaac Gouy

Peter said:
Come on. I would hardly be telling people to write something in C if I
thought that the improvement would only be few percentage points. It was
no surprise to me that C was faster than Ruby, neither was it a surprise
as to how much of an improvement it was...

You're right - I was completely wrong to say you were surprised.

Did you think others would be surprised?
You seemed to think they were overlooking the "the degree of speed up"
- why would you think that?
 
E

Eric Hodel

3) Therefore, Eric's statement:


is incorrect. Ruby's (admittedly poor) performance on the fasta
benchmark should not be further artifically be dragging down its
(admittedly poor) performance on the subsequent 2 benchmarks.

My assertion is correct. The fasta benchmark says:
Each program should:
[...]
* generate DNA sequences, by weighted random selection from the
alphabets (using the pseudo-random number generator from the random
benchmark)

And the implemenation:

http://shootout.alioth.debian.org/gp4/benchmark.php?
test=fasta&lang=ruby&id=0

Contains the random number generator from the random benchmark:

http://shootout.alioth.debian.org/old/benchmark.php?
test=random&lang=ruby&id=0
 
I

Isaac Gouy

Caleb Clausen wrote:
-snip-
All right. I'm looking at it. Based on what I'm seeing on your
website, I'll make the following statements.

Thank you for taking the trouble.

-snip-
(Provided we undestand that this benchmark is testing the performance
of an arbitrary algorithm, not the performance that a real program
using random numbers in a given language would experience.)

Maybe you could explain a little more - if someone was using one of
those programs to generate Fasta files for program testing and
algorithm development in their lab, would that make it a real program
rather than an arbitrary algorithm? What do you say the difference is
between a real program and an arbitrary algorithm?
 
A

ara.t.howard

In contrast, it isn't surprising that programmers try to miss out parts of a
computation - it's what we do. The most unnatural thing about writing a
program for the Computer Language Shootout is having to write a program
constrained to an arbitrary distance and arbitrary route - there's some
latitude but not as much as we're accustomed to.

then it is a brick moving contest - the kind i'm happy to lose ;-)

-a
 
C

Caleb Clausen

Maybe you could explain a little more - if someone was using one of
those programs to generate Fasta files for program testing and
algorithm development in their lab, would that make it a real program
rather than an arbitrary algorithm? What do you say the difference is
between a real program and an arbitrary algorithm?

My understanding of the goal of the fasta program(s) (aside from being
a benchmark) is to generate a random dna sequence, possibly with some
biases about which symbols are more likely. If I were really using
such a program in my work, and speed in that program was important to
me, (and I was writing it in Ruby,) I would not use the version you
give on your website. I would make use of Ruby's already existing
random number generator. (Among other things, ruby uses (AFAIK) the
well-regarded Mersenne twister algorithm, whereas you have specified a
linear congruential generator, which is considered inadequate for
hard-core random number applications.)

For benchmarking purposes, what you have already is just fine,
provided you understand the limitations. I understand the need to have
all the languages implement the same algorithm so that you can have an
apple-to-apples comparison. I can also sympathise with your desire to
keep the testing simpler by requiring a specific output. It's just
that this isn't the way a serious programmer would actually use random
numbers in Ruby (and probably most other languages).

What you are really testing is not the speed at which ruby generates
random numbers, as they would be used in the real world, but the speed
at which it can execute the linear congruential algorithm, which
ultimately depends on the speed at which it can execute integer math
operations.
 
C

Caleb Clausen

My assertion is correct. The fasta benchmark says:
Each program should:
[...]
* generate DNA sequences, by weighted random selection from the
alphabets (using the pseudo-random number generator from the random
benchmark)

And the implemenation:

http://shootout.alioth.debian.org/gp4/benchmark.php?
test=fasta&lang=ruby&id=0

Contains the random number generator from the random benchmark:

http://shootout.alioth.debian.org/old/benchmark.php?
test=random&lang=ruby&id=0

I did not see this test named 'random' when I looked through the
alioth site earlier today... and I still don't see any links to it
from that site. (I did find it by following your link, however.)

I notice that your second link contains the word 'old'. Perhaps this
benchmark has been withdrawn? If so, I submit that your criticism was
true in the past, but no longer is.
 
I

Isaac Gouy

Caleb said:
My understanding of the goal of the fasta program(s) (aside from being
a benchmark) is to generate a random dna sequence, possibly with some
biases about which symbols are more likely. If I were really using
such a program in my work, and speed in that program was important to
me, (and I was writing it in Ruby,) I would not use the version you
give on your website. I would make use of Ruby's already existing
random number generator. (Among other things, ruby uses (AFAIK) the
well-regarded Mersenne twister algorithm, whereas you have specified a
linear congruential generator, which is considered inadequate for
hard-core random number applications.)

I don't think you gave a direct answer to - What do you say the
difference is between a real program and an arbitrary algorithm? - so
please correct what I get wrong.

You're saying if this was a "real program" it would use whichever
library function the language provides for generating random numbers -
if that's all there is to the distinction, when there are no
appropriate library functions are "real programs" and "arbitrary
algorithms" the same thing?

For benchmarking purposes, what you have already is just fine,
provided you understand the limitations. I understand the need to have
all the languages implement the same algorithm so that you can have an
apple-to-apples comparison. I can also sympathise with your desire to
keep the testing simpler by requiring a specific output. It's just
that this isn't the way a serious programmer would actually use random
numbers in Ruby (and probably most other languages).

What you are really testing is not the speed at which ruby generates
random numbers, as they would be used in the real world, but the speed
at which it can execute the linear congruential algorithm, which
ultimately depends on the speed at which it can execute integer math
operations.

We're not testing the speed at which Ruby generates random numbers,
period. (Are those 6 word summary descriptions that confusing?)

We're testing the speed at which it can execute a linear congruential
algorithm, do binary-chop on a tiny array, slice and build strings, and
print strings in FASTA format. (Replace the linear congruential
algorithm by calls to Ruby rand and the time will be reduced to ~400s)
 
P

Peter Hickman

Isaac said:
Did you think others would be surprised?
You seemed to think they were overlooking the "the degree of speed up"
- why would you think that?

Because when people state that performance is their priority and are
told that C will give them the best performance they start to change
their story. I realise that C is harder to program in than any scripting
language but the stated requirement was performance so anything less is,
well, less. Perhaps the problem is that people don't realise just how
much of a boost C can give you and therefore see the effort involved
being out of proportion to the expected results, this comes down to not
knowing what performance boost you can get.
 
I

Isaac Gouy

Peter said:
Because when people state that performance is their priority and are
told that C will give them the best performance they start to change
their story.

I don't know what you mean by they start to change their story.
I realise that C is harder to program in than any scripting
language but the stated requirement was performance so anything less is,
well, less. Perhaps the problem is that people don't realise just how
much of a boost C can give you and therefore see the effort involved
being out of proportion to the expected results, this comes down to not
knowing what performance boost you can get.

Make a different assumption - assume that people do realize just how
much faster C is at moving bytes, can you find another explanation?

Look at your own webpage, it talks about comparing apples to apples and
yet Kristof Bastiaensen's Ruby program uses a completely different
algorithm. Would you have thought of Kristof's algorithm if you were
coding in C?

We have to ask 2 questions
1) How do I find a fast solution?
2) How do I make that solution fast?

In the first "For performance, write it in C" post you talked about
writing in Ruby or Perl or Python and then converting the program to C.
How much effort to you spend re-writing the Ruby or Perl or Python
exploring different algorithms and re-thinking the solution? How much
effort to find a fast solution before trying to make that solution
fast?
 
J

Jacob Fugal

My assertion is correct. The fasta benchmark says:
Each program should:
[...]
* generate DNA sequences, by weighted random selection from the
alphabets (using the pseudo-random number generator from the random
benchmark)

And the implemenation:

http://shootout.alioth.debian.org/gp4/benchmark.php?
test=fasta&lang=ruby&id=0

Contains the random number generator from the random benchmark:

http://shootout.alioth.debian.org/old/benchmark.php?
test=random&lang=ruby&id=0

I did not see this test named 'random' when I looked through the
alioth site earlier today... and I still don't see any links to it
from that site. (I did find it by following your link, however.)

I notice that your second link contains the word 'old'. Perhaps this
benchmark has been withdrawn? If so, I submit that your criticism was
true in the past, but no longer is.

I agree that the presence of "/old/" (vs. "/gc4/") in the URL probably
indicates that the "random" benchmark has been removed from most
locations. There was an actual link to it, however, from the FASTA
benchmark description page. Specifically, on this page:

http://shootout.alioth.debian.org/gp4/benchmark.php?test=fasta&lang=all

At the bottom, under "about the fasta benchmark", in the second bullet
point. That's the *current* page for the FASTA benchmark, so maybe
that page just needs to be updated to remove that constraint?

Jacob Fugal
 
I

Isaac Gouy

Jacob said:
My assertion is correct. The fasta benchmark says:

Each program should:
[...]
* generate DNA sequences, by weighted random selection from the
alphabets (using the pseudo-random number generator from the random
benchmark)

And the implemenation:

http://shootout.alioth.debian.org/gp4/benchmark.php?
test=fasta&lang=ruby&id=0

Contains the random number generator from the random benchmark:

http://shootout.alioth.debian.org/old/benchmark.php?
test=random&lang=ruby&id=0

I did not see this test named 'random' when I looked through the
alioth site earlier today... and I still don't see any links to it
from that site. (I did find it by following your link, however.)

I notice that your second link contains the word 'old'. Perhaps this
benchmark has been withdrawn? If so, I submit that your criticism was
true in the past, but no longer is.

I agree that the presence of "/old/" (vs. "/gc4/") in the URL probably
indicates that the "random" benchmark has been removed from most
locations. There was an actual link to it, however, from the FASTA
benchmark description page. Specifically, on this page:

http://shootout.alioth.debian.org/gp4/benchmark.php?test=fasta&lang=all

At the bottom, under "about the fasta benchmark", in the second bullet
point. That's the *current* page for the FASTA benchmark, so maybe
that page just needs to be updated to remove that constraint?

Jacob Fugal

There was and is an actual link to it from the FASTA benchmark
description
http://shootout.alioth.debian.org/gp4/benchmark.php?test=fasta&lang=all#about

Jacob, why would we want to remove that constraint?
 
J

Jacob Fugal

Jacob said:
My assertion is correct. The fasta benchmark says:

Each program should:
[...]
* generate DNA sequences, by weighted random selection from the
alphabets (using the pseudo-random number generator from the random
benchmark)

And the implemenation:

http://shootout.alioth.debian.org/gp4/benchmark.php?
test=fasta&lang=ruby&id=0

Contains the random number generator from the random benchmark:

http://shootout.alioth.debian.org/old/benchmark.php?
test=random&lang=ruby&id=0

I did not see this test named 'random' when I looked through the
alioth site earlier today... and I still don't see any links to it
from that site. (I did find it by following your link, however.)

I notice that your second link contains the word 'old'. Perhaps this
benchmark has been withdrawn? If so, I submit that your criticism was
true in the past, but no longer is.

I agree that the presence of "/old/" (vs. "/gc4/") in the URL probably
indicates that the "random" benchmark has been removed from most
locations. There was an actual link to it, however, from the FASTA
benchmark description page. Specifically, on this page:

http://shootout.alioth.debian.org/gp4/benchmark.php?test=fasta&lang=all

At the bottom, under "about the fasta benchmark", in the second bullet
point. That's the *current* page for the FASTA benchmark, so maybe
that page just needs to be updated to remove that constraint?

Jacob Fugal

There was and is an actual link to it from the FASTA benchmark
description
http://shootout.alioth.debian.org/gp4/benchmark.php?test=fasta&lang=all#about

Jacob, why would we want to remove that constraint?

Either remove or update to point to a current (not "/old/") benchmark.
I don't care either way...

Jacob Fugal
 
W

William Grosso

This is just a note to clarify things in my mind and then ask a
few questions about the language.

Here's what I think I know:

-- The current version of Ruby is 1.8.5.
-- There is a 1.9 out there, but it's a work in progress and not released
yet.
-- There is a 2.0 (YARV) out there, but it's also a work in progress.
-- 1.8.5, 1.9, and 2.0 all support slightly different languages.
-- 1.9 is the place where most language experimentation is going on
-- 2.0 will support the same language as 1.8, but with a different underlying
implementation.
-- There is no formal language specification (and most of us are working off
the pickaxe book)

Here's what I don't know:

-- Is any of the above wrong?
-- Are there estimated release dates for 1.9 and 2.0?
-- Are there places where changes to the language between 1.8.5, 1.9, and 2.0
are listed (note: http://wiki.rubygarden.org/Ruby/page/show/Rite doesn't
really count) ?


Best,


Bill
 
G

Guest

William Grosso ([email protected])
21/8/2006 18:26:41
-- 2.0 will support the same language as 1.8, but with a different
underlying implementation.

Wrong, AFAIK.
The 1.9 (experimental) would once became 2.0 (stable).
The language WOULD be differ from 1.8, hence the major version.

V.
 
W

William Grosso

Oops. Sorry. Meant to type

-- 2.0 will support the same language as 1.9, but with a different
underlying implementation.


Bill
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top