C in Science and Engineering...

S

Seebs

The scientist is interested in *doing the calculation*. The fact that
it takes him less time to write and debug the code -- because he
*doesn't* have to fuss with manual memory management, because he
*doesn't* have to think about the nuts and bolts of string parsing,
because he *doesn't* have to think about whether to roll his own
hash/dictionary/map library or to decide which publicly available one to
use -- is far more important than the fact that he could get results 8%
faster if he wrote the code in C instead of Perl, Python, or Ruby.
Because the first time his genetic sequence analysis code crashes 36
hours into a 48-hour run because of a dangling pointer, that 8% time
savings means very little.

A couple of points:

1. 8% is not a reasonable estimate. My experience is it's usually a factor
of two or three, although that can vary widely between scripted languages.
2. Back in the day, one of the posters here, whose name I've forgotten
(Tanmoy, last name started with a B?), pointed something out, in a debate
between C and Ada users. It was to the effect of (paraphrased):

I pay for computer time, and I am paid for programming time. If
I spend twelve months writing something that will complete execution
in five months, that is better than if I spend ten months writing
something that will complete execution in six months.

Consider the case of tasks being run on supercomputers. There was a press
release a while back about some people who were building a supercomputer
cluster based on the Cell microprocessor. The *power savings* of running
the more-efficient CPU instead of a conventional CPU were in the millions
of dollars. At that point, a 10% reduction in processing time to complete
a task could be enough to pay for a couple of years' development effort...

Also, I should point out: It is not at all impossible for scripted languages
to crash due to crazy bugs. I found a beautiful and very hard to reproduce
bug in the Ruby<->PostgreSQL bindings once, the net result of which was that
you could VERY occasionally get data corruption under circumstances where
you bound a large number of variables into a query and at least a few of
them were not strings to begin with, but objects of other sorts which had a
possible string representation. You may rest assured, this was harder to
debug in Ruby than it would have been in C...

-s
 
C

Charlton Wilbur

S> 1. 8% is not a reasonable estimate. My experience is it's
S> usually a factor of two or three, although that can vary widely
S> between scripted languages.

Perl has the overhead of having to be compiled each time it is run. For
short scripts on trivial data, you might be looking at 100% to 200%
overhead. The longer the processing time, the less the overhead of
compilation matters.

My experience is that once you factor out that cost of compilation,
idiomatic Perl rarely takes more than 10% more overhead than idiomatic C.

S> Also, I should point out: It is not at all impossible for
S> scripted languages to crash due to crazy bugs. I found a
S> beautiful and very hard to reproduce bug in the Ruby<->PostgreSQL
S> bindings once, the net result of which was that you could VERY
S> occasionally get data corruption under circumstances where you
S> bound a large number of variables into a query and at least a few
S> of them were not strings to begin with, but objects of other
S> sorts which had a possible string representation. You may rest
S> assured, this was harder to debug in Ruby than it would have been
S> in C...

Of course. But how many bugs were avoided entirely by using Ruby rather
than C in the first place?

Charlton
 
C

Charlton Wilbur

M> If the file fits into memory, I would probably mmap() it. If
M> mmap() is not available, there is the possibility of malloc()ing
M> one big array, read the entire file into it, and then do the
M> parsing and processing.

Why should a scientist *care* about the details of implementation on
that level? You're just offering more potential failure cases that
result from programmer error, and generally scientists are *not*
programmers.

Charlton
 
J

jacob navia

Charlton Wilbur a écrit :
Of course. But how many bugs were avoided entirely by using Ruby rather
than C in the first place?

None.

A programming language doesn't avoid bugs. Mistakes can be done
in any programming language.

malloc/free bugs?

Use a GC in C.

Zero terminated strings problems?

Use a counted string library.

Etc.

C is as good as anything else, and remember:

Ruby is written in C.
 
J

jacob navia

Charlton Wilbur a écrit :
M> If the file fits into memory, I would probably mmap() it. If
M> mmap() is not available, there is the possibility of malloc()ing
M> one big array, read the entire file into it, and then do the
M> parsing and processing.

Why should a scientist *care* about the details of implementation on
that level? You're just offering more potential failure cases that
result from programmer error, and generally scientists are *not*
programmers.


(1) If a scientist starts programming he/she is a programmer.
(2) The allocation errors and low level accounting can be avoided
completely using a garbage collector in C. You are not forced to use
malloc/free.

You are just spreading nonsense, as always. You do not like C. OK.
It is your "choice", but please take into account the facts.
 
C

Charlton Wilbur

JN> A programming language doesn't avoid bugs. Mistakes can be done
JN> in any programming language.

JN> malloc/free bugs?

JN> Use a GC in C.

JN> Zero terminated strings problems?

JN> Use a counted string library.

What is so magical about C that means that it's preferable to use a
custom string library, a custom hash/map/dictionary library, a custom
counted string library, a custom regular expression library, and
probably a half-dozen other custom libraries than to use a language
that's better suited to the problem?

Charlton
 
C

Charlton Wilbur

JN> (2) The allocation errors and low level accounting can be
JN> avoided completely using a garbage collector in C. You are not
JN> forced to use malloc/free.

The allocation errors and low level accounting can *also* be avoided
completely by using a language with dynamic memory management. You are
not forced to use C any more than you are forced to use malloc/free.

JN> You are just spreading nonsense, as always. You do not like
JN> C. OK. It is your "choice", but please take into account the
JN> facts.

Actually, no, I like C a great deal. I just don't think it's the
perfect language for all problems and all situations, because I *do*
take the facts into account. Your view of C being the perfect universal
language is, as far as I'm concerned, the nonsense that's being spread.

Charlton
 
R

Rui Maciel

Charlton said:
Suppose I have to read an arbitrarily long list of arbitrarily long
genetic sequences from a text file. In C, I have to either allocate
more space than I will possibly need for it (and possibly find out I'm
wrong -- or, worse, that the person who wrote the code was wrong, and
the source code has gone missing),

I don't see how this can be seen as the C way of doing a list. You will be hard-pressed to find a
language/library that provides dynamic arrays that doesn't do that, either "under the hood" or
explicitly.

or I have to handle dynamic
allocation and keep track of pointers.

Linked lists are terribly basic data structures that don't pose any challenge to anyone remotely
invested in writing software.

In Perl, Python, or Ruby, I just
make a list, and the language worries about dynamic memory allocation.

And what leads you to believe that you are forced to "worry about dynamic memory allocation" with
C? If you opt to use a library that, as those interpreted languages do, pushes those details
under the hood then you can also claim that in C you don't have to "worry about" that. It's just
a matter of choosing what component you wish to use. What forces you to write everything from
scratch if you happen to write something in C?

And there are several features in the dynamic languages just like this:
the task can be accomplished in C, and probably more efficiently, but
requires a lot more fiddly bits and programmer attention.

Based on this example, this isn't quite true. If the perceived problem regarding C is the
apparent lack of generic data structures then all you need to do is pick up one of the countless C
libraries providing generic data structures. Yet, I don't see what's so frightening about writing
a linked list.

Dictionaries,
hashes, or maps - whatever they're called in your local dialect.
Regular expressions and data parsing. String manipulation.

You mean, stuff like this?

http://library.gnome.org/devel/glib/2.24/

I mean, here's a task. I'm going to give you a file of arbitrary
length. In that file will be words, separated by whitespace. I want
you to give me a list of all the words in that file, together with a
count of how many times they're used. How many lines of C code is that?

It depends, really. If the language you need to parse is a well established language then chances
are someone already wrote a parser for it. In that case, a half dozen LoC would suffice,
including the code to open the file. On the other hand, if it's some sort of custom language
which no one has any real clue about what the language's production may look like then it would
need a bit of work, whether you are using C or any other language.

In essence, it appears that your criticism is directed towards what libraries/routines are
accessible on a specific language and not the language itself.

I can accomplish it with one line of Perl. An inexperienced Perl
programmer can probably do it in less than two dozen.

Perl's conciseness isn't exactly seen as a feature. In fact, it's one of those things that is
constantly pointed out as a defect instead of a quality. People don't refer to Perl as a "write
once, read never language" for nothing.

The scientist is interested in *doing the calculation*. The fact that
it takes him less time to write and debug the code -- because he
*doesn't* have to fuss with manual memory management, because he
*doesn't* have to think about the nuts and bolts of string parsing,
because he *doesn't* have to think about whether to roll his own
hash/dictionary/map library or to decide which publicly available one to
use -- is far more important than the fact that he could get results 8%
faster if he wrote the code in C instead of Perl, Python, or Ruby.

You would have a point if we were talking about a 8% performance drop. Yet, with Perl, at best,
we are taking about a 2x performance penalty and, at worse, a 198x performance penalty. That
performance penalty isn't easily sold to anyone who needs to run demanding applications whose run
time is measured in hours instead of seconds.

http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=perl&lang2=gcc

Because the first time his genetic sequence analysis code crashes 36
hours into a 48-hour run because of a dangling pointer, that 8% time
savings means very little.

And would your reaction be if you were able to go from those 48-hour runs to some other run time
between 24h and less than 2 minutes just by picking up the right language for the job?


Rui Maciel
 
R

Rui Maciel

Charlton said:
Perl has the overhead of having to be compiled each time it is run. For
short scripts on trivial data, you might be looking at 100% to 200%
overhead. The longer the processing time, the less the overhead of
compilation matters.

My experience is that once you factor out that cost of compilation,
idiomatic Perl rarely takes more than 10% more overhead than idiomatic C.

Can you provide any objective, tangible proof of that?

Of course. But how many bugs were avoided entirely by using Ruby rather
than C in the first place?

On the other hand, how many bank robberies were avoided entirely by using C rather than Ruby?


Rui Maciel
 
R

Rui Maciel

Charlton said:
M> If the file fits into memory, I would probably mmap() it. If
M> mmap() is not available, there is the possibility of malloc()ing
M> one big array, read the entire file into it, and then do the
M> parsing and processing.

Why should a scientist *care* about the details of implementation on
that level? You're just offering more potential failure cases that
result from programmer error, and generally scientists are *not*
programmers.

Then again, scientists don't tend to be dumb slobs. Quite the opposite, actually. And they do tend
to pay attention to details. In fact, in essence that's what a scientist does for a living.


Rui Maciel
 
B

bart.c

jacob navia said:
Charlton Wilbur a écrit :

None.

A programming language doesn't avoid bugs. Mistakes can be done
in any programming language.

malloc/free bugs?

Use a GC in C.

Which takes care of half of it.
Zero terminated strings problems?

Use a counted string library.

Etc.

C is as good as anything else, and remember:

Ruby is written in C.

One program written in C, and thousands in the easy language.

Versus thousands in C, which *is* more error-prone, so requires more skills
from programmers who might be more interested in getting their tasks
accomplished quickly and painlessly.
 
R

Rui Maciel

Charlton said:
What is so magical about C that means that it's preferable to use a
custom string library, a custom hash/map/dictionary library, a custom
counted string library, a custom regular expression library, and
probably a half-dozen other custom libraries than to use a language
that's better suited to the problem?

I don't know what's so special about C. Yet, following that point of view, Perl and Ruby must be
terribly magical, as their communities felt the pressing need to build specialized repositories
dedicated to this sort of stuff. Have you ever heard of CPAN and rubyforge?


Rui Maciel
 
C

chutsu

    JN> (2) The allocation errors and low level accounting can be
    JN> avoided completely using a garbage collector in C. You are not
    JN> forced to use malloc/free.

The allocation errors and low level accounting can *also* be avoided
completely by using a language with dynamic memory management.  You are
not forced to use C any more than you are forced to use malloc/free.

    JN> You are just spreading nonsense, as always. You do not like
    JN> C. OK.  It is your "choice", but please take into account the
    JN> facts.

Actually, no, I like C a great deal.  I just don't think it's the
perfect language for all problems and all situations, because I *do*
take the facts into account.  Your view of C being the perfect universal
language is, as far as I'm concerned, the nonsense that's being spread.

Charlton

I think I've started a flame war. Getting back to the point, me as a
undergrad physicist I understand (or try to) that there are many tools
for the job, but when I look at stuff available on the internet, being
the "New Generation" we tend to like newer things like scripting
languages such as Python and Ruby. My reason for posting the question
was to to see whether C will cease to be used in future, and to
determine the "General Language" scientists and engineers alike use.

Obviously one cannot predict the future, and I accept that. From what
I have learnt through scanning through the replies is that Fortran is
used. However I personally don't like Fortran all that much, because:
1.) Its seems only old men aged 40~50+ use them
2.) Its not very widely used in anything else other than science
3.) GNU seems to only have a compiler for the Fortran 95 dialect (am I
right?)
4.) Fortran keeps changing, trying to be something its not by adding
Object Oriented features...

Note, I think the last point is quiet important, because when thinking
in the long term, when one creates a piece of code, I would like to
keep it for a loooonnnnggg time. I want to be able to write reusable
code, so 10-20 years down the line I'll still be able to use functions
or libraries I've created. Scripting languages, like python and ruby
change so much that they aren't backwards compatible. Like in Python
3000, or ruby 1.8 to 1.9. One has to change their old code to work
with the new, but really*? How stupid is that?

Some times I wish there was a decent scripting language, mature, and
stable. That doesn't change every 10 or 5 years...
Chris Choi
 
S

Seebs

Perl has the overhead of having to be compiled each time it is run. For
short scripts on trivial data, you might be looking at 100% to 200%
overhead. The longer the processing time, the less the overhead of
compilation matters.
My experience is that once you factor out that cost of compilation,
idiomatic Perl rarely takes more than 10% more overhead than idiomatic C.

Interesting. I guess it depends a lot on various factors. For instance,
multi-layered loops tend to be very expensive, while string operations are
usually not noticably more expensive in perl than in C. For mathematical
operations, though, I would expect to see a pretty significant cost to
the bytecode interpreter.
Of course. But how many bugs were avoided entirely by using Ruby rather
than C in the first place?

I don't know.

-s
 
S

Seebs

The allocation errors and low level accounting can *also* be avoided
completely by using a language with dynamic memory management. You are
not forced to use C any more than you are forced to use malloc/free.

I have one program in my collection for which I feel genuinely comfortable
saying that I am forced to use C. One. Everything else I've done in C
has been done because I was comfortable with C, or because I had reason
to believe that the overhead of a scripted language would be unacceptable,
or because I felt C mapped nicely onto the problem space.

But it is *possible* to find yourself "forced" to use C -- in that it would
be unreasonable to use anything else. For the example in question, have
a look at the program "pseudo", hosted at:
http://github.com/wrpseudo/pseudo

It's not a portable program by any means, but I genuinely feel that there was
no reasonable alternative to writing it in C. :)

-s
 
C

Charlton Wilbur

RM> Linked lists are terribly basic data structures that don't pose
RM> any challenge to anyone remotely invested in writing software.

But most of the people involved in scientific computation are *not*
interested in writing software. They're interested in getting the
answer to a complex question that involves significant computation.

RM> And would your reaction be if you were able to go from those
RM> 48-hour runs to some other run time between 24h and less than 2
RM> minutes just by picking up the right language for the job?

If you can actually cut the run time *in half*, and you don't double the
development time by doing so, then go for it.

That does not match anything I've ever observed in practice, but hey,
it's your fantasy.

Charlton
 
J

jacob navia

Charlton Wilbur a écrit :
JN> (2) The allocation errors and low level accounting can be
JN> avoided completely using a garbage collector in C. You are not
JN> forced to use malloc/free.

The allocation errors and low level accounting can *also* be avoided
completely by using a language with dynamic memory management. You are
not forced to use C any more than you are forced to use malloc/free.

JN> You are just spreading nonsense, as always. You do not like
JN> C. OK. It is your "choice", but please take into account the
JN> facts.

Actually, no, I like C a great deal. I just don't think it's the
perfect language for all problems and all situations, because I *do*
take the facts into account. Your view of C being the perfect universal
language is, as far as I'm concerned, the nonsense that's being spread.

You have a way of dicussing that is typical of people with no arguments.

(1) I said that the problems YOU mentioned about C memory allocation
can be avoided with a GC.

(2) YOU say that I think that "C is the perfect universal language".
I never said that. I just pointed out that you CAN avoid low level
accounting problems in memory management in C without any pain by
using a GC. From that sentence to go to "C is the perfect universal
language" is just BAD FAITH and putting words in other people's
mouth.

Get a clue man.
 
J

jacob navia

Charlton Wilbur a écrit :
JN> A programming language doesn't avoid bugs. Mistakes can be done
JN> in any programming language.

JN> malloc/free bugs?

JN> Use a GC in C.

JN> Zero terminated strings problems?

JN> Use a counted string library.

What is so magical about C that means that it's preferable to use a
custom string library, a custom hash/map/dictionary library, a custom
counted string library, a custom regular expression library, and
probably a half-dozen other custom libraries than to use a language
that's better suited to the problem?

Charlton

The advantage of C is its performance, its simplicity, its absence of
a preconceived and pre-imposed way of doing your computation.

C has been used in scientific computing since several decades and there
is a wealth of libraries ready to use. I mentioned a few, but there are
much important ones for scientific and technical applications.
 
J

jacob navia

Charlton Wilbur a écrit :
I mean, here's a task. I'm going to give you a file of arbitrary
length. In that file will be words, separated by whitespace. I want
you to give me a list of all the words in that file, together with a
count of how many times they're used. How many lines of C code is that?
I can accomplish it with one line of Perl. An inexperienced Perl
programmer can probably do it in less than two dozen.

Exactly that is why perl is GREAT for throw away software.

You want a program that will do something, then be thrown away?

Use perl. In one incrempehensible line that will give good
result in perl 5.1.2 under RedHat 7.5 it will run perfectly.

You got perl 4.xx or perl 6.xx?

Bad luck

You want it to run under windows?

Bad luck.

Perl is write only software. Write it, use it once or twice,
then forget it...

Nobody will be able to debug it, or to understand it later.
 
J

jacob navia

Charlton Wilbur a écrit :
RM> I haven't found a single person using python or ruby for serious
RM> science and engineering applications.

Me neither. Most of the people I know in scientific computing are using
Perl to do their bioinformatics data crunching.

Charlton

Can you please tell me a public available bioinformatic
library in perl?


Thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,273
Latest member
DamonShoem

Latest Threads

Top