Ben Finney said:
This seems to make the dangerous assumption that the programmer has
the correct program in mind, and needs only to transfer it correctly
to the computer.
Well, I hope the programmer can at least state some clear
specficiations that a correct program should implement, e.g. it should
not freeze or crash.
I would warrant that anyone who understands exactly how a program
should work before writing it, and makes no design mistakes before
coming up with a program that works, is not designing a program of any
interest.
I'm not sure what the relevance of this is. Programmers generally
have an idea of what it is that they're trying to write. They then
have to make a bunch of design and implementation decisions as they go
along. Soemtimes those decisions are subtly incorrect, and the more
errors the computer can identify automatically, the fewer errors are
likely to remain in the program.
Certainly. Which is why the trend continues toward developing programs
such that mistakes of all kinds cause early, obvious failure -- where
they have a much better chance of being caught and fixed *before*
innocent hands get ahold of them.
Here is what Edward Lee wrote about developing a multi-threaded Java
app (
http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf):
A part of the Ptolemy Project experiment was to see whether
effective software engineering practices could be developed for an
academic research setting. We developed a process that included a
code maturity rating system (with four levels, red, yellow, green,
and blue), design reviews, code reviews, nightly builds,
regression tests, and automated code coverage metrics [43]. The
portion of the kernel that ensured a consistent view of the
program structure was written in early 2000, design reviewed to
yellow, and code reviewed to green. The reviewers included
concurrency experts, not just inexperienced graduate students
(Christopher Hylands (now Brooks), Bart Kienhuis, John Reekie, and
myself were all reviewers). We wrote regression tests that
achieved 100 percent code coverage. The nightly build and
regression tests ran on a two processor SMP machine, which
exhibited different thread behavior than the development machines,
which all had a single processor. The Ptolemy II system itself
began to be widely used, and every use of the system exercised
this code. No problems were observed until the code deadlocked on
April 26, 2004, four years later.
It is certainly true that our relatively rigorous software
engineering practice identified and fixed many concurrency
bugs. But the fact that a problem as serious as a deadlock that
locked up the system could go undetected for four years despite
this practice is alarming. How many more such problems remain? How
long do we need test before we can be sure to have discovered all
such problems? Regrettably, I have to conclude that testing may
never reveal all the problems in nontrivial Multithreaded code.
Now consider that the code you test today on a 2-core processor may be
running on a 100-core processor in 4 years, making any concurrency
errors that much worse. You can't test on the 100-core cpu today
because it doesn't exist yet, but your code still has to run on it
correctly once it's out there. Heck, if you're writing in CPython (no
parallelism at all, due to the GIL) you might be in for a surprise as
soon as real parallelism arrives on even 2 cores. It gets worse,
maybe you're writing operating system or language runtime code
(e.g. parallel garbage collector for Java). That means there will be
hostile code on the same machine, written by diabolical maniacs who
know every detail of your implementation, and who deliberately do
stuff (e.g. write code to hit specific cache lines or cause page
faults at fiendishly precisely chosen moments) specifically to cause
weird timing conditions and trigger bugs. Testing is not enough to
ensure that such attacks won't succeed.
I remember an old programming book (might even have been Knuth vol. 3)
recommending using the Shell sorting algorithm (a very simple exchange
sort that runs in about O(n**1.4) time) instead of an O(n log n)
algorithm because Shell's algorithm (being simpler) was less likely to
end up with implementation errors. Today I think an experienced
programmer needing to code a sort routine wouldn't hesitate to code a
good algorithm, because our languages and coding/testing methods are
better than they were in the Fortran and assembly language era when
that book was written. But there are still more complex problems
(like shared memory multiprocessing) that we're advised to stay away
from because they're so fraught with hazards. We instead use
approaches that are simpler but take perforamnce penalties. (As a
non-concurrency example, we might implement geometric search through
some kind of list scanning algorithm instead of using more efficient
but complicated k-d trees). With higher reliability languages we'll
hopefully be in better shape to tackle those complex problems instead
of retreating from them, without expecting to spend eternity hunting
down mysterious bugs.