in praise of type checking

Tom Anderson · Oct 11, 2011

I like what Hoare said about it: "The story of the Mariner space
rocket to Venus, lost because of the lack of compulsory declarations in
FORTRAN, was not to be published until later.

I love Tony Hoare.

But anyway, it occurred to me that there are a remarkable number of good
stories about space probes, and space rockets, which are specifically
relevant to getting things right (or rather, wrong) when programming. The
ones i know:

- Mariner 1 and "The most expensive hyphen in history"
- Mars Climate Orbiter and metric vs imperial units
- Ariane 5 and exception handling, data typing, scope creep, and unit testing
- Eagle and the 1201 alarms [1]
- I vaguely remember a story about a computer controlling a landing
(during a test) which followed a descent trajectory defined by
interpolating a polynomial; the points defining the polynomial made a nice
smooth line, but the thing about high-order polynomials is that the
interpolation between them can be pretty wild - in this case, the minimum
altitude reached by the curve was below ground level
- Er ...
- That's it.

Any others?

tom

[1] On which subject:

http://www.hq.nasa.gov/alsj/a11/a11.1201-pa.html
http://www.hq.nasa.gov/alsj/a11/a11.1201-fm.html
http://www.doneyles.com/LM/Tales.html

Gene Wirchenko · Oct 11, 2011

On Tue, 11 Oct 2011 07:48:07 +0200, Robert Klemme

[snip]

Which conclusion? Here are my two points with a bit more explanation:

1. It is interesting that the discussion dynamic vs. static typing comes
up again and again. This indicates that people are not able or do not
want to settle it.

Or that there is no one optimum.

2. Static typing is not automatically superior to dynamic typing in
every case. (Neither is the opposite true but because of the thread's
subject I am of course arguing in favor of dynamic typing to balance
other positions).

Which explains why it comes up again and again.

[snip]

Sincerely,

Gene Wirchenko

Arved Sandstrom · Oct 11, 2011

Which conclusion? Here are my two points with a bit more explanation:

1. It is interesting that the discussion dynamic vs. static typing comes
up again and again. This indicates that people are not able or do not
want to settle it.

2. Static typing is not automatically superior to dynamic typing in
every case. (Neither is the opposite true but because of the thread's
subject I am of course arguing in favor of dynamic typing to balance
other positions).

To add a bit more explanation: I believe the reason for 1 is the former:
people /cannot/ settle this because "dynamic vs. static typing" leaves
out too many aspects which are important for success and cost of
software projects: these are at least nature of the software (and size),
people (skills and number) and process used. Yet people often search
for simple and catchy rules which explains the popularity of the topic.

As Gene said (and I happen to agree with), there is no one optimum. One
guy in one situation will see that dynamic typing suits his purposes
better than static typing (or at least he thinks it does), and another
guy in another situation thinks that static typing is better than
dynamic typing (and he in turn may have good reasons, adequate reasons,
or lousy reasons for believing so).

Let's also keep in mind that since there is no hard and fast answer to
this one, it's an appealing subject for debate, and let's also keep in
mind that since the bar to discussing this is apparently low, every new
generation of novices thinks they can dive right in too. After all,
you'll have guys with 6 months of Ruby and no other language under their
belts (this is *not* a subtle dig at you

) and they think _they_ can
claim that dynamic typing is better than static typing.

Certainly. But the cost of bugs found during testing depends on the
process used: that the compiler does not catch a bug does not
automatically mean that it's late into the project that it is caught.
And "lateness" is the most driving factor for cost because the more time
passes the more code can be written which depends on the faulty code.

Type information for e.g. method arguments certainly helps make code
readable but it is easy to write spaghetti code in statically and
dynamically typed languages. Maintainability also depends on quality of
documentation and the overall design.

Static typing (a la Java) and dynamic typing (a la Python) are a small
part of the puzzle here. A strongly-typed dynamically-typed language is
arguably much better than a weakly-typed statically-typed language.

Furthermore, if we look at the entire spectrum, where does a Java
variable lie? Without any further constraints, it looks pretty weak
compared to a functional language variable, which really *is* a variable
- single-assignment...and depending on the language you'd probably be
using immutable values as well. If you're following good practice by
coding to an interface, that Java variable's static type may be as wide
ranging as a List or Map (and generifying may not pin it down much more
either), and if it's not a final, and if the value is mutable (which is
probable), you're not gaining nearly as much over a dynamically typed
variable as you think.

In other words, I agree wholeheartedly that design and other practices
outside of simple reliance on the type system are major factors. I'd go
so far as to say that regardless of your competency, that the type
system in use will not matter that much. Not until you use really strict
type systems like that of Haskell.

Two more points to consider

1. Static typing won't detect design and architecture flaws which are
generally considered to be the most expensive to remedy.
Agreed.

2. Static typing is only of limited help in detecting concurrency issues
which are often hard to track down and thus expensive.

Kind regards

robert

AHS

Lew · Oct 12, 2011

...

Your points are cogent and your conclusions make sense.

Each tool in the toolshed has a purpose. I am not against dynamically-typed languages at all. One has to use the qualities of the tool to enhance one's purpose.

No question that both sides offer value to the systems development process.

Gene Wirchenko · Oct 12, 2011

Arved Sandstrom wrote:
...
...

It does not even have to be different people. Some tasks seem easier to
me in a dynamically typed language, others with static typing.

Quite. I prefer static, because I like to catch the trouble up
front. This is at a cost of flexibility, and sometimes, I want/need
that flexibility.

Sincerely,

Gene Wrichenko

Travers Naran · Oct 12, 2011

The Ariane 5 incident doesn't tell us anything about exception
handling, data typing, scope creep or unit testing. Neither of those
were the culprit. It _does_ tell us a few things about requirements /
specification (mis-)management.

I thought it told us the importance of not dumping your exceptions to
stdout when stdout is fed into the rocket gimbal controller?

Martin Gregorie · Oct 12, 2011

The Ariane 5 incident doesn't tell us anything about exception handling,
data typing, scope creep or unit testing. Neither of those were the
culprit. It _does_ tell us a few things about requirements /
specification (mis-)management.

True enough, but it speaks volumes about skimping on integration testing
and not bothering to test edge conditions.

Gene Wirchenko · Oct 12, 2011

The Ariane 5 incident doesn't tell us anything about exception
handling, data typing, scope creep or unit testing. Neither of those
were the culprit. It _does_ tell us a few things about requirements /
specification (mis-)management.

AIUI, it had to do with a type conversion that lost precision.
Either the exception was not thrown or was not handled. (My sources
did not go into that detail.)

Sincerely,

Gene Wirchenko

Gene Wirchenko · Oct 12, 2011

On Wed, 12 Oct 2011 16:55:39 -0500, Leif Roar Moldskred

[snip]

It was the wrong code, but there was nothing wrong _with_ the code:
like a traffic cop trying to measure the speed of a passing car with a
hair dryer -- there's nothing wrong _with_ the hair dryer, but the
hair dryer is clearly the wrong choice.

Sure there was. It had an assumption about how much oomph the
rocket had. The Ariane 5 had way more than the 4 did. With the 4,
the overflow was not possible. With the 5, it was.

Sincerely,

Gene Wirchenko

Martin Gregorie · Oct 12, 2011

Yes ... but not really. The error manifested as an arithmetic overflow
exception in the hardware, but that is an immaterial detail. The
_actual_ problem didn't have anything to do with exception handling or
even with programming errors.

What had happened was that that they took a piece of code that had been
written for the Ariane 4 rocket -- and which was correct and without
defects -- and put it onto the much larger and more powerful Ariane 5
rocket without considering if the code was fit for its new purpose and
without testing the way the code interacted with the rest of the system.

It was the wrong code, but there was nothing wrong _with_ the code: like
a traffic cop trying to measure the speed of a passing car with a hair
dryer -- there's nothing wrong _with_ the hair dryer, but the hair dryer
is clearly the wrong choice.

The full report, which makes interesting reading, is here:
http://esamultimedia.esa.int/docs/esa-x-1819eng.pdf

Arved Sandstrom · Oct 13, 2011

Yes, but as the code wasn't written to be used on the Ariane 5, that
was a valid assumption and not an error. The code was correct and fit
for its intended use. That someone later took this code and tried to
use it for something it was never meant to be used for is not an error
in the code: A wrench makes a poor hammer, but that doesn't mean the
wrench is constructed badly.

When I read the report that Martin pointed us at, particularly pages 4-6
(pages 8-10 of the PDF), I sure don't get the impression that the code
was "correct and fit for its intended use". This includes in a wider
sense the documentation that describes the assumptions and decisions
related to the code.

How do you know that the code was not meant to be used for the Ariane 5?
They _did_ use it for the Ariane 5; that's enough evidence for me that
they intended for it to be used not just for the Ariane 4 but also for
the Ariane 5. The report clearly states that the reasoning related to
the horizontal bias variable BH was *faulty* - they just happened to be
fortunate with Ariane 4. It was not, as you suggest, a "valid
assumption". And the follow-up exception-handling was described as a
systematic software design error.

In a wider sense, if you've got a codebase that was intended for
situation A (or more precisely, since "intended" is a strong purposeful
word that implies knowing what you're about, "used with"), and now you
adopt it for situation B, that codebase _belongs_ to situation B. You
can't say it's fit and correct just because it still works in situation
A - who cares, actually? If it's unfit for situation B it's unfit for
situation B. Period.

AHS

John B. Matthews · Oct 14, 2011

Arved Sandstrom said:
When I read the report that Martin pointed us at, particularly pages
4-6 (pages 8-10 of the PDF), I sure don't get the impression that the
code was "correct and fit for its intended use". This includes in a
wider sense the documentation that describes the assumptions and
decisions related to the code.

How do you know that the code was not meant to be used for the Ariane
5? They _did_ use it for the Ariane 5; that's enough evidence for me
that they intended for it to be used not just for the Ariane 4 but
also for the Ariane 5. The report clearly states that the reasoning
related to the horizontal bias variable BH was *faulty* - they just
happened to be fortunate with Ariane 4. It was not, as you suggest, a
"valid assumption". And the follow-up exception-handling was
described as a systematic software design error.

I inferred that BH was left unprotected to reduce delay in the "event of
a hold in the count-down," a feature used in Ariane 4. "The same
requirement does not apply to Ariane 5." The "systematic software design
error" was a culture of "only addressing random hardware failures." The
management error was in not thoroughly testing the reused software.

In a wider sense, if you've got a codebase that was intended for
situation A (or more precisely, since "intended" is a strong
purposeful word that implies knowing what you're about, "used with"),
and now you adopt it for situation B, that codebase _belongs_ to
situation B. You can't say it's fit and correct just because it still
works in situation A - who cares, actually? If it's unfit for
situation B it's unfit for situation B. Period.

A Java analogy might be adopting a 10 year old external dependency
without running _all_ unit tests.

"No reference to justification of [the BH] decision was found directly
in the source code." I can't help but think that generated documentation
(e.g javadoc, adahtml) is one way to mitigate this kind of risk.

Tom Anderson · Oct 14, 2011

The Ariane 5 incident doesn't tell us anything about exception handling,
data typing, scope creep or unit testing. Neither of those were the
culprit. It _does_ tell us a few things about requirements /
specification (mis-)management.

It tells us about all those things, which is why i mentioned them. And
more - i should also have mentioned process management.

The fundamental failure was about requirements, absolutely. That's what i
referred to as scope creep - the scope of the inertial navigation system
was originally defined as being Ariane 4, but crept to include Ariane 5,
without this being properly addressed.

But that was not the only failure. There were several points at which
something could have been done differently which would have saved the
rocket. Off the top of my head:

1. The module that failed was a pre-launch calibration daemon in the
inertial navigation system; it had no use at all after launch. If it had
been shut down at launch, the failure would not have occurred.

2. IIRC, the pre-launch procedure had changed such that the daemon was not
needed anyway. If it had been removed, the failure would not have
occurred.

3. The failure involved a cast from (in Java terms) a double (used to
capture and instrument reading) to a short (used for calculations) which
overflowed. If doubles had been used for calculation, the failure would
not have occurred.

4. The cast was not protected by a suitable exception handler. If it had
been (although i'm not sure what the handler would actually do), the
failure would not have occurred.

5. The inertial navigation system's top-level exception handling handled a
crash by writing diagnostic information to the same data bus used for
output, without any metadata indicating that it was diagnostics rather
than data; the guidance computer interpreted it as data, and went wild. If
the diagnostic information had been written elsewhere, or had been marked
and subsequently recognised by the guidance computer as being such rather
than data, the failure would not have occurred.

6. The combination of a real inertial navigation system and a real
guidance computer was never tested with real sensor inputs. The guidance
computer was tested with a mock inertial navigation system, which did not
accurately reproduce the real system's faulty behaviour. It was a unit
test rather than an integration test. If the test had been an integration
test, the fault would have been detected long before launch, and the
failure would not have occurred.

Yes, you can identify a root cause, in the form of a mistake in the
requirements process. But you can also identify a series of other mistakes
which enabled that mistake to cause the failure. To pay attention only to
the root cause and discard the other mistakes is foolish.

tom

Arne Vajhøj · Nov 6, 2011

So I presume you didn't have anything like this in your code:

{
foo(yourMethodToChangeToInt());
}
void foo(int i) { ... }
void foo(boolean b) { ... }
boolean yourMethodToChangeToInt() { ... }

After the change, the other foo will be called.
The point is, that the compiler won't necessarily present
you *all* call-sites of your method, not even all those
where the result is actually used.

Which is another good reason not to change the return
type but add a new method.

Arne

Ubunto	74	Oct 13, 2011
constructing a constant HashMap	19	Oct 16, 2011
Where am I?	10	Oct 13, 2011
naming convention	9	Oct 19, 2011
Java control panel anomaly	2	Oct 23, 2011
borrowing Constants	22	Sep 24, 2011
@see scope	6	Oct 4, 2011
code generation for the ternary operator	33	Oct 28, 2011

in praise of type checking

Tom Anderson

Gene Wirchenko

Arved Sandstrom

Lew

Gene Wirchenko

Travers Naran

Martin Gregorie

Gene Wirchenko

Gene Wirchenko

Martin Gregorie

Arved Sandstrom

John B. Matthews

Tom Anderson

Arne Vajhøj

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads