Re: "Strong typing vs. strong testing"

R

RG

ImpalerCore said:
[ron@mighty:~]$ cat foo.c
#include <stdio.h>

int maximum(int a, int b) { return a > b ? a : b; }

int main() {
  long x = 8589934592;
  printf("Max of %ld and 1 is %d\n", x, maximum(x,1));
  return 0;}

[ron@mighty:~]$ gcc -Wall foo.c
[ron@mighty:~]$ ./a.out
Max of 8589934592 and 1 is 1

In the context of procedural programming, there is always an implicit
contract between the function and its client. If you're going to fool
around sending cleverly crafted garbage into the input of 'maximum'
due to C conversion rules, why do you expect the 'maximum' function to
be responsible for producing the correct response to an ill-formed
question? What alternative behavior of 'maximum' would you prefer to
see, that the C language auto-promote the function arguments and
return type to type long based on the type of arguments provided to
the 'maximum' function?

You either learn to play in the sandbox that C provides, splinters and
all, or you find another sandbox.

Best regards,
John D.

You're missing a lot of context. I'm not trying to criticize C, just to
refute a false claim that was made about it.

rg
 
R

RG

Seebs said:
No, we won't. It's the *definition* of static typing. Static typing
is there to give you some guarantees at the expense of not being able
to express some things without special extra effort. That's why it's
static.

I don't want to quibble over terminology. Whatever label you choose to
put on it ("false positive", "not being able to express some things
without special extra effort") I consider it a deficiency. The costs
are greater than the benefits. Reasonable people can (and obviously do)
disagree.

rg
 
G

Gene

There are only two possibilities: either you have a finite-state
machine, or you have a Turning machine.  (Well, OK, you could have a
pushdown automaton, but there are no programming languages that model a
PDA.  Well, OK, there's Forth, but AFAIK there are no static type
checkers for Forth.  Besides, who uses Forth? ;-)

If you have a finite state machine everything is trivial.  If you have a
Turing machine everything is generally impossible.  This is an
oversimplification but not far from the fundamental underlying truth.

My favorite practical example is the square root function.  The standard
C library defines a square root function on floats (actually on
doubles), which is to say, over a finite-state model with 2^64 states.  
The model is not well defined over half of that range (negative
numbers), which the static type checker cannot catch because there is no
such thing as an unsigned double.

But the fun doesn't stop there.  Doubles >= 0.0 are not the only thing
one might reasonably want to take a square root of, and indeed C++
overloads sqrt to work on complex and valarray types in addition to
floats of various lengths (though you still have to manually keep track
of whether or not the argument to sqrt might be a negative real).  But
what if I want an exact integer square root?  Or a square root of a data
type that represents irrational numbers not as floating point
approximations but as exact symbolic representations?  I haven't worked
out the details, but I'd be surprised if none of these variations turned
out to be Turing complete.

The Turner paper is right on point: there's a fundamental distinction
between the (known) finite and the (potentially) infinite.  In my
experience most of the cool interesting stuff has been found in the
latter domain, and trying to shoehorn the latter into the former is more
trouble then it's worth.

The FA or TM dichotomy is more painful to contemplate than you say.
Making appropriate simplifications for input, any modern computer is a
FA with 2^(a few trillion) states. Consequently, the gestalt of
computer science seems to be to take it on faith that at some very
large number of states, the FA behavior makes a transition to TM
behavior for all possible practical purposes (and I mean all). So
what is it--really--that's trivial to analyze? And what is
impossible? I'm sorry this is drifting OT and will stop here.
 
R

RG

Keith Thompson said:
RG said:
My code compiles with no warnings under gcc -Wall.

Conclusion: "-Wall" is not "a decent set of warning levels" in this
context, in spite of the name. (If you want to complain that the
name "-Wall" is misleading, I won't disagree, but it's a gcc issue,
not a C issue.)

With "-Wconversion", I get:

c.c: In function 'main':
c.c:7: warning: passing argument 1 of 'maximum' with different width due to
prototype

[...]

That gives (what I would consider to be) false positives, e.g.:

[ron@mighty:~]$ cat foo.c

void foo(long x) {}

int main() { foo(1); }
[ron@mighty:~]$ gcc -Wconversion foo.c
foo.c: In function ‘main’:
foo.c:4: warning: passing argument 1 of ‘foo’ with different width due
to prototype

And yet you have not managed to do it.

Actually I have. Twice now. Remember, the claim is not that the
compiler will fail to warn. It's trivial to never fail to warn simply
by always warning. The claim is that the compiler with *either* fail to
warn *or* generate false positives. So you have to choose your compiler
(and flags) first, and then I get to construct my example. If my
example has *either* an error that the compiler doesn't catch *or* a
non-error that it does catch then I win.

Want to try another round?

rg
 
P

Pascal J. Bourguignon

RG said:
There are only two possibilities: either you have a finite-state
machine, or you have a Turning machine. (Well, OK, you could have a
pushdown automaton, but there are no programming languages that model a
PDA. Well, OK, there's Forth, but AFAIK there are no static type
checkers for Forth. Besides, who uses Forth? ;-)

If you have a finite state machine everything is trivial. If you have a
Turing machine everything is generally impossible. This is an
oversimplification but not far from the fundamental underlying truth.


All our computers are FSA: they don't have infinite memory.

Even if we developed a robot able to crunch stellar matter and use it
as memory, it would still NOT be a Turing Machine, since the universe is
finite, has finite matter and energy.

Therefore it's trivial. (Indeed, the answer is 42).



Now the problem is rather our minds. We use the notion of mathematical
infinite, and of Turing Machines vs. DFA, because beyond a certain size,
a DFA is not manageable any more by our little minds, and also,
computationnaly. While it's finite, computing its most of its
properties will require more time than the universe (which is also
limited in time). So we need another conceptualization to reduce the
complexity of big DFAs. Turing Machines are much simplier, they just
slip the complexity under the infinite tape (which conceptually is
rather simple, as long as you don't try to reach the end of the tape, or
you don't try to use it to weave an infinite rug...
 
P

Paul Rubin

RG said:
There are only two possibilities: either you have a finite-state
machine, or you have a Turning machine. (Well, OK, you could have a
pushdown automaton, but there are no programming languages that model a
PDA.

The point is that the halting problem for general Turing machines is
undecidable, but there is a subset of the Turing machines, in which the
halting problem is decidable. And it turns out that the decidable
subset is substantial enough to write almost every program anyone
usually wants to write in practice.
Well, OK, there's Forth, but AFAIK there are no static type
checkers for Forth.

http://home.vrweb.de/stephan.becher/forth/

but anyway, Forth is Turing-complete.
If you have a finite state machine everything is trivial. If you have a
Turing machine everything is generally impossible. This is an
oversimplification but not far from the fundamental underlying truth.

I'm sorry, but I don't think you understand the actual situation enough
to be making pronouncements like that. The stuff about finite-state
machines isn't even slightly relevant.
The Turner paper is right on point: there's a fundamental distinction
between the (known) finite and the (potentially) infinite. In my
experience most of the cool interesting stuff has been found in the
latter domain, and trying to shoehorn the latter into the former is more
trouble then it's worth.

The point of the Turner paper is that you can design a language with
separate types for finite data (like arrays) and "infinite" data (like
an endless stream of requests going into a server). Such a language can
restrict you to writing provably halting programs on the finite data,
and provably non-halting programs on the infinite data. That is, the
language doesn't let you write infinite loops on finite data or break
out of infinite loops on infinite data (you can only shut down your
server by "unplugging the computer", e.g. by killing the process
externally).

The claim is that these two classes of programs are enough for most
purposes. The third possible class, the programs like the Collatz
searcher where you can't tell by static analysis whether the program
halts, just isn't that important. The paper argues that by giving up
the ability to express those undecidable programs, a language can gain
other advantages that make up for the loss of Turing-completeness most
of the time.

I don't think anyone has suggested languages like that are a great idea
for everyday programming, but obviously there can be metholodogies that
use such approaches for special purposes.
 
P

Pascal J. Bourguignon

Keith Thompson said:
const double pi = 2.71828182845904523526;

To a human reader, it's obviously either a mistake or deliberate
obfuscation, but I'm not sure I'd *want* my compiler to warn me
about it just because I named the object "pi" rather than "e".
(And if I called it "x", even an intelligent human wouldn't know
that it's wrong.)

Well, you see, I think it would be perfectly nice from a compiler to
provide a warning if you gave that value to a variable named pi. On the
other hand, I'd expect an error where gcc only gives warning (I usually
compile with -Werror, but that should be the default).
 
K

Keith Thompson

Seebs said:
Hmm. You have a point. It's clearly a conversion from one type
to another.

If I'm reading 6.7.5.3p15 correctly, the types
int (*)()
and
int (*)(int, int)
are compatible, so the declaration and initialization of barf is
perfectly legal, and a call
bar(3, 4)
would also be legal and would return 4.

I actually didn't notice on my initial reading that the call is passing
the wrong number of arguments. Since the type of barf doesn't specify
the number or types of the arguments, no diagnostic is required, but
the behavior is undefined.
Yes. It's clearly undefined behavior to call a function through a
pointer to a different type, or to call a function with the wrong number
of arguments. I am pretty sure at least one compiler catches this.

The former is not a problem here; the type of barf is compatible with
the type of a pointer to maximum. The latter is the problem, and a
sufficiently clever compiler can warn about it.

Note that you could do something silly like this:

int one_param(int a);
int two_params(int a, int b);
int (*barf)();
if (some_condition) {
barf = one_param;
}
else {
barf = two_params;
}
if (some_other_condition) {
barf(1);
}
else {
barf(2, 3);
}

No constraint violations, and no undefined behavior as long as
some_condition and some_other_condition have the same value.
The best a compiler can do (unless it knows about the conditions)
is warn you that something *might* go wrong.

For programmers, the solution is simple: *Don't do that!*.
 
P

Paul Rubin

RG said:
I don't want to quibble over terminology. Whatever label you choose to
put on it ("false positive", "not being able to express some things
without special extra effort") I consider it a deficiency. The costs
are greater than the benefits. Reasonable people can (and obviously do)
disagree.

Chris Smith's essay "What To Know Before Debating Type Systems"
discusses the question basically that way, without taking either side.
It's well worth reading:

http://web.archive.org/web/20080822101209/http://www.pphsg.org/cdsmith/types.html
 
K

Keith Thompson

RG said:
You're missing a lot of context. I'm not trying to criticize C, just to
refute a false claim that was made about it.

Can you cite the article that made this false claim, and exactly what
the false claim was?
 
S

Seebs

I don't want to quibble over terminology.

May I suggest that, if you don't want to use words and terminology
precisely, perhaps computer programming is not for you?
Whatever label you choose to
put on it ("false positive", "not being able to express some things
without special extra effort") I consider it a deficiency.

Oh, it definitely is a deficiency.

However, the deficiency is that you have to do extra work to get data
safely in, and you have to figure out for yourself how you want to
handle things if, say, you have a value which is not of the right type,
and you want to make it be of the right type, but it may not be possible
to do so.

It is, however, entirely possible to write a function which returns
the maximum of its two inputs, 100% reliably, in C.
The costs are greater than the benefits. Reasonable people can
(and obviously do) disagree.

I tend to think it depends on what I'm trying to do, and why I'm trying
to do it. There are plenty of tasks for which I prefer C's semantics to
those of most scripting languages -- and others for which I prefer scripting
language semantics.

In practice, I have dozens to hundreds of times more problems with
scripting languages where something ends up being of the wrong type (or more
precisely, not of any of the types which can be used in that context) than I
have, say, overflow problems in C. On the other hand, it's usually much
easier to catch them and deal with them.

-s
 
P

Paul Rubin

TheFlyingDutchman said:
With Tiny C on my system, your code does not cause maximum to give an
incorrect value, or to blow up:

int maximum(int a, int b)
{
printf("entering maximum %d %d\n",a,b);
if ( a > b )
return a;
else
return b;
}

What did printf show as "b" when you ran it? If that code worked at
all, it was by accident. Try changing the name "maximum" to "minimum"
and change the ">" to "<", compile the same program, and see if you
still get a correct value.
 
S

Seebs

Which is why agile practices such as TDD have an edge. If it compiles
*and* passes all its tests, it must be right.

So far as I know, that actually just means that the test suite is
insufficient. :)

Based on my experience thus far, anyway, I am pretty sure it's essentially
not what happens that the tests and code are both correct, and it is usually
the case either that the tests fail or that there are not enough tests.

-s
 
S

Seebs

That gives (what I would consider to be) false positives, e.g.:
[ron@mighty:~]$ cat foo.c
void foo(long x) {}
int main() { foo(1); }
[ron@mighty:~]$ gcc -Wconversion foo.c
foo.c: In function ???main???:
foo.c:4: warning: passing argument 1 of ???foo??? with different width due
to prototype

But it's *not* a false positive. The compiler is not claiming that the
conversion couldn't be done -- it's just telling you that there is a
change of type going on.

If you don't want that message, it is certainly possible to write code
which won't get it, and which will reliably work everywhere.
So you have to choose your compiler
(and flags) first, and then I get to construct my example. If my
example has *either* an error that the compiler doesn't catch *or* a
non-error that it does catch then I win.

Those goal posts are sorta red shifted at this point.

You're redefining "error" and "non-error" so as to demand that a statically
typed language offer you the same semantics of errors and non-errors that
dynamically typed languages have. That's cheating, though. The claim
is about C, not about what people who are expecting a dynamically typed
language would like C to be like.

-s
 
I

Ian Collins

So far as I know, that actually just means that the test suite is
insufficient. :)

Based on my experience thus far, anyway, I am pretty sure it's essentially
not what happens that the tests and code are both correct, and it is usually
the case either that the tests fail or that there are not enough tests.

Which is why we write the tests first. The only code written is written
to pass a test.

Reviewing tests is a lot easier than reviewing the code while working
out what is is supposed to do.
 
D

Don Geddis

Keith Thompson said:
Can you cite the article that made this false claim, and exactly what
the false claim was?

http://groups.google.com/group/comp.lang.lisp/msg/431925448da59481

Message-ID: <0497e39d-6bd1-429d-a86f-f4c89babe1a4@u31g2000pru.googlegroups.com>
From: TheFlyingDutchman <[email protected]>
Newsgroups: comp.lang.lisp

[...]
in C I can have a function maximum(int a, int b) that will always
work. Never blow up, and never give an invalid answer. If someone
tries to call it incorrectly it is a compile error.
[...]

_______________________________________________________________________________
Don Geddis http://don.geddis.org/ (e-mail address removed)
 
S

Seebs

in C I can have a function maximum(int a, int b) that will always
work. Never blow up, and never give an invalid answer. If someone
tries to call it incorrectly it is a compile error.

I would agree that the third sentence is arguably wrong, simply
because there's no such thing (outside #error) of a mandate to stop
compiling. However, my understanding was that the dispute was over
the second sentence, and that's certainly correct.

The obvious simple maximum() in C will not raise an exception nor return
something which isn't an int in any program which is not on its face
invalid in the call. This is by definite contrast with several of the
interpreted languages, where a function or subroutine like that cannot
specify that its argument must be some kind of integer.

-s
 
R

RG

Don Geddis said:
Keith Thompson said:
Can you cite the article that made this false claim, and exactly what
the false claim was?

http://groups.google.com/group/comp.lang.lisp/msg/431925448da59481

Message-ID:
<0497e39d-6bd1-429d-a86f-f4c89babe1a4@u31g2000pru.googlegroups.com>
From: TheFlyingDutchman <[email protected]>
Newsgroups: comp.lang.lisp

[...]
in C I can have a function maximum(int a, int b) that will always
work. Never blow up, and never give an invalid answer. If someone
tries to call it incorrectly it is a compile error.
[...]

______________________________________________________________________________
_
Don Geddis http://don.geddis.org/
(e-mail address removed)

Thanks, Don.

rg
 
R

rustom

Some points that seem to be missed (or Ive missed them?)

1. A dichotomy is being made between 'static' languages like C and
'dynamic' languages like python/lisp. This dichotomy was valid 30
years ago, not today. In Haskell for example

- static checking is stronger than in C/C++ -- its very hard if not
impossible to core dump haskell except through memory exhaustion

- dynamic-ness is almost that of python/lisp -- on can write
significant haskell programs without type-declaring a single variable/
function

Much more mainstream, C# is almost as 'managed' as dynamic languages
and has efficiency comparable to C.

2. The dichotomy above misses a more pervasive dichotomy -- hardware
vs software -- as real today as 30 years ago.

To see this let us lift the discussion from that of *languages* C vs
Python/Lisp to philosophies:
-- C-philosophy: the purpose of type-checking is to maximize (runtime)
efficiency
-- Lisp-philosophy: the purpose of type-checking is zero-errors (aka
seg-faults) via continuous checks at all levels.

If one is honest (and not polemical :) ) it would admitted that both
sides are needed in different contexts.

Now Dijkstra pointed (40 years ago) in Discipline of Programming that
this unfortunate dilemma arises due to lack of hardware support. I am
unable to reproduce the elegance and succinctness of his language but
the argument is as follows:

Let us say that for a typical profile of a computer we have for every
one instruction of the pathological one typified by the maximum
function, a trillion 'normal' instructions. This is what he calls a
very-skew test -- an if-then-else that checks this would go the if-way
way one trillion times for one else-way. It is natural for a
programmer to feel the pinch of these trillion checks and (be inclined
to) throw them away.

If however the check was put into hardware there would be no such
dilemma. If every arithmetic operation was always checked for overflow
*by hardware* even languages committed to efficiency like C could trap
on errors with no extra cost.
Likewise Lisp/python-like languages could easily be made more
efficient.

The diff arises from the fact that software costs per use whereas
hardware costs per installation -- a transistor, unlike an if, does
not cost any more if its used once or a trillion times.

In short the problem is not C vs Lisp/Python but architectures like
Intel wherein:

1. an overflow bit harmlessly set by a compare operation is
indistinguishable from one set by a signed arithmetic operation --
almost certainly a problem

2. An into instruction (interrupt on overflow) must be inserted into
the software stream rather than raised as a hardware interrupt.
 
R

RG

Seebs said:
That gives (what I would consider to be) false positives, e.g.:
[ron@mighty:~]$ cat foo.c
void foo(long x) {}
int main() { foo(1); }
[ron@mighty:~]$ gcc -Wconversion foo.c
foo.c: In function ???main???:
foo.c:4: warning: passing argument 1 of ???foo??? with different width due
to prototype

But it's *not* a false positive. The compiler is not claiming that the
conversion couldn't be done -- it's just telling you that there is a
change of type going on.

Again, you can't have it both ways. Either a warning is a "compiler
error" according to the claim at issue (see below) or it is not. If it
is, then this is a false positive. If it is not, then my previous
example refutes the claim.
If you don't want that message, it is certainly possible to write code
which won't get it, and which will reliably work everywhere.

Of course it's possible. It's *possible*, with enough effort, to write
correct code in any programming language. What does that have to do
with anything?
Those goal posts are sorta red shifted at this point.

At this point I would like to quote a wise man who once said:
May I suggest that, if you don't want to use words and terminology
precisely, perhaps computer programming is not for you?

Red shifted?
You're redefining "error" and "non-error" so as to demand that a statically
typed language offer you the same semantics of errors and non-errors that
dynamically typed languages have. That's cheating, though. The claim
is about C, not about what people who are expecting a dynamically typed
language would like C to be like.

The claim is:

http://groups.google.com/group/comp.lang.lisp/msg/431925448da59481

Message-ID:
<0497e39d-6bd1-429d-a86f-f4c89babe1a4@u31g2000pru.googlegroups.com>
From: TheFlyingDutchman <[email protected]>
Newsgroups: comp.lang.lisp

[...]
in C I can have a function maximum(int a, int b) that will always
work. Never blow up, and never give an invalid answer. If someone
tries to call it incorrectly it is a compile error.
[...]

The truth of this claim hinges on the definitions of "work", "never blow
up", "invalid", "call incorrectly" and "compile error." Define these
however you like, the result will be that the claim is either false or
vacuous.

rg
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top