Is it time for secure C ?

Harti Brandt · Jul 8, 2004

On Wed, 7 Jul 2004, Keith Thompson wrote:

KT>[...]
KT>> What no language and no compiler can solve is the logic error. If I am
KT>> writing control software for an aircraft, and I accidentally use a sine
KT>> rather than a cosine in some vital calculation, it will not be picked up
KT>> except through testing, or when the aeroplane crashes.
KT>
KT>Or code review. If you're writing airplane control software without
KT>doing code review, remind me not to fly in your airplanes. (That's a
KT>general comment, not directed at Malcolm.)

Even code review or using a 'safe' language doesn't help when the problem
specification is already broken. Remember the Ariane IV crash (software
written in ADA)? Or the Airbus crash in Warszawa?

What you need is a correct problem specification, a correct design, a
correct implementation done by good programmers in a language carefully
choosen for the problem. Easy, isn't it?

harti

Richard Bos · Jul 8, 2004

Malcolm said:
C makes it very easy to address memory illegally. This problem can be solved
by using another language, at the cost of some runtime inefficiency and loss
of simplicity.

It can be solved even more simply by doing what real professionals do:
think before you code, create a solid design, and _do your bloody
bookkeeping_. _You_ (general, not Malcolm specifically) are the one who
writes the program, secure or insecure. If you can't keep tabs on
something as simple as the length of an array, how can you be trusted
with something as complicated as, ooh, a file system?

What no language and no compiler can solve is the logic error. If I am
writing control software for an aircraft, and I accidentally use a sine
rather than a cosine in some vital calculation, it will not be picked up
except through testing, or when the aeroplane crashes.

Or (shock! horror!) proof-reading your code, or perhaps even (gnashing
of teeth! rain of fire!) having someone else proof-read it...

Richard

Dan Pop · Jul 8, 2004

In said:
Or code review. If you're writing airplane control software without
doing code review, remind me not to fly in your airplanes. (That's a
general comment, not directed at Malcolm.)

How do you know who wrote the programs of the airplanes you're flying
with? ;-)

Dan

Dan Pop · Jul 8, 2004

In said:
Greetings,

No, implementations of C make it very easy to address memory illegally.
I've not read anything in the standard the prohibits an implementation
from actually enforcing the rules.

What rules? You can convert any integer to a pointer value and the
language cannot tell whether the result is a valid pointer value or not.

I've given this a lot of thought of late & don't think it would be that
terribly difficult to add proper bounds checking to a good compiler.

Think harder. Review the answer I gave to Jacob Navia, on this topic,
in this very newsgroup, several months ago.

Dan

Richard Bos · Jul 8, 2004

Roman Ziak said:
Microsoft is world seconds biggest software developper.

Oh? So who's the first? IBM are a large computer manufacturer, but they
don't sell as much software as Microsoft.

Yes, WinNT used to crash a lot, but I seen WinXP crashed probably once
or twice in my career and that was for the third party driver.

My mileage varies. The greatest culprits seem to be Word and Internet
Exploder, both M$ products.

What makes you think MS does not mean anything in C world ?

Their own behaviour, and their attitude towards anything Standard -
_any_ standard, not just C.

Do you live in the cave ?

No, they do. Unfortunately, it's a great honking big cave, and there's a
lot of sheep in there. They don't yet know that Microsoft is a
reincarnation of Polyphemus.

Everybody learns from mistakes and so does this company.

Erm... no. They learn to cover up better, but if they _really_ learned,
they wouldn't keep overrunning buffers.

Richard

Dan Pop · Jul 8, 2004

In said:
^^^^^^^^^^^^^^^^^^^
Oh? So who's the first? IBM are a large computer manufacturer, but they
don't sell as much software as Microsoft.

^^^^
Developing software and selling software is not exactly the same thing.

I've no idea what metric could be used in order to make a top of the
software developers, so Roman's statement is rather vacuous to me.

Dan

P.J. Plauger · Jul 8, 2004

Erm... no. They learn to cover up better, but if they _really_ learned,
they wouldn't keep overrunning buffers.

The Secure C Library proposed to WG14 by Microsoft does indeed
stem from what they *learned* in eliminating buffer overruns
(among other security lapses) from their code.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

BruceS · Jul 8, 2004

Malcolm said:
Fortunately all my aeroplanes are vitual video game ones. However we don't
do code reviews, we just play it and if it plays for areasnable length of
time without faling over, release it.

Cool. That's exactly how we handled code reviews at various places I've
worked. Writing applications for government accounting, utility (e.g.
electric, gas) mapping, building design, etc. The major difference is that
the developers didn't try "playing" much at all. We left that for the
support guy and the customers. Who needs code reviews when there are
customers, who are quite willing to tell you about any problems they find?
Besides, we were generally too busy, trying to work around fundamental
design flaws. And find the causes of those bugs the customers reported. We
had no time for code reviews.

kyle york · Jul 8, 2004

Greetings,

Dan said:
What rules? You can convert any integer to a pointer value and the
language cannot tell whether the result is a valid pointer value or not.

So you're saying undefined behaviour is undefined. What's new? Nothing
prevents the compiler from emitting code that will trap/crash/burn in
this case.

This is assuming I understand 6.3.2.3 paragraphs 6 & 7. The way I read
this the implementation is allowed to say this results in an invalid
pointer. Simple enough.

Think harder. Review the answer I gave to Jacob Navia, on this topic,
in this very newsgroup, several months ago.

You and Jacob have had many threads in the past few months & I remember
many of them.

Please give an example of how it would be impossible to implement bounds
checking. I've yet to come up with a scenario that is insurmountable. As
I said before there's nothing in the language that prevents an
implementation that includes bounds checking. If I'm wrong, please point
me to chapter & verse.

Dave Vandervies · Jul 8, 2004

Roman Ziak said:
Almost every more sofisticated software contains bugs.

This need not be the case.

There's a guy by the name of Donald Knuth who's written at least one
major software package that I know of and use regularly, and I think a
few others as well. I don't know of any bugs in any of them Any bugs
that do exist are definitely NOT ones that a mere mortal would come
across in normal use.

There's no reason, other than unwillingness or inability to do the
job right, that implies that software *should* have bugs. (Note that
the inability to do the job right may not actually be the fault of the
programmers; this doesn't mean it becomes excusable. Especially when
it *is* the fault of the programmers.)

ObC: There's no reason why C code needs to have bugs, either. You just
need to be a little bit more careful, the same way you need to be more
careful with a chainsaw than with a screwdriver.

dave

Guillaume · Jul 8, 2004

Cool. That's exactly how we handled code reviews at various places I've

worked. Writing applications for government accounting, utility (e.g.
electric, gas) mapping, building design, etc. The major difference is that
the developers didn't try "playing" much at all. We left that for the
support guy and the customers. Who needs code reviews when there are
customers, who are quite willing to tell you about any problems they find?
Besides, we were generally too busy, trying to work around fundamental
design flaws. And find the causes of those bugs the customers reported. We
had no time for code reviews.

;-)

Dan Pop · Jul 8, 2004

In said:
Greetings,

So you're saying undefined behaviour is undefined. What's new? Nothing
prevents the compiler from emitting code that will trap/crash/burn in
this case.

This is assuming I understand 6.3.2.3 paragraphs 6 & 7. The way I read
this the implementation is allowed to say this results in an invalid
pointer. Simple enough.

Unless the address resulting from the conversion is the address of an
object. This adds a bit of complication to the issue.

You and Jacob have had many threads in the past few months & I remember
many of them.

Please give an example of how it would be impossible to implement bounds
checking. I've yet to come up with a scenario that is insurmountable. As
I said before there's nothing in the language that prevents an
implementation that includes bounds checking. If I'm wrong, please point
me to chapter & verse.

What part of "Review the answer I gave to Jacob Navia, on this topic,
in this very newsgroup, several months ago" was too difficult for you
to understand?

I'm not saying that it is impossible, I'm saying that you're way too
optimistic when you say that it's not terribly difficult. BTW, Jacob
gave up the idea after reading my answer ;-)

Dan

Keith Thompson · Jul 8, 2004

kyle york said:
Please give an example of how it would be impossible to implement
bounds checking. I've yet to come up with a scenario that is
insurmountable. As I said before there's nothing in the language that
prevents an implementation that includes bounds checking. If I'm
wrong, please point me to chapter & verse.

I think (but I'm not certain) that reliable bounds checking could be
provided by a C implementation, but there would be a significant cost.
The simplest way to do it would be to use "fat pointers".

For example, a char* might consist of three elements:

The base address of an object, created either by an object
definition or by a call to an allocation function like malloc();

The size of the object, in bytes; and

An offset, in bytes.

(For pointers to larger types, the size and offset could be measured
either in bytes or in larger units, whichever turns out to be more
efficient.)

Pointer arithmetic (including array indexing) would operate on the
offset, and would trap if the result is outside the known bounds of
the base object. Any operation on a pointer would check whether the
base address is non-null, and whether the offset is within the bounds
of the base object.

The drawbacks are that the resulting code would be slower, pointers
would take up more space, and many useful instances of undefined
behavior (in non-portable code) would cause traps.

Arthur J. O'Dwyer · Jul 8, 2004

I think (but I'm not certain) that reliable bounds checking could be
provided by a C implementation, but there would be a significant cost.
The simplest way to do it would be to use "fat pointers".

I seem to recall objections centering around the way pointer
representations interact with, say, arrays of unsigned char and
a few ill-advised memcpys. Consider

void *p;
unsigned char foo[sizeof p];
p = malloc(42);
*p = PERFECTLY_FINE;
memcpy(foo, &p, sizeof p);
free(p);
memcpy(&p, foo, sizeof p);
*p = SAME_BITS_INVOLVED, BUT_INCORRECT;

Now, this is the kind of thing that can be handled perfectly well
by a clever malloc package... but I think there *were* other examples
that really couldn't be handled correctly 100% of the time.

The drawbacks are that the resulting code would be slower, pointers
would take up more space, and many useful instances of undefined
behavior (in non-portable code) would cause traps.

If it causes a trap, it's not very useful, is it now?

That
last objection is just saying that non-portable code is not portable
to some implementations --- and that's true by definition! The
first two objections are true enough, though.

Of course, you *could* use the Hypothetical Nice Implementation to
test and debug your code, and then move to the Real-World Dangerous
Implementation for release. It would just be one step more advanced
than the widespread "Debug Version/Release Version" paradigm.

-Arthur

jacob navia · Jul 8, 2004

Keith Thompson said:
I think (but I'm not certain) that reliable bounds checking could be
provided by a C implementation, but there would be a significant cost.
The simplest way to do it would be to use "fat pointers".

This is the solution I have used in my string library

For example, a char* might consist of three elements:

The base address of an object, created either by an object
definition or by a call to an allocation function like malloc();

The size of the object, in bytes; and

An offset, in bytes.

My "fat" pointers consist of a length, a pointer, and a pointer to the base
object.
Each time the pointer is moved, the implementation checks that it stays
within the bounds of the original string object.

This is done dynamically, i.e. at run time.

Pointer arithmetic (including array indexing) would operate on the
offset, and would trap if the result is outside the known bounds of
the base object. Any operation on a pointer would check whether the
base address is non-null, and whether the offset is within the bounds
of the base object.

This is what my string library does

The drawbacks are that the resulting code would be slower, pointers
would take up more space, and many useful instances of undefined
behavior (in non-portable code) would cause traps.

I jave never really measured since the first implementation of the
library is designed as a proof of concept not as the final version.

The one measurement I did was the cost of function calls. In a
1.5GHZ P4 it would take several millions of calls to slow down the
program just one second.

The speed penalty is quite small, and for most purposes negligeable.

kyle york · Jul 8, 2004

Greetings,

Keith said:
[...]

For example, a char* might consist of three elements:

The base address of an object, created either by an object
definition or by a call to an allocation function like malloc();

The size of the object, in bytes; and

An offset, in bytes.

I was thinking one more level of indirection -- a pointer has a
descriptor + offset. The descriptor has reference count, base, size, and
flags. The biggest problem at the moment is how to handle pointers
embedded in structures & unions, specifically if a structure is freed
while an embedded pointer is still valid.

Yes, this does lead to a code size & performance hit but I suspect it
would still be incredibly useful, especially for people learning C and
arguably even for most user applications considering the number of hacks
out there trying to prevent things like buffer overlow. If there's a 10%
performance hit but a guarentee of safety I'd buy it.

An added benefit would be garbage collection for free.

Anyway, my original point was simply that there's nothing *in the
language* that forbids safe pointers, it's just no one has bothered to
implement them.

Minti · Jul 8, 2004

kyle york said:
Greetings,

No, implementations of C make it very easy to address memory illegally.
I've not read anything in the standard the prohibits an implementation
from actually enforcing the rules.

I've given this a lot of thought of late & don't think it would be that
terribly difficult to add proper bounds checking to a good compiler.

Depends. If you ask for static checking, quite easy[1], if runtime
difficult but yes possible. But then again it can be implemented in
std C only if there is a universal appeal for it. And IMO we aren't
getting there anytime sooner.

Actually many compilers _do_ indeed support runtime checking but that
is only at debug time, relase build don't have much of run time
checking.

But then again you already knew this.

[1] Will it be able to to detect?

int arr[10]

int x = 10;
for ( int i = 0; i <= x; ++i ) { arr = 0xCAFE; }

P.J. Plauger · Jul 8, 2004

I think (but I'm not certain) that reliable bounds checking could be
provided by a C implementation, but there would be a significant cost.
The simplest way to do it would be to use "fat pointers".

For example, a char* might consist of three elements:
.....

Actually, Microsoft's Secure C proposal is even simpler. It provides
augmented library functions for every case where a buffer size is
not explicitly spelled out in the existing calling sequence. That
permits code reviewers, the compiler, and the library code itself
to do considerably more checking -- without the need for widespread
use of fat pointers. Couple that with the inexpensive but effective
stack fences now generated by VC++ and you get quite a bit more
reliability for a remarkably small cost in performance and code
complexity.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Roman Ziak · Jul 8, 2004

P.J. Plauger said:
Actually, Microsoft's Secure C proposal is even simpler. It provides
augmented library functions for every case where a buffer size is
not explicitly spelled out in the existing calling sequence. That
permits code reviewers, the compiler, and the library code itself
to do considerably more checking -- without the need for widespread
use of fat pointers. Couple that with the inexpensive but effective
stack fences now generated by VC++ and you get quite a bit more
reliability for a remarkably small cost in performance and code
complexity.

What is "stack fence" ? Would it be swapping variables described in

http://blogs.msdn.com/tims/archive/2003/10/30/57439.aspx

I noticed in VC++ that it sometimes moves the stack pointer by approx
1k down, when calling certain functions and also swaps order of arguments.
I was not able to follow this even when stepping through single
instructions,
the stack just changed all of the sudden when entering the function.

Roman

P.J. Plauger · Jul 8, 2004

What is "stack fence" ? Would it be swapping variables described in

http://blogs.msdn.com/tims/archive/2003/10/30/57439.aspx

I noticed in VC++ that it sometimes moves the stack pointer by approx
1k down, when calling certain functions and also swaps order of arguments.
I was not able to follow this even when stepping through single
instructions,
the stack just changed all of the sudden when entering the function.

The article describes part of the machinery; there's a bit more.
Basically, the stack frames are organized so that it's much harder
for a buffer overrun to subvert a program and go unnoticed.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Draft Secure C	68	Jan 12, 2007
secure integer library	40	Aug 17, 2006
Derivation of the C Standard's Formula for FLT_DIG, DBL_DIG, LDBL_DIG	9	Jul 7, 2012
Derivation of the C Standard's Formula for FLT_DIG, DBL_DIG, LDBL_DIG	0	Jul 7, 2012
what is n1570?	13	Jun 29, 2011
wtf is n1570?	1	Jun 29, 2011
CERT C Programming Language Secure Coding Standard	7	Aug 31, 2006
The C Containers Library	121	Jul 8, 2012

Is it time for secure C ?

Harti Brandt

Richard Bos

Dan Pop

Dan Pop

Richard Bos

Dan Pop

P.J. Plauger

BruceS

kyle york

Dave Vandervies

Guillaume

Dan Pop

Keith Thompson

Arthur J. O'Dwyer

jacob navia

kyle york

Minti

P.J. Plauger

Roman Ziak

P.J. Plauger

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads