Bounds checking and safety in C

K

Keith Thompson

I've quoted jacob's entire article for a reason. Skip down to see my
comments.

jacob navia said:
We hear very often in this discussion group that
bounds checking, or safety tests are too expensive
to be used in C.

Several researchers of UCSD have published an interesting
paper about this problem.

http://www.jilp.org/vol9/v9paper10.pdf

Specifically, they measured the overhead of a bounds
checking implementation compared to a normal one, and
found that in some cases the overhead can be reduced
to a mere 8.3% in some cases...

I quote from that paper

< quote >
To summarize, our meta-data layout coupled with meta-check instruction
reduce the average overhead of bounds checking to 21% slowdown which is
a significant reduction when compared to 81% incurred by current
software implementations when providing complete bounds checking.
< end quote>

This 21% slowdown is the overhead of checking EACH POINTER
access, and each (possible) dangling pointer dereference.

If we extrapolate to the alleged overhead of using some extra
arguments to strcpy to allow for safer functions (the "evil
empire" proposal) the overhead should be practically ZERO.

Somehow, we are not realizing that with the extreme power of the
CPUs now at our disposal, it is a very good idea to try to
minimize the time we stay behind the debugger when developing
software. A balance should be sought for improving the safety
of the language without overly compromising the speed of the
generated code.

I quote again from that paper:

< quote >
As high GHZ processors become prevalent, adding hardware support to
ensure the correctness and security of programs will be just as
important, for the average user, as further increases in processor
performance. The goal of our research is to focus on developing
compiler and hardware support for efficiently performing software checks
that can be left on all of the time, even in production code releases,
to provide a signi cant increase in the correctness and security of
software.

< end quote >

The C language, as it is perceived by many people here, seems
frozen in the past without any desire to incorporate the changing
hardware/software relationship into the language itself.

When this issues are raised, the "argument" most often presented is
"Efficiency" or just "it is like that".

This has lead to the language being perceived as a backward and error
prone, only good for outdated software or "legacy" systems.

This pleases again the C++ people, that insist in seeing their language
as the "better C", and obviously, C++ is much better in some ways as
C, specially what string handling/common algorithms in the STL/ and
many other advances.

What strikes me is that this need not be, since C could with minimal
improvements be a much safer and general purpose language than it is
now.

Discussion about this possibility is nearly impossible, since a widely
read forum about C (besides this newsgroup) is non existing.

Hence this message.

To summarize:

o Bounds checking and safer, language supported constructs are NOT
impossible because too much overhead
o Constructs like a better run time library could be implemented in a
much safer manner if we would redesign the library from scratch,
without any effective run time cost.


jacob

P.S. If you think this article is off topic, please just ignore it.
I am tired of this stupid polemics.

jacob, the paper does look interesting. It's 26 pages long, and I
haven't yet been able to set aside the time to read the whole thing,
but it's on my to-do list.

This is *in spite of* your article recommending it. You spent most of
your article attempting to refute arguments that nobody has actually
made. And in the ensuing discussion you have ignored comments
demonstrating that bounds-checking can already be implemented without
changing the language at all.

A great many of the responses, including some from me and from Richard
Heathfield, have been *in support of* the idea of bounds checking in C.
We could have had an interesting and useful discussion if you hadn't
assumed throughout that we're all out to get you. If you attempt to
conduct both sides of a flame war yourself, it shouldn't be surprising
that the result is a flame war.
 
I

Ian Collins

jacob said:
In most cases this is a recipe for disaster.

Not if you know what you are doing and understand the limitations of the
simulation.
Only the most perfect emulators can REALLY
reproduce 100% of the features of the target, either
because it is too slow/too fast for the same real
time conditions, either because the simulated input stream
doesn't correspond with the actual input stream 100%,
and a thousand of other reasons.

The emulator is never the REAL THING, it is a an emulator!
That doesn't matter when you are unit testing code. The unit test
framework is as good a place as any to run any bounds checking.
OK. Then you would agree with me that this feature

#pragma STDC_BOUNDS_CHECK(ON)

would be much better since it wouldn't be constrained
to just dbx...
Not really, you loose the ability to select what you want to check at
run time.
 
R

Richard Heathfield

Keith Thompson said:

A great many of the responses, including some from me and from Richard
Heathfield, have been *in support of* the idea of bounds checking in
C.

Well, to be strictly accurate, my own response was merely intended to
point out that it is not necessary for C's definition to change in
order to allow bounds-checking implementations. It was not intended to
convey either support for or opposition to bounds-checking itself.

My own view is that bounds-checking can be a useful aid during
development. I would not, however, choose to use an implementation that
*enforced* bounds-checking for production code.

<snip>
 
I

Ian Collins

Kelsey said:
[snips]

It's no where near that bad. Yes there is a performance penalty, but
this can be mitigated by only applying the full set of checks to
selected parts of the application.

Ah, so the parts where you used strcpy safely you can skip, but the parts
where you didn't use it safely, you should bounds-check?

Seems silly to me; if you suspect a particular piece of code of needing
such hand-holding, fix the code. If you simply don't know, then why do
you assume some parts are safe and others not?
I think you miss the point.

Assuming you build you application from a set of libraries, you don't
have to bounds check every library every run.
 
K

Keith Thompson

Richard Heathfield said:
Keith Thompson said:



Well, to be strictly accurate, my own response was merely intended to
point out that it is not necessary for C's definition to change in
order to allow bounds-checking implementations. It was not intended to
convey either support for or opposition to bounds-checking itself.
[...]

Fair enough.
 
K

Keith Thompson

Ian Collins said:
Kelsey said:
[snips]
Impossible to use because the program will slow down for a factor
of 1,000 at least...

It's no where near that bad. Yes there is a performance penalty, but
this can be mitigated by only applying the full set of checks to
selected parts of the application.

Ah, so the parts where you used strcpy safely you can skip, but the parts
where you didn't use it safely, you should bounds-check?

Seems silly to me; if you suspect a particular piece of code of needing
such hand-holding, fix the code. If you simply don't know, then why do
you assume some parts are safe and others not?
I think you miss the point.

Assuming you build you application from a set of libraries, you don't
have to bounds check every library every run.

Why not? Well, I agree that you don't *have* to, but there could be
some benefit in doing so. Testing can never (well, hardly ever) be
100% exhaustive. The fact that you've thoroughly tested your
application and/or library with bounds checking doesn't necessarily
mean that no bounds errors are possible during production runs.

The usefulness of bounds checking in production code depends on what
happens when a check fails. If a failed check causes the application
to terminate immediately, that might or might not be better than
allowing the application to continue running; it depends very much on
the context in which the application is used. If it allows the
application to catch the error, perhaps via some sort of exception
handling mechanism, then it could be advantageous *if* the
exception-handling code is correct.

Also, an application with bounds checking and the same application
without bounds checking are, in a sense, two different applications,
and *both* should be tested just as thoroughly.

If bounds checking were completely free of cost, I might advocate
requiring it in the language. If it always caused code to be slower
by a factor of 10, I wouldn't suggest it except during testing or in
safety-critical code. The truth is somewhere in between.
 
C

CBFalconer

.... snip much about addition of bounds checking ...
The second is the "spirit of C". C is for macho programmers
that do not need bounds checking because they never make mistakes.

And there are others.

I wanted with my post address the first one. Those researchers
prove that a FAST implementation of bounds checking is
feasible even without language support.

I would say that with language support, the task would be much
easier AND much faster, so fast that it could be done at run time
without any crushing overhead.

I suggest you go back and read about the tests on complete bounds
checking in Pascal, performed roughly 30 years ago. The conclusion
was that routine enabling of such would slow most code down by
something like 2 or 3 percent. No more. The compiler can detect
which cases require checking. All that is required from the
programmer is proper typing.

Of course, Pascal is a sanely designed language, without violent
bandying of pointers, with convenient sub-ranges, etc. The net
effective result is that C cannot be thoroughly checked at compile
or run time.
 
S

Serve Lau

jacob navia said:
The most obvious example is the development of
length delimited strings.

strlen becomes just a memory read with those strings.
Much faster *and safer* than an UNBOUNDED memory scan!

Other functions like strcat that implicitly call
strlen are FASTER.

I have been promoting this change (without obsoleting
zero terminated strings of course for legacy code)
for several years. Maybe you are new here and did not see
my other posts.



The annotations chapter is a huge issue in itself.
But I can't say everything in one post please.


Thanks for the advice but what makes you think I haven't done it?

why not generalise it more and add an "array" datatype instead of a string
datatype. Strings are only useful for well strings. And you'd have to add
wchar_t * strings to with their casts and support functions. Much better to
have an array type then that knows its size and strings can be implemented
on top of the array type.
 
J

jacob navia

Serve said:
why not generalise it more and add an "array" datatype instead of a string
datatype. Strings are only useful for well strings. And you'd have to add
wchar_t * strings to with their casts and support functions. Much better to
have an array type then that knows its size and strings can be implemented
on top of the array type.

Obvious, but that is *much* more complicated.
I programmed a generalized array that knows the size of the stored
elements too.

But needs finishing. I am planning a general array package,
with optimized array operations.

jacob
 
I

Ian Collins

Keith said:
Why not? Well, I agree that you don't *have* to, but there could be
some benefit in doing so. Testing can never (well, hardly ever) be
100% exhaustive. The fact that you've thoroughly tested your
application and/or library with bounds checking doesn't necessarily
mean that no bounds errors are possible during production runs.
I agree, there is nothing to stop something from passing a bad pointer
to a tested library, or even the standard library.

My comment and practice is based on past experience, bounds errors tend
to show up in the user code that originates them so selective testing
during development is useful. One of the reasons I build applications
from a set of dynamic libraries is to make the access checking easier
(there used to be problems on Sparc with modules over a certain size)
and faster, a case of the tools shaping the process. I would use the
feature more if it had less of a performance hit (bear in mind that this
tool also performs access checking).
The usefulness of bounds checking in production code depends on what
happens when a check fails. If a failed check causes the application
to terminate immediately, that might or might not be better than
allowing the application to continue running; it depends very much on
the context in which the application is used. If it allows the
application to catch the error, perhaps via some sort of exception
handling mechanism, then it could be advantageous *if* the
exception-handling code is correct.
The only time I have used bounds checking in production code was an
embedded product based on the 386, I used a local description table
entry for each allocation. This deferred all of the checking to the
hardware, any out of bounds (including use of a freed pointer) access
resulted in a trap and reboot. We decided that any out of bounds access
would leave the system in an unsafe state.
 
A

Alan Curry

Consider this program:
int fn(int *p,int c)
{
return p[c];
}

int main(void)
{
int tab[3];

int s = fn(tab,3);
}

Please tell me a compiler system where this program generates an
exception.

gcc -fmudflap

(If optimizing, the whole thing becomes a noop, but adding "return s;" at the
end of main takes care of that.)
 
W

William Hughes

And often that it must execute correctly. For example:

#include <limits.h>
#include <stdio.h>
int main(void)
{
printf("INT_MAX = %d\n", INT_MAX);
return 0;

}

This example isn't relevant to bounds checking, but it is an example
of a non-strictly-conforming program that must compile and execute
correctly under any conforming implementation.

See C99 4p3:

A program that is correct in all other aspects, operating on
correct data, containing unspecified behavior shall be a correct
program and act in accordance with 5.1.2.3.






That wasn't my claim. A program that can be shown at compile time to
violate bounds invokes undefined behavior, so it's neither strictly
conforming (C99 4p5) nor "correct (C99 4p3). An implementation can do
anything it likes with such a program.

My argument is that an bounds-checking implementation that doesn't
affect any strictly conforming program (except perhaps for
performance), but that does break some "correct" programs (i.e.,
programs that do not invoke UB but that depend on unspecified
behavior) is not a conforming implementation. In other words, it's
not the effect on strictly conforming programs we have to worry about;
it's the effect on the much larger set of "correct" programs.

Yes, I concede the point.

Do you have an example of a "correct" program that has a
bounds violation?

The examples you gave, a two dimensional array accessed as a one
dimensional
array and the struct hack, are examples of undefined behaviour so are
not part of a "correct" program.

- William Hughes
 
K

Keith Thompson

William Hughes said:
Yes, I concede the point.

Do you have an example of a "correct" program that has a
bounds violation?

By definition, no. A program that has a bounds violation invokes
undefined behavior, and is therefore not "correct".

Imagine, though, a hypothetical bounds-checking implementation that
operates on the *assumption* that the code being processed is strictly
conforming. This could be an unintentional implicit assumption;
perhaps the folks who wrote the bounds-checking subsystem didn't think
to deal with unspecified behavior, and were too aggressive in their
assumptions. Such an implementation would correctly handle any
strictly conforming program, but could break some correct programs.

I didn't mean to suggest that this kind of thing is likely to be a
concern in the real world (and perhaps I haven't been sufficiently
clear on that point). I merely meant to point out that a
bounds-checking implementation must work correctly for all *correct*
programs, not just for all strictly conforming programs.
 
W

William Hughes

By definition, no. A program that has a bounds violation invokes
undefined behavior, and is therefore not "correct".

Imagine, though, a hypothetical bounds-checking implementation that
operates on the *assumption* that the code being processed is strictly
conforming. This could be an unintentional implicit assumption;
perhaps the folks who wrote the bounds-checking subsystem didn't think
to deal with unspecified behavior, and were too aggressive in their
assumptions. Such an implementation would correctly handle any
strictly conforming program, but could break some correct programs.

I didn't mean to suggest that this kind of thing is likely to be a
concern in the real world (and perhaps I haven't been sufficiently
clear on that point). I merely meant to point out that a
bounds-checking implementation must work correctly for all *correct*
programs, not just for all strictly conforming programs.

Indeed. Correctly handling strictly conforming programs is
not the only condition that the standard puts on an implementation.
So a putatitve bounds checking implementation cannot be shown to
be conforming by showing that it does not break any strictly
conforming
implementation.

Still, I think that the knowledge that no strictly conforming
program can be used to show that a bounds checking implementation
is possible. The proof must, however, involve more than noting that
correctly detecting bounds violations is possible in theory, and
no strictly conforming program has bounds violations.

- William Hughes
 
K

Keith Thompson

William Hughes said:
Indeed. Correctly handling strictly conforming programs is
not the only condition that the standard puts on an implementation.
So a putatitve bounds checking implementation cannot be shown to
be conforming by showing that it does not break any strictly
conforming
implementation.

Still, I think that the knowledge that no strictly conforming
program can be used to show that a bounds checking implementation
is possible. The proof must, however, involve more than noting that
correctly detecting bounds violations is possible in theory, and
no strictly conforming program has bounds violations.

I'm afraid I don't see how any argument involving strictly conforming
programs is useful in determining whether a bounds-checking
implementation can be conforming. Can you expand on your reasoning?

Also, I can't quite parse your sentence above starting "Still, I
think". Was there a typo?
 
W

William Hughes

[...]
Indeed. Correctly handling strictly conforming programs is
not the only condition that the standard puts on an implementation.
So a putatitve bounds checking implementation cannot be shown to
be conforming by showing that it does not break any strictly
conforming
implementation.
Still, I think that the knowledge that no strictly conforming
program can be used to show that a bounds checking implementation
is possible. The proof must, however, involve more than noting that
correctly detecting bounds violations is possible in theory, and
no strictly conforming program has bounds violations.

I'm afraid I don't see how any argument involving strictly conforming
programs is useful in determining whether a bounds-checking
implementation can be conforming. Can you expand on your reasoning?


Something like.

Bounds checking is not required, so a non-bounds checking
implementation can be conforming.

Bounds checking can be done in theory.

Strictly conforming programs do not have bounds violations so
any problem must occur with non-strictly conforming
programs that have bounds violations which
produce implementation defined behaviour
[I know that such programs do not exist but I am trying
to construct a proof that does not need this fact.]

The implementation defined behaviour is to turn off
bounds checking if a program that might be such
a program is detected. (from the above we know
that an implementation without bounds checking can
be conforming)

The gap here is that it is not clear to me that the set of programs
that
can be shown to be free of implementation defined bound violation
behaviour
is large enough to allow the reasonable use of the term
"bounds checking implementation".
Also, I can't quite parse your sentence above starting "Still, I
think". Was there a typo?

Yes. One should always proofread carefully to make sure that no
get left out.

The sentence should read:


Still, I think that the knowledge that no strictly conforming
program can have a bounds violation
can be used to show that a bounds checking implementation
is possible.


- William Hughes
 
W

websnarf

I am not sure what you are saying here. Are you claiming
that among the existing implementations there is an
implementation of another language that gives you
performance and better memory safety than any existing
implementation of C, or are you claiming that there is
another language which gives performance and better
memory safety than any possible implemntation of C (in
this case is the claim that the performance is comparable
to that of C)? Or do you mean something else?

Well, whichever of those he means, he's just wrong. C (and C++) enjoy
a very solitary place as fast practical low level languages, which are
really not generally matched in performance by any widely usable
language except sometimes in narrow applications.

See the problem with the C.L.C. regulars and the C standards folks is
that they don't only not know what programmers what, they also don't
even know what they already have.

Putting bounds checking into the core language would give it a more
abstract interface to memory. But that would make C lose its low-
level nature. I don't see this as being in keeping with the spirit of
C especially as other languages typically go to a lot of trouble to
*hoist* bounds checking (at compile/translation time) in order to
reduce its cost. The C language doesn't present abstract enough
primitives to allow for similar hoisting optimizations to be detected
easily.

The one exception to this, of course, is strings, where a complete
(and purely library side) reformulation can get you drastically
improved safety and improved performance at the same time (see the
link at the bottom).

The right answer, for C and C++ is to present more generic structures
that have bounds checking built-in with the hoisting done for you, as
additional libraries. In this way nothing is taken away from the
language, while safety mechanisms still exist and are available to
those that want to use them. This is really a C++ STL kind of thing.

The greatest potential for C as a language in terms of pursuing safety
is to *change* the standard library. However, this is generally not
open for discussion. Certain non-reentrant functions, and others
which just have really bad abstractions should be removed and
corrected. And the role and functionality of malloc, realloc, calloc,
free appear to have been wildly under-treated. The IO functions are
pathetically non-abstract, and the WG14 proposal falls short of the
mark.

If one simply delivers good functionality through safe interfaces, the
level of safety will go up. If the only choices people have are to
use crummy unsafe interfaces then the level of safety won't go up.
 
I

Ian Collins

Well, whichever of those he means, he's just wrong. C (and C++) enjoy
a very solitary place as fast practical low level languages, which are
really not generally matched in performance by any widely usable
language except sometimes in narrow applications.
I didn't spot William's post, but I was referring to C++ as the
alternative. There you have the option of the raw C style seat of your
pants style of programming or through another layer of indirection, the
Pascal bounds checking style. The choice is up to the developer.
 
P

Peter J. Holzer

Bounds checking is nice and all, but it certainly is no panacea.
It may even not be *that* useful IMO. Here is why:

1. No bounds checking. You read or write data outside bounds. It
generates an exception.

This IS bounds checking.
2. Bounds checking. You read or write data outside bounds. It generates
an 'out of bounds' exception.

Not that much different.

Right. There is much difference between bounds checking and bounds
checking.
(All implementations where it doesn't always generate an exception, or
worse, where it can lead to code execution, is brain-dead IMO, but
that's another story. Thus, it's not a problem of bounds checking or
not.)

But it is. If a bounds violation doesn't generate an exception, the
implementation obviously doesn't do bounds checking. If a bounds
violation does generate an exception the implementation does check
bounds (at least in some cases).

hp
 
K

Kelsey Bjarnason

[snips]

A factor of ten slowdown is no problem, since I use it only when
debugging.

The objective is to use it at runtime since the speed penalty is not
great.

An impact of almost 10% would be intolerable in many situations, thank you
very much.
In my implementation:

char *str = (char *)String;

Oh goody - modifiable, directly in the object. Now you have to trap every
single pointer operation I might ever choose to do to ensure I don't
modify, say, the length. Or free the buffer. Or whatever.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top