Bounds checking and safety in C

J

jacob navia

We hear very often in this discussion group that
bounds checking, or safety tests are too expensive
to be used in C.

Several researchers of UCSD have published an interesting
paper about this problem.

http://www.jilp.org/vol9/v9paper10.pdf

Specifically, they measured the overhead of a bounds
checking implementation compared to a normal one, and
found that in some cases the overhead can be reduced
to a mere 8.3% in some cases...

I quote from that paper

< quote >
To summarize, our meta-data layout coupled with meta-check instruction
reduce the average overhead of bounds checking to 21% slowdown which is
a significant reduction when compared to 81% incurred by current
software implementations when providing complete bounds checking.
< end quote>

This 21% slowdown is the overhead of checking EACH POINTER
access, and each (possible) dangling pointer dereference.

If we extrapolate to the alleged overhead of using some extra
arguments to strcpy to allow for safer functions (the "evil
empire" proposal) the overhead should be practically ZERO.

Somehow, we are not realizing that with the extreme power of the
CPUs now at our disposal, it is a very good idea to try to
minimize the time we stay behind the debugger when developing
software. A balance should be sought for improving the safety
of the language without overly compromising the speed of the
generated code.

I quote again from that paper:

< quote >
As high GHZ processors become prevalent, adding hardware support to
ensure the correctness and security of programs will be just as
important, for the average user, as further increases in processor
performance. The goal of our research is to focus on developing
compiler and hardware support for efficiently performing software checks
that can be left on all of the time, even in production code releases,
to provide a signi cant increase in the correctness and security of
software.

< end quote >

The C language, as it is perceived by many people here, seems
frozen in the past without any desire to incorporate the changing
hardware/software relationship into the language itself.

When this issues are raised, the "argument" most often presented is
"Efficiency" or just "it is like that".

This has lead to the language being perceived as a backward and error
prone, only good for outdated software or "legacy" systems.

This pleases again the C++ people, that insist in seeing their language
as the "better C", and obviously, C++ is much better in some ways as
C, specially what string handling/common algorithms in the STL/ and
many other advances.

What strikes me is that this need not be, since C could with minimal
improvements be a much safer and general purpose language than it is
now.

Discussion about this possibility is nearly impossible, since a widely
read forum about C (besides this newsgroup) is non existing.

Hence this message.

To summarize:

o Bounds checking and safer, language supported constructs are NOT
impossible because too much overhead
o Constructs like a better run time library could be implemented in a
much safer manner if we would redesign the library from scratch,
without any effective run time cost.


jacob

P.S. If you think this article is off topic, please just ignore it.
I am tired of this stupid polemics.
 
R

Richard

Richard Heathfield said:
jacob navia said:


The C Standard neither requires nor forbids bounds checking. A strictly
conforming program will violate no bounds, and so presumably will not

A conforming program can still have bugs. Or?
be able to detect the existence of a bounds checker. Therefore, it's
perfectly acceptable for an implementation to incorporate this feature.
And indeed some do, although typically only in debug mode, for what I
hope are obvious reasons. This is entirely a QoI issue.

<snip>

--
 
R

Richard Heathfield

jacob navia said:
We hear very often in this discussion group that
bounds checking, or safety tests are too expensive
to be used in C.

The C Standard neither requires nor forbids bounds checking. A strictly
conforming program will violate no bounds, and so presumably will not
be able to detect the existence of a bounds checker. Therefore, it's
perfectly acceptable for an implementation to incorporate this feature.
And indeed some do, although typically only in debug mode, for what I
hope are obvious reasons. This is entirely a QoI issue.

<snip>
 
R

Richard Heathfield

Richard said:
A conforming program can still have bugs. Or?

I actually said "strictly conforming program". A strictly conforming
program does not contain any instances of undefined behaviour. (If it
did, it would not be strictly conforming.) Therefore, it cannot violate
any bounds.
 
R

Richard

Richard Heathfield said:
Richard said:


I actually said "strictly conforming program". A strictly conforming
program does not contain any instances of undefined behaviour. (If it
did, it would not be strictly conforming.) Therefore, it cannot violate
any bounds.

How does a program get so certified?
 
B

Bjoern Vian

Richard said:
I actually said "strictly conforming program". A strictly conforming
program does not contain any instances of undefined behaviour. (If it
did, it would not be strictly conforming.) Therefore, it cannot violate
any bounds.

Ok, but that is completely irrelevant for programming practice;
it's pure theory.
 
I

Ian Collins

jacob said:
Somehow, we are not realizing that with the extreme power of the
CPUs now at our disposal, it is a very good idea to try to
minimize the time we stay behind the debugger when developing
software. A balance should be sought for improving the safety
of the language without overly compromising the speed of the
generated code.

As Richard H. pointed out, this is a QoI issue. For may years I have
been using a development environment that supports run time bounds and
leak checking and I probably wouldn't use one that didn't.

There are alternatives to C if you want performance and better memory
safety.
 
G

Guillaume

Bounds checking is nice and all, but it certainly is no panacea.
It may even not be *that* useful IMO. Here is why:

1. No bounds checking. You read or write data outside bounds. It
generates an exception. (All implementations where it doesn't always
generate an exception, or worse, where it can lead to code execution, is
brain-dead IMO, but that's another story. Thus, it's not a problem of
bounds checking or not.)

2. Bounds checking. You read or write data outside bounds. It generates
an 'out of bounds' exception.

Not that much different. In both cases, you need to handle the
exception. How you must handle it in its particular context really *is*
the main issue here, and the main difficulty.

"Manual" bounds checking here and there in your code can be useful -
mostly because you know why you want to check that at this point and how
you're gonna handle the occasional out-of-bounds cases.

But systematic bounds checking? I don't believe in that.
You opinion may vary. But I'm waiting for strong arguments.
 
J

jacob navia

Bjoern said:
Ok, but that is completely irrelevant for programming practice;
it's pure theory.

It is not even theory

What he is saying is:

"A strictly conforming program does not contain any instances of
undefined behavior."

What this abstraction bring to us in useful consequences is
zero since nowhere it is specified how to prove/disprove
that program "a" is strictly conforming or not!

But let's close this parentheses. Heathfield posted that
message 9 minutes after I posted mine, with some "bla bla"
without substance. He did not read the article of those
researchers, and he addresses NONE of the issues I raised.

Please let's return to those issues!

jacob
 
R

Richard Heathfield

Bjoern Vian said:
Ok, but that is completely irrelevant for programming practice;
it's pure theory.

Well, I disagree. The practical relevance is immediate and vital:
*because* a strictly conforming program does not violate any bounds, a
bounds checker that does not disturb such a program is *necessarily*
tolerated by the Standard. That is, an implementation may offer
bounds-checking as part of its attempt to secure competitive advantage,
without abrogating its claim to ISO conformance.

So if the question is "should C be changed to allow bounds checking",
the answer is no, because C already allows bounds checking. It just
doesn't require it.

If the question is "should C be changed to /require/ bounds checking", I
would suggest that again the answer should be no, but this time from
the perspective of free market choice. Let those who wish to have
bounds-checking use implementations that make it possible, and let
those who do not wish to have bounds-checking use implementations that,
at the very least, don't make it impossible to switch it off.
 
K

Keith Thompson

Bjoern Vian said:
Ok, but that is completely irrelevant for programming practice;
it's pure theory.

I agree that the category of "strictly conforming programs" is too
narrow to be particularly useful to programmers. (It's useful
primarily in defining conforming implementations, I think.)

However, bounds checking affects behavior only for programs that
already exhibit undefined behavior. For non-buggy programs, bounds
checking should have no effect other than performance. For buggy
programs, bounds checking can reveal the bugs (that's the whole
point). I suppose the most sensible thing to do of a check fails is
to abort the program, given C's lack of exception handling.

On the other hand, there are some presumably valid C constructs that
could break in the presence of bounds checking, such as the classic
"struct hack" and code that assumes two-dimensional arrays can be
accessed as one-dimensional arrays.
 
J

jacob navia

Ian said:
There are alternatives to C if you want performance and better memory
safety.

Well, that is precisely the issue here.

I think that C should address those problems instead of
telling people:

"C is an obsolete language. Please use another one".
 
R

Richard Heathfield

Richard said:
How does a program get so certified?

"Certified", I wouldn't know about. But the definition of "strictly
conforming" is not a secret:

"A strictly conforming program shall use only those features of
the language and library specified in this Standard. It shall
not produce output dependent on any unspecified, undefined, or
implementation-defined behavior, and shall not exceed any minimum
implementation limit."
 
I

Ian Collins

jacob said:
Well, that is precisely the issue here.

I think that C should address those problems instead of
telling people:

"C is an obsolete language. Please use another one".

No, it doesn't say that. There is nothing to stop a tool vendor
providing some form of access and leak checking tool. From my
perspective, such a feature is an indication of the quality of their tools.
 
J

jacob navia

Guillaume said:
Bounds checking is nice and all, but it certainly is no panacea.
It may even not be *that* useful IMO. Here is why:

1. No bounds checking. You read or write data outside bounds. It
generates an exception.

NO, in most cases writing beyond a variable's specified length doesn't
produce any exception.

Consider this program:
int fn(int *p,int c)
{
return p[c];
}

int main(void)
{
int tab[3];

int s = fn(tab,3);
}

Please tell me a compiler system where this program generates an
exception.
> (All implementations where it doesn't always
generate an exception, or worse, where it can lead to code execution, is
brain-dead IMO, but that's another story. Thus, it's not a problem of
bounds checking or not.)

You are dreaming!

Most C implementations will not do any bounds checking whatsoever!
2. Bounds checking. You read or write data outside bounds. It generates
an 'out of bounds' exception.

Not that much different. In both cases, you need to handle the
exception. How you must handle it in its particular context really *is*
the main issue here, and the main difficulty.

Yes. The consequence is that we need an exception handling
capability!
 
D

dan

We hear very often in this discussion group that
bounds checking, or safety tests are too expensive
to be used in C.

Several researchers of UCSD have published an interesting
paper about this problem.

http://www.jilp.org/vol9/v9paper10.pdf

Specifically, they measured the overhead of a bounds
checking implementation compared to a normal one, and
found that in some cases the overhead can be reduced
to a mere 8.3% in some cases...

I quote from that paper

< quote >
To summarize, our meta-data layout coupled with meta-check instruction
reduce the average overhead of bounds checking to 21% slowdown which is
a significant reduction when compared to 81% incurred by current
software implementations when providing complete bounds checking.
< end quote>

This 21% slowdown is the overhead of checking EACH POINTER
access, and each (possible) dangling pointer dereference.

If we extrapolate to the alleged overhead of using some extra
arguments to strcpy to allow for safer functions (the "evil
empire" proposal) the overhead should be practically ZERO.

Somehow, we are not realizing that with the extreme power of the
CPUs now at our disposal, it is a very good idea to try to
minimize the time we stay behind the debugger when developing
software. A balance should be sought for improving the safety
of the language without overly compromising the speed of the
generated code.

I quote again from that paper:

< quote >
As high GHZ processors become prevalent, adding hardware support to
ensure the correctness and security of programs will be just as
important, for the average user, as further increases in processor
performance. The goal of our research is to focus on developing
compiler and hardware support for efficiently performing software checks
that can be left on all of the time, even in production code releases,
to provide a signi cant increase in the correctness and security of
software.

< end quote >

The C language, as it is perceived by many people here, seems
frozen in the past without any desire to incorporate the changing
hardware/software relationship into the language itself.

When this issues are raised, the "argument" most often presented is
"Efficiency" or just "it is like that".

This has lead to the language being perceived as a backward and error
prone, only good for outdated software or "legacy" systems.

This pleases again the C++ people, that insist in seeing their language
as the "better C", and obviously, C++ is much better in some ways as
C, specially what string handling/common algorithms in the STL/ and
many other advances.

What strikes me is that this need not be, since C could with minimal
improvements be a much safer and general purpose language than it is
now.

Discussion about this possibility is nearly impossible, since a widely
read forum about C (besides this newsgroup) is non existing.

Hence this message.

To summarize:

o Bounds checking and safer, language supported constructs are NOT
impossible because too much overhead
o Constructs like a better run time library could be implemented in a
much safer manner if we would redesign the library from scratch,
without any effective run time cost.

jacob

P.S. If you think this article is off topic, please just ignore it.
I am tired of this stupid polemics.

Is it reasonable to suggest that if you don't think C is safe, or
needs bounds checking, then develop or use another language that is
safer? Or use a tool / development environment that does bounds
checking. I've used Java and C# a lot and really like them. But C is a
different kind of language. One thing I really I like about C is that
it does NOT impose extra overhead, such as mandatory bounds checking,
that the programmer is stuck with. I don't think it makes sense to
make suggestions that would radically change the nature of an existing
programming language.

Daniel Goldman
 
J

jacob navia

Ian said:
No, it doesn't say that. There is nothing to stop a tool vendor
providing some form of access and leak checking tool. From my
perspective, such a feature is an indication of the quality of their tools.

Yes, there are a lot of tools, and their existence PROVES the gaping
hole in the language. The problem is that developing such tools
is very system specific and in most cases not available.

o They are NOT available in many embedded platforms
o They are NOT available in many respected systems like,
(for instance) AIX, or SUN, as far as I know. Under linux
you have valgrind but it is quite heavy to use.

What I am aiming at is a general language construct that would
allow more easy compiler checking in the existing toolset,
i.e. the compiler and the linker.
 
R

Richard Heathfield

jacob navia said:
It is not even theory

What he is saying is:

"A strictly conforming program does not contain any instances of
undefined behavior."

What this abstraction bring to us in useful consequences is
zero since nowhere it is specified how to prove/disprove
that program "a" is strictly conforming or not!

Please learn to read for comprehension. The reason I mentioned strictly
conforming programs at all was to make it clear just why C
implementations are already allowed to offer bounds-checking, even
though the Standard doesn't mention it.
But let's close this parentheses. Heathfield posted that
message 9 minutes after I posted mine,

Yes, that's right. It's not difficult to see what you're trying to say:
"wouldn't bounds checking in C be great?" And my answer is equally
simple: "if you want it, you can already have it *today*, and if you
don't want it, you don't have to have it". From a C language
perspective, what else is there to say?
 
R

Richard Heathfield

Keith Thompson said:
I agree that the category of "strictly conforming programs" is too
narrow to be particularly useful to programmers. (It's useful
primarily in defining conforming implementations, I think.)

And that's how I was using it in this thread - to illustrate that you
can have bounds-checking conforming implementations right now if you
want.

On the other hand, there are some presumably valid C constructs that
could break in the presence of bounds checking, such as the classic
"struct hack" and code that assumes two-dimensional arrays can be
accessed as one-dimensional arrays.

Do you have any examples of valid C constructs that are actually valid?
Both the struct hack and the 2d/1d hack are actually invalid C. And
before you start: the codification of the struct hack in C99 involved a
syntax change, so you can put /that/ Get Out Of Jail Free card back in
the pack! :)
 
W

William Hughes

There are alternatives to C if you want performance and better memory
safety.

I am not sure what you are saying here. Are you claiming that
among the existing implementations there is an implementation
of another language that gives you performance and better
memory safety than any existing implementation of C,
or are you claiming that there is another language which gives
performance and better memory safety than any possible
implemntation of C (in this case is the claim that the performance
is comparable to that of C)? Or do you mean something
else?

- William Hughes
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top