Bounds checking and safety in C

I

Ian Collins

jacob said:
Yes, there are a lot of tools, and their existence PROVES the gaping
hole in the language. The problem is that developing such tools
is very system specific and in most cases not available.

o They are NOT available in many embedded platforms

You can test the code off-target.
o They are NOT available in many respected systems like,
(for instance) AIX, or SUN, as far as I know.

News to me, I've been using Sun's dbx with its access/leak checking for
many years.
What I am aiming at is a general language construct that would
allow more easy compiler checking in the existing toolset,
i.e. the compiler and the linker.

Why impost the extra burden on the vendor when they can provides optinal
tools?
 
J

jacob navia

dan said:
Is it reasonable to suggest that if you don't think C is safe, or
needs bounds checking, then develop or use another language that is
safer?

Excuse me but I do not see why I would need another computer
language to write:

int Strcmp(String s1,String s2);

Where String could be defined as:

typedef struct tagString {
size_t len;
char *chars;
};

Why can't we use this in C instead of being stuck with zero
terminated strings forever?

Why can't arrays be first class objects like structures that do NOT
decay into unbounded pointers eliminating all size information and
making size checks impossible?
> Or use a tool / development environment that does bounds
checking. I've used Java and C# a lot and really like them.

Nice, but I think that C should give an answer to this problems,
should address this problems instead of saying to people

"Go away and use a real computer language"
But C is a
different kind of language.

Really?

What kind of language then?

One thing I really I like about C is that
it does NOT impose extra overhead, such as mandatory bounds checking,
that the programmer is stuck with.

Pascal has an optional bound checking that can be turned on/off with
specific programmer's instructions.

I don't think it makes sense to
make suggestions that would radically change the nature of an existing
programming language.

int Strcmp(String s1,String s2);

That changes radically the language?

Maybe, I do not know.

In any case if we go on like this the language is doomed.

I remember what happened when I suggested C last time at work.

"It would be incovenient" people told me.

And that was it.
 
J

jacob navia

Ian said:
You can test the code off-target.

No. The code will run on target and you would have to
simulate the conditions on target, not always an easy task,
mind you.
News to me, I've been using Sun's dbx with its access/leak checking for
many years.

Impossible to use because the program will slow down for a factor
of 1,000 at least...
Why impost the extra burden on the vendor when they can provides optinal
tools?

Because not every vendor can provide such tools, and a small language
modifications would suffice to provide for most bound
checking applications.
 
K

Keith Thompson

Guillaume said:
Bounds checking is nice and all, but it certainly is no panacea.
It may even not be *that* useful IMO. Here is why:

1. No bounds checking. You read or write data outside bounds. It
generates an exception. (All implementations where it doesn't always
generate an exception, or worse, where it can lead to code execution,
is brain-dead IMO, but that's another story. Thus, it's not a problem
of bounds checking or not.)
[...]

Are you under the impression that attempts to violate bounds in C
typically trigger an exception? In my experience, attempting to
access memory just beyond the bounds of an array *usually* results in
the program silently accessing some other memory, perhaps part of
another object.

Implementations typically catch attempts to access memory outside
what's allocated to a program, but accesses within that memory usually
aren't caught, even if they violate the bounds of the intended object.
 
K

Keith Thompson

Richard Heathfield said:
Bjoern Vian said:


Well, I disagree. The practical relevance is immediate and vital:
*because* a strictly conforming program does not violate any bounds, a
bounds checker that does not disturb such a program is *necessarily*
tolerated by the Standard. That is, an implementation may offer
bounds-checking as part of its attempt to secure competitive advantage,
without abrogating its claim to ISO conformance.

I agree with your conclusion, but not necessarily with your line of
reasoning. A bounds-checking implementation that doesn't disturb any
strictly conforming program might still be non-conforming if it breaks
some programs that are valid without being strictly conforming.

Implementations are required to accept strictly conforming programs,
but those aren't the *only* programs they're required to accept.

On the other hand, I suspect that a non-perverse bounds-checking
implementation that doesn't break any strictly conforming programs
would be valid anyway. I haven't thought it through, though.

[...]
 
K

Keith Thompson

jacob navia said:
It is not even theory

What he is saying is:

"A strictly conforming program does not contain any instances of
undefined behavior."

Of course; that's part of the definition of the term.
What this abstraction bring to us in useful consequences is
zero since nowhere it is specified how to prove/disprove
that program "a" is strictly conforming or not!

No, that's not specified. What's the point?
But let's close this parentheses. Heathfield posted that
message 9 minutes after I posted mine, with some "bla bla"
without substance. He did not read the article of those
researchers, and he addresses NONE of the issues I raised.

Please let's return to those issues!

Huh? You raised the issue of bounds checking in C. He confirmed that
it's already legal to implement bounds checking in C. That seems
relevant to me.
 
R

Richard Heathfield

Keith Thompson said:
I agree with your conclusion, but not necessarily with your line of
reasoning. A bounds-checking implementation that doesn't disturb any
strictly conforming program might still be non-conforming if it breaks
some programs that are valid without being strictly conforming.

That is certainly true, and I don't wish to give the impression that it
is not. I am only arguing that a bounds-checking implementation does
not, a priori, become a non-conforming implementation. That does not
mean that a bounds-checking implementation is necessarily conforming!
Implementations are required to accept strictly conforming programs,
but those aren't the *only* programs they're required to accept.
Right.

On the other hand, I suspect that a non-perverse bounds-checking
implementation that doesn't break any strictly conforming programs
would be valid anyway. I haven't thought it through, though.

"Non-perverse" being the operative term here, yes. Those implementors
who wish to gain competitive advantage by offering bounds-checking
would be well advised not to throw that advantage away by making their
implementations non-conforming in the process.
 
I

Ian Collins

jacob said:
No. The code will run on target and you would have to
simulate the conditions on target, not always an easy task,
mind you.
Well I still support a couple of embedded projects and I can't remember
the last time I did any debugging on the targets. Everything is
developed and unit tested on the host, even the acceptance test suite
can run against both the target and the host simulation.
Impossible to use because the program will slow down for a factor
of 1,000 at least...
It's no where near that bad. Yes there is a performance penalty, but
this can be mitigated by only applying the full set of checks to
selected parts of the application.
Because not every vendor can provide such tools, and a small language
modifications would suffice to provide for most bound
checking applications.

But they are not required everywhere. On many embedded targets, there
simply would not be space for the extra code.
 
D

dan

Excuse me but I do not see why I would need another computer
language to write:

int Strcmp(String s1,String s2);

Where String could be defined as:

typedef struct tagString {
size_t len;
char *chars;

};

Why can't we use this in C instead of being stuck with zero
terminated strings forever?

Why can't arrays be first class objects like structures that do NOT
decay into unbounded pointers eliminating all size information and
making size checks impossible?



Nice, but I think that C should give an answer to this problems,
should address this problems instead of saying to people

"Go away and use a real computer language"


Really?

What kind of language then?


Pascal has an optional bound checking that can be turned on/off with
specific programmer's instructions.


int Strcmp(String s1,String s2);

That changes radically the language?

Maybe, I do not know.

In any case if we go on like this the language is doomed.

I remember what happened when I suggested C last time at work.

"It would be incovenient" people told me.

And that was it.

It would be great if bounds checking could be turned on or off in C.
But by your example, it's already possible to accomplish that within
the current C language. Instead of calling strcmp directly, just call
it within another function that does the bounds checking. If this is
important to you or someone else, why don't you write a version of the
C library that does bounds checking? My guess is someone else has
already done this. From your example, it seems like your argument is
really to get rid of 0 as the convention for ending a string. You
don't think that would be a radical change? Sounds pretty radical to
me.

real languages??? I never said that C# and Java were "real computer
languages", and that C is not. I just said they are different types of
languages. I've never heard anyone else on this group say that either.
Any who would say C is not a "real computer language" is just plain
stupid.

your place of work??? Maybe it would be inconvenient for them to use C
at your place of work. I have no idea. I don't see what that has to do
with the discussion.

what kind of language??? You ask "What kind of language then?" to my
statement that C is a different kind of language. If you're proposing
fundamental changes to C, and don't understand the huge differences
between C and Java/C# (I assume you really do know the differences),
you need to go back to the basics.

Doomed??? You say that "if we go on like this the language is doomed".
Have you looked at the TIOBE programming index anytime. The last time
I looked, it was #2 or #3 in popularity! I don't think C is doomed.
And what if it is doomed, whatever that means? After all, it's just a
language. If C dies out (no indication of that at present), there
would be another (hopefully better) language to take it's place. Maybe
someone will write a new language like C that will not use 0 to
terminate strings!

Daniel Goldman
 
K

Keith Thompson

Richard Heathfield said:
Keith Thompson said: [...]
On the other hand, there are some presumably valid C constructs that
could break in the presence of bounds checking, such as the classic
"struct hack" and code that assumes two-dimensional arrays can be
accessed as one-dimensional arrays.

Do you have any examples of valid C constructs that are actually valid?

I don't think that's *quite* what you meant.
Both the struct hack and the 2d/1d hack are actually invalid C. And
before you start: the codification of the struct hack in C99 involved a
syntax change, so you can put /that/ Get Out Of Jail Free card back in
the pack! :)

I actually can't think of any realistic examples off the top of my
head. But there are plenty of programs that aren't strictly
conforming but that a conforming implementation must accept. For
example, a program that prints the value of INT_MAX is not strictly
conforming.

What I have in mind in general is that a bounds-checking
implementation might make incorrect assumptions about when checks can
be removed or, more relevantly, when they can be proven during
compilation to fail. With certain sets of assumptions, no strictly
conforming programs would be affected, but some correct programs that
depend on implementation-defined behavior could be.

It seems fairly clear to me that such examples are theoretically
possible. If you're still not convinced, I can try to come up with
something more concrete.
 
K

Keith Thompson

jacob navia said:
Excuse me but I do not see why I would need another computer
language to write:

int Strcmp(String s1,String s2);

Where String could be defined as:

typedef struct tagString {
size_t len;
char *chars;
};

Quibble: you forgot the typedef name.
Why can't we use this in C instead of being stuck with zero
terminated strings forever?

We can, of course; who ever said we can't? A number of string
packages are available.

If you're suggesting making radical changes to the language, that
won't do any good until and unless such changes are incorporated into
the standard *and* widely implemented. In the best case, this will
take many years. In the worst case, it will never happen; witness the
lack of adoption of C99.

If you're suggesting dropping C's current support for zero-terminated
strings, that will absolutely never happen; it would break existing
code.
Why can't arrays be first class objects like structures that do NOT
decay into unbounded pointers eliminating all size information and
making size checks impossible?

Because making such a change would break existing code.

If you want a new array-like construct, either you can already
implement it in C, or it requires unrealisticly drastic changes to the
language.

[...]

int Strcmp(String s1,String s2);

That changes radically the language?

Maybe, I do not know.

Of course not. That declaration, given an appropriate declaration of
String, is already perfectly legal C.

[...]
 
I

Ian Collins

Keith said:
Because making such a change would break existing code.

If you want a new array-like construct, either you can already
implement it in C, or it requires unrealisticly drastic changes to the
language.
Those drastic changes have already been made, they're called C++ :)
 
K

Keith Thompson

dan said:
It would be great if bounds checking could be turned on or off in C.
But by your example, it's already possible to accomplish that within
the current C language. Instead of calling strcmp directly, just call
it within another function that does the bounds checking. If this is
important to you or someone else, why don't you write a version of the
C library that does bounds checking? My guess is someone else has
already done this. From your example, it seems like your argument is
really to get rid of 0 as the convention for ending a string. You
don't think that would be a radical change? Sounds pretty radical to
me.
[...]

A bounds checking implementation of the standard C library is not
possible, at least not without considerable compiler support.

For example, strcmp() just takes two pointers as arguments. Neither
strcmp() nor any theoretical wrapper function has any way of knowing
the sizes of the actual arrays (assuming that each pointer points to
the first element of an array).

Now a wrapper function that takes additional arguments that specify
the sizes of the arrays could check the bounds, but then the caller
would have to pass correct values for those arguments.

And of course changes to the runtime library don't help user code.

A bounds-checking *compiler* can keep track of the sizes of objects by
enhancing pointers with information about what they point to. Using
such a compiler to compile the library (at least a version of it)
(assuming it's written in C) should give you all the bounds checking
you need.
 
C

Chris Torek

We hear very often in this discussion group that
bounds checking, or safety tests are too expensive
to be used in C.

Some people think it is "too expensive", some do not.

Here are some references to existing bounds-checking implementations
for C compilers.

http://llvm.org/pubs/2006-05-24-SAFECode-BoundsCheck.pdf

http://gcc.gnu.org/ml/gcc/1998-05/msg00073.html

http://www.pgroup.com/

Of these, I think only the Portland Group's compilers are actually
available today (and they do bounds-checking by default, though you
can turn it off with a pragma; see the user's guide).

The relative un-common-ness of bounds checking implementations
does, I think, say something about the relative demand for such
implementations -- especially since it is so easy to put in a C
compiler, if one is designing one "from the ground" as it were.
 
D

dan

We hear very often in this discussion group that
bounds checking, or safety tests are too expensive
to be used in C.

Several researchers of UCSD have published an interesting
paper about this problem.

http://www.jilp.org/vol9/v9paper10.pdf

Specifically, they measured the overhead of a bounds
checking implementation compared to a normal one, and
found that in some cases the overhead can be reduced
to a mere 8.3% in some cases...

I quote from that paper

< quote >
To summarize, our meta-data layout coupled with meta-check instruction
reduce the average overhead of bounds checking to 21% slowdown which is
a significant reduction when compared to 81% incurred by current
software implementations when providing complete bounds checking.
< end quote>

This 21% slowdown is the overhead of checking EACH POINTER
access, and each (possible) dangling pointer dereference.

If we extrapolate to the alleged overhead of using some extra
arguments to strcpy to allow for safer functions (the "evil
empire" proposal) the overhead should be practically ZERO.

Somehow, we are not realizing that with the extreme power of the
CPUs now at our disposal, it is a very good idea to try to
minimize the time we stay behind the debugger when developing
software. A balance should be sought for improving the safety
of the language without overly compromising the speed of the
generated code.

I quote again from that paper:

< quote >
As high GHZ processors become prevalent, adding hardware support to
ensure the correctness and security of programs will be just as
important, for the average user, as further increases in processor
performance. The goal of our research is to focus on developing
compiler and hardware support for efficiently performing software checks
that can be left on all of the time, even in production code releases,
to provide a signi cant increase in the correctness and security of
software.

< end quote >

The C language, as it is perceived by many people here, seems
frozen in the past without any desire to incorporate the changing
hardware/software relationship into the language itself.

When this issues are raised, the "argument" most often presented is
"Efficiency" or just "it is like that".

This has lead to the language being perceived as a backward and error
prone, only good for outdated software or "legacy" systems.

This pleases again the C++ people, that insist in seeing their language
as the "better C", and obviously, C++ is much better in some ways as
C, specially what string handling/common algorithms in the STL/ and
many other advances.

What strikes me is that this need not be, since C could with minimal
improvements be a much safer and general purpose language than it is
now.

Discussion about this possibility is nearly impossible, since a widely
read forum about C (besides this newsgroup) is non existing.

Hence this message.

To summarize:

o Bounds checking and safer, language supported constructs are NOT
impossible because too much overhead
o Constructs like a better run time library could be implemented in a
much safer manner if we would redesign the library from scratch,
without any effective run time cost.

jacob

P.S. If you think this article is off topic, please just ignore it.
I am tired of this stupid polemics.

I read (or tried to read) the article. It does not suggest changing
the C language. I don't think you understood much of the article, if
you read their conclusions carefully. You totally missed the point
that the authors said hardware changes would be needed to make the
bounds checking not use excessive resources. Maybe it would have been
better if your original post was ignored, since it was so poorly
thought out. Nobody disagrees that messing up pointer and array
operations in C produces very nasty bugs, and that there is a need to
reduce the risk. But your idea about changing the C language to get
rid of strings ending with 0 is totally unworkable, because it would
break tons of existing code, as others have pointed out.

Daniel Goldman
 
C

CBFalconer

jacob said:
Excuse me but I do not see why I would need another computer
language to write:

int Strcmp(String s1,String s2);

Where String could be defined as:

typedef struct tagString {
size_t len;
char *chars;
};

Why can't we use this in C instead of being stuck with zero
terminated strings forever?

But you can. That thing is then of type String, which is quite
different from the type string. What's the problem?

Of course you may have problems with users, who don't know what a
String is, and can't just go to their reference books and look it
up.
 
R

Richard Bos

Bjoern Vian said:
Ok, but that is completely irrelevant for programming practice;
it's pure theory.

No, it isn't. If it _were_ possible for a strictly conforming program to
violate object bounds, a bounds checking implementation would be legal.
Since it is _not_ possible, a bounds checking implementation is legal
and, get this, on occasion very _practical_ to discover where your
program is not strictly conforming in a bounds-violating way.

Richard
 
J

jacob navia

Richard said:
No, it isn't. If it _were_ possible for a strictly conforming program to
violate object bounds, a bounds checking implementation would be legal.
Since it is _not_ possible, a bounds checking implementation is legal
and, get this, on occasion very _practical_ to discover where your
program is not strictly conforming in a bounds-violating way.

Richard

Are you a lawyer?

It looks like.

tell me then, how can I know if a program is
"strictly conforming" then?

That is worst than the halting problem!

This whole rubbish is just to destroy any technical
discussion about the issues I raised, using this
pseudo technical language legalese.
 
R

Richard Heathfield

Richard Bos said:

If it _were_ possible for a strictly conforming program
to violate object bounds, a bounds checking implementation would be
legal.

ITYM "illegal". A mere typo, of course, but it completely reverses the
meaning of the sentence! (We've all done it.)
 
J

jacob navia

dan said:
I read (or tried to read) the article. It does not suggest changing
the C language.

I did not said it was. I cited some of their conclusions, specifically
those that proved that the overhead of bounds checking and dangling
pointer testing could be substantially reduced.
I don't think you understood much of the article,
???

if
you read their conclusions carefully. You totally missed the point
that the authors said hardware changes would be needed to make the
bounds checking not use excessive resources.

They do not said that. They say that a hardware support would be
better to have, but that other things like storing the meta-data
with the object instead of in the pointer would speed things
according to their simulations.
Maybe it would have been
better if your original post was ignored, since it was so poorly
thought out.

Yes, I see that you disagree with the post
Nobody disagrees that messing up pointer and array
operations in C produces very nasty bugs, and that there is a need to
reduce the risk.

This contradicts your earlier sentence...

But your idea about changing the C language to get
rid of strings ending with 0 is totally unworkable, because it would
break tons of existing code, as others have pointed out.

I never said that I wanted to make zero terminated strings illegal.
I just propose that OTHER types of strings could be as well supported by
the language, nothing else.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top