Bounds checked arrays

J

jacob navia

As everybody knows, the C language lacks
a way of specifying bounds checked arrays.

This situation is intolerable for people that know
that errors are easy to do, and putting today's
powerful microprocessor to do a few instructions
more at each array access will not make any
difference what speed is concerned.

Not all C applications are real-time apps.

Besides, there are the viruses
and other malicious software that are using
this problem in the C language to do their dirty
work.

Security means that we avoid the consequences
of mistakes and expose them as soon as possible.

It would be useful then, if we introduced into C

#pragma STDC bounds_checking(ON/OFF)

When the state of this toggle is ON, the compiler
would accept declarations (like now)

int array[2][3];

The compiler would emit code that tests
each index for a well formed index.
Each index runs from zero to n-1, i.e.
must be greater than zero and less than
"n".

In arrays of dimension "n", the compiler would
emit code that tests "n" indices, before using
them.

Obviously, optimizations are possible, and
good compilers will optimize away many tests
specially in loops. This is left unspecified.

Important is to know that the array updates
can't overflow in neighboring memory areas.

How many machine instructions does this cost?

Each test is a comparison of an index with a
constant value, and a conditional jump. If the
compiler only emits forward branches, the
branch predictor can correctly predict that in
most cases the branch will NOT be taken.

In abstract assembly this is 4 instructions:
test if index >= 0
jump if not "indexerror"
test if index < "n"
jump if not "indexerror"

where "n" is a compile time constant.

We have something like 4 cycles then, what
a 2GHZ machine does in 0,000 000 004 seconds.

Yes, table access is a common operation but
it would take millions of those to slow the program
a negligible quantity of time. We are not in the
PDP-11 any more.

This would make C a little bit easier to program,
and the resulting programs of better quality.
Buffer overflows happen of course, but the language
limits the consequences by enforcing limits.

By default the behavior is to stop the program.
The user can override this, and different schemas
can be specified by him/her to take actions when
a buffer overflow happens.

A simple strategy is to just do nothing.

int fn(char *input)
{
char tmpbuf[BUFSIZ];
int i=0;
bool result = false;

while (*input) {
tmpbuf[i++] = *input++;
}
// Do things with the input
// set result
return result;
indexerror:
return false;
}

This function uses the built-in error checking
to avoid any bad consequence for an overflow.
If the input data is too long, it is a mal-formed
input that should be discarded.

This frees the programmer from the tedious task
of writing
if (i >= sizeof(tmpbuf)) goto indexerror;

at EACH array access. This can be done better
by a machine and the compiler.

Because a program like that today
***assumes*** the input length
can't be bigger than BUFSIZ.

This is always *implicitely* assumed and
nowhere *enforced* by the way. The current
state implies that catastrophic errors can happen
if the index starts overwriting separate memory
areas like the return address...

Everyone knows this. Let's do something to
stop it. Something simple, without too much
fuzz.

In this case the compiler generates code that
in case of index error
jumps to this label and does what the programmer
specifies.

The motto of C is that: Trust the programmer.

We have just to allow him/her to specify what to do
in case of overflow.

Trust the programmer doesn't mean that we trust
that he never does a mistake of course. It means
that the programmer can specify what actions
to take in case of error and provide sensible
defaults.

Default is then, to finish the program like the
assert() macro, another useful construct.

Note that this proposal doesn't change anything
in the language. No new constructs, even if
compilers could provide arrangements like the
one proposed above.

I propose then:

#pragma STDC bounds_checking(ON/OFF)

that should be written outside a function scope.

That's all.

This proposal is an invitation to
brain-storming..:)

I know that anyone using C is aware of this.
So, let's fix it.

jacob
 
O

osmium

jacob said:
In arrays of dimension "n", the compiler would
emit code that tests "n" indices, before using
them.

n is, in general, unknown in functions that have arrays passed to them.
This presents serious problems to your scheme.
 
J

jacob navia

osmium said:
n is, in general, unknown in functions that have arrays passed to them.
This presents serious problems to your scheme.

This must be changed. You write:

int fn(int array[2][3]);

Meaning that this function accepts only arrays of
dimensions 2 lines three columns. Inside the
function the dimensions are known.
Or you use *bounded* pointers.
You write:
int fn(size_t lines, size_t cols,int array[lines][cols]);

This information must be passed around.
 
M

Martin Dickopp

jacob navia said:
As everybody knows, the C language lacks a way of specifying bounds
checked arrays.

Yes, but I don't think anything in the C standard /forbits/ an
implementation to check array bounds. From the point of view of the
standard, accessing an out of bounds array element causes undefined
behavior, so the implementation is free to (e.g.) terminate the program.

<OT>
FWIW, there is or was an attempt to implement bounds checking in the
GNU C compiler. I don't know what the current state is.
It would be useful then, if we introduced into C

#pragma STDC bounds_checking(ON/OFF)

I disagree. While I would find bounds checking very useful, I consider
it a quality of implementation issue, not something which should be
standardized. Remember that there are also C compilers for embedded
devices, which often operate under severe memory constraints (the
devices, not the compilers) and cannot (and need not) afford the
additional bounds checking overhead. Mandatory bounds checking in the
standard would force these compilers to implement it nevertheless.

Martin
 
J

jacob navia

Martin Dickopp said:
Yes, but I don't think anything in the C standard /forbits/ an
implementation to check array bounds. From the point of view of the
standard, accessing an out of bounds array element causes undefined
behavior, so the implementation is free to (e.g.) terminate the program.

Yes. That's precisely my proposal
<OT>
FWIW, there is or was an attempt to implement bounds checking in the
GNU C compiler. I don't know what the current state is.


I disagree. While I would find bounds checking very useful, I consider
it a quality of implementation issue, not something which should be
standardized. Remember that there are also C compilers for embedded
devices, which often operate under severe memory constraints (the
devices, not the compilers) and cannot (and need not) afford the
additional bounds checking overhead. Mandatory bounds checking in the
standard would force these compilers to implement it nevertheless.

Very easy. In that case, the user just do NOT writes to the code

#pragma ...

That is all.

Normally, in embedded devices the code is in flash/eprom and
memory constraints are low for code, but not for data (RAM).

This means that the 4 assembler instructions more at each access
do not produce a memory starvation situation.

But if the code size is critical, or performance constraints
makes this #pragma impossible, those implementations
do not accept this pragma.

As I said before, not ALL applications in C are running
in 2K RAM. Let's not impose a lower common denominator
for all programs.

Programs that need this feature should be able to use it
in *standard* C.
 
M

Malcolm

jacob navia said:
As everybody knows, the C language lacks
a way of specifying bounds checked arrays.
There's a sub-thread on this (size of a sizeof(pointer)).

An implementation is allowed to create safe pointers that contain bounds
information. For complete safety it also has to put memory access through
gyristics to prevent corruption of the pointer itself. However very few
implementations do so, presumably for efficiency reasons.
 
M

Martin Dickopp

jacob navia said:
Very easy. In that case, the user just do NOT writes to the code

#pragma ...

That is all.

Yes, but the /compiler/ would still have to implement it.
Programs that need this feature should be able to use it
in *standard* C.

You can always try to lobby the committee members. However, according to
my understanding of the standardization process, it is unlikely that the
committee will consider something unless at least some implementations
already offer it as an extension.

Martin
 
N

Nick Landsberg

jacob said:
As everybody knows, the C language lacks
a way of specifying bounds checked arrays.

This situation is intolerable for people that know
that errors are easy to do, and putting today's
powerful microprocessor to do a few instructions
more at each array access will not make any
difference what speed is concerned.

Not all C applications are real-time apps.

But for those applications which are real-time
apps, the overhead for the bounds checking may
well be intolerable.
Besides, there are the viruses
and other malicious software that are using
this problem in the C language to do their dirty
work.

Actually, they are using the undisciplined coding
practices of dilletante C-coders to do their
dirty work.
Security means that we avoid the consequences
of mistakes and expose them as soon as possible.

Yes, if possible at compile time. Else, institute
real coding practices (rather than bogus ones which
say "use fixed arrays rather than malloc") and have
the code inspected by experts. A flag in the compiler
to do as much strict bounds checking as possible at
compile time would go a part of the way to this end.
It would be useful then, if we introduced into C

#pragma STDC bounds_checking(ON/OFF)

When the state of this toggle is ON, the compiler
would accept declarations (like now)

int array[2][3];

The compiler would emit code that tests
each index for a well formed index.
Each index runs from zero to n-1, i.e.
must be greater than zero and less than
"n".

In arrays of dimension "n", the compiler would
emit code that tests "n" indices, before using
them.

Obviously, optimizations are possible, and
good compilers will optimize away many tests
specially in loops. This is left unspecified.

Important is to know that the array updates
can't overflow in neighboring memory areas.

As someone else pointed out, the calling sequences
for all subroutine calls may have to change
to pass the limits. If not, then, the implementation
may have to pass these without the programmer knowing
about it. The C language has a long-standing tradition
that there is a well known "sentinel", i.e. NULL,
which indicates the end of the array for character types.
Of more importance, what should the behaviour be
when array bounds are exceeded. You are proposing
a new standard behaviour which includes bounds checking.
(This, as someone else pointed out, would be an extension
to the language and would probably not be considered
unless there was at least one existing implementation,
an "existance proof."

What should be the "standard behaviour" if the bounds
checks fail? Proposing a solution to a percieved
problem without proposing appropriate behaviour
when something like that happens, is, IMO, half
a solution.

There are many languages out there which perform
bounds checking. Elsethread, there are many languages
which do not have pointers. There is a need for
such languages, otherwise they would not be there,
but they are NOT C.
How many machine instructions does this cost?

Each test is a comparison of an index with a
constant value, and a conditional jump. If the
compiler only emits forward branches, the
branch predictor can correctly predict that in
most cases the branch will NOT be taken.

In abstract assembly this is 4 instructions:
test if index >= 0
jump if not "indexerror"
test if index < "n"
jump if not "indexerror"

where "n" is a compile time constant.

We have something like 4 cycles then, what
a 2GHZ machine does in 0,000 000 004 seconds.

Yes, table access is a common operation but
it would take millions of those to slow the program
a negligible quantity of time. We are not in the
PDP-11 any more.

I differ with your analysis of the number of assembly
instructions, but that's a nit-pick. I work on systems
which need to do upwards of 10,000 database lookups
per second and the same order of magnitude of parsing
strings, etc. They involve copying information from one
memory space to another. For those applications we
use C. For applications with less stringent requirements,
e.g. "only" 1,000 database accesses per second, we use Java.

C has it's place, Java has it's place, other languages
have their place.

On the C applications, we take great care NOT to use
dubious constructs and code reviews by the lead
developers are required before the code even gets
to system test. (No, it does not catch all the
problems.) There is a discipline involved.
If you don't have that discipline, use another
language. (This last was not meant as a flame,
rather a simple statement of fact.)

This would make C a little bit easier to program,
and the resulting programs of better quality.

[Much Snipped]
 
N

Nick Landsberg

Apology below.

What should be the "standard behaviour" if the bounds
checks fail? Proposing a solution to a percieved
problem without proposing appropriate behaviour
when something like that happens, is, IMO, half
a solution.

Whoops ... my apologies ... I just checked your original
post and you do propose a solution to "stop the
program" when this check fails (or "do nothing",
whatever that means).

On the systems that I mentioned in my original
reply, this is NOT an option. (Maximum down-time
from ALL causes, hardware, software, pilot-error,
software upgrade, is not to exceed 53 minutes a year
which is colloquialy quoted a 4-9's or 99.99%. For
installations which require higher availability,
we duplex everything.)

Thus, given the option of "stopping" the program
when array bounds were exceeded
as the proposed behaviour, it would not help me
in my projects. I would still have to enforce the
coding discipline a-priori in order to ensure
that no bounds checks were ever exceeded.
This is what we do now.

There are many languages out there which perform
bounds checking. Elsethread, there are many languages
which do not have pointers. There is a need for
such languages, otherwise they would not be there,
but they are NOT C.
How many machine instructions does this cost?

Each test is a comparison of an index with a
constant value, and a conditional jump. If the
compiler only emits forward branches, the
branch predictor can correctly predict that in
most cases the branch will NOT be taken.

In abstract assembly this is 4 instructions:
test if index >= 0
jump if not "indexerror"
test if index < "n"
jump if not "indexerror"

where "n" is a compile time constant.

We have something like 4 cycles then, what
a 2GHZ machine does in 0,000 000 004 seconds.

Yes, table access is a common operation but
it would take millions of those to slow the program
a negligible quantity of time. We are not in the
PDP-11 any more.


I differ with your analysis of the number of assembly
instructions, but that's a nit-pick. I work on systems
which need to do upwards of 10,000 database lookups
per second and the same order of magnitude of parsing
strings, etc. They involve copying information from one
memory space to another. For those applications we
use C. For applications with less stringent requirements,
e.g. "only" 1,000 database accesses per second, we use Java.

C has it's place, Java has it's place, other languages
have their place.

On the C applications, we take great care NOT to use
dubious constructs and code reviews by the lead
developers are required before the code even gets
to system test. (No, it does not catch all the
problems.) There is a discipline involved.
If you don't have that discipline, use another
language. (This last was not meant as a flame,
rather a simple statement of fact.)

This would make C a little bit easier to program,
and the resulting programs of better quality.

[Much Snipped]
 
N

nrk

jacob said:
osmium said:
n is, in general, unknown in functions that have arrays passed to them.
This presents serious problems to your scheme.

This must be changed. You write:

int fn(int array[2][3]);

Meaning that this function accepts only arrays of
dimensions 2 lines three columns. Inside the
function the dimensions are known.
Or you use *bounded* pointers.
You write:
int fn(size_t lines, size_t cols,int array[lines][cols]);

This information must be passed around.

The proposal is worthy and am sure has been considered by wise folks in the
past. IMHO, the only realistic way of doing this is to ensure that pointer
types somehow implicitly contain all the relevant information needed for
bounds checking. However, this automatically means that all pointer
accesses will slow down. In the naive implementation, all memory access
through pointers might slow down by approximately 50% on average
(neglecting cache effects). Even if you optimize your checks smartly, I
doubt if the hit you take in performance is going to come down by too much.
C has traditionally been the language that's close to the "metal" and more
and more is becoming the language you use when performance becomes
critical. IMO, under these situations, you might be better off using
languages such as Java that implicitly provide such protected environments
for you.

If you want some degree of fast+safe, C++ is turning out to be a better
alternative here. If one uses the STL wisely and extensively, the need for
manual dynamic memory management comes down drastically. Additionally, the
provision of a good in-built string type reduces a lot of common errors
that you find in C programs to non-issues.

I think it is always a bad idea to try and enforce safety (or other
non-functional properties) by relying on what the user must do (for
instance, your suggestion that the size must be passed around is a
monstrosity in my view). The best solutions are those that are invisible
and omni-present.

Or to para-phrase the Isha Upanishad to fit the bottom-line :)

It moves, and it does not move
It is far, and it is near
It is within all this, and it is also outside all this.

Just my $0.02.

-nrk.
 
S

Sidney Cadot

nrk said:
The proposal is worthy and am sure has been considered by wise folks in the
past. IMHO, the only realistic way of doing this is to ensure that pointer
types somehow implicitly contain all the relevant information needed for
bounds checking. However, this automatically means that all pointer
accesses will slow down. In the naive implementation, all memory access
through pointers might slow down by approximately 50% on average
(neglecting cache effects).

<OT>

I did some benchmarking on the Java/hotspot JIT compiler compared to
equivalent C code, and the amazing thing was that the bounds-checked
Java code (which compiled to quite decent assembly on the x386 platform,
anyway) was only marginally slower than the C code (5-10%) for tight
inner loops.

It appeared that this was for two reasons. First, these array-accessing
loops are bound by memory access throughput (the index check is very
cheap, as it resides in a register); second, on superscalar processors
such as the x386, the bounds-check and actual operation can be performed
basically in parallel.

In short, I think we should be more concerned about semantics than about
performance impact.

Best regards,

Sidney
 
N

Nick Landsberg

Sidney Cadot wrote:

[snip]
<OT>

I did some benchmarking on the Java/hotspot JIT compiler compared to
equivalent C code, and the amazing thing was that the bounds-checked
Java code (which compiled to quite decent assembly on the x386 platform,
anyway) was only marginally slower than the C code (5-10%) for tight
inner loops.

It appeared that this was for two reasons. First, these array-accessing
loops are bound by memory access throughput (the index check is very
cheap, as it resides in a register); second, on superscalar processors
such as the x386, the bounds-check and actual operation can be performed
basically in parallel.

Please explain further, Sidney, possibly in an Email. I fail to
grasp the concept of bounds checks and actual operations
being performed in parallel. It's late at night and I need
sleep to function well. :)

Something must be linear, somewhere?

(yes, it's OT, that's why the request for Email)
In short, I think we should be more concerned about semantics than about
performance impact.

Disagree, but it's almost a religious argument. See my posts regarding
high-volume systems elsethread. Even 10% may price me out of the
market. Then again, if everyone used the same implementation, we'd
be on a level playing field again (but the customers would not like
it! :)

Best regards,

Sidney

Resident skeptic and professional paranoid.
 
M

Malcolm

Nick Landsberg said:
Whoops ... my apologies ... I just checked your original
post and you do propose a solution to "stop the
program" when this check fails (or "do nothing",
whatever that means).
"do nothing" would be replacing an attempt to write outside an array with a
nop. An attempt to read outside of bounds would return a set value.
On the systems that I mentioned in my original
reply, this is NOT an option. (Maximum down-time
from ALL causes, hardware, software, pilot-error,
software upgrade, is not to exceed 53 minutes a year
which is colloquialy quoted a 4-9's or 99.99%.
For most apps I don't see that you've got a choice. If there is a software
error causing an array overflow you've got a choice between no results or
wrong results.
My own field of games is an exception, since the worst thing that is likely
to happen is that you spoil someone's enjoyment of a video game, you can
plough on and hope the program recovers.
 
M

Mark McIntyre

"do nothing" would be replacing an attempt to write outside an array with a
nop. An attempt to read outside of bounds would return a set value.

wof!! Rhat would be disasterous. Imagine trying to find someone's bank
balance, the height to fly an aircraft at, or the dosage of a medicine that
they needed, and getting some compiler-defined default value....

"Your balance is 10,000, so we can pay that bad cheque "
"Fly at 10,000 feet. Don't worry about the Andes just in front of you"
"give em 10,000 ml of morphine, that ought to do"
For most apps I don't see that you've got a choice. If there is a software
error causing an array overflow you've got a choice between no results or
wrong results.

Sure but the discussion here is between the compiler doing these checks
magically for you, and you doing them yourself. If your system is mission
critical then frankly you should do them, not rely on the compiler
implementor's quality control.
My own field of games is an exception, since the worst thing that is likely
to happen is that you spoil someone's enjoyment of a video game, you can
plough on and hope the program recovers.

ah, *that* explains a lot ! <gd&r>
 
N

nrk

Nick said:
Apology below.



Whoops ... my apologies ... I just checked your original
post and you do propose a solution to "stop the
program" when this check fails (or "do nothing",
whatever that means).

On the systems that I mentioned in my original
reply, this is NOT an option. (Maximum down-time
from ALL causes, hardware, software, pilot-error,
software upgrade, is not to exceed 53 minutes a year
which is colloquialy quoted a 4-9's or 99.99%. For
installations which require higher availability,
we duplex everything.)

Thus, given the option of "stopping" the program
when array bounds were exceeded
as the proposed behaviour, it would not help me
in my projects. I would still have to enforce the
coding discipline a-priori in order to ensure
that no bounds checks were ever exceeded.
This is what we do now.

Yes, the array bounds check is not going to be useful for what you describe
in your *production* environment. But in all your testing phases I would
think it would add significant value. After thinking about it a bit, I
feel the proposal indeed is worthy even if you can never use it in
production environments ever for performance or other reasons. The simple
fact that I don't have to spend ages trying to track down a off-by-one
error somewhere in my code during my testing phase is enough of an
incentive to support such an initiative. As Jacob suggests, it might be
possible to build such a feature so that it can be toggled at compile time.

Those who suggest that overflowing array bounds happens only to "bad"
programmers (actually most of these kind of suggestions are in the other
"Bounds checked string library" thread)... well, what can I say?

"He that is without sin among you, let him first cast a stone at her."

-nrk.

<snip>
 
N

Nick Landsberg

nrk said:
Nick Landsberg wrote:




Yes, the array bounds check is not going to be useful for what you describe
in your *production* environment. But in all your testing phases I would
think it would add significant value. After thinking about it a bit, I
feel the proposal indeed is worthy even if you can never use it in
production environments ever for performance or other reasons. The simple
fact that I don't have to spend ages trying to track down a off-by-one
error somewhere in my code during my testing phase is enough of an
incentive to support such an initiative. As Jacob suggests, it might be
possible to build such a feature so that it can be toggled at compile time.

Those who suggest that overflowing array bounds happens only to "bad"
programmers (actually most of these kind of suggestions are in the other
"Bounds checked string library" thread)... well, what can I say?

"He that is without sin among you, let him first cast a stone at her."

-nrk.

<snip>

What you say has merit. Even with all the discipline in the world,
some "bounds exceeded" bugs will get through. Upon further thought,
I would be in favor of a "debugging" mode where bounds checking
would be performed but which could be completely turned off
for the "production compile." This would help during individual
module development and the initial system test phase. Note that by
"completely turned off" I mean that the compiler does not
emit the bounds-checking code, not just turn the diagnostics
off. This is for performance reasons stated elsethread.

We'd still run a full regression test of the "production compile"
anyway, because, by our rather paranoid definition, this is
a different executable than the one which had bounds
checking turned on, thus a full regression test is still
necessary.
 
N

Nick Landsberg

A couple of points I forgot in the previous followup
(hit send too soon):

Nick said:
What you say has merit. Even with all the discipline in the world,
some "bounds exceeded" bugs will get through. Upon further thought,
I would be in favor of a "debugging" mode where bounds checking
would be performed but which could be completely turned off
for the "production compile." This would help during individual
module development and the initial system test phase. Note that by
"completely turned off" I mean that the compiler does not
emit the bounds-checking code, not just turn the diagnostics
off. This is for performance reasons stated elsethread.

We'd still run a full regression test of the "production compile"
anyway, because, by our rather paranoid definition, this is
a different executable than the one which had bounds
checking turned on, thus a full regression test is still
necessary.

One could even argue that compiling differently (with and
without bounds checks) that it is a different language
being compiled, with bounds checks being "not C"
(or is that !C ? ... is !C == D ?, ...
or should that be P instead of D ?, but I digress. :)

I wouldn't go quite as far as saying it's a different
language, but I am sure there are people who would.

If it (bounds checking) is the default setting, then it would
benefit the uninitiated by finding their bugs earlier.
It it is not, then those same uninitated (uninitialized? :)
progammers would never use it and there's no benefit for them.

Additional developer discipline would still have to be added
to compile "with" for their functional tests and "without"
for their performance tests. This could be done with "make"
parameters, but that's OT here, and I won't get into that.
 
J

jacob navia

What you say has merit. Even with all the discipline in the world,
some "bounds exceeded" bugs will get through.

Nobody is perfect. Let's accept this. Nobody is perfect and
only the machine has the patience to verify things.
Upon further thought,
I would be in favor of a "debugging" mode where bounds checking
would be performed but which could be completely turned off
for the "production compile."

If you do not write
#pragma STDC bounds_checking(off)

no bounds checking is inkected in the code.

In assert.h we could add that line if NDEBUG
is not defined. Etc.
This would help during individual
module development and the initial system test phase.

I wouldn't turn it off in many programs. Normal PC
programs are completely bound by OS calls and other
stuff. The index checking overhead is so small that
nobody will notice.

Note that by
"completely turned off" I mean that the compiler does not
emit the bounds-checking code, not just turn the diagnostics
off. This is for performance reasons stated elsethread.

The code generator doesn't emit anything new if
there is no bounds check specified.

Even where bounds check is ON, there is no need to
check more than once an input array that is not "resized"
within the body of the function.
We'd still run a full regression test of the "production compile"
anyway, because, by our rather paranoid definition, this is
a different executable than the one which had bounds
checking turned on, thus a full regression test is still
necessary.

Obvious. That is not paranoid but necessary. 4 instructions
more increase the size, lower the speed of the program. It
is another executable.

A finer grained decision can be done if you just turn
bounds checking off only in time critical parts of the
program and leave the rest.

The big problem is the managing of size information.

An obstacle to that is the lack of array assignment and
array "decay" on C.
 
N

Nick Landsberg

jacob said:
"Nick Landsberg" <[email protected]> a écrit dans le message de




Nobody is perfect. Let's accept this. Nobody is perfect and
only the machine has the patience to verify things.

But the "machine" is also programmed by humans. In this
case the compiler is progammed by humans.
If you do not write
#pragma STDC bounds_checking(off)

no bounds checking is inkected in the code.

In assert.h we could add that line if NDEBUG
is not defined. Etc.

I would not support this proposal with the #pragma
because this means modifying the file which contains
the code. I would be more willing to support it if
it was a flag to the compiler. The reasons have
to do with code maintenance and paranoia on my part.
If the source file was modified, how does one guarantee
that the #pragma directive was the ONLY one modified?
How does one guarantee that every developer properly
modified the #pragma directive prior to the "production
build?" One could write various scripts to do this,
I suppose, but that's outside of the language.
This issue transcends the language but is legitimate
to address when dealing with a software system having
several thousand directories and several tens of
thousands of source files. From my perspective,
if this is adopted, it should be a compile time
flag, rather than a pre-processor or compiler
directive in the source file.

Trial implementations may well decide to use the
#pragma, but from a code maintenance standpoint
I feel it needs to be a compile-time flag.
I wouldn't turn it off in many programs. Normal PC
programs are completely bound by OS calls and other
stuff. The index checking overhead is so small that
nobody will notice.

Agreed. I am just viewing it from the rather
parochial standpoint of th kinds of projects
I have been working on the past 10 years or so.
The code generator doesn't emit anything new if
there is no bounds check specified.

Even where bounds check is ON, there is no need to
check more than once an input array that is not "resized"
within the body of the function.

That's tricky code within the compiler. A braindead
implementation could choose to apply the bound checks
for all instances. Since the standard has NEVER provided
performance guidelines, there is nothing to prevent
a compiler vendor to emit that code for every array
access. Other than a better compiler vendor doing
the very optimization you are speaking of. If the
subscript is computed "on the fly" I presume a bounds
check would be necessary, even if the size of the array
did not change.
Obvious. That is not paranoid but necessary. 4 instructions
more increase the size, lower the speed of the program. It
is another executable.

It's actually 6 or more instructions in most assemblers.

A finer grained decision can be done if you just turn
bounds checking off only in time critical parts of the
program and leave the rest.

The big problem is the managing of size information.

An obstacle to that is the lack of array assignment and
array "decay" on C.

And "C" has always let the programmer shoot themselves in
the foot. Once upon a time, someone mentioned to me
that "Pascal is like a nanny. If you want to go out and
play in the street, she insists that you put your galoshes
on in case it should rain. C, on the other hand, has the
attitude of 'don't blame me if you catch your death of cold!'"

Later.
 
K

Keith Thompson

jacob navia said:
It would be useful then, if we introduced into C

#pragma STDC bounds_checking(ON/OFF)

When the state of this toggle is ON, the compiler
would accept declarations (like now)

int array[2][3];

The compiler would emit code that tests
each index for a well formed index.
Each index runs from zero to n-1, i.e.
must be greater than zero and less than
"n".
[...]

To be useful, such bounds checking cannot be limited to explicitly
declared arrays. You've mentioned requiring explicit bounds on
function arguments, but that's only one special case. For example:

int arr[10];
int *ptr = arr;
...
int x = ptr[15];

The declaration of arr and ptr, the initialization of ptr, and the
evaluation of ptr[15] can be almost arbitrarily complex and can be
widely separated, even in separate source files. If you want to do
reliable bounds checking on ptr[15], you have to carry the bounds
information along with the pointer.

In other words, you need fat pointers.

The type of a pointer is known at compilation time, so you don't need
to store, for example, the fact that ptr points to, say, a 4-byte
type. You do need to store the lower and upper bounds of the object
into which it points. For example (let's call the raw address
returned by malloc() M):

char *ptr = malloc(100); /* ptr contains (M, 0, 100) */
ptr += 20; /* ptr contains (M+20, -20, 80) */
char c1 = ptr[-10]; /* ok, -10 is within the bounds */
char c2 = ptr[50]; /* ok, 50 is within the bounds */
char c3 = ptr[80]; /* failure, 80 is just beyond the upper bound */

Every pointer arithmetic operation has to update the bounds
information as well as the address, and can fail if the resulting
address points outside the original object. Every pointer dereference
operation has to check the bounds. The bounds information could be
stored either as byte offsets or as a count of the pointed-to type; I
have a hunch byte offsets would work better, but I'm not certain.

Finally, you still need to have a way to represent a pointer with no
associated bounds information, on which no checking can be done.
Unless you're designing a whole new system from scratch, some code in
C-with-bounds-checking ("C[]"?) will still have to deal with pointers
from system interfaces and from code written in C-without-bounds-checking.

Perhaps, rather than a "#pragma", bounds-checked pointers should be
specified with a new type qualifier.

Note that the presence of absence of bounds-checking, however it's
specified, will change the sizes of pointer objects, which will affect
memory layouts, which will inevitably reveal or hide subtle bugs.
There will be programs that work correctly (and don't trigger any
checks) with bounds-checking enabled, but will fail mysteriously when
it's disabled.

Another possibility might be to store the bounds information
elsewhere, rather than in each pointer object. The system could keep
a table of all known objects, including base address and size, that's
updated whenever an object is created or destroyed. But then each
pointer operation would have to do a (hash?) lookup on the table,
which would slow things down even more.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top