Bounds checking and safety in C

K

Kelsey Bjarnason

Guillaume said:
Bounds checking is nice and all, but it certainly is no panacea.
It may even not be *that* useful IMO. Here is why:

1. No bounds checking. You read or write data outside bounds. It
generates an exception.

NO, in most cases writing beyond a variable's specified length doesn't
produce any exception.

Consider this program:
int fn(int *p,int c)
{
return p[c];
}

int main(void)
{
int tab[3];

int s = fn(tab,3);
}

Please tell me a compiler system where this program generates an
exception.

$gcc -fmudflap -lmudflap test.c
$./a.out
*******
mudflap violation 1 (check/read): time=1185910311.158143 ptr=0xbfe5f050 size=4
pc=0xb7e9f20d location=`test.c:3 (fn)'
/usr/lib/libmudflap.so.0(__mf_check+0x3d) [0xb7e9f20d]
./a.out(fn+0x80) [0x80487d4]
./a.out(main+0x47) [0x8048826]
Nearby object 1: checked region begins 1B after and ends 4B after
mudflap object 0x80cb110: name=`test.c:8 (main) tab'
bounds=[0xbfe5f044,0xbfe5f04f] size=12 area=stack check=0r/0w liveness=0
alloc time=1185910311.158136 pc=0xb7e9ec4d
number of nearby objects: 1
$
 
K

Kelsey Bjarnason

Kelsey said:
[snips]

Impossible to use because the program will slow down for a factor
of 1,000 at least...

It's no where near that bad. Yes there is a performance penalty, but
this can be mitigated by only applying the full set of checks to
selected parts of the application.

Ah, so the parts where you used strcpy safely you can skip, but the parts
where you didn't use it safely, you should bounds-check?

Seems silly to me; if you suspect a particular piece of code of needing
such hand-holding, fix the code. If you simply don't know, then why do
you assume some parts are safe and others not?
I think you miss the point.

Assuming you build you application from a set of libraries, you don't
have to bounds check every library every run.

You don't? So one can safely assume that a library which has worked in
three cases will continue to work in 300 more, never exposing a bug of
this sort?
 
P

Peter J. Holzer

They do not said that. They say that a hardware support would be
better to have, but that other things like storing the meta-data
with the object instead of in the pointer would speed things
according to their simulations.

From the conclusion:

| storing the required meta-data with the object scales better in terms of
| performance. Incorporating both bounds and dangling pointer checks using
| this approach results in an average slowdown of 63.9%.
| This slowdown is still too large for the checks to be used in released
| software. We therefore propose an ISA and architecture extension using
| the meta-check instruction.

What is unclear about "This slowdown is still too large for the checks
to be used in released software"? The authors are clearly of the opinion
that the speedup from using OMD instead of PMD is not enough. (I don't
share that opinion: For most software an overhead of 63.9% won't matter
at all).

hp
 
K

Keith Thompson

Kelsey Bjarnason said:
[snips]
In my implementation:

char *str = (char *)String;

Oh goody - modifiable, directly in the object. Now you have to trap every
single pointer operation I might ever choose to do to ensure I don't
modify, say, the length. Or free the buffer. Or whatever.

I suspect that jacob has overloaded the cast operator to allow this;
the cast probably extracts the information from the String object
(which might be a struct) rather than doing an actual pointer
conversion.

This kind of overloading is non-standard, of course, but it's a
permitted extension as long as it doesn't change the behavior of any
strictly conforming program.
 
I

Ian Collins

Kelsey said:
Kelsey said:
[snips]

On Mon, 30 Jul 2007 13:19:10 +1200, Ian Collins wrote:

Impossible to use because the program will slow down for a factor
of 1,000 at least...

It's no where near that bad. Yes there is a performance penalty, but
this can be mitigated by only applying the full set of checks to
selected parts of the application.
Ah, so the parts where you used strcpy safely you can skip, but the parts
where you didn't use it safely, you should bounds-check?

Seems silly to me; if you suspect a particular piece of code of needing
such hand-holding, fix the code. If you simply don't know, then why do
you assume some parts are safe and others not?
I think you miss the point.

Assuming you build you application from a set of libraries, you don't
have to bounds check every library every run.

You don't? So one can safely assume that a library which has worked in
three cases will continue to work in 300 more, never exposing a bug of
this sort?

No, where did I say that?
 
P

Peter J. Holzer

That is the justification I hear most often.

The second is the "spirit of C". C is for macho programmers
that do not need bounds checking because they never make mistakes.

And there are others.

Indeed. You mentioned one of these others in a different post yourself:

It breaks the ABI.

C compilers aren't developed in a vacuum. For almost any platform, there
are already C compilers, and there are libraries compiled with these C
compilers. To be able to use these libraries, the compiler must use the
same sizes and alignments for all types or know that it needs to
interface with a different ABI. The first is clearly incompatible with
fat pointers: If a library function expects to get 4 byte pointers, but
is passed 12 byte pointers instead, it will produce garbage. The second
is possible, but requires a lot of housekeeping, especially if bounds
checking is optional.

I think lack of interoperability with existing compilers and libraries
has a lot more to do with the scarcity of bounds checking C compilers,
than performance, machism, or the spirit of C.
I wanted with my post address the first one. Those researchers
prove that a FAST implementation of bounds checking is
feasible even without language support.

Right. I'm not surprised, though. I believe that the overhead figures
I've seen more than 10 years ago weren't much worse (but I don't have
any papers at hand, so don't ask me for references).
I would say that with language support, the task would be much easier
AND much faster, so fast that it could be done at run time without
any crushing overhead.

Possible. But none of the proposed language changes I've seen from
you has that effect. But since you are the author of a compiler, you can
just implement that change in your compiler and then publish benchmark
results.

hp
 
D

Dave Vandervies

Guillaume said:
Bounds checking is nice and all, but it certainly is no panacea.
It may even not be *that* useful IMO. Here is why:

1. No bounds checking. You read or write data outside bounds. It
generates an exception. (All implementations where it doesn't always
generate an exception, or worse, where it can lead to code execution,
is brain-dead IMO, but that's another story. Thus, it's not a problem
of bounds checking or not.)
[...]

Are you under the impression that attempts to violate bounds in C
typically trigger an exception? In my experience, attempting to
access memory just beyond the bounds of an array *usually* results in
the program silently accessing some other memory, perhaps part of
another object.

This is true, and I have numbers to back it up. Vaguely remembered
numbers, but numbers based on direct experience nonetheless.
<[email protected]> describes an out-of-bounds read
bug that triggered an exception something like three or four times in
the equivalent of a year of running time for the program it was in.
This is definitely not "always".

A bounds-checked build is a Rather Useful debugging tool (if we'd been
able to build with something like GCC's mudflap checking mentioned
elsethread we'd've found this bug the first time we ran the checked
build), but if you write code that works by design, it's only an
effort-saving tool, and won't catch anything that careful reviews and
testing wouldn't.


dave

--
Dave Vandervies (e-mail address removed)
Sadly, the books-i've-never-read pile is already at height that would turn
a health and safety nazi white. --Geoff Lane and Howard
Fix your priorities. This one is important. S Shubs in the SDM
 
P

Peter J. Holzer

Bounds checking is nice and all, but it certainly is no panacea.
It may even not be *that* useful IMO. Here is why:

1. No bounds checking. You read or write data outside bounds. It
generates an exception.

This IS bounds checking.
2. Bounds checking. You read or write data outside bounds. It generates
an 'out of bounds' exception.

Not that much different.

Right. There is not much difference between bounds checking and bounds
checking.
(All implementations where it doesn't always generate an exception, or
worse, where it can lead to code execution, is brain-dead IMO, but
that's another story. Thus, it's not a problem of bounds checking or
not.)

But it is. If a bounds violation doesn't generate an exception, the
implementation obviously doesn't do bounds checking. If a bounds
violation does generate an exception the implementation does check
bounds (at least in some cases).

hp
 
S

santosh

Peter said:
This IS bounds checking.

This is the case where the operating system traps access to addresses not
owned by the process.
Right. There is not much difference between bounds checking and bounds
checking.

Bounds checking would also trap access to memory not a part of the concerned
object, but still within the process's writable address space.

[snip]
 
R

Richard Tobin

This IS bounds checking.
[/QUOTE]
This is the case where the operating system traps access to addresses not
owned by the process.

It might be or it might not be. On a processor that can use a
different segment for each object (including the x86 in theory, though
as far as I'm aware it's never done) you can get hardware bounds
checking on individual objects.

-- Richard
 
I

Ian Collins

This is the case where the operating system traps access to addresses not
owned by the process.

It might be or it might not be. On a processor that can use a
different segment for each object (including the x86 in theory, though
as far as I'm aware it's never done) you can get hardware bounds
checking on individual objects.
[/QUOTE]
It can be done, I have used this feature in an embedded allocator. The
problem for the general case is that there are a finite number (4096 on
the 386) number of descriptor table entries.
 
K

Kelsey Bjarnason

Kelsey said:
Kelsey Bjarnason wrote:
[snips]

On Mon, 30 Jul 2007 13:19:10 +1200, Ian Collins wrote:

Impossible to use because the program will slow down for a factor
of 1,000 at least...

It's no where near that bad. Yes there is a performance penalty, but
this can be mitigated by only applying the full set of checks to
selected parts of the application.
Ah, so the parts where you used strcpy safely you can skip, but the parts
where you didn't use it safely, you should bounds-check?

Seems silly to me; if you suspect a particular piece of code of needing
such hand-holding, fix the code. If you simply don't know, then why do
you assume some parts are safe and others not?

I think you miss the point.

Assuming you build you application from a set of libraries, you don't
have to bounds check every library every run.

You don't? So one can safely assume that a library which has worked in
three cases will continue to work in 300 more, never exposing a bug of
this sort?

No, where did I say that?

The subject is whether to check everything or not. I maintain that if
you're not checking everything, why check anything, as by definition
you've determined which parts of the code are unsafe - so fix 'em.

The alternative is to treat all memory manipulation as suspect, shy of
proving that some function or library is incapable of such flaws.

The response to this was if you build from a set of libraries, you don't
need to check every library every run - and the obvious question is, why
not? If you've proven it safe, you *never* need to check it; if you
haven't, then the argument for bounds checking at all applies and it must
(by the logic behind the checking) be checked every run, every time -
other than the degenerate cases where the input state and system state are
identical on subsequent runs.
 
I

Ian Collins

Kelsey said:
Kelsey said:
On Tue, 31 Jul 2007 09:16:15 +1200, Ian Collins wrote:

Kelsey Bjarnason wrote:
[snips]

Seems silly to me; if you suspect a particular piece of code of needing
such hand-holding, fix the code. If you simply don't know, then why do
you assume some parts are safe and others not?

I think you miss the point.

Assuming you build you application from a set of libraries, you don't
have to bounds check every library every run.
You don't? So one can safely assume that a library which has worked in
three cases will continue to work in 300 more, never exposing a bug of
this sort?
No, where did I say that?

The subject is whether to check everything or not. I maintain that if
you're not checking everything, why check anything, as by definition
you've determined which parts of the code are unsafe - so fix 'em.
See my response to Keith.
 
W

websnarf

[...]
I never said that I wanted to make zero terminated strings illegal.
Good.

I just propose that OTHER types of strings could be as well
supported by the language, nothing else.

Other types of strings can already be well supported by the language.
The only current limitation is that some operations require a bit
more syntax than you might like. For example, if you have a type
String that's really a structure, you can't use a cast to 'char*' to
convert a String to a classic C string -- but if one of the members
is a char* pointing to a C string, you can just use something like
"obj.str". You can't use string literals for String values, but you
can use a function call.

For example:

...
String s = Str("hello");
s = append(s, Str(", world"));

How do you implement Str as something that supports that above that 1)
works in C89 compilers, 2) does not leak memory and 3) works on a
multi-threaded system?

As a macro you can't pass a naked structure as parameter in C89
compilers, and as a return to a function its presumably going to
allocate resources that you are not keeping track of externally in the
second line. If you keep track of them in some hidden structure
somewhere then, besides being weird, you lose multi-threading
capabilities.

In Bstrlib the above is solved:

struct tagbstring s = bfromcstr ("hello");

printf("%s\n", s.str);
...

If you want the convenience of using string literals and so forth, I
think that only a few minor changes to the language would be required.
I haven't thought this through, but I suspect that most or all of
these changes could be implemented as conforming extensions (i.e.,
extensions that don't alter the behavior of any strictly conforming
code; see C99 4p6). Any programs that depend on such extensions would
of course be restricted to implementations that support them, but it
could be the first step in establishing existing practice and possibly
getting the extensions adopted in a future C standard.

Incidentally, depending on how this hypothetical String type is
implemented, aliasing could be an issue. For example;

String s1, s2;
s1 = "hello";
s2 = s1;
s2.str[0] = 'j';

s2 is now equal to str("jello"). Is s1 equal to str("jello"), or to
str("hello")? In other words, does assignment of Strings copy the
entire string value, or does it just create a new reference to the
same string value?

Certainly classic C strings have the same issue, but there's bound to
be considerable work to be done in deciding how a new (standard?)
String type will deal with it.

--
Keith Thompson (The_Other_Keith) (e-mail address removed) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
W

websnarf

Google Groups is a piece of crap, BTW. As I was saying:

How do you implement Str as something that supports that above that
1) works in C89 compilers, 2) does not leak memory and 3) works on a
multi-threaded system?

As a macro you can't pass a naked structure as parameter in C89
compilers, and as a return to a function its presumably going to
allocate resources that you are not keeping track of externally in
the second line. If you keep track of them in some hidden structure
somewhere then, besides being weird, you lose multi-threading
capabilities.

In Bstrlib the above is solved as:

struct tagbstring s = bfromcstr ("hello");
bcatblk (s, bsStaticBlkParms (", world"));

The bfromcstr function allocates resources. The bsStaticBlkParms
macro just plays a simplistic trick which makes a pair of parameters
(the char * string and its length) that can be fed straight to
bcatblk's last two parameters.


I have found that the alternatives supported in Bstrlib take mostly
horizontal source code space. So you could create something like:

S"This is a tagbstring"

but its not a big enough deal to care too much about it.

Which is the main reason why its a *BAD* idea.

Irrelevant. For what are now plainly obvious reasons. (Future C
standards clearly no longer have any influence on real world
programming practice.)
Incidentally, depending on how this hypothetical String type is
implemented, aliasing could be an issue. For example;
String s1, s2;
s1 = "hello";
s2 = s1;
s2.str[0] = 'j';
s2 is now equal to str("jello"). Is s1 equal to str("jello"), or
to str("hello")? In other words, does assignment of Strings copy
the entire string value, or does it just create a new reference to
the same string value?

C is a language that supports pointers. I don't know what your
question is. Bstrlib is very aliasing sensitive and always does
things as correctly as possible (which almost always is "just make it
work").

You have a complete model and tested implementation that is used in
the real world that you can peruse at your leisure. Certainly nobody
has bugged me about how I deal with aliasing in Bstrlib.
 
J

jacob navia

CBFalconer said:
jacob navia wrote:
... snip much about addition of bounds checking ...

I suggest you go back and read about the tests on complete bounds
checking in Pascal, performed roughly 30 years ago. The conclusion
was that routine enabling of such would slow most code down by
something like 2 or 3 percent. No more. The compiler can detect
which cases require checking. All that is required from the
programmer is proper typing.

Of course, Pascal is a sanely designed language, without violent
bandying of pointers, with convenient sub-ranges, etc. The net
effective result is that C cannot be thoroughly checked at compile
or run time.

I agree with that. 2-3% would be feasible now with optimizing
compilers that would take the bounds check out of a loop, for instance

But C is much harder than pascal in this area
 
R

Richard Bos

^^^^^
As has been correctly observed by several people, I meant illegal here.
Are you a lawyer?

It looks like.

Typical - even your insults are half-baked and have all the strength of
a wet sponge.
tell me then, how can I know if a program is
"strictly conforming" then?

My argument doesn't require that you do so. It merely requires that the
implementation - bounds-checking or not - make sure that all strictly
conforming programs are allowed.
That is worst than the halting problem!

I suggest a bit of reading about the halting problem may be in order for
you.
This whole rubbish is just to destroy any technical discussion about the
issues I raised, using this pseudo technical language legalese.

Jacob, you really need to get a grip on reality, stop being so
defensive, and look at what people _actually_ write. Here, for once,
surprisingly enough, Richard H. and I were writing _in defence_ of your
beloved bounds checkers, by saying that the C Standard is allowing
enough that those who want bounds-checking implementations - zis is you,
meester navia - _can have them_, and at the very same time those who
don't want them need not be burdened by them.

So basically what we're writing here is: Fine. You want a bounds
checker, you can have a bounds checker. All you have to do is make sure
that correct programs, which do not cross any bounds they should not,
are not affected (except, of course, in terms of speed and efficiency,
which the Standard intentionally does not address). Programs which run
over the edge of an object, those your lcc-win-plus-plus can deal with
in any, read that: _any_ way it sees fit. You want all that? Stick to
the caveats, and you have our blessing, and what's much more important,
you have the Standard's blessing, as well.
And you manage to see _that_ as a personal insult. Sheesh.

Richard
 
K

Kelsey Bjarnason

The most obvious example is the development of
length delimited strings [strlen] [strcat]
I have been promoting this change [..] for several years.

First of all, it was thought of long ago [by OTHER people]
Second of all, its overall efficiency is still debated [*]
Third of all, companies such as Microsoft already REQUIRE it
internally

You're taking recommendations from a company with one of the worst
security/safety records in computing history?
 
B

Ben Pfaff

JT said:
* I'll leave the literature search and other
examples for others to cite. One quick example is
the tokenization and manipulation of an input text block.
With NUL-terminated string, you can tokenize
the input buffer in-place (by replacing whitespace
with \0), but with length-denoted strings you need
to always malloc a new space since there's no room
for the length field.

It depends. I can define a string as a structure that contains a
length and a pointer. I can then point into the middle of an
existing string with no need to do any copying. This approach
has its own downsides, of course.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top