Bounds Checking as Undefined Behaviour?

S

Shao Miller

Does the following code imply undefined behaviour?

int main(void) {
union {
char bar[1];
char baz[2];
} foo;

foo.baz[0] = 'z';
foo.baz[1] = 'z';
foo.bar[0] = 'r';
return foo.bar[1]; /* Undefined behaviour for O.O.B. */
}

I don't believe it does, since I believe that 'foo.bar' "decays" to a
'char *' whose bounds are checked, if at all, against the substrate of
'foo' versus a 'char[1]' within the object "space" of 'foo'.

I am not at all interested in arguing or debating about it. When in
doubt, treat as undefined, perhaps.

I'd simply be interested in what others have to say about it.

Ben Bacarisse brought up something [I believe is] related in another
thread and Richard Heathfield and Peter "Seebs" Seebach both
additionally shared interesting thoughts on the matter.

So what are your thoughts, if you please?

A good brain-teaser, Ben!
 
E

Eric Sosman

Does the following code imply undefined behaviour?

int main(void) {
union {
char bar[1];
char baz[2];
} foo;

foo.baz[0] = 'z';
foo.baz[1] = 'z';
foo.bar[0] = 'r';
return foo.bar[1]; /* Undefined behaviour for O.O.B. */
}

U.B. for O.O.B., as you say.
I don't believe it does, since I believe that 'foo.bar' "decays" to a
'char *' whose bounds are checked, if at all, against the substrate of
'foo' versus a 'char[1]' within the object "space" of 'foo'.

No: An array has its own bounds, not imputed bounds from some
larger entity of which it might be part. Consider:

struct { char c[1]; double d; )
*p = malloc(1000000); // assume success
p->c[999999] = 'x';
I am not at all interested in arguing or debating about it. When in
doubt, treat as undefined, perhaps.

I'd simply be interested in what others have to say about it.

Okay: That's what I say. No debate, no argument, just opinion.
 
S

Shao Miller

Does the following code imply undefined behaviour?
int main(void) {
   union {
     char bar[1];
     char baz[2];
   } foo;
   foo.baz[0] = 'z';
   foo.baz[1] = 'z';
   foo.bar[0] = 'r';
   return foo.bar[1];    /* Undefined behaviour for O.O.B. */
}

     U.B. for O.O.B., as you say.
I don't believe it does, since I believe that 'foo.bar' "decays" to a
'char *' whose bounds are checked, if at all, against the substrate of
'foo' versus a 'char[1]' within the object "space" of 'foo'.

     No: An array has its own bounds, not imputed bounds from some
larger entity of which it might be part.  Consider:

        struct { char c[1]; double d; )
            *p = malloc(1000000);  // assume success
        p->c[999999] = 'x';
I am not at all interested in arguing or debating about it.  When in
doubt, treat as undefined, perhaps.
I'd simply be interested in what others have to say about it.

     Okay: That's what I say.  No debate, no argument, just opinion.
Thanks, Eric! Just to clarify: Was the intent of your example to
demonstrate a case where a bounds-checking implementation would
diagnose out-of-bounds or not? I think that's what you said above,
but would like to be sure.
 
S

Shao Miller

It is not clear to me whether the behaviour is undefined, but I have two
observations to make, one wrt the Standard, and one that is more general.

Firstly:
6.2.6.1(7): "When a value is stored in a member of an object of union
type, the bytes of the object representation that do not correspond to
that member but do correspond to other members take unspecified values."

When you store a value, then, in foo.bar[0], the value of foo.baz[1]
becomes unspecified. Therefore, *at best*, the value (if such there be)
of foo.bar[1] is unspecified, so the implementation is free to make it
anything that can be represented in the type, and does not have to
document its behaviour. (This is not quite the same thing as undefined
behaviour, since the implementation is /not/ free to, say, repeal the
law of gravity as it can in principle do for ++i+i++.)
Thanks, Richard!
The second observation is this: the code makes no sense as written. No
matter what it is supposed to achieve, there is certainly a better way
of achieving it. Although clc loves to tilt at windmills, we do tend to
prefer windmills that are actually capable of grinding some genuine
flour. If you can come up with a reason why someone might want to write
the above code, people might take more interest in your question.
Point taken. This question of UB really had no purpose other than a
simple test case, so is not very interesting at all. :)
 
S

Shao Miller

Eric Sosman wrote:



Another five seconds of thinking about it before posting, and I'd have
reached the same conclusion. Ah well!
Forgiveness is yours. :) The question was in regards to out-of-
bounds, not unspecified values. Every response has its value (one
hopes).
 
I

Ian Collins

Eric Sosman wrote:



Another five seconds of thinking about it before posting, and I'd have
reached the same conclusion. Ah well!

Unless the array encompasses all available memory, it is always part of
some larger entity!
 
E

Eric Sosman

Does the following code imply undefined behaviour?
int main(void) {
union {
char bar[1];
char baz[2];
} foo;
foo.baz[0] = 'z';
foo.baz[1] = 'z';
foo.bar[0] = 'r';
return foo.bar[1]; /* Undefined behaviour for O.O.B. */
}

U.B. for O.O.B., as you say.
I don't believe it does, since I believe that 'foo.bar' "decays" to a
'char *' whose bounds are checked, if at all, against the substrate of
'foo' versus a 'char[1]' within the object "space" of 'foo'.

No: An array has its own bounds, not imputed bounds from some
larger entity of which it might be part. Consider:

struct { char c[1]; double d; )
*p = malloc(1000000); // assume success
p->c[999999] = 'x';
I am not at all interested in arguing or debating about it. When in
doubt, treat as undefined, perhaps.
I'd simply be interested in what others have to say about it.

Okay: That's what I say. No debate, no argument, just opinion.
Thanks, Eric! Just to clarify: Was the intent of your example to
demonstrate a case where a bounds-checking implementation would
diagnose out-of-bounds or not? I think that's what you said above,
but would like to be sure.

My thesis is less stringent: All I'm saying is that the behavior
is undefined, regardless of whether there's explicit bounds checking.
For example, the compiler is entitled to assume that p->c and p->d do
not overlap, so that storing to an element of p->c does not invalidate
a p->d that may already have been cached in a register. No bounds
checking in sight, yet the outcome of the program is in doubt anyhow.
 
E

Eric Sosman

Unless the array encompasses all available memory, it is always part of
some larger entity!

Yes (quite likely) from the machine's point of view, but not from
C's standpoint. Separate objects are -- well, separate, not embedded
in some kind of luminiferous aether. That's why it's U.B. to subtract
pointers to disparate objects: There's no "unifying substrate" in which
the address calculation can be defined. In principle, each object can
inhabit its very own private address space, incommensurate with the
spaces that contain other objects. (Granted, implementations usually
take shortcuts.)
 
S

Shao Miller

     My thesis is less stringent: All I'm saying is that the behavior
is undefined, regardless of whether there's explicit bounds checking.
For example, the compiler is entitled to assume that p->c and p->d do
not overlap, so that storing to an element of p->c does not invalidate
a p->d that may already have been cached in a register.  No bounds
checking in sight, yet the outcome of the program is in doubt anyhow.
Aha! So for the same reason, we cannot define the output to be a
return code of '3' below:

#include <stddef.h>

int main(void) {
struct {
char c[1];
int v;
} s1, s2;
size_t i;

s1.v = 3;
s2.v = 5;
for (i = 0; i < sizeof s1; i++) {
s2.c = s1.c;
}
return s2.v;
}

Because 's2.v' might be cached. Is that right?
 
S

Shao Miller

Aha!  So for the same reason, we cannot define the output to be a
return code of '3' below:
.... ... ...

Because 's2.v' might be cached.  Is that right?

Or rather, this code, I should have used:

#include <stddef.h>

int main(void) {
struct {
char c[1];
int v;
} s1, s2;
int x;
size_t i;

s1.v = 3;
s2.v = 5;
x = s2.v;
for (i = 0; i < sizeof s1; i++) {
s2.c = s1.c;
}
return s2.v;
}
 
S

Shao Miller

Shao said:
Aha!  So for the same reason, we cannot define the output to be a
return code of '3' below:

It's certainly true that s2.v might not be 3. Since the last write to s2
was through s2.c[], the value of s2.v is unspecified, for the same
reason as before (6.2.6.1(7)). "Unspecified" does not necessarily mean
the same as "3".
Aha! I sensed the same reason here as your mention of using a
differently named member for setting values, but didn't wish to draw a
conclusion too quickly. Thanks for confirming! Is that why
'memcpy()' is used for copying, so that the implementation has a
chance to invalidate its caches? Or might we still have a cached
's2.v' after copying over '&s2' with 'memcpy()'?
 
T

Tim Rentsch

Richard Heathfield said:
Shao said:
My thesis is less stringent: All I'm saying is that the behavior
is undefined, regardless of whether there's explicit bounds checking.
For example, the compiler is entitled to assume that p->c and p->d do
not overlap, so that storing to an element of p->c does not invalidate
a p->d that may already have been cached in a register. No bounds
checking in sight, yet the outcome of the program is in doubt anyhow.
Aha! So for the same reason, we cannot define the output to be a
return code of '3' below:

#include <stddef.h>

int main(void) {
struct {
char c[1];
int v;
} s1, s2;
size_t i;

s1.v = 3;
s2.v = 5;
for (i = 0; i < sizeof s1; i++) {
s2.c = s1.c;
}
return s2.v;
}

Because 's2.v' might be cached. Is that right?


It's certainly true that s2.v might not be 3. Since the last write to
s2 was through s2.c[], the value of s2.v is unspecified, for the same
reason as before (6.2.6.1(7)). "Unspecified" does not necessarily mean
the same as "3".


Of course what you meant was that reading s1.c[1] is already
undefined behavior, so the store into s2.c[1] may not even
occur (but will also be undefined behavior if it does).

Also, 6.2.6.1p7 is about unions, and the example code uses only
structs.
 
S

Shao Miller

Of course what you meant was that reading s1.c[1] is already
undefined behavior, so the store into s2.c[1] may not even
occur (but will also be undefined behavior if it does).

... ... ...
Thanks for that, Tim.

int main(void) {
/* Pretty simple structure */
struct {
char c[1];
int i;
int j;
} s1, s2;
/* Copy iterator and dummy */
int cpi;

/* Assign some members */
s1.i = 5;
/* s1.j not assigned */
s2.i = 0;
/* s2.j not assigned */

/* Attempt to copy s1 to s2. s2.i might be cached. #1 */
for (cpi = s2.i; cpi < sizeof s1; cpi++)
/* Why might reading s1.c[cpi] be undefined? */
s2.c[cpi] = s1.c[cpi];
/* Attempt to fetch s2.i again. Could it be cached from #1? */
cpi = s2.i;

/* Is memcpy() well-defined for copying s1 to s2? #2 */
memcpy(&s2, &s1, sizeof s1);
/* Could s2.i still be cached from #1? */
cpi = s2.i;

/* Attempt to copy s1 to s2 using approach #3 */
for (cpi = 0; cpi < sizeof s1; cpi++)
/* Use a char * to access each */
((char *)&s2)[cpi] = ((char *)&s1)[cpi];
/* Could s2.i still be cached from #1? */
cpi = s2.i;

/* Attempt to copy s1 to s2 using approach #4 */
for (cpi = 0; cpi < sizeof s1; cpi++)
/* Use casts of the c member, which is at the start */
((char *)s2.c)[cpi] = ((char *)s1.c)[cpi];
/* Could s2.i still be cached from #1? */
cpi = s2.i;

return cpi;
}
 
S

Shao Miller

Please forgive the fact that the code mixes the 'size_t' result from
'sizeof' with 'int'. That's not really the meat and potatoes of the
inline code-questions. :)
 
S

Shao Miller

CLC at its finest!
Heheh, isn't that something.

I'm not interested in argument or debating it, because I agree with
Richard W. M. Jones and Paul H. J. Kelly in their their paper[1]
that[2]:

"Casts can properly be used to change the type of the object to which
a pointer refers, but cannot be used to turn a pointer to one object
into a pointer to another. A corollary is that bounds checking is not
type checking: it does not prevent storage from being declared with
one data structure and used with another."

That is, by my understanding, that one must consider the entirety of
the object as the bounds; not some bounds based on type. A contiguous
region of storage valid for storing values.

If you treat all of memory (as per Ian Collins' quip) as a largest
object, then the bounds are the bounds of memory.

If you consider Eric Sosman's observation regarding "disparate
objects," we can imagine an object space with holes between objects.
These holes are rather like the unspecified padding in a 'struct'
object. So you imagine the bounds of objects in all of memory, and
you can recursively treat the bounds of sub-objects within a 'struct'
object similarly.

An array of an array of an array etc. of an object type has no holes.
Thus for the 2D array example originally from another thread
regarding:

int two_d_array[10][10];
two_d_array[0][20] = 5;

When 'two_d_array[0]' "decays" into an 'int *', some perceived bound
of '10' for the latter dimension implies a treatment of bounds based
on type, not based on the object.

This is why I offered the original post. The 'union' object likewise
has no holes between its start and the "sub-object" designated by
'foo.bar[1]'. Same thing as an array to me, so no undefined
behaviour, in my opinion.

With a 'struct' object, we might have holes (padding) between member
"sub-objects." It would appear that 'memcpy()' and otherwise treating
a 'struct' object as an array of 'char' allows for even such holes to
be perhaps temporarily "walked across." Quite fortunately, we are
protected by the restriction of character type, along with never
attempting to use such a hole as anything other than an "unspecified
value." Not undefined behaviour, but an unspecified value.

A 'struct' object should not span across memory segments or other
possible trap boundaries, so walking the holes in such an object is
also a little different than walking all of memory as an array of
'char', so the recursive treatment suggested above is not 100% the
same... It's safer on a 'struct' object.

Argument or debate are obviously valuable processes for understanding
a subject matter, but not my interest in this case.

[1] Jones, R. W. M. and Kelly, P. H. J. "Backwards-compatible bounds
checking for arrays and pointers in C programs". in ``Third
International Workshop on Automated Debugging'', M. Kamkar and D.
Byers, eds (Linkoping University Electronic Press), 1997. See also:
PostScript[3] and online proceedings[4]. Jones and Kelly provided
object bounds-checking for GCC in 1995.
[2] See [1], pp. 2, section 2, paragraph 3.
[3] http://www.doc.ic.ac.uk/~phjk/Publications/BoundsCheckingForC.ps.gz
[4] http://www.ep.liu.se/ea/cis/1997/009/
 
S

Shao Miller

The undefined behaviour for the multi-dimensional array business is so
interesting. Which of the "Like this?" lines (if any) would allow for
the "Ok?" line to be well-behaved?

#include <stddef.h>
#include <stdlib.h>

#define UB_ALLOWED 1

int main(void) {
typedef int arrten[10];
typedef arrten arrarr[10];
typedef int rawints[sizeof (arrarr)];
void *vp;
size_t s = sizeof (arrarr);
arrarr *foo;
int *ip;

vp = malloc(s);
if (!vp) return 1;

foo = vp;
ip = vp;

(*foo)[1][0] = 2; /* Ok */
ip[10] = 3; /* Ok */

if ((*foo)[0] != ip) return 2;

ip = (*foo)[0]; /* Ok. Make dirty */
if ((*foo)[0] != ip) return 3;

#if UB_ALLOWED
ip[10] = 5; /* UB */
(*foo)[0][10] = 7; /* UB */
#endif

/* How do we wash it off? */
ip = vp; /* Like this? */
ip = (void *)vp; /* Like this? */
ip = (void *)foo; /* Like this? */
ip = *(rawints *)vp; /* Like this? */

ip[10] = 11; /* Ok? */

free(vp);
return 0;
}

Please and thank-you. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top