Out-of-bounds Restrictions?

Frederick Gotham · Oct 29, 2006

In another thread recently, there was discussed the accessing of array
indices which were out of bounds. Obviously, the following code is bogus:

int arr[5];

arr[9] = 4;

Take the following snippet however:

int arr[2][5];

arr[0][9] = 4;

The definition of "arr" results in a chunk of memory of size: sizeof(int)*5
*2
Therefore, I would have thought that there was nothing wrong with the
accessing of the 10th element (if we look at it as a one-dimensional array
of int's). I've been told, however, that the behaviour is undefined.

This seems strange to me, especially given that we should be able to access
memory however we like in C. For example, look at the free reign we have
with the following:

int *const p = malloc(128 * sizeof*p);

p[56] = 4;

double *const pd = (double*)p;

pd[0] = 56.334;

What about the following snippet; is the behaviour undefined?

int (*p)[5][6] = malloc(5*6*sizeof*p);

(*p)[0][8] = 7;

Can the second snippet be remedied by simply adding a pointer? e.g.:

int arr[2][5];

/* arr[9] = 4; */

int *const p = (int*)&**arr;

p[9] = 4;

Eric Sosman · Oct 29, 2006

Frederick said:
In another thread recently, there was discussed the accessing of array
indices which were out of bounds. Obviously, the following code is bogus:

int arr[5];

arr[9] = 4;

Take the following snippet however:

int arr[2][5];

arr[0][9] = 4;

The definition of "arr" results in a chunk of memory of size: sizeof(int)*5
*2
Therefore, I would have thought that there was nothing wrong with the
accessing of the 10th element (if we look at it as a one-dimensional array
of int's). I've been told, however, that the behaviour is undefined.

Correct. Formally, arr[0][9] is identical to *(arr[0] + 9),
and the sub-expression `arr[0] + 9' is invalid: it does not produce
a pointer to one of the elements of arr[0] nor to the fictitious
element just past its end (which would be arr[0][5]). That is, it
is invalid for the same reason your first example is invalid.

As a practical matter, few C implementations will catch this
error: few C implementations perform (or even can perform) bounds-
checking on array indices. However, the Standard does not forbid
bounds-checking; it does not take the step of actually sanctioning
an improper reference to perfectly good memory. (This is also the
downfall of the classic form of the "struct hack.")

This seems strange to me, especially given that we should be able to access
memory however we like in C. For example, look at the free reign we have
with the following:

"Free rein." Also, I'm not too sure what you mean by "should"
here: Is there a moral imperative lurking?

int *const p = malloc(128 * sizeof*p);

p[56] = 4;

double *const pd = (double*)p;

pd[0] = 56.334;

I'm not sure what this is supposed to illustrate. It's valid
if malloc() succeeds and if sizeof(double) <= 128 * sizeof(int).

What about the following snippet; is the behaviour undefined?

int (*p)[5][6] = malloc(5*6*sizeof*p);

(*p)[0][8] = 7;

Work it through: p has the type "pointer to int[5][6]," so
(*p) has the type "int[5][6]" and (*p)[0] has the type "int[6]"
and (*p)[0][8] is an attempt to reference outside the extent of
that six-int array. Undefined, and almost certainly uncaught.

(By the way, the malloc() requests enough memory for thirty
such int[5][6] arrays, 900 ints altogether.)

Can the second snippet be remedied by simply adding a pointer? e.g.:

int arr[2][5];

/* arr[9] = 4; */

int *const p = (int*)&**arr;

p[9] = 4;

As far as I can see, this is all right. The cast is useless
clutter, though.

Kenny McCormack · Oct 29, 2006

In another thread recently, there was discussed the accessing of array
indices which were out of bounds. Obviously, the following code is bogus:

int arr[5];

arr[9] = 4;

Take the following snippet however:

int arr[2][5];

arr[0][9] = 4;

The definition of "arr" results in a chunk of memory of size: sizeof(int)*5
*2
Therefore, I would have thought that there was nothing wrong with the
accessing of the 10th element (if we look at it as a one-dimensional array
of int's). I've been told, however, that the behaviour is undefined.

Freddy, Freddy, Freddy, who's side are you on?

You know perfectly well that the above is "undefined" because it's not
strictly kosher. As would anyone who has read this group for more than
a day or two. The basic rule about "undefined behavior" is that if you
have to ask (more precisely, if it even occurs to you to ask), then it
almost certainly is.

The fact that it works as you expect on every implementation known to
man is, of course, completely irrelevant (in the eyes of the religious
zealots).

Barry Schwarz · Oct 29, 2006

In another thread recently, there was discussed the accessing of array
indices which were out of bounds. Obviously, the following code is bogus:

int arr[5];

arr[9] = 4;

Take the following snippet however:

int arr[2][5];

arr[0][9] = 4;

The definition of "arr" results in a chunk of memory of size: sizeof(int)*5
*2
Therefore, I would have thought that there was nothing wrong with the
accessing of the 10th element (if we look at it as a one-dimensional array
of int's). I've been told, however, that the behaviour is undefined.

This seems strange to me, especially given that we should be able to access
memory however we like in C. For example, look at the free reign we have
with the following:

int *const p = malloc(128 * sizeof*p);

p[56] = 4;

double *const pd = (double*)p;

pd[0] = 56.334;

Only in the very likely event that sizeof(double) <= 128*sizeof(int)
but still not guaranteed.

What about the following snippet; is the behaviour undefined?

int (*p)[5][6] = malloc(5*6*sizeof*p);

You probably meant sizeof(int) here since this allocates 30x as much
space as the point you are trying to make.

(*p)[0][8] = 7;

Consider a very specialized processor where the compiler recognizes
that the memory allocated to p straddles a "memory hardware boundary".
All of p[0] is before the boundary and the remainder is after. When
generating code for an expression where the first subscript of (*p) is
a constant, the compiler knows to generate code that references the
correct memory "segment". In the case where the first subscript is
not constant, the compiler knows to generate code to reference both
"segments" and execute only the code that references the correct
segment based on the result of a run-time test of the subscript.

You want (*p)[0][8] to mean (*p)[1][2] but the code will access the
wrong segment since the first subscript is a constant.

Can the second snippet be remedied by simply adding a pointer? e.g.:

int arr[2][5];

/* arr[9] = 4; */

int *const p = (int*)&**arr;

Since **arr is already an int, the cast is unnecessary. Or you could
simplify to (int*)arr.

p[9] = 4;

The compiler is allowed to infer that p points into arr[0]. Not a
change from the previous.

Remove del for email

Frederick Gotham · Oct 29, 2006

Barry Schwarz:

p[9] = 4;

Click to expand...

The compiler is allowed to infer that p points into arr[0]. Not a
change from the previous.

I'm still not so sure that the compiler can decide that the memory access
is bad.

Every chunk of memory can be treated as an array of unsigned char's, as
follows:

#include <stdio.h>

int main(void)
{
int arr[5][6][7] = {0};
/* Is that initialisation OK with
just the single 0 between the
braces? */

char unsigned const *p = (char unsigned*)arr;
char unsigned const *const pover =
(char unsigned*)(arr+sizeof arr/sizeof*arr);

do printf("%u",(unsigned)*p++);
while (pover != p);

return 0;
}

Peter Nilsson · Oct 29, 2006

Frederick said:
...
Take the following snippet however:

int arr[2][5];

arr[0][9] = 4;

The definition of "arr" results in a chunk of memory of size: sizeof(int)*5
*2
Therefore, I would have thought that there was nothing wrong with the
accessing of the 10th element (if we look at it as a one-dimensional array
of int's). I've been told, however, that the behaviour is undefined.

That's because the behaviour _is_ undefined.

This seems strange to me, especially given that we should be able to
access memory however we like in C.

Why should we be able to do that? Do you have some buffer overflow
attack virus that needs to be strictly conforming?

There are cases where walking over the end of an array can be
useful (e.g. struct hack in C90.) But it's not obviously clear that
being
able to do so is more beneficial in terms of efficiency gains. Note
that
languages like Fortran have stricter rules than C. The stricter rules
allow for significantly _greater_ optimisation, not less.

The freedom that pointers have in C is actually a language weakness
more than a strength. Time has proven that.

Dik T. Winter · Oct 30, 2006

> > ...
> > Take the following snippet however:
> >
> > int arr[2][5];
> >
> > arr[0][9] = 4;
> >
> > The definition of "arr" results in a chunk of memory of size: sizeof(int)*5
> > *2
> > Therefore, I would have thought that there was nothing wrong with the
> > accessing of the 10th element (if we look at it as a one-dimensional array
> > of int's). I've been told, however, that the behaviour is undefined.

Click to expand...

>
> That's because the behaviour _is_ undefined.

Can you provide a quotation from the standard? There has been an thread
about this object a bit earlier. My conclusion was that it is permitted.
Note that 'a[j]' is equivalent to '*(&(*(a + i)) + j)', where the '&'
yields a pointer of type 'pointer to array 5 of int.
In the standard indexing is allowed as long as the indexed pointer points
within an 'object'. There are two objects involved here: the complete
object and the element object. Now it could be argued that arr[0] is
a 'pointer to array 5 of int', and so the index must remain in the
range [0,5] (where the last can not be used in dereferencing). But
consider the following snippet:
int *p = &(a[0][0]);
now p is a pointer to int (and by the standard an int is, with indexing,
considered to be an array of a single element), so in that case indexing
of p should be restricted to 0 and 1.

Guest · Oct 30, 2006

Dik said:
Frederick said:

...
Take the following snippet however:

int arr[2][5];

arr[0][9] = 4;

The definition of "arr" results in a chunk of memory of size: sizeof(int)*5
*2
Therefore, I would have thought that there was nothing wrong with the
accessing of the 10th element (if we look at it as a one-dimensional array
of int's). I've been told, however, that the behaviour is undefined.

Click to expand...

That's because the behaviour _is_ undefined.

Click to expand...

Can you provide a quotation from the standard?

Non-normative, but:

J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).

Dik T. Winter · Oct 31, 2006

> Dik T. Winter wrote: ....

> >
> > Can you provide a quotation from the standard?

Click to expand...

>
> Non-normative, but:
>
> J.2 Undefined behavior:
> The behavior is undefined in the following circumstances:
> [...]
> - An array subscript is out of range, even if an object is apparently
> accessible with the given subscript (as in the lvalue expression
> a[1][7] given the declaration int a[4][5]) (6.5.6).

Indeed, I see it now also in 6.5.6 (it is different from the previous
discussion). So formally also the following:
int a[4][3];
int *p = &(a[0][0]);
p[6] = 1;
is also undefined behaviour. On the other hand, the following:
int a[4][3];
int *p = (int *)a;
p[6] = 1;
apparently is valid. Or is it not?

Kavya · Oct 31, 2006

I found one reference. This might be useful.

http://c-faq.com/aryptr/ary2dfunc2.html

"... according to an official interpretation, the behavior of accessing
(&array[0][0])[x] is not defined for x >= NCOLUMNS."

Guest · Oct 31, 2006

Dik said:
Dik T. Winter wrote: ...

That's because the behaviour _is_ undefined.

Can you provide a quotation from the standard?

Click to expand...

Non-normative, but:

J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).

Click to expand...

Indeed, I see it now also in 6.5.6 (it is different from the previous
discussion). So formally also the following:
int a[4][3];
int *p = &(a[0][0]);
p[6] = 1;
is also undefined behaviour. On the other hand, the following:
int a[4][3];
int *p = (int *)a;
p[6] = 1;
apparently is valid. Or is it not?

I don't think the standard guarantees that (int *) a points to a[0][0].

(There are a lot of cases where "everybody knows" what the behaviour of
pointer conversions should be, but where the standard doesn't spell it
out. offsetof() is close to useless without relying on such behaviour.
This may or may not be one of those cases.)

Frederick Gotham · Nov 1, 2006

Dik T. Winter:

Note that 'a[j]' is equivalent to '*(&(*(a + i)) + j)'

Incorrect.

a[j]

is equivalent to:

( a ) [j]

which is equivalent to:

*( a + j )

which is equivalent to:

*( *(a + i) + j )

The addressof operator plays no part.

Out-of-bounds nonsense	63	Nov 1, 2006
Out-of-bounds Nonsense	7	Nov 1, 2006
The behavior of the program.	2	Feb 21, 2014
Adding adressing of IPv6 to program	1	Feb 16, 2023
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Bounds Checking as Undefined Behaviour?	29	Jul 29, 2010
Address one past the end of array - is this syntax a valid C++?	9	Feb 1, 2014
C pipe	1	Dec 9, 2021

Out-of-bounds Restrictions?

Frederick Gotham

Eric Sosman

Kenny McCormack

Barry Schwarz

Frederick Gotham

Peter Nilsson

Dik T. Winter

Guest

Dik T. Winter

Kavya

Guest

Frederick Gotham

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads