Out-of-bounds Restrictions?

F

Frederick Gotham

In another thread recently, there was discussed the accessing of array
indices which were out of bounds. Obviously, the following code is bogus:

int arr[5];

arr[9] = 4;

Take the following snippet however:

int arr[2][5];

arr[0][9] = 4;

The definition of "arr" results in a chunk of memory of size: sizeof(int)*5
*2
Therefore, I would have thought that there was nothing wrong with the
accessing of the 10th element (if we look at it as a one-dimensional array
of int's). I've been told, however, that the behaviour is undefined.

This seems strange to me, especially given that we should be able to access
memory however we like in C. For example, look at the free reign we have
with the following:

int *const p = malloc(128 * sizeof*p);

p[56] = 4;

double *const pd = (double*)p;

pd[0] = 56.334;

What about the following snippet; is the behaviour undefined?

int (*p)[5][6] = malloc(5*6*sizeof*p);

(*p)[0][8] = 7;

Can the second snippet be remedied by simply adding a pointer? e.g.:

int arr[2][5];

/* arr[9] = 4; */

int *const p = (int*)&**arr;

p[9] = 4;
 
E

Eric Sosman

Frederick said:
In another thread recently, there was discussed the accessing of array
indices which were out of bounds. Obviously, the following code is bogus:

int arr[5];

arr[9] = 4;

Take the following snippet however:

int arr[2][5];

arr[0][9] = 4;

The definition of "arr" results in a chunk of memory of size: sizeof(int)*5
*2
Therefore, I would have thought that there was nothing wrong with the
accessing of the 10th element (if we look at it as a one-dimensional array
of int's). I've been told, however, that the behaviour is undefined.

Correct. Formally, arr[0][9] is identical to *(arr[0] + 9),
and the sub-expression `arr[0] + 9' is invalid: it does not produce
a pointer to one of the elements of arr[0] nor to the fictitious
element just past its end (which would be arr[0][5]). That is, it
is invalid for the same reason your first example is invalid.

As a practical matter, few C implementations will catch this
error: few C implementations perform (or even can perform) bounds-
checking on array indices. However, the Standard does not forbid
bounds-checking; it does not take the step of actually sanctioning
an improper reference to perfectly good memory. (This is also the
downfall of the classic form of the "struct hack.")
This seems strange to me, especially given that we should be able to access
memory however we like in C. For example, look at the free reign we have
with the following:

"Free rein." Also, I'm not too sure what you mean by "should"
here: Is there a moral imperative lurking?
int *const p = malloc(128 * sizeof*p);

p[56] = 4;

double *const pd = (double*)p;

pd[0] = 56.334;

I'm not sure what this is supposed to illustrate. It's valid
if malloc() succeeds and if sizeof(double) <= 128 * sizeof(int).
What about the following snippet; is the behaviour undefined?

int (*p)[5][6] = malloc(5*6*sizeof*p);

(*p)[0][8] = 7;

Work it through: p has the type "pointer to int[5][6]," so
(*p) has the type "int[5][6]" and (*p)[0] has the type "int[6]"
and (*p)[0][8] is an attempt to reference outside the extent of
that six-int array. Undefined, and almost certainly uncaught.

(By the way, the malloc() requests enough memory for thirty
such int[5][6] arrays, 900 ints altogether.)
Can the second snippet be remedied by simply adding a pointer? e.g.:

int arr[2][5];

/* arr[9] = 4; */

int *const p = (int*)&**arr;

p[9] = 4;

As far as I can see, this is all right. The cast is useless
clutter, though.
 
K

Kenny McCormack

In another thread recently, there was discussed the accessing of array
indices which were out of bounds. Obviously, the following code is bogus:

int arr[5];

arr[9] = 4;

Take the following snippet however:

int arr[2][5];

arr[0][9] = 4;

The definition of "arr" results in a chunk of memory of size: sizeof(int)*5
*2
Therefore, I would have thought that there was nothing wrong with the
accessing of the 10th element (if we look at it as a one-dimensional array
of int's). I've been told, however, that the behaviour is undefined.

Freddy, Freddy, Freddy, who's side are you on?

You know perfectly well that the above is "undefined" because it's not
strictly kosher. As would anyone who has read this group for more than
a day or two. The basic rule about "undefined behavior" is that if you
have to ask (more precisely, if it even occurs to you to ask), then it
almost certainly is.

The fact that it works as you expect on every implementation known to
man is, of course, completely irrelevant (in the eyes of the religious
zealots).
 
B

Barry Schwarz

In another thread recently, there was discussed the accessing of array
indices which were out of bounds. Obviously, the following code is bogus:

int arr[5];

arr[9] = 4;

Take the following snippet however:

int arr[2][5];

arr[0][9] = 4;

The definition of "arr" results in a chunk of memory of size: sizeof(int)*5
*2
Therefore, I would have thought that there was nothing wrong with the
accessing of the 10th element (if we look at it as a one-dimensional array
of int's). I've been told, however, that the behaviour is undefined.

This seems strange to me, especially given that we should be able to access
memory however we like in C. For example, look at the free reign we have
with the following:

int *const p = malloc(128 * sizeof*p);

p[56] = 4;

double *const pd = (double*)p;

pd[0] = 56.334;

Only in the very likely event that sizeof(double) <= 128*sizeof(int)
but still not guaranteed.
What about the following snippet; is the behaviour undefined?

int (*p)[5][6] = malloc(5*6*sizeof*p);

You probably meant sizeof(int) here since this allocates 30x as much
space as the point you are trying to make.
(*p)[0][8] = 7;

Consider a very specialized processor where the compiler recognizes
that the memory allocated to p straddles a "memory hardware boundary".
All of p[0] is before the boundary and the remainder is after. When
generating code for an expression where the first subscript of (*p) is
a constant, the compiler knows to generate code that references the
correct memory "segment". In the case where the first subscript is
not constant, the compiler knows to generate code to reference both
"segments" and execute only the code that references the correct
segment based on the result of a run-time test of the subscript.

You want (*p)[0][8] to mean (*p)[1][2] but the code will access the
wrong segment since the first subscript is a constant.
Can the second snippet be remedied by simply adding a pointer? e.g.:

int arr[2][5];

/* arr[9] = 4; */

int *const p = (int*)&**arr;

Since **arr is already an int, the cast is unnecessary. Or you could
simplify to (int*)arr.
p[9] = 4;

The compiler is allowed to infer that p points into arr[0]. Not a
change from the previous.


Remove del for email
 
F

Frederick Gotham

Barry Schwarz:
p[9] = 4;

The compiler is allowed to infer that p points into arr[0]. Not a
change from the previous.


I'm still not so sure that the compiler can decide that the memory access
is bad.

Every chunk of memory can be treated as an array of unsigned char's, as
follows:

#include <stdio.h>

int main(void)
{
int arr[5][6][7] = {0};
/* Is that initialisation OK with
just the single 0 between the
braces? */

char unsigned const *p = (char unsigned*)arr;
char unsigned const *const pover =
(char unsigned*)(arr+sizeof arr/sizeof*arr);

do printf("%u",(unsigned)*p++);
while (pover != p);

return 0;
}
 
P

Peter Nilsson

Frederick said:
...
Take the following snippet however:

int arr[2][5];

arr[0][9] = 4;

The definition of "arr" results in a chunk of memory of size: sizeof(int)*5
*2
Therefore, I would have thought that there was nothing wrong with the
accessing of the 10th element (if we look at it as a one-dimensional array
of int's). I've been told, however, that the behaviour is undefined.

That's because the behaviour _is_ undefined.
This seems strange to me, especially given that we should be able to
access memory however we like in C.

Why should we be able to do that? Do you have some buffer overflow
attack virus that needs to be strictly conforming? ;)

There are cases where walking over the end of an array can be
useful (e.g. struct hack in C90.) But it's not obviously clear that
being
able to do so is more beneficial in terms of efficiency gains. Note
that
languages like Fortran have stricter rules than C. The stricter rules
allow for significantly _greater_ optimisation, not less.

The freedom that pointers have in C is actually a language weakness
more than a strength. Time has proven that.
 
D

Dik T. Winter

> > ...
> > Take the following snippet however:
> >
> > int arr[2][5];
> >
> > arr[0][9] = 4;
> >
> > The definition of "arr" results in a chunk of memory of size: sizeof(int)*5
> > *2
> > Therefore, I would have thought that there was nothing wrong with the
> > accessing of the 10th element (if we look at it as a one-dimensional array
> > of int's). I've been told, however, that the behaviour is undefined.
>
> That's because the behaviour _is_ undefined.

Can you provide a quotation from the standard? There has been an thread
about this object a bit earlier. My conclusion was that it is permitted.
Note that 'a[j]' is equivalent to '*(&(*(a + i)) + j)', where the '&'
yields a pointer of type 'pointer to array 5 of int.
In the standard indexing is allowed as long as the indexed pointer points
within an 'object'. There are two objects involved here: the complete
object and the element object. Now it could be argued that arr[0] is
a 'pointer to array 5 of int', and so the index must remain in the
range [0,5] (where the last can not be used in dereferencing). But
consider the following snippet:
int *p = &(a[0][0]);
now p is a pointer to int (and by the standard an int is, with indexing,
considered to be an array of a single element), so in that case indexing
of p should be restricted to 0 and 1.
 
G

Guest

Dik said:
Frederick said:
...
Take the following snippet however:

int arr[2][5];

arr[0][9] = 4;

The definition of "arr" results in a chunk of memory of size: sizeof(int)*5
*2
Therefore, I would have thought that there was nothing wrong with the
accessing of the 10th element (if we look at it as a one-dimensional array
of int's). I've been told, however, that the behaviour is undefined.

That's because the behaviour _is_ undefined.

Can you provide a quotation from the standard?

Non-normative, but:

J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).
 
D

Dik T. Winter

> Dik T. Winter wrote: ....
> >
> > Can you provide a quotation from the standard?
>
> Non-normative, but:
>
> J.2 Undefined behavior:
> The behavior is undefined in the following circumstances:
> [...]
> - An array subscript is out of range, even if an object is apparently
> accessible with the given subscript (as in the lvalue expression
> a[1][7] given the declaration int a[4][5]) (6.5.6).

Indeed, I see it now also in 6.5.6 (it is different from the previous
discussion). So formally also the following:
int a[4][3];
int *p = &(a[0][0]);
p[6] = 1;
is also undefined behaviour. On the other hand, the following:
int a[4][3];
int *p = (int *)a;
p[6] = 1;
apparently is valid. Or is it not?
 
G

Guest

Dik said:
Dik T. Winter wrote: ...
That's because the behaviour _is_ undefined.

Can you provide a quotation from the standard?

Non-normative, but:

J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).

Indeed, I see it now also in 6.5.6 (it is different from the previous
discussion). So formally also the following:
int a[4][3];
int *p = &(a[0][0]);
p[6] = 1;
is also undefined behaviour. On the other hand, the following:
int a[4][3];
int *p = (int *)a;
p[6] = 1;
apparently is valid. Or is it not?

I don't think the standard guarantees that (int *) a points to a[0][0].

(There are a lot of cases where "everybody knows" what the behaviour of
pointer conversions should be, but where the standard doesn't spell it
out. offsetof() is close to useless without relying on such behaviour.
This may or may not be one of those cases.)
 
F

Frederick Gotham

Dik T. Winter:
Note that 'a[j]' is equivalent to '*(&(*(a + i)) + j)'


Incorrect.

a[j]

is equivalent to:

( a ) [j]

which is equivalent to:

*( a + j )

which is equivalent to:

*( *(a + i) + j )

The addressof operator plays no part.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top