F
Frederick Gotham
[ This post deals with both C and C++, but does not alienate either language
because the language feature being discussed is common to both languages. ]
Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range. In particular,
accesses similar to the following:
int arr[2][2];
arr[0][3] = 7;
Both the C Standard and the C++ Standard necessitate that the four int's be
lain out in memory in ascending order with no padding in between, i.e.:
(best viewed with a monowidth font)
--------------------------------
| Memory Address | Object |
--------------------------------
| 0 | arr[0][0] |
| 1 | arr[0][1] |
| 2 | arr[1][0] |
| 3 | arr[1][1] |
--------------------------------
One can see plainly that there should be no problem with the little snippet
above because arr[0][3] should be the same as arr[1][1], but I've had people
over on comp.lang.c telling me that the behaviour of the snippet is undefined
because of an "out of bounds" array access. They've even backed this up with
a quote from the C Standard:
J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).
Are the same claims of undefined behaviour existing in C++ made by anyone?
If it is claimed that the snippet's behaviour is undefined because the second
subscript index is out of range of the dimension, then this rationale can be
brought into doubt by the following breakdown. First let's look at the
expression statement:
arr[0][3] = 9;
The compiler, both in C and in C++, must interpret this as:
*( *(arr+0) + 3 ) = 9;
In the inner-most set of parentheses, "arr" decays to a pointer to its first
element, i.e. an R-value of the type int(*)[2]. The value 0 is then added to
this address, which has no effect. The address is then dereferenced, yielding
an L-value of the type int[2]. This expression then decays to a pointer to
its first element, yielding an R-value of the type int*. The value 3 is then
added to this address. (In terms of bytes, it's p += 3 * sizeof(int)). This
address is then dereferenced, yielding an L-value of the type int. The L-
value int is then assigned to.
The only thing that sounds a little dodgy in the above paragraph is that an
L-value of the type int[2] is used as a stepping stone to access an element
whose index is greater than 1 -- but this shouldn't be a problem, because the
L-value decays to a simple R-value int pointer prior to the accessing of the
int object, so any dimension info should be lost by then.
To the C++ programmers: Is the snippet viewed as invoking undefined
behaviour? If so, why?
To the C programmers: How can you rationalise the assertion that it actually
does invoke undefined behaviour?
I'd like to remind both camps that, in other places, we're free to use our
memory however we please (given that it's suitably aligned, of course). For
instance, look at the following. The code is an absolute dog's dinner, but it
should work perfectly on all implementations:
/* Assume the inclusion of all necessary headers */
void Output(int); /* Defined elsewhere */
int main(void)
{
assert( sizeof(double) > sizeof(int) );
{ /* Start */
double *p;
int *q;
char unsigned const *pover;
char unsigned const *ptr;
p = malloc(5 * sizeof*p);
q = (int*)p++;
pover = (char unsigned*)(p+4);
ptr = (char unsigned*)p;
p[3] = 2423.234;
*q++ = -9;
do Output(*ptr++);
while (pover != ptr);
return 0;
} /* End */
}
Another thing I would remind both camps of, is that we can access any memory
as if it were simply an array of unsigned char's. That means we can access an
"int[2][2]" as if it were simply an object of the type "char unsigned[sizeof
(int[2][2])]".
The reason I'm writing this is that, at the moment, it sounds like absolute
nonsense to me that the original snippet's behaviour is undefined, and so I
challenge those who support its alleged undefinedness.
I leave you with this:
int arr[2][2];
void *const pv = &arr;
int *const pi = (int*)pv; /* Cast used for C++ programmers! */
pi[3] = 8;
because the language feature being discussed is common to both languages. ]
Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range. In particular,
accesses similar to the following:
int arr[2][2];
arr[0][3] = 7;
Both the C Standard and the C++ Standard necessitate that the four int's be
lain out in memory in ascending order with no padding in between, i.e.:
(best viewed with a monowidth font)
--------------------------------
| Memory Address | Object |
--------------------------------
| 0 | arr[0][0] |
| 1 | arr[0][1] |
| 2 | arr[1][0] |
| 3 | arr[1][1] |
--------------------------------
One can see plainly that there should be no problem with the little snippet
above because arr[0][3] should be the same as arr[1][1], but I've had people
over on comp.lang.c telling me that the behaviour of the snippet is undefined
because of an "out of bounds" array access. They've even backed this up with
a quote from the C Standard:
J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).
Are the same claims of undefined behaviour existing in C++ made by anyone?
If it is claimed that the snippet's behaviour is undefined because the second
subscript index is out of range of the dimension, then this rationale can be
brought into doubt by the following breakdown. First let's look at the
expression statement:
arr[0][3] = 9;
The compiler, both in C and in C++, must interpret this as:
*( *(arr+0) + 3 ) = 9;
In the inner-most set of parentheses, "arr" decays to a pointer to its first
element, i.e. an R-value of the type int(*)[2]. The value 0 is then added to
this address, which has no effect. The address is then dereferenced, yielding
an L-value of the type int[2]. This expression then decays to a pointer to
its first element, yielding an R-value of the type int*. The value 3 is then
added to this address. (In terms of bytes, it's p += 3 * sizeof(int)). This
address is then dereferenced, yielding an L-value of the type int. The L-
value int is then assigned to.
The only thing that sounds a little dodgy in the above paragraph is that an
L-value of the type int[2] is used as a stepping stone to access an element
whose index is greater than 1 -- but this shouldn't be a problem, because the
L-value decays to a simple R-value int pointer prior to the accessing of the
int object, so any dimension info should be lost by then.
To the C++ programmers: Is the snippet viewed as invoking undefined
behaviour? If so, why?
To the C programmers: How can you rationalise the assertion that it actually
does invoke undefined behaviour?
I'd like to remind both camps that, in other places, we're free to use our
memory however we please (given that it's suitably aligned, of course). For
instance, look at the following. The code is an absolute dog's dinner, but it
should work perfectly on all implementations:
/* Assume the inclusion of all necessary headers */
void Output(int); /* Defined elsewhere */
int main(void)
{
assert( sizeof(double) > sizeof(int) );
{ /* Start */
double *p;
int *q;
char unsigned const *pover;
char unsigned const *ptr;
p = malloc(5 * sizeof*p);
q = (int*)p++;
pover = (char unsigned*)(p+4);
ptr = (char unsigned*)p;
p[3] = 2423.234;
*q++ = -9;
do Output(*ptr++);
while (pover != ptr);
return 0;
} /* End */
}
Another thing I would remind both camps of, is that we can access any memory
as if it were simply an array of unsigned char's. That means we can access an
"int[2][2]" as if it were simply an object of the type "char unsigned[sizeof
(int[2][2])]".
The reason I'm writing this is that, at the moment, it sounds like absolute
nonsense to me that the original snippet's behaviour is undefined, and so I
challenge those who support its alleged undefinedness.
I leave you with this:
int arr[2][2];
void *const pv = &arr;
int *const pi = (int*)pv; /* Cast used for C++ programmers! */
pi[3] = 8;