F
Frederick Gotham
[ This post deals with both C and C++, but does not alienate either
language because the language feature being discussed is common to both
languages. ]
Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range. In particular,
accesses similar to the following:
int arr[2][2];
arr[0][3] = 7;
Both the C Standard and the C++ Standard necessitate that the four int's be
lain out in memory in ascending order with no padding in between, i.e.:
(best viewed with a monowidth font)
--------------------------------
| Memory Address | Object |
--------------------------------
| 0 | arr[0][0] |
| 1 | arr[0][1] |
| 2 | arr[1][0] |
| 3 | arr[1][1] |
--------------------------------
One can see plainly that there should be no problem with the little snippet
above because arr[0][3] should be the same as arr[1][1], but I've had
people over on comp.lang.c telling me that the behaviour of the snippet is
undefined because of an "out of bounds" array access. They've even backed
this up with a quote from the C Standard:
J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).
Are the same claims of undefined behaviour existing in C++ made by anyone?
If it is claimed that the snippet's behaviour is undefined because the
second subscript index is out of range of the dimension, then this
rationale can be brought into doubt by the following breakdown. First let's
look at the expression statement:
arr[0][3] = 9;
The compiler, both in C and in C++, must interpret this as:
*( *(arr+0) + 3 ) = 9;
In the inner-most set of parentheses, "arr" decays to a pointer to its
first element, i.e. an R-value of the type int(*)[2]. The value 0 is then
added to this address, which has no effect. The address is then
dereferenced, yielding an L-value of the type int[2]. This expression then
decays to a pointer to its first element, yielding an R-value of the type
int*. The value 3 is then added to this address. (In terms of bytes, it's p
+= 3 * sizeof(int)). This address is then dereferenced, yielding an L-value
of the type int. The L-value int is then assigned to.
The only thing that sounds a little dodgy in the above paragraph is that an
L-value of the type int[2] is used as a stepping stone to access an element
whose index is greater than 1 -- but this shouldn't be a problem, because
the L-value decays to a simple R-value int pointer prior to the accessing
of the int object, so any dimension info should be lost by then.
To the C++ programmers: Is the snippet viewed as invoking undefined
behaviour? If so, why?
To the C programmers: How can you rationalise the assertion that it
actually does invoke undefined behaviour?
I'd like to remind both camps that, in other places, we're free to use our
memory however we please (given that it's suitably aligned, of course). For
instance, look at the following. The code is an absolute dog's dinner, but
it should work perfectly on all implementations:
/* Assume the inclusion of all necessary headers */
void Output(int); /* Defined elsewhere */
int main(void)
{
assert( sizeof(double) > sizeof(int) );
{ /* Start */
double *p;
int *q;
char unsigned const *pover;
char unsigned const *ptr;
p = malloc(5 * sizeof*p);
q = (int*)p++;
pover = (char unsigned*)(p+4);
ptr = (char unsigned*)p;
p[3] = 2423.234;
*q++ = -9;
do Output(*ptr++);
while (pover != ptr);
return 0;
} /* End */
}
Another thing I would remind both camps of, is that we can access any
memory as if it were simply an array of unsigned char's. That means we can
access an "int[2][2]" as if it were simply an object of the type "char
unsigned[sizeof(int[2][2])]".
The reason I'm writing this is that, at the moment, it sounds like absolute
nonsense to me that the original snippet's behaviour is undefined, and so I
challenge those who support its alleged undefinedness.
I leave you with this:
int arr[2][2];
void *const pv = &arr;
int *const pi = (int*)pv; /* Cast used for C++ programmers! */
pi[3] = 8;
language because the language feature being discussed is common to both
languages. ]
Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range. In particular,
accesses similar to the following:
int arr[2][2];
arr[0][3] = 7;
Both the C Standard and the C++ Standard necessitate that the four int's be
lain out in memory in ascending order with no padding in between, i.e.:
(best viewed with a monowidth font)
--------------------------------
| Memory Address | Object |
--------------------------------
| 0 | arr[0][0] |
| 1 | arr[0][1] |
| 2 | arr[1][0] |
| 3 | arr[1][1] |
--------------------------------
One can see plainly that there should be no problem with the little snippet
above because arr[0][3] should be the same as arr[1][1], but I've had
people over on comp.lang.c telling me that the behaviour of the snippet is
undefined because of an "out of bounds" array access. They've even backed
this up with a quote from the C Standard:
J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).
Are the same claims of undefined behaviour existing in C++ made by anyone?
If it is claimed that the snippet's behaviour is undefined because the
second subscript index is out of range of the dimension, then this
rationale can be brought into doubt by the following breakdown. First let's
look at the expression statement:
arr[0][3] = 9;
The compiler, both in C and in C++, must interpret this as:
*( *(arr+0) + 3 ) = 9;
In the inner-most set of parentheses, "arr" decays to a pointer to its
first element, i.e. an R-value of the type int(*)[2]. The value 0 is then
added to this address, which has no effect. The address is then
dereferenced, yielding an L-value of the type int[2]. This expression then
decays to a pointer to its first element, yielding an R-value of the type
int*. The value 3 is then added to this address. (In terms of bytes, it's p
+= 3 * sizeof(int)). This address is then dereferenced, yielding an L-value
of the type int. The L-value int is then assigned to.
The only thing that sounds a little dodgy in the above paragraph is that an
L-value of the type int[2] is used as a stepping stone to access an element
whose index is greater than 1 -- but this shouldn't be a problem, because
the L-value decays to a simple R-value int pointer prior to the accessing
of the int object, so any dimension info should be lost by then.
To the C++ programmers: Is the snippet viewed as invoking undefined
behaviour? If so, why?
To the C programmers: How can you rationalise the assertion that it
actually does invoke undefined behaviour?
I'd like to remind both camps that, in other places, we're free to use our
memory however we please (given that it's suitably aligned, of course). For
instance, look at the following. The code is an absolute dog's dinner, but
it should work perfectly on all implementations:
/* Assume the inclusion of all necessary headers */
void Output(int); /* Defined elsewhere */
int main(void)
{
assert( sizeof(double) > sizeof(int) );
{ /* Start */
double *p;
int *q;
char unsigned const *pover;
char unsigned const *ptr;
p = malloc(5 * sizeof*p);
q = (int*)p++;
pover = (char unsigned*)(p+4);
ptr = (char unsigned*)p;
p[3] = 2423.234;
*q++ = -9;
do Output(*ptr++);
while (pover != ptr);
return 0;
} /* End */
}
Another thing I would remind both camps of, is that we can access any
memory as if it were simply an array of unsigned char's. That means we can
access an "int[2][2]" as if it were simply an object of the type "char
unsigned[sizeof(int[2][2])]".
The reason I'm writing this is that, at the moment, it sounds like absolute
nonsense to me that the original snippet's behaviour is undefined, and so I
challenge those who support its alleged undefinedness.
I leave you with this:
int arr[2][2];
void *const pv = &arr;
int *const pi = (int*)pv; /* Cast used for C++ programmers! */
pi[3] = 8;