Out-of-bounds nonsense

  • Thread starter Frederick Gotham
  • Start date
F

Frederick Gotham

[ This post deals with both C and C++, but does not alienate either
language because the language feature being discussed is common to both
languages. ]

Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range. In particular,
accesses similar to the following:

int arr[2][2];

arr[0][3] = 7;

Both the C Standard and the C++ Standard necessitate that the four int's be
lain out in memory in ascending order with no padding in between, i.e.:

(best viewed with a monowidth font)

--------------------------------
| Memory Address | Object |
--------------------------------
| 0 | arr[0][0] |
| 1 | arr[0][1] |
| 2 | arr[1][0] |
| 3 | arr[1][1] |
--------------------------------

One can see plainly that there should be no problem with the little snippet
above because arr[0][3] should be the same as arr[1][1], but I've had
people over on comp.lang.c telling me that the behaviour of the snippet is
undefined because of an "out of bounds" array access. They've even backed
this up with a quote from the C Standard:

J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).

Are the same claims of undefined behaviour existing in C++ made by anyone?

If it is claimed that the snippet's behaviour is undefined because the
second subscript index is out of range of the dimension, then this
rationale can be brought into doubt by the following breakdown. First let's
look at the expression statement:

arr[0][3] = 9;

The compiler, both in C and in C++, must interpret this as:

*( *(arr+0) + 3 ) = 9;

In the inner-most set of parentheses, "arr" decays to a pointer to its
first element, i.e. an R-value of the type int(*)[2]. The value 0 is then
added to this address, which has no effect. The address is then
dereferenced, yielding an L-value of the type int[2]. This expression then
decays to a pointer to its first element, yielding an R-value of the type
int*. The value 3 is then added to this address. (In terms of bytes, it's p
+= 3 * sizeof(int)). This address is then dereferenced, yielding an L-value
of the type int. The L-value int is then assigned to.

The only thing that sounds a little dodgy in the above paragraph is that an
L-value of the type int[2] is used as a stepping stone to access an element
whose index is greater than 1 -- but this shouldn't be a problem, because
the L-value decays to a simple R-value int pointer prior to the accessing
of the int object, so any dimension info should be lost by then.

To the C++ programmers: Is the snippet viewed as invoking undefined
behaviour? If so, why?

To the C programmers: How can you rationalise the assertion that it
actually does invoke undefined behaviour?

I'd like to remind both camps that, in other places, we're free to use our
memory however we please (given that it's suitably aligned, of course). For
instance, look at the following. The code is an absolute dog's dinner, but
it should work perfectly on all implementations:

/* Assume the inclusion of all necessary headers */

void Output(int); /* Defined elsewhere */

int main(void)
{
assert( sizeof(double) > sizeof(int) );

{ /* Start */

double *p;
int *q;
char unsigned const *pover;
char unsigned const *ptr;

p = malloc(5 * sizeof*p);
q = (int*)p++;
pover = (char unsigned*)(p+4);
ptr = (char unsigned*)p;
p[3] = 2423.234;
*q++ = -9;


do Output(*ptr++);
while (pover != ptr);

return 0;

} /* End */
}

Another thing I would remind both camps of, is that we can access any
memory as if it were simply an array of unsigned char's. That means we can
access an "int[2][2]" as if it were simply an object of the type "char
unsigned[sizeof(int[2][2])]".

The reason I'm writing this is that, at the moment, it sounds like absolute
nonsense to me that the original snippet's behaviour is undefined, and so I
challenge those who support its alleged undefinedness.

I leave you with this:

int arr[2][2];

void *const pv = &arr;

int *const pi = (int*)pv; /* Cast used for C++ programmers! */

pi[3] = 8;
 
K

Keith Thompson

Frederick Gotham said:
[ This post deals with both C and C++, but does not alienate either
language because the language feature being discussed is common to both
languages. ]

Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range.
[snip]

This was multi-posted to at least two newsgroups, comp.std.c and
comp.lang.c. (Given the content, it may have been posted to one or
more C++ newsgroups as well, but I haven't checked.)

I mention this so that readers will be aware of it when deciding
whether and where to post a followup.
 
F

Frederick Gotham

Keith Thompson:
This was multi-posted to at least two newsgroups, comp.std.c and
comp.lang.c. (Given the content, it may have been posted to one or
more C++ newsgroups as well, but I haven't checked.)

I mention this so that readers will be aware of it when deciding
whether and where to post a followup.


I wasn't sure how preferable it was over cross-posting, although I know that
my own newsreader makes a mess of cross-posts (...not to mention I don't
quite understand how they're supposed to work).

I have indeed posted to both C newsgroups and C++ newsgroups.
 
E

Eric Sosman

Frederick said:
[...] but I've had
people over on comp.lang.c telling me that the behaviour of the snippet is
undefined because of an "out of bounds" array access. They've even backed
this up with a quote from the C Standard:
[...]

Frederick, you are under no obligation to believe. But
if you choose to disbelieve, do the believers the courtesy of
leaving the temple quietly. Any door you like, just stop
the mewling. Please?

"The man convinced against his will
Is of the same opinion still."
 
R

Richard Heathfield

Frederick Gotham said:
[ This post deals with both C and C++, but does not alienate either
language because the language feature being discussed is common to both
languages. ]

The C++ parts are irrelevant here.
Over on comp.lang.c,

Huh? This *is* comp.lang.c.
we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range.

Yes, and it's undefined behaviour, as has been explained more than ad
nauseam.

One can see plainly that there should be no problem with the little
snippet above

No, one can plainly see that the behaviour is undefined, and that's a
problem.
Are the same claims of undefined behaviour existing in C++ made by anyone?

Questions about C++ are off-topic here.

<snip>
 
F

Frederick Gotham

Richard Heathfield:
Yes, and it's undefined behaviour, as has been explained more than ad
nauseam.


Not ad nauseam enough.

If the following is well-defined:

int *const p = malloc(5 * sizeof *p);

p[2] = 6;

, then I don't see how my original snippet cannot be.
 
F

Flash Gordon

Frederick said:
Richard Heathfield:
Yes, and it's undefined behaviour, as has been explained more than ad
nauseam.

Not ad nauseam enough.

If the following is well-defined:

int *const p = malloc(5 * sizeof *p);

p[2] = 6;

It's allowed because the standard says it is allowed.
, then I don't see how my original snippet cannot be.

It is undefined behaviour because that is what the standards committee
decided. They even made it clear in one of the annexes (as someone else
pointed out to you), so even if you can't follow the reasoning from the
normative text you can see that it is what the committee intended. If
you cannot accept what the committee clearly state then perhaps you
should write your own language which is defined as you thing it should
be and use that instead of C.
 
R

Richard Heathfield

Frederick Gotham said:
Richard Heathfield:



Not ad nauseam enough.

Maybe you have a higher nausea threshold than many of us.
If the following is well-defined:

int *const p = malloc(5 * sizeof *p);

p[2] = 6;

, then I don't see how my original snippet cannot be.

That's your problem, not ours. The Standard forbids access outside the
bounds of an array. If you wish to violate that prohibition, that's your
choice but, if you do so, the behaviour of the program is undefined. You
may not like the fact, but the ISO C Standard is not concerned with your
(or my) likes or dislikes.
 
P

Pierre Asselin

Frederick Gotham said:
int arr[2][2];
arr[0][3] = 7;

Yep, undefined behavior indeed. Surprising, but that's what the
standard says. Your code may break at the next compiler upgrade.

Both the C Standard and the C++ Standard necessitate that the four int's be
lain out in memory in ascending order with no padding in between, i.e.:

That sounds right, but to write portable code you will need to
express your intent with an explicit cast.

((int * const) arr[0])[3]= 7; /* ugly */
or
{
int * const tmp= arr[0]; /* wordy */
tmp[3]= 7;
}
 
F

Frederick Gotham

Pierre Asselin:
That sounds right, but to write portable code you will need to
express your intent with an explicit cast.

((int * const) arr[0])[3]= 7;

Firstly, all casts yield an R-value, so the const is redudant. That would
leave us with:

((int*)arr[0])[3] = 7;

Secondly, the cast is redundant, because "arr[0]" decays to a pointer to its
first element, and no cast is required.

Still though, people seem to think it invokes undefined behaviour.
 
C

Chris Dollin

Frederick said:
Pierre Asselin:
That sounds right, but to write portable code you will need to
express your intent with an explicit cast.

((int * const) arr[0])[3]= 7;

Firstly, all casts yield an R-value, so the const is redudant. That would
leave us with:

((int*)arr[0])[3] = 7;

Secondly, the cast is redundant, because "arr[0]" decays to a pointer to its
first element, and no cast is required.

Still though, people seem to think it invokes undefined behaviour.

int arr[2][2];
arr[0][3] = 7;

`arr[0]` has type `array[2]int`. Clearly such an object has no
element at index 3. BOOM.

(It doesn't matter that `arr[0]` then decays into a pointer-to-int.
That pointer only points to /2/ ints. That there are more ints
afterward, even that there are /surely/ more ints afterward, doesn't
stop it being undefined. Think of it as the Standard permitting an
implementation to do bounds-checking.)

(Similarly, if the Standard were to say that use of any identifier
ending in `kers` yielded undefined behaviour, then using
`bonkers` or `blinkers` in your code would yeild undefined
behaviour, even if the implementation were unchanged from whatever
it now is. Implementations don't have to go out of their way to
make undefined constructs have bizarre behaviour. Of course the
Standard would never make such a generic constraint on names,
so you don't have to avoid `inkers` or `thankers` or `streakers`
as names in your code ...)

(fx:BOOM)
 
F

Fred Kleinschmidt

Frederick Gotham said:
[ This post deals with both C and C++, but does not alienate either
language because the language feature being discussed is common to both
languages. ]

Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range. In particular,
accesses similar to the following:

int arr[2][2];

arr[0][3] = 7;
Frederick Gotham

Consider what happens when you pass this to a function

foo( arr, 2, 2 );

and foo is defined as:

void foo( int **arr, int dim1, int dim2 ) {
/*
* you think this is OK as long as [0][3]
* is inside the bounds of [2][2] ?
*/
arr[0][3] = 7;
}

Now foo can't determine whether you passed a 2D array to it,
or a pointer to a pointer to int.

Now supposes somewhere else I write this code:
int **arr2;
arr2 = malloc( 2 * sizeof (*arr) );
for ( i=0; i < 2; i++ ) {
arr2 = malloc( 2 * sizeof(*arr2) );
}
foo( arr2 ) ;

What will happen in foo() ?
 
F

Frederick Gotham

Chris Dollin:
int arr[2][2];
arr[0][3] = 7;

`arr[0]` has type `array[2]int`.


The type in question is written as: int[2]

Clearly such an object has no
element at index 3. BOOM.


No, but it's part of a contiguous sequence of memory.
 
F

Frederick Gotham

What ever happened to the idea of contiguous memory? When I define the
following object:

int arr[2][2];

, the type of the object "arr" is: int[2][2]

It consists of four int objects which are lain out contiguously in memory.

Therefore, if we take the address of the first int, why can't we add to that
address to yield the addresses of the int's which are directly after it in
contiguous memory? Isn't that one of the fundamental faculties of pointers?
 
F

Frederick Gotham

Do you think there's anything wrong with the following?

int arr[2][2];

int *p = *arr;

*p++ = 1;
*p++ = 2;
*p++ = 3;
*p++ = 4;
 
F

Frederick Gotham

Fred Kleinschmidt:
Consider what happens when you pass this to a function

foo( arr, 2, 2 );

and foo is defined as:

void foo( int **arr, int dim1, int dim2 ) {
/*
* you think this is OK as long as [0][3]
* is inside the bounds of [2][2] ?
*/
arr[0][3] = 7;
}


Thankfully, there's no implicit conversion from int[2][2] to int**.

It would appear you have confused a multi-dimensional array with an array
of pointers to arrays... ?

Now supposes somewhere else I write this code:
int **arr2;


Here you define a pointer to a pointer to an int.

arr2 = malloc( 2 * sizeof (*arr) );


Here you allocate enough memory for two int pointers.

for ( i=0; i < 2; i++ ) {
arr2 = malloc( 2 * sizeof(*arr2) );
}
foo( arr2 ) ;



I think this confirms my suspicion that you're thinking of arrays of
pointers to arrays, rather than multi-dimensional arrays.

Oh, by the way, a multi-dimensonal array is merely an array of arrays.
 
C

Clever Monkey

Frederick said:
Pierre Asselin:
That sounds right, but to write portable code you will need to
express your intent with an explicit cast.

((int * const) arr[0])[3]= 7;

Firstly, all casts yield an R-value, so the const is redudant. That would
leave us with:

((int*)arr[0])[3] = 7;
I totally love this word you just created:

redudant
adj 1. More dude than is needed or required; "being that cool is
just redudant, dude"
 
R

Richard Heathfield

Frederick Gotham said:
Do you think there's anything wrong with the following?

int arr[2][2];

int *p = *arr;

*arr is equivalent to arr[0], which is an array of two int. It is acceptable
for p to point to the first element in this array, so the assignment is
fine.
*p++ = 1;

No problem. Now arr[0][0] has the value 1, and p points to arr[0][1].
*p++ = 2;

No problem. Now arr[0][1] has the value 2, and p points one past the end of
the arr[0] array.

Illegal dereference of p. The behaviour is undefined.

And it will remain undefined, no matter which way you cut it.
 
R

Richard Heathfield

Frederick Gotham said:
What ever happened to the idea of contiguous memory? When I define the
following object:

int arr[2][2];

, the type of the object "arr" is: int[2][2]

It consists of four int objects which are lain out contiguously in memory.

Therefore, if we take the address of the first int, why can't we add to
that address to yield the addresses of the int's which are directly after
it in contiguous memory?

You can, as long as you don't exceed the bounds of any array.
 
J

Jordan Abel

2006-11-01 said:
Frederick Gotham said:
Do you think there's anything wrong with the following?

int arr[2][2];

int *p = *arr;

*arr is equivalent to arr[0], which is an array of two int. It is acceptable
for p to point to the first element in this array, so the assignment is
fine.
*p++ = 1;

No problem. Now arr[0][0] has the value 1, and p points to arr[0][1].
*p++ = 2;

No problem. Now arr[0][1] has the value 2, and p points one past the end of
the arr[0] array.

Illegal dereference of p. The behaviour is undefined.

And it will remain undefined, no matter which way you cut it.

ok. So how about if instead of int *p = *arr; you instead use this:
int *p;

p = (int *)(unsigned char *)arr;
p[0]=0; p[1]=1; /* no problems */
p[2]=2; p[3]=3; /* is this legal? */

/* assuming the above wasn't wrong, or if it was wrong, wasn't executed */
p = (int *)((unsigned char *)arr+2*sizeof(int))
p[0]=2; p[1]=3; /* is this legal? */
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top