Out-of-bounds nonsense

  • Thread starter Frederick Gotham
  • Start date
R

Rafael Almeida

int arr[2][2];

arr[0][3] = 7;

One can see plainly that there should be no problem with the little
snippet above because arr[0][3] should be the same as arr[1][1], but
I've had people over on comp.lang.c telling me that the behaviour of
the snippet is undefined because of an "out of bounds" array access.
They've even backed this up with a quote from the C Standard:

J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
[...]
- An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).

As far as I know, it's not guaranteed that

*(*(arr+0)+3)

should work. After all, the *(arr+0) has only two elements, and there's
no guarantees that what would be the first undefined position after
*(arr+0) is the first position of *(arr+1).

Moreover, if someone writes a compiler that might even work when you
write *(*(arr+0)+3), but when you write arr[0][3] it deletes all your
files, it would be standard compliant, but the behaviour of arr[0][3]
would be very different from what you wanted. That's why it's undefined
behaviour.
 
D

Default User

Jordan Abel wrote:

ok. So how about if instead of int *p = *arr; you instead use this:
int *p;

p = (int *)(unsigned char *)arr;
p[0]=0; p[1]=1; /* no problems */
p[2]=2; p[3]=3; /* is this legal? */

/* assuming the above wasn't wrong, or if it was wrong, wasn't
executed */ p = (int *)((unsigned char *)arr+2*sizeof(int))
p[0]=2; p[1]=3; /* is this legal? */


You mentioned this before, and I'm not sure. The best I can find in the
standard (c99 draft) is:


[#7] A pointer to an object or incomplete type may be
converted to a pointer to a different object or incomplete
type. If the resulting pointer is not correctly aligned50)
for the pointed-to type, the behavior is undefined.
Otherwise, when converted back again, the result shall
compare equal to the original pointer. When a pointer to an
object is converted to a pointer to a character type, the
result points to the lowest addressed byte of the object.
Successive increments of the result, up to the size of the
object, yield pointers to the remaining bytes of the object.


50)In general, the concept ``correctly aligned'' is
transitive: if a pointer to type A is correctly aligned
for a pointer to type B, which in turn is correctly
aligned for a pointer to type C, then a pointer to type A
is correctly aligned for a pointer to type C.


So the question becomes one of alignment, I think. I'm fairly sure that
it would have to be aligned properly.




Brian
 
J

Jordan Abel

2006-11-01 said:
So the question becomes one of alignment, I think. I'm fairly sure that
it would have to be aligned properly.

Well, it's guaranteed that it points at a place where an integer is
actually stored, so it would certainly have to be aligned properly

Now, the second half of my question is - what if i skip the explicit
cast to (unsigned char *)? The point being i'm still starting with
a pointer that is to the first member of an array, correctly aligned for
an int, that is 4*sizeof(int) bytes wide, so i'm absolutely sure there's
no _real_ issue here - the question becomes one of whether an
implementation can decide to reject it just to be contrary.

starting with *arr is different because your pointer is then to the
first member of an array of two ints, regardless of the fact that
another identical array follows it in memory. arr[0][2] is right out,
i've been thoroughly convinced of this in comp.std.c

That is, in any "heavy pointer" [as discussed earlier in this thread, or
perhaps in comp.std.c - this is why multiposting is incorrect, by the
way] that you get from 'arr' instead of '*arr', the "how far can it go
before reaching the end" has to be 2*sizeof(int[2]), that is, 2*(2*
sizeof(int)) because it's the pointer-to-the-first-element of an array
of two elements of type int[2], whereas it's conceivable that *arr is
realized as the pointer-to-the-first-element of an array of two ints
(and therefore its "heavy pointer" parameter will be 2*sizeof(int)
instead)

I think that actually, the conclusion that must be reached is that this
is legal: int a[2][2]; int *p=(int *)a; p[2]=2; and this is not: int a[
2][2]; int *p=*a; p[2]=2; despite the fact that it makes more apparent
intuitive sense for a and *a to be equivalent pointers in all but type.
 
P

Peter Nilsson

Frederick said:
Over on comp.lang.c, we've been discussing the accessing of array elements
via subscript indices which may appear to be out of range. In particular,
accesses similar to the following:

int arr[2][2];

arr[0][3] = 7;

Both the C Standard and the C++ Standard necessitate that the four int's be
lain out in memory in ascending order with no padding in between
One can see plainly that there should be no problem with the little snippet
above because arr[0][3] should be the same as arr[1][1],

Frederick, in this and other related threads, you appear to be running
two
arguments simultaniously:

1) The behaviour is defined; and
2) The behaviour should be defined.

On point 1, you are wrong and can look up the cited c&v any time you
like.

On point 2, no one is arguing that the concept cannot be made rigorous
and consistent. It's just that no one can see clear benefit to doing
so.
[The struct hack problems were fixed through amended syntax and
semantics that didn't involve legalising out of bounds access.]

In contrast, the _disadvantages_ to the technique are well known and
documented, in particular, buffer overflow problems and optimisation
crimping.

What you suggest is unnecessary. To the programmers that want to do
that kind of thing, the blanket of undefined behaviour covers them as
it
does for many other implementation specific techniques.
 
M

Mark McIntyre

Chris Dollin:
int arr[2][2];
arr[0][3] = 7;

`arr[0]` has type `array[2]int`.


The type in question is written as: int[2]

Thats how you write it in C, not how you explain what it is.
No, but it's part of a contiguous sequence of memory.

So is
struct
{
int x[2];
int y[2];
}bar;

and I'm sure you'd agree that writing to bar.x[3] would be UB./

Whats so hard to understand here?
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
A

Andrey Tarasevich

Frederick said:
...
The only thing that sounds a little dodgy in the above paragraph is that an
L-value of the type int[2] is used as a stepping stone to access an element
whose index is greater than 1 -- but this shouldn't be a problem, because
the L-value decays to a simple R-value int pointer prior to the accessing
of the int object, so any dimension info should be lost by then.
...

That's exactly the point where you are both right and wrong at the same time. As
I said before, attempts to explain from the committee's point of view this have
already been made in the "struct hack" thread and "struct hack"-related defect
reports.

You are saying that "any dimension info should be lost by then". That's not
true. It has been stated here that the intended meaning of the pointer
arithmetic rules given in the standard allows for dimension info to be retained
inside the pointer itself. In other words, when initializing a pointer the
implementation is allowed to store the accessible memory range inside that
pointer. For example, a pointer value produced in this case

int a[100];
int* p = a;

can be internally represented as a range-and-address combination

<0> <100> <address of a>

and an attempt to perform index arithmetics on this pointer might intentionally
verify the range limitations and fail (produce UB) in out-of-range situations.
The range values can be inherited from pointer to pointer during address
arithmetic operations

int* q = p + 5; /* q is <-5> <95> <address of a + 5> */

and so on. Needless to say, the very same thing might apply to the implicit
pointer resulting from the implicit array-to-pointer conversion. That's the
reason why your example might fail.

Now, while the above is definitely implementable, the real question is whether
the standard actually allows this kind of (overly restrictive?) implementation.

Some posters insisted that this is immediately allowed by pointer arithmetic
rules described in the standard. Formally, this is not true. Standard pointer
arithmetic rules are indeed formulated in terms of "array size", but they do not
say that the aforementioned "size" is the _declared_ size of the array object
(as opposed to the actual size of the underlying memory block). In other words,
formally, as follows from the standard _document_ (disregarding any informal
additions distributed by word-of-mouth), the implementation that restricts this
kind of "out of bound" access in non-conforming. This also means that neither
C89/90 nor C99 _documents_ really outlaw the infamous "struct hack".

At the same time it is important to note that it is well-known that the
committee's position is that the real intent behind the current version of the
pointer arithmetic rules was to interpret the notion of "array size" as the
_declared_ size of the array object. _This_ is the reason why "struct hack" and
the out-of-bounds access from your example are considered illegal in C. They are
outlawed "semi-informally" by known authoritative word-of-mouth comments, which
nevertheless are not included in the standard document.

Also it is worth noting that the big intuitive problem with this kind of access
being illegal is that the "ranged-pointer" feature I described above is
definitely out of place in C. In stronger words, an artificial restriction like
this is completely unacceptable in C. Moreover, it is completely unacceptable in
C++ as well It is something a 'std::vector<>' might do, but not a raw pointer).
And, as one would expect in case of such a random restriction, there's no
rationale behind it at all.
 
M

Mark McIntyre

What ever happened to the idea of contiguous memory? When I define the
following object:

int arr[2][2];

, the type of the object "arr" is: int[2][2]

No, thats its C declaration. Its type is array [2] of array[2] of
ints.
It consists of four int objects which are lain out contiguously in memory.

It consists of four int objects, yup. They may even be contiguous.
Thats irrelevant.
Therefore, if we take the address of the first int, why can't we add to that
address to yield the addresses of the int's which are directly after it in
contiguous memory?

Because it says so in the Standard. This doesn't mean its impossible,
or even impractical. Just that its not allowed. If you really don't
like that, raise it as a DR with the committee.
Isn't that one of the fundamental faculties of pointers?

Yes, but its still not relevant.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
K

Keith Thompson

Mark McIntyre said:
On Wed, 01 Nov 2006 15:34:46 GMT, in comp.lang.c , Frederick Gotham
No, but it's part of a contiguous sequence of memory.

So is
struct
{
int x[2];
int y[2];
}bar;

and I'm sure you'd agree that writing to bar.x[3] would be UB./

x and y aren't *necessarily* contiguous; there could be a gap between
them. In the array case being discussed, the representation is
specified by the standard, and there can be no gap.
Whats so hard to understand here?

The behavior is undefined because the standard says so. Anyone who's
unwilling or unable to accept that simple fact isn't going to be
persuaded by anything we say here.
 
J

Joe Wright

Mark said:
Chris Dollin:
int arr[2][2];
arr[0][3] = 7;

`arr[0]` has type `array[2]int`.

The type in question is written as: int[2]

Thats how you write it in C, not how you explain what it is.
No, but it's part of a contiguous sequence of memory.

So is
struct
{
int x[2];
int y[2];
}bar;

and I'm sure you'd agree that writing to bar.x[3] would be UB./

Whats so hard to understand here?

If I had..
int a[2][2];
... and wanted to treat it cavalierly, I might use..
int *p = (int*)a;
...and then treat p[0]..p[3]. That's legal isn't it?
 
F

Frederick Gotham

Mark McIntyre:
So is
struct
{
int x[2];
int y[2];
}bar;

and I'm sure you'd agree that writing to bar.x[3] would be UB.

No, it wouldn't. It's possible that you'd be writing to padding bytes, but
nonetheless it's perfectly OK:

struct { int x[2], y[2]; } obj;

int *p = (int*)&obj;
int const *const pover = p + sizeof obj;

do *p++ = 0;
while (pover != p);

This code in and of itself is perfectly OK, but you'd want to watch out for
trapping if you go on to read from "x" or "y" within "obj".

If there were no padding, then everything would be fine and dandy.
 
F

Frederick Gotham

Mark McIntyre:
, the type of the object "arr" is: int[2][2]

No, thats its C declaration. Its type is array [2] of array[2] of
ints.


Its type is int[2][2]. _You_ can call it:

Chocolate factory of arrays of length two with squidgy marshmellow bits
in between to two int's.

, but I think it's best that _we_ call it by its C name.

It consists of four int objects, yup. They may even be contiguous.


They _are_ contiguous, despite your (misplaced) sarcasm.

Thats irrelevant.


If it were irrelevant, I wouldn't be ranting on about this so much.
 
F

Frederick Gotham

Thanks Andrey, I've finally gotten the response I was looking for.

Andrey Tarasevich:
Also it is worth noting that the big intuitive problem with this kind of
access being illegal is that the "ranged-pointer" feature I described
above is definitely out of place in C.


Yes. I like control. I _love_ control. That's why I opt for _proper_
programming languages like C and C++, and not mickey-mouse languages like
Java.

In stronger words, an artificial restriction like this is completely
unacceptable in C.


So how do we get our hands on a "Range-liberal pointer", a pointer without
armbands? Must we have an intermediate cast to something like a void* or a
char* in order to liberate the pointer from its range restriction?
Something like:

int arr[2][2];

int *const p = (int*)(char unsigned*)&arr;

p[3] = 5;

(I chose "char unsigned*" because the Standard explicitly allows us to
treat any object as though it were simply a sequence of bytes -- which they
are!)

See how I took the address of the entire array (i.e. &arr), well I wonder
if the code would be any less proper if I took the address of the first
element instead, i.e.:

int *const p = (int*)(char unsigned*)&**arr;
 
E

Eric Sosman

Frederick said:
What ever happened to the idea of contiguous memory?

Nothing "happened to" it. It's still there. It's still
a useful notion for describing representations. But that's
all it is.
When I define the
following object:

int arr[2][2];

, the type of the object "arr" is: int[2][2]

It consists of four int objects which are lain out contiguously in memory.

"Laid." But so do all of

int brr[1][4];
int crr[4][1];
int drr[4];

If you cannot see that arr, brr, crr, and drr have four different
types despite their single common representation, you have not
grasped the notion of "type." Part of that notion is that different
types behave differently even if their representations are the same:

int i = 0;
unsigned int u = 0;
// Claim: i and u have identical representations
--i;
--u;
// Claim: i and u have behaved differently despite
// their identical representations, because they
// are of different types

Now: arr, brr, crr, drr have the same representation but different
types, therefore they can behave differently. In particular, they
can behave differently w.r.t. the [] operator, and there's an end on't.
Therefore, if we take the address of the first int, why can't we add to that
address to yield the addresses of the int's which are directly after it in
contiguous memory? Isn't that one of the fundamental faculties of pointers?

You are confusing representation with value, and ignoring type.
 
J

Jordan Abel

2006-11-01 said:
Chris Dollin:
`arr[0]` has type `array[2]int`.

The type in question is written as: int[2]

Thats how you write it in C, not how you explain what it is.

Well, if you want to explain what it is, I'd do so in english rather
than some hereto-unknown moon syntax. However, types have names, so why
not use them?
 
J

Jordan Abel

2006-11-02 said:
If I had..
int a[2][2];
.. and wanted to treat it cavalierly, I might use..
int *p = (int*)a;
..and then treat p[0]..p[3]. That's legal isn't it?

My conclusion, which i've articulated, is that it is. No-one else has
weighed in yet.
 
J

Jordan Abel

2006-11-01 said:
For example, a pointer value produced in this case

int a[100];
int* p = a;

can be internally represented as a range-and-address combination

<0> <100> <address of a>

nit: 200 or 400 would be more likely, unless we have word-addressed
memory, in which case void*/char* would have a very different
representation.
Also it is worth noting that the big intuitive problem with this kind
of access being illegal is that the "ranged-pointer" feature
I described above is definitely out of place in C. In stronger words,
an artificial restriction like this is completely unacceptable in C.
Moreover, it is completely unacceptable in C++ as well It is something
a 'std::vector<>' might do, but not a raw pointer). And, as one would
expect in case of such a random restriction, there's no rationale
behind it at all.

What if ranged pointers are provided in hardware, and accessing out of
bounds incurs a trap that must be handled? (incidentally,
base/max/offset would be an equivalent implementation, and may actually
exist in some hardware)

And there's a very good rationale for providing this in debug mode even
if it's not provided in production mode.
 
O

Old Wolf

Joe said:
If I had..
int a[2][2];
.. and wanted to treat it cavalierly, I might use..
int *p = (int*)a;
..and then treat p[0]..p[3]. That's legal isn't it?

No, but:
int *p = (int *)&a;
would be fine.

&a points to an object containing 4 ints.
&a[0] points to an object containing 2 ints.

Writing "a" by itself has the same effect as &a[0],
in your snippet.
 
J

Jordan Abel

2006-11-02 said:
Joe said:
If I had..
int a[2][2];
.. and wanted to treat it cavalierly, I might use..
int *p = (int*)a;
..and then treat p[0]..p[3]. That's legal isn't it?

No, but:
int *p = (int *)&a;
would be fine.

&a points to an object containing 4 ints.
&a[0] points to an object containing 2 ints.

actually... &a[0] points to *two* objects containing 2 ints.

if we had
int a[2]
int *p = a /* &a[0] */

no-one would be claiming we can't use p[1].
Writing "a" by itself has the same effect as &a[0],
in your snippet.
 
F

Flash Gordon

Joe Wright wrote:

If I had..
int a[2][2];
.. and wanted to treat it cavalierly, I might use..
int *p = (int*)a;
..and then treat p[0]..p[3]. That's legal isn't it?

I believe that would be perfectly legal because a (as opposed to a[0])
decays to a pointer to the start of entire array of arrays and you are
not going outside the region a defines. The previous example using a[0],
on the other hand, do go outside the region a[0] is defined as referring to.
 
F

Flash Gordon

Frederick said:
Mark McIntyre:
So is
struct
{
int x[2];
int y[2];
}bar;

and I'm sure you'd agree that writing to bar.x[3] would be UB.

No, it wouldn't. It's possible that you'd be writing to padding bytes, but
nonetheless it's perfectly OK:

<snip>

Wrong. As has been pointed out already look at the discussions on the
struct hack, also look at the defect report about it and the
justification for C99 including an officially sanctioned method for
solving the problem the struct hack is used to deal with.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top