Pointer Equality for Different Array Objects

S

Shao Miller

(More bounds-checking.)

Wait a minute... N1256 6.5.9p6:

"Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an object
and a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer to
one past the end of one array object and the other is a pointer to the
start of a different array object that happens to immediately follow the
first array object in the address space.94)"

So if we have:

union u_test {
int a[2][2];
int b[4];
};
union u_test test = { { { 0 } } };
int x = &test.a[1][1] == &test.b[3];

Is 'x' zero or one?

If we can claim, for the purposes of supporting bounds-checking, that
'&test.a[1][1]' points into an 'int[2]' and nothing more, and that
'&test.b[3]' points into an 'int[4]' and nothing more, then aren't these
pointers pointing to distinct objects?

If they're pointing to the same object, why might they not be for the
purposes of 6.5.6p8?

Just to illustrate an identical expression statement:

int x = (*(test.a + 1) + 1) == (test.b + 3);
 
K

Kaz Kylheku

(More bounds-checking.)

Wait a minute... N1256 6.5.9p6:

"Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an object
and a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer to
one past the end of one array object and the other is a pointer to the
start of a different array object that happens to immediately follow the
first array object in the address space.94)"

This can only happen in a correct program if the two array-like objects
are themselves members of a larger array.
So if we have:

union u_test {
int a[2][2];
int b[4];
};
union u_test test = { { { 0 } } };
int x = &test.a[1][1] == &test.b[3];

Is 'x' zero or one?

One. These are the same address because the elements of an array are
contiguous. A simulated two dimensional array has the storage layout
of a one-dimensional array, and so a[1][1] corresponds to b[3].
If we can claim, for the purposes of supporting bounds-checking, that

For the purposes of bounds checking, you have two pointers here,
neither of which is out of bounds of the object from which they are derived.

A pointer comaprison (for exact equality, at that) does not create
a bounds-checking issue.
'&test.a[1][1]' points into an 'int[2]' and nothing more, and that
'&test.b[3]' points into an 'int[4]' and nothing more, then aren't these
pointers pointing to distinct objects?

They are not distinct objects because they are overlapped in a union.

Even these have to compare equal:

union { int x; double y; } u;

&u.y == (double *) &u.x;

The members of a union all start at the same address. They not
distinct objects.
 
S

Shao Miller

(More bounds-checking.)

Wait a minute... N1256 6.5.9p6:

"Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an object
and a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer to
one past the end of one array object and the other is a pointer to the
start of a different array object that happens to immediately follow the
first array object in the address space.94)"

This can only happen in a correct program if the two array-like objects
are themselves members of a larger array.
So if we have:

union u_test {
int a[2][2];
int b[4];
};
union u_test test = { { { 0 } } };
int x =&test.a[1][1] ==&test.b[3];

Is 'x' zero or one?

One. These are the same address because the elements of an array are
contiguous. A simulated two dimensional array has the storage layout
of a one-dimensional array, and so a[1][1] corresponds to b[3].

Thank you for your response. I agree that they point to the same
location. But 6.5.9p6 doesn't state that pointing to the same location
is sufficient to yield an equality.
For the purposes of bounds checking, you have two pointers here,
neither of which is out of bounds of the object from which they are derived.

A pointer comaprison (for exact equality, at that) does not create
a bounds-checking issue.

Agreed. But does the result create an issue for the subject of
"bounds-checking?"
'&test.a[1][1]' points into an 'int[2]' and nothing more, and that
'&test.b[3]' points into an 'int[4]' and nothing more, then aren't these
pointers pointing to distinct objects?

They are not distinct objects because they are overlapped in a union.

If they are not distinct objects, then they are the same object, right?
Well what type is the object? Is it an 'int'? If so, is it treated
as an 'int[1]' for the purposes of pointer arithmetic? Or is it an
array object, instead? If so, how many elements in the array? Could
you do:

&test.a[1][1] - 2;

?
Even these have to compare equal:

union { int x; double y; } u;

&u.y == (double *)&u.x;

The members of a union all start at the same address. They not
distinct objects.

That example is different because "both are pointers to the same object
(including a pointer to an object and a subobject at its beginning)".
In the original example, whatever each pointer pointed to was not at the
beginning of any containing object... Was it? (Or were they?)
 
S

Shao Miller

N1570
6.3.2.1 Lvalues, arrays, and function designators
1
When an object is said to have a particular type,
the type is specified by the lvalue used to designate the object.

Thanks! But does that help to determine the count of elements in some
containing array object?
 
K

Kaz Kylheku

Thank you for your response. I agree that they point to the same
location. But 6.5.9p6 doesn't state that pointing to the same location
is sufficient to yield an equality.

It does if you interpret "same object" as being "same location".

What is an object? "A region of data storage in the execution environment,
the contents of which can represent values".

Pointers to the same object are pointers to the same region of data storage.
Agreed. But does the result create an issue for the subject of
"bounds-checking?"

I don't think there can be any expectation of bounds checking between two views
of the same storage through a union. There is just no sense there of a pointer
from one straying out of bounds and into the other.
If they are not distinct objects, then they are the same object, right?
Well what type is the object? Is it an 'int'? If so, is it treated
as an 'int[1]' for the purposes of pointer arithmetic? Or is it an
array object, instead? If so, how many elements in the array? Could
you do:

It is all these things simultaneously. You could ask these same questions
of a non-union object. Given int x[5], what is x[1]? Is it just an int?
Or is it a portion of the whole array? So what type is it?

ISO C uses the term "subobject" for embedded objects. x[1] is an int object
which s a subobject of the int[5] array.

Both arrays involved in the union have contain subobjects of type int which
correspond together.

If two views of a different type are aliased through a union, then
the type is whichever one is the last through which a value is stored.
This is from the special rules about unions.

I think it should hold for this kind of array aliasing. (But you're not even
asking about if it's okay to store to one array and read the other; just
whether certain pointers are reliably equal.)
&test.a[1][1] - 2;

Also consider a[0][3]. This kind of "wrong geometry" access is in a kind of
gray area, where the standard isn't of a lot of help.

Right or wrong, some C programs are going to do this and find it to be quite
portable. So if you're doing bounds-checking, it's not entirely realistic to
insist on diagnose such things. A programmer who doesn't consider that to be a
bounds error will be irked by the diagnostics, perceived to be a nuisance.

On the other hand, the "wrong geometry" access to a[0][3] could be unintended,
and indicate a bug. Anotehr programmer might be thankful to have that diagnosed
(maybe even the same programmer, in a different programming situation).

There have been c.l.c discussions about this once in a while; I remember some
from many years ago. Some of the arguments hinged on whether such an access is
done blatantly with array indexing, or displacement of pointer directly obtaind
from "array" decay. Under that kind of hair-splitting &a[0][0] + 3 seems less
wrong than than a[0][3] because the former has a pointer to just an int, which
is then being displaced, in a way that is disconnected from the geometry of the
array.
 
K

Kaz Kylheku

N1570
6.3.2.1 Lvalues, arrays, and function designators
1
When an object is said to have a particular type,
the type is specified by the lvalue used to designate the object.

This is a simplification if taken by itself. Some objects have a declared
type, and if you form some other lvalue to access them, you're in
undefined behavior land.
 
S

Shao Miller

It does if you interpret "same object" as being "same location".

What is an object? "A region of data storage in the execution environment,
the contents of which can represent values".

Pointers to the same object are pointers to the same region of data storage.

A nice conclusion. There is a fellow in another C-devoted forum who
insists on using the word "into" when discussing pointers. Pointers
always point "into" something, and that something would make sense as
"region of data storage."

Surely in:

int i = 42;
char * p = &i;

the pointer 'p' doesn't point "to" the 'int'-typed object designated by
'i'. After all, a 'char *' points "to" a 'char'. But it certainly
points "into" the object designated by 'i'.
I don't think there can be any expectation of bounds checking between two views
of the same storage through a union. There is just no sense there of a pointer
from one straying out of bounds and into the other.

I'm sorry that the demonstration failed. I intended to highlight that
if we are talking about what a pointer points to (or into), the notion
ought to be consistent throughout an interpretation of the definitions
of Standard C.

If we are going to say that the pointers compare as equal, it's because
they point to the same object (or subobject). But then with pointer
arithmetic, we have the vague ("if the array is large enough"). Well
what array? All [non-bit-field] objects are an array of bytes. That
array? For single-dimensional arrays, that array? For
multi-dimensional arrays, which dimension do we pick for our notion of
"the array" in order to determine if it's large enough for the pointer
arithmetic to be defined?

Essentially, given:

int a[2][2] = { { 0 } };

if we say that 'a[1] + 1' yields a pointer value X and that pointer
arithmetic is only defined for { X - 1, X, X + 1 }, we can potentially
justify that conclusion by saying "the array object" is not the larger
containing array, but the second 'int[2]' of the larger containing
array, only.

But if _that's_ the object being pointed into, I think we ought to stick
to that for pointer equality. So then the two pointers in the original
code do _not_ point to the same object or a sub-object at its beginning.

On the other hand, if we justify the pointer equality by saying that the
objects occupy the same location or that the pointers point into the
same region of data storage, well then pointer arithmetic should be
defined across that region of data storage, not just a particular
partition of it.

That is, it seems odd if two pointers with the same type, and pointing
into the same region of data storage, and pointing at the same byte in
that region of data storage, and comparing as equal, have different
defined boundaries for "the array" when under consideration for pointer
arithmetic.

And the pointer equality definition is so specific with its "if and only
if."

Add to that the definition of all object representations being
accessible via 'unsigned char' type and it seems that any bounds are at
the beginning of the contiguous region of memory and at one byte past
the end.
If they are not distinct objects, then they are the same object, right?
Well what type is the object? Is it an 'int'? If so, is it treated
as an 'int[1]' for the purposes of pointer arithmetic? Or is it an
array object, instead? If so, how many elements in the array? Could
you do:

It is all these things simultaneously. You could ask these same questions
of a non-union object. Given int x[5], what is x[1]? Is it just an int?
Or is it a portion of the whole array? So what type is it?

Agreed. I'm glad you find them analogous.
ISO C uses the term "subobject" for embedded objects. x[1] is an int object
which s a subobject of the int[5] array.

Both arrays involved in the union have contain subobjects of type int which
correspond together.

Agreed. And it seems that an 'int[4][5]' array has 20 sub-objects of
type 'int' that correspond to each of the combinations of index that can
be used with the 'int[4][5]' and that point to 'int' objects (not one past).
If two views of a different type are aliased through a union, then
the type is whichever one is the last through which a value is stored.
This is from the special rules about unions.

And of course there's "type punning." In your example below, is
'a[0][3]' not similarly type punning the 'int[2][2]' as an 'int[4]' and
designating/accessing the fourth element?
I think it should hold for this kind of array aliasing. (But you're not even
asking about if it's okay to store to one array and read the other; just
whether certain pointers are reliably equal.)
&test.a[1][1] - 2;

Also consider a[0][3]. This kind of "wrong geometry" access is in a kind of
gray area, where the standard isn't of a lot of help.

Well actually, your example there was just given in another thread,
which is why this thread was "more bounds-checking." The gray area is
what I'm trying to explore... Definitions, consequences, consistency, etc.
Right or wrong, some C programs are going to do this and find it to be quite
portable. So if you're doing bounds-checking, it's not entirely realistic to
insist on diagnose such things. A programmer who doesn't consider that to be a
bounds error will be irked by the diagnostics, perceived to be a nuisance.

On the other hand, the "wrong geometry" access to a[0][3] could be unintended,
and indicate a bug. Anotehr programmer might be thankful to have that diagnosed
(maybe even the same programmer, in a different programming situation).

The clearest case I can think of is where a 'for' loop can be known at
translation-time to allow for an array index to go out-of-bounds without
any fancy business happening to the index. This seems worth warning about!

A run-time check might set up traps at one byte before and one byte
after a range of data storage. That seems sensible, too!

But for any stricter run-time bounds-checking, such as catching
'a[j]' where the ranges for 'i' and 'j' aren't known at
translation-time and go out-of-bounds, there could be checks for each
dimension of a multi-dimensional array, but is that consistent with C?
There have been c.l.c discussions about this once in a while; I remember some
from many years ago. Some of the arguments hinged on whether such an access is
done blatantly with array indexing, or displacement of pointer directly obtaind
from "array" decay. Under that kind of hair-splitting&a[0][0] + 3 seems less
wrong than than a[0][3] because the former has a pointer to just an int, which
is then being displaced, in a way that is disconnected from the geometry of the
array.

Yeah, but I wouldn't call it hair-splitting... The array subscripting
operator is defined with an identity given in terms of the binary
addition operator and the unary indirection operator (and parentheses).
I think that's rather important and worth discussion if it's at all a
"gray area."

It would seem unfair to give the array subscripting notation 'a[0][3]'
some kind of bounds-preferential treatment versus the identical '*(*(a +
0) + 3)' notation.

But we do see references to "provenance" in at least one defect report,
and this seems related to the "provenance" of a pointer. If it "came
from" an array with certain boundaries, then pointer arithmetic is only
defined for its use with those boundaries, despite the fact that another
pointer with difference boundaries is identical in every other way.

Of course "provenance" seems like a "gray area," since you can combine
things (such as via bit-wise operators). Then whence did they come? An
example would be combining two objects into a destination such that the
effective type of the destination cannot be determined.

a[0][3]
*( *( a + 0 ) + 3 )
*( *( 'int[2][2]' + 0 ) + 3 )
*( *( 'int (*)[2]' + 0 ) + 3 )
*( *( 'int (*)[2]' ) + 3)
*( 'int[2]' + 3 )
*( 'int *' + 3 )
*( 'int *' )
'int'
 
K

Kaz Kylheku

But for any stricter run-time bounds-checking, such as catching
'a[j]' where the ranges for 'i' and 'j' aren't known at
translation-time and go out-of-bounds, there could be checks for each
dimension of a multi-dimensional array, but is that consistent with C?


You know, who cares? All that matters is: is this check valuable to the user?
Checks can be stronger or weaker than the standard language. Suppose that the
C standard explicitly said that an array is just flat memory that can be
aliased with any multi-dimensional array geometry. Well, someone might still
want some of their code to pass array dimension bounds checks.
It would seem unfair to give the array subscripting notation 'a[0][3]'
some kind of bounds-preferential treatment versus the identical '*(*(a +
0) + 3)' notation.

No, but the preferential treatment could actually stem from the
"a + displacement" where a is not a pointer, but an array (that converts to a
pointer on evaluation) and not from the choice of notation.

In such an expression, there is enough info to know that the displacement is in
bounds with respect to the static type of a, regardless of the larger
container in which that array finds itself.
But we do see references to "provenance" in at least one defect report,
and this seems related to the "provenance" of a pointer. If it "came
from" an array with certain boundaries, then pointer arithmetic is only
defined for its use with those boundaries, despite the fact that another
pointer with difference boundaries is identical in every other way.

Of course "provenance" seems like a "gray area," since you can combine
things (such as via bit-wise operators). Then whence did they come?

Clearly, at some point provenance has to be severed. A good rule of thumb (if
provenance were to stop being a gray area) might be that a pointer value
that is derived from a conversion from array, plus any combination of
displacements, has provenance from that array. As soon as &..*.. is involved,
it should be lost: e.g. &a[0] or &*(a + 0) ought to drop provenance.
 
S

Shao Miller

But for any stricter run-time bounds-checking, such as catching
'a[j]' where the ranges for 'i' and 'j' aren't known at
translation-time and go out-of-bounds, there could be checks for each
dimension of a multi-dimensional array, but is that consistent with C?


You know, who cares? All that matters is: is this check valuable to the user?
Checks can be stronger or weaker than the standard language. Suppose that the
C standard explicitly said that an array is just flat memory that can be
aliased with any multi-dimensional array geometry. Well, someone might still
want some of their code to pass array dimension bounds checks.


Absolutely it can be valuable. But if I write a program which I expect
to be strictly conforming and either translating it or running it
behaves differently for different implementations, that's upsetting.
It would seem unfair to give the array subscripting notation 'a[0][3]'
some kind of bounds-preferential treatment versus the identical '*(*(a +
0) + 3)' notation.

No, but the preferential treatment could actually stem from the
"a + displacement" where a is not a pointer, but an array (that converts to a
pointer on evaluation) and not from the choice of notation.

If that works, would you encourage and support a proposal to incorporate
that into Standard C? Maybe it's pretty reasonable to add something to
pointer arithmetic along the lines of "if the pointer was the immediate
result of the evaluation of an lvalue having an array type, then the
number of elements of the array object is the number of elements in that
array type."
In such an expression, there is enough info to know that the displacement is in
bounds with respect to the static type of a, regardless of the larger
container in which that array finds itself.

And if we wish to "forget" about the bounds of array, would it be
reasonable for us to simply do:

int a[2][2] = { { 0 } };
int x;
x = ((int *) a[0])[3];

? Or maybe 'x = (1 ? a[0] : 0)[3];'? That seems easy.
But we do see references to "provenance" in at least one defect report,
and this seems related to the "provenance" of a pointer. If it "came
from" an array with certain boundaries, then pointer arithmetic is only
defined for its use with those boundaries, despite the fact that another
pointer with difference boundaries is identical in every other way.

Of course "provenance" seems like a "gray area," since you can combine
things (such as via bit-wise operators). Then whence did they come?

Clearly, at some point provenance has to be severed. A good rule of thumb (if
provenance were to stop being a gray area) might be that a pointer value
that is derived from a conversion from array, plus any combination of
displacements, has provenance from that array. As soon as&..*.. is involved,
it should be lost: e.g.&a[0] or&*(a + 0) ought to drop provenance.

Oh, ok. So when the evaluation of 'a' would yield a pointer value, the
bounds of the array object would be defined by the type of 'a', then you
could apply '*' and designate the object, then apply '&' and point to
that object without the previous bounds. Interesting! I guess that'd
require some changes to the text of '&' and '*'. Would you encourage
and support a proposal to incorporate that into Standard C?

I appreciate your discussion. :) While you're here, could you clarify
if the following suggest undefined behaviour?

#1:

int a[2][2] { { 0 } };
int * p = a[0] + 0;
int * q = a[1] + 1;
ptrdiff_t diff = q - p;

#2:

union u_test {
int a[2][2];
int b[4];
};
union u_test test = { { { 0 } } };

int * p = test.a[1] + 1;
int * q = test.b + 3;
ptrdiff_t diff = q - p;

q = test.b + 0;
diff = q - p;
 
S

Shao Miller

(More bounds-checking.)

Wait a minute... N1256 6.5.9p6:

"Two pointers compare equal if and only if both are null pointers, both
are pointers to the same object (including a pointer to an object and a
subobject at its beginning) or function, both are pointers to one past
the last element of the same array object, or one is a pointer to one
past the end of one array object and the other is a pointer to the start
of a different array object that happens to immediately follow the first
array object in the address space.94)"

So if we have:

union u_test {
int a[2][2];
int b[4];
};
union u_test test = { { { 0 } } };
int x = &test.a[1][1] == &test.b[3];

Is 'x' zero or one?

If we can claim, for the purposes of supporting bounds-checking, that
'&test.a[1][1]' points into an 'int[2]' and nothing more, and that
'&test.b[3]' points into an 'int[4]' and nothing more, then aren't these
pointers pointing to distinct objects?

If they're pointing to the same object, why might they not be for the
purposes of 6.5.6p8?

Just to illustrate an identical [declaration]:

int x = (*(test.a + 1) + 1) == (test.b + 3);

Maybe using a 'typedef' makes it look different?

typedef int array_of_two_ints[2];
union u_test {
array_of_two_ints a[2];
int b[4];
};
union u_test test = { { { 0 } } };
int x;
array_of_two_ints * ptr_into_array_of_two_ints;
int * ptr_into_int;

/*
* Point into the second element of 'a', so the
* element type is 'array_of_two_ints'
*/
ptr_into_array_of_two_ints = &test.a[1];


/*
* Point into the second element of the array object
* with type 'array_of_two_ints', so the element
* type is 'int'
*/
ptr_into_int = &ptr_into_array_of_two_ints[1];

/*
* 'ptr_into_int' does _not_ point to:
* - A subobject at the beginning of the 'test.b'
* array object
* - The array object 'test.b'
* - A function
* - No object (it's non-null)
* - One past the array object 'test.b'
* - An array object that happens to immediately
* follow 'test.b' in the address space
* - An object that is not an element of an array
* object
* '&test.b[3] points to:
* - The fourth element of the 'test.b' array
* object
*/
x = ptr_into_int == &test.b[3];
x = ptr_into_int - &test.b[3];

Perhaps similarly:

struct s_common {
double d;
int i[2];
};
struct s_foo {
struct s_common common;
short s;
};
struct s_bar {
struct s_common common;
long l;
};
union u_foobar {
struct s_foo foo;
struct s_bar bar;
};
union u_foobar test;
int * p;
int * q;
int x;

p = &test.foo.common.i[1];
q = &test.bar.common.i[1];

/*
* 'p' does _not_ point to:
* - A subobject at the beginning of the
* 'test.bar.common.i' array object
* - The array object 'test.bar.common.i'
* - A function
* - No object (it's non-null)
* - One past the array object 'test.bar.common.i'
* - An array object that happens to immediately
* follow 'test.bar.common.i' in the
* address space
* - An object that is not an element of an array
* object
* 'q' points to:
* - The second element of the 'test.bar.common.i'
* array object
*/
x = p == q;
x = p - q;

Fortunately, of course, they point to the same byte in the same
contiguous region of data storage.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top