Dereference an array pointer... UB?

  • Thread starter Tomás Ó hÉilidhe
  • Start date
T

Tomás Ó hÉilidhe

Do you think we can reach any kind of consensus on whether the
following code's behaviour is undefined by the Standard?

int my_array[5];

int const *const pend = *(&my_array + 1);

Considering the syntax of the language, then we definitely do
dereference an invalid pointer... but if we consider the mechanics of the
language, then we know that nothing "happens" when we dereference a pointer
to an array, because arrays are dealt with in terms of pointers.
 
M

Malcolm McLean

Tomás Ó hÉilidhe said:
Do you think we can reach any kind of consensus on whether the
following code's behaviour is undefined by the Standard?

int my_array[5];

int const *const pend = *(&my_array + 1);

Considering the syntax of the language, then we definitely do
dereference an invalid pointer... but if we consider the mechanics of the
language, then we know that nothing "happens" when we dereference a
pointer to an array, because arrays are dealt with in terms of pointers.
my_array and &my_array resolve to the same thing. It's a quirk of the
language.
 
B

Ben Bacarisse

Malcolm McLean said:
Tomás Ó hÉilidhe said:
Do you think we can reach any kind of consensus on whether the
following code's behaviour is undefined by the Standard?

int my_array[5];

int const *const pend = *(&my_array + 1);

Considering the syntax of the language, then we definitely do
dereference an invalid pointer... but if we consider the mechanics of the
language, then we know that nothing "happens" when we dereference a
pointer to an array, because arrays are dealt with in terms of pointers.
my_array and &my_array resolve to the same thing. It's a quirk of the
language.

But my_array + 1 and &my_array + 1 don't. The word "resolve" allows you
to be right (since you can mean what you like by it) but it hides the
important difference between the two expressions -- their type.
 
V

vippstar

Tomás Ó hÉilidhe said:
Do you think we can reach any kind of consensus on whether the
following code's behaviour is undefined by the Standard?
int my_array[5];
int const *const pend = *(&my_array + 1);
Considering the syntax of the language, then we definitely do
dereference an invalid pointer... but if we consider the mechanics of the
language, then we know that nothing "happens" when we dereference a
pointer to an array, because arrays are dealt with in terms of pointers.

my_array and &my_array resolve to the same thing. It's a quirk of the
language.
Only in value context.
I believe it's undefined behavior.
You dereference a pointer past the end of an object.
It is essentially the same with
--
int *foo;
int *bar = *(&foo+1);
--
Which is invalid.
&foo is an object, which can be treated as an array with 1 element.
Therefore, &foo+1 is a valid pointer, which cannot be dereferenced,
however you do dereference it.

It is invalid.

I am, however, not 100% sure about this, but it appears to be logical
and correct.
 
T

Tomás Ó hÉilidhe

Malcolm McLean:
my_array and &my_array resolve to the same thing. It's a quirk of the
language.


I'm not sure what you mean by that.

my_array is a int[X] (and it decays to an int*)

&my_array is a int(*)[X] (and it DOESN'T decay to an int*)
 
M

Malcolm McLean

Tomás Ó hÉilidhe said:
Malcolm McLean:
my_array and &my_array resolve to the same thing. It's a quirk of the
language.


I'm not sure what you mean by that.

my_array is a int[X] (and it decays to an int*)

&my_array is a int(*)[X] (and it DOESN'T decay to an int*)
That was an error on my part.
 
T

Tomás Ó hÉilidhe

vippstar:

It is essentially the same with


No no no, they're not the same. Syntactically, yes they're the same,
but mechanically, they're not. The difference is that *(&foo+1) is an
actual value, it results in a value being read from memory.
&foo is an object, which can be treated as an array with 1 element.
Therefore, &foo+1 is a valid pointer, which cannot be dereferenced,
however you do dereference it.


You're correct.

It is invalid.


I'm not sure I agrees, because an array doesn't have a value. Its elements
do, but not the array itself.
 
V

vippstar

vippstar:


No no no, they're not the same. Syntactically, yes they're the same,
but mechanically, they're not. The difference is that *(&foo+1) is an
actual value, it results in a value being read from memory.
I am not sure what you are talking about, however, both &foo and
&your_array are pointers.
int * and int (*)[X} respectively.
You point one past the end of what.. they point to, which is valid but
cannot dereferenced.
*(&foo+1) is not valid.
You're correct.
And the same applies for &your_array. They are both pointers that
point to 1 valid thing. (foo and your_array respectively)
I'm not sure I agrees, because an array doesn't have a value. Its elements
do, but not the array itself.
We are, however not talking about arrays, but pointers.
I insist that my example is the same with what you are trying to do,
and they are both invalid.
I suggest to think of another solution for your problem, and if that
is not possible, consider if that is the _only_ way.
 
T

Tomás Ó hÉilidhe

vippstar:

int * and int (*)[X] respectively.
You point one past the end of what.. they point to, which is valid but
cannot dereferenced.


Dereference an int(*)[X] and you get an int[X], which doesn't have a
value, and so it couldn't result in an out-of-bounds memory access because
there shouldn't be any memory access at all if arrays don't have values.
 
M

Marc Boyer

Tomás Ó hÉilidhe said:
Do you think we can reach any kind of consensus on whether the
following code's behaviour is undefined by the Standard?

int my_array[5];

int const *const pend = *(&my_array + 1);

Considering the syntax of the language, then we definitely do
dereference an invalid pointer... but if we consider the mechanics of the
language, then we know that nothing "happens" when we dereference a
pointer to an array, because arrays are dealt with in terms of pointers.
my_array and &my_array resolve to the same thing. It's a quirk of the
language.

No.
6.3.2.1/3
"Except when it is the operand of the sizeof operator /or the
unary & operator/ [...] an expression that has type "array of type"
is converted to an expression with type "pointer to type" that
points to the initial element of the array object".

Marc Boyer
 
O

Old Wolf

Do you think we can reach any kind of consensus on whether the
following code's behaviour is undefined by the Standard?

int my_array[5];

int const *const pend = *(&my_array + 1);

&X + 1 is a pointer to one-past-the-end.
Dereferencing such a pointer this causes UB.
Doesn't matter what data type the pointer is.
 
T

Tomás Ó hÉilidhe

Old Wolf:
&X + 1 is a pointer to one-past-the-end.
Dereferencing such a pointer this causes UB.
Doesn't matter what data type the pointer is.


That's a very superficial way of looking at it.

The REASON why it's UB to dereference a pointer to one-past-the-last is
because it could result in an out-of-bounds memory access.

With a pointer to an array, nothing happens when you dereference it -- all
that happens is that you've got an expression of int[X] rather than int(*)
[X].
 
T

Tomás Ó hÉilidhe

Tomás Ó hÉilidhe:
With a pointer to an array, nothing happens when you dereference it --
all that happens is that you've got an expression of int[X] rather
than int(*) [X].


In fact, I'd go one step further to say that the following should be legal:


int (*parr)[X] = (int(*)[X])798797; /* Some random address (but which
doesn't cause a trap)

*parr;
 
T

Thad Smith

Tomás Ó hÉilidhe said:
Old Wolf:


That's a very superficial way of looking at it.

The REASON why it's UB to dereference a pointer to one-past-the-last is
because it could result in an out-of-bounds memory access.

Perhaps your point is that the Standard /should/ have defined a behavior,
but didn't. I agree with that.

My reading is that a unary * applied to a function pointer is defined. A
unary * applied to a pointer to an object is defined. There are no other
cases defined for the unary * operator. Since &X+1 technically isn't a
pointer to an object, *(&X+1) is undefined by omission.
 
K

Keith Thompson

Tomás Ó hÉilidhe said:
Old Wolf:

That's a very superficial way of looking at it.

The REASON why it's UB to dereference a pointer to one-past-the-last is
because it could result in an out-of-bounds memory access.

The reason why it's UB is that the standard doesn't define the
behavior. (Though you've correctly described the rationale for what
the standard says.)
With a pointer to an array, nothing happens when you dereference it -- all
that happens is that you've got an expression of int[X] rather than int(*)
[X].

An expression of array type is converted to a pointer. There has to
be something to convert in the first place.
 
K

Keith Thompson

Tomás Ó hÉilidhe said:
Tomás Ó hÉilidhe:
With a pointer to an array, nothing happens when you dereference it --
all that happens is that you've got an expression of int[X] rather
than int(*) [X].

In fact, I'd go one step further to say that the following should be legal:

int (*parr)[X] = (int(*)[X])798797; /* Some random address (but which
doesn't cause a trap)
*parr;

You're certainly free to argue that it *should* be legal.

Actually, "legal" isn't the right word. It's not a syntax error or a
constraint violation, so it's "legal" in the sense that no diagnostic
is required. The question is whether the standard defines the
behavior.

parr is an lvalue. If it doesn't designate an object, then the
behavior of evaluating *parr is undefined. As always, the consequence
of undefined behavior can include doing nothing, or doing just what
you wanted it to do.
 
T

Tomás Ó hÉilidhe

Keith Thompson:
An expression of array type is converted to a pointer. There has to
be something to convert in the first place.


Yes but an array type isn't a value -- which is the very reason why
arrays decay to a pointer to their first element, so that we can actually
get a value out of them.
 
K

Keith Thompson

Tomás Ó hÉilidhe said:
Keith Thompson:

Yes but an array type isn't a value -- which is the very reason why
arrays decay to a pointer to their first element, so that we can actually
get a value out of them.

To quibble over your choice of words, of course an array type isn't a
value; an array type is a type. (I'm not picking on you, but
precision is important.)

Presumably what you meant is that there's no such thing as an array
value. I think the standard is vague on this point, but I disagree;
there *is* such a thing as an array value. The language just provides
very few contexts in which array values become visible.

C99 3.17 defines a "value" as the "precise meaning of the contents of
an object when interpreted as having a specific type". I don't see
how that excludes arrays. (It does seem to exclude the result of
evaluating a non-lvalue expression, but that's a separate issue.)

There clearly are struct values. Structs can be assigned, passed as
function arguments, and returned as function results, all by copying
the value. A struct value consists of the values of its members;
for example, given:
struct { int x; int y; } obj = { 10, 20 };
the value of obj consists of the int values 10 and 20. A struct with
a member of array type has a value that includes the value of the
array member; that value consists of the values of the array's
elements.

Here's something to chew on. It probably says something about the
original question, but I'm not sure what.

int main(void)
{
struct s {
int x;
int y[2];
} ;
volatile struct s obj = { 10, { 20, 30 } };

obj; /* Computes and discards the value of obj.
Must access obj.x, obj.y[0], and obj.y[1]. */

obj.x; /* Computes and discards the value of obj.x.
Must access obj.x. */

obj.y; /* Computes and discards the address of obj.y[0].
Must this access obj.y[0] and obj.y[1]?
*May* it do so?
C&V? */

return 0;
}
 
T

Thad Smith

Tomás Ó hÉilidhe said:
Old Wolf:



That's a very superficial way of looking at it.

The REASON why it's UB to dereference a pointer to one-past-the-last is
because it could result in an out-of-bounds memory access.

I would say that the reason that the behavior is undefined is that the
committee didn't realize (or appreciate) the potential utility of defining
the meaning of the unary * operator on pointer values derived from pointers
to objects, but not themselves a pointer to an object.
 
K

Kaz Kylheku

    Do you think we can reach any kind of consensus on whether the
following code's behaviour is undefined by the Standard?

    int my_array[5];

    int const *const pend = *(&my_array + 1);

You may have a pointer one element past the last element of an array
object. However, my_array as whole is not an element of an array. So
&myarray + 1 is invalid.

What you are doing is similar to computing p below:

int i, j[1];
int *p = &i + 1; // not right, i is not an array object
int *q = &j + 1; // okay, since j is an array object

We can fix this in your example, similarly to the trick with j above:
use a one-element array.

But the dereference conundrum is still there:

int my_array[1][5];
int *p = my_array[1];

The problem is clearer now: you're trying to create pointer-based
access to an nonexistent array. The expression my_array[0] refers to a
valid array element, which is an array of 5 ints. But there is no such
array as my_array[1]. This my_array[1] expression has the /type/
``array of 5 int'', but it's not an object. You're allowed to point to
it as a unit, but that's it.

We can show the problem in these two steps:

int my_array[1][5];
int (*q)[5] = my_array + 1;

Now q is a ``pointer to an array of 5 int'', correctly aimed one-
element past the end of an array object. So far so good.

What we're trying to do next is effectively the same as:

int *p = q[0];

We've been given a finger, and want to take the hand. Not happy with
having a pointer one element past the end of an array object, we want
a pointer to the first element of that nonexistent element. :)

In fact the pointer we're trying to compute points to the same
location as &my_array[0][5], which is allowed, and has the same type.
One element past the end of my_array[0] would appear to be the same
nonexistent thing as the first element of my_array[1] (indeed it has
the same type and address) but the semantics is subtly different.

But if q[0] is okay, why not &q[0][0]. If decay cancels out bad
dereferencing, then address-of can also cancel out more bad
dereferencing. And now you open the door to &q[0][1]. If we can point
to the first element of a nonexistent array of 5 int, why not the
second? It's because we know that the justification for the first
element is that it's really one element past the end of something.
However, we didn't arrive at it that way.

/How/ we arrive at a value can determine whether or not it is correct,
not just the final value itself. If I have two int objects i, and j,
and perform arithmetic on &i so that the result points to j, that's
not correct, even though the result is indistinguishable from the
correct value &j.

Fact is, a bounds checking compiler could be designed to enforce the
semantic rule that dereferencing an out-of-bounds pointer is not
allowed under any circumstances, and consequently that array-to-
pointer decay can only happen over a valid array object.
    Considering the syntax of the language, then we definitely do
dereference an invalid pointer... but if we consider the mechanics of the
language, then we know that nothing "happens" when we dereference a pointer
to an array, because arrays are dealt with in terms of pointers.

We could also argue that ``nothing'' happens when you merely increment
a pointer out of bounds.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top