Dereference an array pointer... UB?

Discussion in 'C Programming' started by Tomás Ó hÉilidhe, Feb 11, 2008.

  1. Do you think we can reach any kind of consensus on whether the
    following code's behaviour is undefined by the Standard?

    int my_array[5];

    int const *const pend = *(&my_array + 1);

    Considering the syntax of the language, then we definitely do
    dereference an invalid pointer... but if we consider the mechanics of the
    language, then we know that nothing "happens" when we dereference a pointer
    to an array, because arrays are dealt with in terms of pointers.

    --
    Tomás Ó hÉilidhe
    Tomás Ó hÉilidhe, Feb 11, 2008
    #1
    1. Advertising

  2. "Tomás Ó hÉilidhe" <> wrote in message
    >
    > Do you think we can reach any kind of consensus on whether the
    > following code's behaviour is undefined by the Standard?
    >
    > int my_array[5];
    >
    > int const *const pend = *(&my_array + 1);
    >
    > Considering the syntax of the language, then we definitely do
    > dereference an invalid pointer... but if we consider the mechanics of the
    > language, then we know that nothing "happens" when we dereference a
    > pointer to an array, because arrays are dealt with in terms of pointers.
    >

    my_array and &my_array resolve to the same thing. It's a quirk of the
    language.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
    Malcolm McLean, Feb 11, 2008
    #2
    1. Advertising

  3. "Malcolm McLean" <> writes:

    > "Tomás Ó hÉilidhe" <> wrote in message
    >>
    >> Do you think we can reach any kind of consensus on whether the
    >> following code's behaviour is undefined by the Standard?
    >>
    >> int my_array[5];
    >>
    >> int const *const pend = *(&my_array + 1);
    >>
    >> Considering the syntax of the language, then we definitely do
    >> dereference an invalid pointer... but if we consider the mechanics of the
    >> language, then we know that nothing "happens" when we dereference a
    >> pointer to an array, because arrays are dealt with in terms of pointers.
    >>

    > my_array and &my_array resolve to the same thing. It's a quirk of the
    > language.


    But my_array + 1 and &my_array + 1 don't. The word "resolve" allows you
    to be right (since you can mean what you like by it) but it hides the
    important difference between the two expressions -- their type.

    --
    Ben.
    Ben Bacarisse, Feb 11, 2008
    #3
  4. Tomás Ó hÉilidhe

    Guest

    On Feb 11, 8:36 pm, "Malcolm McLean" <> wrote:
    > "Tomás Ó hÉilidhe" <> wrote in message
    >
    > > Do you think we can reach any kind of consensus on whether the
    > > following code's behaviour is undefined by the Standard?

    >
    > > int my_array[5];

    >
    > > int const *const pend = *(&my_array + 1);

    >
    > > Considering the syntax of the language, then we definitely do
    > > dereference an invalid pointer... but if we consider the mechanics of the
    > > language, then we know that nothing "happens" when we dereference a
    > > pointer to an array, because arrays are dealt with in terms of pointers.

    >
    > my_array and &my_array resolve to the same thing. It's a quirk of the
    > language.

    Only in value context.
    I believe it's undefined behavior.
    You dereference a pointer past the end of an object.
    It is essentially the same with
    --
    int *foo;
    int *bar = *(&foo+1);
    --
    Which is invalid.
    &foo is an object, which can be treated as an array with 1 element.
    Therefore, &foo+1 is a valid pointer, which cannot be dereferenced,
    however you do dereference it.

    It is invalid.

    I am, however, not 100% sure about this, but it appears to be logical
    and correct.
    , Feb 11, 2008
    #4
  5. Malcolm McLean:

    > my_array and &my_array resolve to the same thing. It's a quirk of the
    > language.



    I'm not sure what you mean by that.

    my_array is a int[X] (and it decays to an int*)

    &my_array is a int(*)[X] (and it DOESN'T decay to an int*)

    --
    Tomás Ó hÉilidhe
    Tomás Ó hÉilidhe, Feb 11, 2008
    #5
  6. "Tomás Ó hÉilidhe" <> wrote in message
    > Malcolm McLean:
    >
    >> my_array and &my_array resolve to the same thing. It's a quirk of the
    >> language.

    >
    >
    > I'm not sure what you mean by that.
    >
    > my_array is a int[X] (and it decays to an int*)
    >
    > &my_array is a int(*)[X] (and it DOESN'T decay to an int*)
    >

    That was an error on my part.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
    Malcolm McLean, Feb 11, 2008
    #6
  7. vippstar:


    > It is essentially the same with
    > --
    > int *foo;
    > int *bar = *(&foo+1);
    > --
    > Which is invalid.



    No no no, they're not the same. Syntactically, yes they're the same,
    but mechanically, they're not. The difference is that *(&foo+1) is an
    actual value, it results in a value being read from memory.

    > &foo is an object, which can be treated as an array with 1 element.
    > Therefore, &foo+1 is a valid pointer, which cannot be dereferenced,
    > however you do dereference it.



    You're correct.


    > It is invalid.



    I'm not sure I agrees, because an array doesn't have a value. Its elements
    do, but not the array itself.

    --
    Tomás Ó hÉilidhe
    Tomás Ó hÉilidhe, Feb 11, 2008
    #7
  8. Tomás Ó hÉilidhe

    Guest

    On Feb 11, 9:52 pm, "Tomás Ó hÉilidhe" <> wrote:
    > vippstar:
    >
    > > It is essentially the same with
    > > --
    > > int *foo;
    > > int *bar = *(&foo+1);
    > > --
    > > Which is invalid.

    >
    > No no no, they're not the same. Syntactically, yes they're the same,
    > but mechanically, they're not. The difference is that *(&foo+1) is an
    > actual value, it results in a value being read from memory.

    I am not sure what you are talking about, however, both &foo and
    &your_array are pointers.
    int * and int (*)[X} respectively.
    You point one past the end of what.. they point to, which is valid but
    cannot dereferenced.
    *(&foo+1) is not valid.

    > > &foo is an object, which can be treated as an array with 1 element.
    > > Therefore, &foo+1 is a valid pointer, which cannot be dereferenced,
    > > however you do dereference it.

    >
    > You're correct.

    And the same applies for &your_array. They are both pointers that
    point to 1 valid thing. (foo and your_array respectively)

    > > It is invalid.

    >
    > I'm not sure I agrees, because an array doesn't have a value. Its elements
    > do, but not the array itself.

    We are, however not talking about arrays, but pointers.
    I insist that my example is the same with what you are trying to do,
    and they are both invalid.
    I suggest to think of another solution for your problem, and if that
    is not possible, consider if that is the _only_ way.
    , Feb 11, 2008
    #8
  9. vippstar:


    > int * and int (*)[X] respectively.
    > You point one past the end of what.. they point to, which is valid but
    > cannot dereferenced.



    Dereference an int(*)[X] and you get an int[X], which doesn't have a
    value, and so it couldn't result in an out-of-bounds memory access because
    there shouldn't be any memory access at all if arrays don't have values.

    --
    Tomás Ó hÉilidhe
    Tomás Ó hÉilidhe, Feb 11, 2008
    #9
  10. Tomás Ó hÉilidhe

    Marc Boyer Guest

    On 2008-02-11, Malcolm McLean <> wrote:
    >
    > "Tomás Ó hÉilidhe" <> wrote in message
    >>
    >> Do you think we can reach any kind of consensus on whether the
    >> following code's behaviour is undefined by the Standard?
    >>
    >> int my_array[5];
    >>
    >> int const *const pend = *(&my_array + 1);
    >>
    >> Considering the syntax of the language, then we definitely do
    >> dereference an invalid pointer... but if we consider the mechanics of the
    >> language, then we know that nothing "happens" when we dereference a
    >> pointer to an array, because arrays are dealt with in terms of pointers.
    >>

    > my_array and &my_array resolve to the same thing. It's a quirk of the
    > language.


    No.
    6.3.2.1/3
    "Except when it is the operand of the sizeof operator /or the
    unary & operator/ [...] an expression that has type "array of type"
    is converted to an expression with type "pointer to type" that
    points to the initial element of the array object".

    Marc Boyer
    Marc Boyer, Feb 12, 2008
    #10
  11. Tomás Ó hÉilidhe

    Old Wolf Guest

    On Feb 12, 7:21 am, "Tomás Ó hÉilidhe" <> wrote:
    > Do you think we can reach any kind of consensus on whether the
    > following code's behaviour is undefined by the Standard?
    >
    > int my_array[5];
    >
    > int const *const pend = *(&my_array + 1);


    &X + 1 is a pointer to one-past-the-end.
    Dereferencing such a pointer this causes UB.
    Doesn't matter what data type the pointer is.
    Old Wolf, Feb 12, 2008
    #11
  12. Old Wolf:

    > &X + 1 is a pointer to one-past-the-end.
    > Dereferencing such a pointer this causes UB.
    > Doesn't matter what data type the pointer is.



    That's a very superficial way of looking at it.

    The REASON why it's UB to dereference a pointer to one-past-the-last is
    because it could result in an out-of-bounds memory access.

    With a pointer to an array, nothing happens when you dereference it -- all
    that happens is that you've got an expression of int[X] rather than int(*)
    [X].

    --
    Tomás Ó hÉilidhe
    Tomás Ó hÉilidhe, Feb 12, 2008
    #12
  13. Tomás Ó hÉilidhe:

    > With a pointer to an array, nothing happens when you dereference it --
    > all that happens is that you've got an expression of int[X] rather
    > than int(*) [X].



    In fact, I'd go one step further to say that the following should be legal:


    int (*parr)[X] = (int(*)[X])798797; /* Some random address (but which
    doesn't cause a trap)

    *parr;


    --
    Tomás Ó hÉilidhe
    Tomás Ó hÉilidhe, Feb 12, 2008
    #13
  14. Tomás Ó hÉilidhe

    Thad Smith Guest

    Tomás Ó hÉilidhe wrote:
    > Old Wolf:
    >
    >> &X + 1 is a pointer to one-past-the-end.
    >> Dereferencing such a pointer this causes UB.
    >> Doesn't matter what data type the pointer is.

    >
    > That's a very superficial way of looking at it.
    >
    > The REASON why it's UB to dereference a pointer to one-past-the-last is
    > because it could result in an out-of-bounds memory access.


    Perhaps your point is that the Standard /should/ have defined a behavior,
    but didn't. I agree with that.

    My reading is that a unary * applied to a function pointer is defined. A
    unary * applied to a pointer to an object is defined. There are no other
    cases defined for the unary * operator. Since &X+1 technically isn't a
    pointer to an object, *(&X+1) is undefined by omission.

    --
    Thad
    Thad Smith, Feb 12, 2008
    #14
  15. "Tomás Ó hÉilidhe" <> writes:
    > Old Wolf:
    >> &X + 1 is a pointer to one-past-the-end.
    >> Dereferencing such a pointer this causes UB.
    >> Doesn't matter what data type the pointer is.

    >
    > That's a very superficial way of looking at it.
    >
    > The REASON why it's UB to dereference a pointer to one-past-the-last is
    > because it could result in an out-of-bounds memory access.


    The reason why it's UB is that the standard doesn't define the
    behavior. (Though you've correctly described the rationale for what
    the standard says.)

    > With a pointer to an array, nothing happens when you dereference it -- all
    > that happens is that you've got an expression of int[X] rather than int(*)
    > [X].


    An expression of array type is converted to a pointer. There has to
    be something to convert in the first place.

    --
    Keith Thompson (The_Other_Keith) <>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Feb 12, 2008
    #15
  16. "Tomás Ó hÉilidhe" <> writes:
    > Tomás Ó hÉilidhe:
    >> With a pointer to an array, nothing happens when you dereference it --
    >> all that happens is that you've got an expression of int[X] rather
    >> than int(*) [X].

    >
    > In fact, I'd go one step further to say that the following should be legal:
    >
    > int (*parr)[X] = (int(*)[X])798797; /* Some random address (but which
    > doesn't cause a trap)
    > *parr;


    You're certainly free to argue that it *should* be legal.

    Actually, "legal" isn't the right word. It's not a syntax error or a
    constraint violation, so it's "legal" in the sense that no diagnostic
    is required. The question is whether the standard defines the
    behavior.

    parr is an lvalue. If it doesn't designate an object, then the
    behavior of evaluating *parr is undefined. As always, the consequence
    of undefined behavior can include doing nothing, or doing just what
    you wanted it to do.

    --
    Keith Thompson (The_Other_Keith) <>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Feb 12, 2008
    #16
  17. Keith Thompson:

    > An expression of array type is converted to a pointer. There has to
    > be something to convert in the first place.



    Yes but an array type isn't a value -- which is the very reason why
    arrays decay to a pointer to their first element, so that we can actually
    get a value out of them.

    --
    Tomás Ó hÉilidhe
    Tomás Ó hÉilidhe, Feb 12, 2008
    #17
  18. "Tomás Ó hÉilidhe" <> writes:
    > Keith Thompson:
    >> An expression of array type is converted to a pointer. There has to
    >> be something to convert in the first place.

    >
    > Yes but an array type isn't a value -- which is the very reason why
    > arrays decay to a pointer to their first element, so that we can actually
    > get a value out of them.


    To quibble over your choice of words, of course an array type isn't a
    value; an array type is a type. (I'm not picking on you, but
    precision is important.)

    Presumably what you meant is that there's no such thing as an array
    value. I think the standard is vague on this point, but I disagree;
    there *is* such a thing as an array value. The language just provides
    very few contexts in which array values become visible.

    C99 3.17 defines a "value" as the "precise meaning of the contents of
    an object when interpreted as having a specific type". I don't see
    how that excludes arrays. (It does seem to exclude the result of
    evaluating a non-lvalue expression, but that's a separate issue.)

    There clearly are struct values. Structs can be assigned, passed as
    function arguments, and returned as function results, all by copying
    the value. A struct value consists of the values of its members;
    for example, given:
    struct { int x; int y; } obj = { 10, 20 };
    the value of obj consists of the int values 10 and 20. A struct with
    a member of array type has a value that includes the value of the
    array member; that value consists of the values of the array's
    elements.

    Here's something to chew on. It probably says something about the
    original question, but I'm not sure what.

    int main(void)
    {
    struct s {
    int x;
    int y[2];
    } ;
    volatile struct s obj = { 10, { 20, 30 } };

    obj; /* Computes and discards the value of obj.
    Must access obj.x, obj.y[0], and obj.y[1]. */

    obj.x; /* Computes and discards the value of obj.x.
    Must access obj.x. */

    obj.y; /* Computes and discards the address of obj.y[0].
    Must this access obj.y[0] and obj.y[1]?
    *May* it do so?
    C&V? */

    return 0;
    }

    --
    Keith Thompson (The_Other_Keith) <>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Feb 12, 2008
    #18
  19. Tomás Ó hÉilidhe

    Thad Smith Guest

    Tomás Ó hÉilidhe wrote:
    > Old Wolf:
    >
    >> &X + 1 is a pointer to one-past-the-end.
    >> Dereferencing such a pointer this causes UB.
    >> Doesn't matter what data type the pointer is.

    >
    >
    > That's a very superficial way of looking at it.
    >
    > The REASON why it's UB to dereference a pointer to one-past-the-last is
    > because it could result in an out-of-bounds memory access.


    I would say that the reason that the behavior is undefined is that the
    committee didn't realize (or appreciate) the potential utility of defining
    the meaning of the unary * operator on pointer values derived from pointers
    to objects, but not themselves a pointer to an object.

    --
    Thad
    Thad Smith, Feb 13, 2008
    #19
  20. Tomás Ó hÉilidhe

    Kaz Kylheku Guest

    On Feb 11, 10:21 am, "Tomás Ó hÉilidhe" <> wrote:
    >     Do you think we can reach any kind of consensus on whether the
    > following code's behaviour is undefined by the Standard?
    >
    >     int my_array[5];
    >
    >     int const *const pend = *(&my_array + 1);


    You may have a pointer one element past the last element of an array
    object. However, my_array as whole is not an element of an array. So
    &myarray + 1 is invalid.

    What you are doing is similar to computing p below:

    int i, j[1];
    int *p = &i + 1; // not right, i is not an array object
    int *q = &j + 1; // okay, since j is an array object

    We can fix this in your example, similarly to the trick with j above:
    use a one-element array.

    But the dereference conundrum is still there:

    int my_array[1][5];
    int *p = my_array[1];

    The problem is clearer now: you're trying to create pointer-based
    access to an nonexistent array. The expression my_array[0] refers to a
    valid array element, which is an array of 5 ints. But there is no such
    array as my_array[1]. This my_array[1] expression has the /type/
    ``array of 5 int'', but it's not an object. You're allowed to point to
    it as a unit, but that's it.

    We can show the problem in these two steps:

    int my_array[1][5];
    int (*q)[5] = my_array + 1;

    Now q is a ``pointer to an array of 5 int'', correctly aimed one-
    element past the end of an array object. So far so good.

    What we're trying to do next is effectively the same as:

    int *p = q[0];

    We've been given a finger, and want to take the hand. Not happy with
    having a pointer one element past the end of an array object, we want
    a pointer to the first element of that nonexistent element. :)

    In fact the pointer we're trying to compute points to the same
    location as &my_array[0][5], which is allowed, and has the same type.
    One element past the end of my_array[0] would appear to be the same
    nonexistent thing as the first element of my_array[1] (indeed it has
    the same type and address) but the semantics is subtly different.

    But if q[0] is okay, why not &q[0][0]. If decay cancels out bad
    dereferencing, then address-of can also cancel out more bad
    dereferencing. And now you open the door to &q[0][1]. If we can point
    to the first element of a nonexistent array of 5 int, why not the
    second? It's because we know that the justification for the first
    element is that it's really one element past the end of something.
    However, we didn't arrive at it that way.

    /How/ we arrive at a value can determine whether or not it is correct,
    not just the final value itself. If I have two int objects i, and j,
    and perform arithmetic on &i so that the result points to j, that's
    not correct, even though the result is indistinguishable from the
    correct value &j.

    Fact is, a bounds checking compiler could be designed to enforce the
    semantic rule that dereferencing an out-of-bounds pointer is not
    allowed under any circumstances, and consequently that array-to-
    pointer decay can only happen over a valid array object.

    >     Considering the syntax of the language, then we definitely do
    > dereference an invalid pointer... but if we consider the mechanics of the
    > language, then we know that nothing "happens" when we dereference a pointer
    > to an array, because arrays are dealt with in terms of pointers.


    We could also argue that ``nothing'' happens when you merely increment
    a pointer out of bounds.
    Kaz Kylheku, Feb 13, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Denis Palmeiro

    NULL Pointer Dereference

    Denis Palmeiro, Jul 8, 2003, in forum: C Programming
    Replies:
    10
    Views:
    652
    Shill
    Jul 16, 2003
  2. Replies:
    9
    Views:
    544
    Bo Persson
    Feb 11, 2006
  3. somenath

    pointer dereference

    somenath, Jul 12, 2007, in forum: C Programming
    Replies:
    34
    Views:
    907
    Anurag
    Jul 18, 2007
  4. somenath

    pointer dereference

    somenath, Aug 9, 2007, in forum: C Programming
    Replies:
    12
    Views:
    649
    Martin Ambuhl
    Aug 10, 2007
  5. Nyang A. Phra

    Function pointer dereference security

    Nyang A. Phra, Nov 11, 2007, in forum: C Programming
    Replies:
    0
    Views:
    280
    Nyang A. Phra
    Nov 11, 2007
Loading...

Share This Page