Out-of-bounds nonsense

Discussion in 'C Programming' started by Frederick Gotham, Nov 1, 2006.

  1. [ This post deals with both C and C++, but does not alienate either
    language because the language feature being discussed is common to both
    languages. ]

    Over on comp.lang.c, we've been discussing the accessing of array elements
    via subscript indices which may appear to be out of range. In particular,
    accesses similar to the following:

    int arr[2][2];

    arr[0][3] = 7;

    Both the C Standard and the C++ Standard necessitate that the four int's be
    lain out in memory in ascending order with no padding in between, i.e.:

    (best viewed with a monowidth font)

    --------------------------------
    | Memory Address | Object |
    --------------------------------
    | 0 | arr[0][0] |
    | 1 | arr[0][1] |
    | 2 | arr[1][0] |
    | 3 | arr[1][1] |
    --------------------------------

    One can see plainly that there should be no problem with the little snippet
    above because arr[0][3] should be the same as arr[1][1], but I've had
    people over on comp.lang.c telling me that the behaviour of the snippet is
    undefined because of an "out of bounds" array access. They've even backed
    this up with a quote from the C Standard:

    J.2 Undefined behavior:
    The behavior is undefined in the following circumstances:
    [...]
    - An array subscript is out of range, even if an object is apparently
    accessible with the given subscript (as in the lvalue expression
    a[1][7] given the declaration int a[4][5]) (6.5.6).

    Are the same claims of undefined behaviour existing in C++ made by anyone?

    If it is claimed that the snippet's behaviour is undefined because the
    second subscript index is out of range of the dimension, then this
    rationale can be brought into doubt by the following breakdown. First let's
    look at the expression statement:

    arr[0][3] = 9;

    The compiler, both in C and in C++, must interpret this as:

    *( *(arr+0) + 3 ) = 9;

    In the inner-most set of parentheses, "arr" decays to a pointer to its
    first element, i.e. an R-value of the type int(*)[2]. The value 0 is then
    added to this address, which has no effect. The address is then
    dereferenced, yielding an L-value of the type int[2]. This expression then
    decays to a pointer to its first element, yielding an R-value of the type
    int*. The value 3 is then added to this address. (In terms of bytes, it's p
    += 3 * sizeof(int)). This address is then dereferenced, yielding an L-value
    of the type int. The L-value int is then assigned to.

    The only thing that sounds a little dodgy in the above paragraph is that an
    L-value of the type int[2] is used as a stepping stone to access an element
    whose index is greater than 1 -- but this shouldn't be a problem, because
    the L-value decays to a simple R-value int pointer prior to the accessing
    of the int object, so any dimension info should be lost by then.

    To the C++ programmers: Is the snippet viewed as invoking undefined
    behaviour? If so, why?

    To the C programmers: How can you rationalise the assertion that it
    actually does invoke undefined behaviour?

    I'd like to remind both camps that, in other places, we're free to use our
    memory however we please (given that it's suitably aligned, of course). For
    instance, look at the following. The code is an absolute dog's dinner, but
    it should work perfectly on all implementations:

    /* Assume the inclusion of all necessary headers */

    void Output(int); /* Defined elsewhere */

    int main(void)
    {
    assert( sizeof(double) > sizeof(int) );

    { /* Start */

    double *p;
    int *q;
    char unsigned const *pover;
    char unsigned const *ptr;

    p = malloc(5 * sizeof*p);
    q = (int*)p++;
    pover = (char unsigned*)(p+4);
    ptr = (char unsigned*)p;
    p[3] = 2423.234;
    *q++ = -9;


    do Output(*ptr++);
    while (pover != ptr);

    return 0;

    } /* End */
    }

    Another thing I would remind both camps of, is that we can access any
    memory as if it were simply an array of unsigned char's. That means we can
    access an "int[2][2]" as if it were simply an object of the type "char
    unsigned[sizeof(int[2][2])]".

    The reason I'm writing this is that, at the moment, it sounds like absolute
    nonsense to me that the original snippet's behaviour is undefined, and so I
    challenge those who support its alleged undefinedness.

    I leave you with this:

    int arr[2][2];

    void *const pv = &arr;

    int *const pi = (int*)pv; /* Cast used for C++ programmers! */

    pi[3] = 8;

    --

    Frederick Gotham
    Frederick Gotham, Nov 1, 2006
    #1
    1. Advertising

  2. Frederick Gotham <> writes:
    > [ This post deals with both C and C++, but does not alienate either
    > language because the language feature being discussed is common to both
    > languages. ]
    >
    > Over on comp.lang.c, we've been discussing the accessing of array elements
    > via subscript indices which may appear to be out of range.

    [snip]

    This was multi-posted to at least two newsgroups, comp.std.c and
    comp.lang.c. (Given the content, it may have been posted to one or
    more C++ newsgroups as well, but I haven't checked.)

    I mention this so that readers will be aware of it when deciding
    whether and where to post a followup.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Nov 1, 2006
    #2
    1. Advertising

  3. Keith Thompson:

    > This was multi-posted to at least two newsgroups, comp.std.c and
    > comp.lang.c. (Given the content, it may have been posted to one or
    > more C++ newsgroups as well, but I haven't checked.)
    >
    > I mention this so that readers will be aware of it when deciding
    > whether and where to post a followup.



    I wasn't sure how preferable it was over cross-posting, although I know that
    my own newsreader makes a mess of cross-posts (...not to mention I don't
    quite understand how they're supposed to work).

    I have indeed posted to both C newsgroups and C++ newsgroups.

    --

    Frederick Gotham
    Frederick Gotham, Nov 1, 2006
    #3
  4. Frederick Gotham

    Eric Sosman Guest

    Frederick Gotham wrote:

    > [...] but I've had
    > people over on comp.lang.c telling me that the behaviour of the snippet is
    > undefined because of an "out of bounds" array access. They've even backed
    > this up with a quote from the C Standard:
    > [...]


    Frederick, you are under no obligation to believe. But
    if you choose to disbelieve, do the believers the courtesy of
    leaving the temple quietly. Any door you like, just stop
    the mewling. Please?

    "The man convinced against his will
    Is of the same opinion still."

    --
    Eric Sosman
    lid
    Eric Sosman, Nov 1, 2006
    #4
  5. Frederick Gotham said:

    >
    > [ This post deals with both C and C++, but does not alienate either
    > language because the language feature being discussed is common to both
    > languages. ]


    The C++ parts are irrelevant here.

    > Over on comp.lang.c,


    Huh? This *is* comp.lang.c.

    > we've been discussing the accessing of array elements
    > via subscript indices which may appear to be out of range.


    Yes, and it's undefined behaviour, as has been explained more than ad
    nauseam.

    <snip>

    > One can see plainly that there should be no problem with the little
    > snippet above


    No, one can plainly see that the behaviour is undefined, and that's a
    problem.

    > Are the same claims of undefined behaviour existing in C++ made by anyone?


    Questions about C++ are off-topic here.

    <snip>

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at above domain (but drop the www, obviously)
    Richard Heathfield, Nov 1, 2006
    #5
  6. Richard Heathfield:

    > Yes, and it's undefined behaviour, as has been explained more than ad
    > nauseam.



    Not ad nauseam enough.

    If the following is well-defined:

    int *const p = malloc(5 * sizeof *p);

    p[2] = 6;

    , then I don't see how my original snippet cannot be.

    --

    Frederick Gotham
    Frederick Gotham, Nov 1, 2006
    #6
  7. Frederick Gotham

    Flash Gordon Guest

    Frederick Gotham wrote:
    > Richard Heathfield:
    >
    >> Yes, and it's undefined behaviour, as has been explained more than ad
    >> nauseam.

    >
    > Not ad nauseam enough.
    >
    > If the following is well-defined:
    >
    > int *const p = malloc(5 * sizeof *p);
    >
    > p[2] = 6;


    It's allowed because the standard says it is allowed.

    > , then I don't see how my original snippet cannot be.


    It is undefined behaviour because that is what the standards committee
    decided. They even made it clear in one of the annexes (as someone else
    pointed out to you), so even if you can't follow the reasoning from the
    normative text you can see that it is what the committee intended. If
    you cannot accept what the committee clearly state then perhaps you
    should write your own language which is defined as you thing it should
    be and use that instead of C.
    --
    Flash Gordon
    Flash Gordon, Nov 1, 2006
    #7
  8. Frederick Gotham said:

    > Richard Heathfield:
    >
    >> Yes, and it's undefined behaviour, as has been explained more than ad
    >> nauseam.

    >
    >
    > Not ad nauseam enough.


    Maybe you have a higher nausea threshold than many of us.

    > If the following is well-defined:
    >
    > int *const p = malloc(5 * sizeof *p);
    >
    > p[2] = 6;
    >
    > , then I don't see how my original snippet cannot be.


    That's your problem, not ours. The Standard forbids access outside the
    bounds of an array. If you wish to violate that prohibition, that's your
    choice but, if you do so, the behaviour of the program is undefined. You
    may not like the fact, but the ISO C Standard is not concerned with your
    (or my) likes or dislikes.

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at above domain (but drop the www, obviously)
    Richard Heathfield, Nov 1, 2006
    #8
  9. Frederick Gotham <> wrote:

    > int arr[2][2];
    > arr[0][3] = 7;


    Yep, undefined behavior indeed. Surprising, but that's what the
    standard says. Your code may break at the next compiler upgrade.


    > Both the C Standard and the C++ Standard necessitate that the four int's be
    > lain out in memory in ascending order with no padding in between, i.e.:


    That sounds right, but to write portable code you will need to
    express your intent with an explicit cast.

    ((int * const) arr[0])[3]= 7; /* ugly */
    or
    {
    int * const tmp= arr[0]; /* wordy */
    tmp[3]= 7;
    }


    --
    pa at panix dot com
    Pierre Asselin, Nov 1, 2006
    #9
  10. Pierre Asselin:

    > That sounds right, but to write portable code you will need to
    > express your intent with an explicit cast.
    >
    > ((int * const) arr[0])[3]= 7;


    Firstly, all casts yield an R-value, so the const is redudant. That would
    leave us with:

    ((int*)arr[0])[3] = 7;

    Secondly, the cast is redundant, because "arr[0]" decays to a pointer to its
    first element, and no cast is required.

    Still though, people seem to think it invokes undefined behaviour.

    --

    Frederick Gotham
    Frederick Gotham, Nov 1, 2006
    #10
  11. Frederick Gotham

    Chris Dollin Guest

    Frederick Gotham wrote:

    > Pierre Asselin:
    >
    >> That sounds right, but to write portable code you will need to
    >> express your intent with an explicit cast.
    >>
    >> ((int * const) arr[0])[3]= 7;

    >
    > Firstly, all casts yield an R-value, so the const is redudant. That would
    > leave us with:
    >
    > ((int*)arr[0])[3] = 7;
    >
    > Secondly, the cast is redundant, because "arr[0]" decays to a pointer to its
    > first element, and no cast is required.
    >
    > Still though, people seem to think it invokes undefined behaviour.
    >


    int arr[2][2];
    arr[0][3] = 7;

    `arr[0]` has type `array[2]int`. Clearly such an object has no
    element at index 3. BOOM.

    (It doesn't matter that `arr[0]` then decays into a pointer-to-int.
    That pointer only points to /2/ ints. That there are more ints
    afterward, even that there are /surely/ more ints afterward, doesn't
    stop it being undefined. Think of it as the Standard permitting an
    implementation to do bounds-checking.)

    (Similarly, if the Standard were to say that use of any identifier
    ending in `kers` yielded undefined behaviour, then using
    `bonkers` or `blinkers` in your code would yeild undefined
    behaviour, even if the implementation were unchanged from whatever
    it now is. Implementations don't have to go out of their way to
    make undefined constructs have bizarre behaviour. Of course the
    Standard would never make such a generic constraint on names,
    so you don't have to avoid `inkers` or `thankers` or `streakers`
    as names in your code ...)

    (fx:BOOM)

    --
    Chris "everyone knows it's flat" Dollin
    "We did not have time to find out everything we wanted to know."
    - James Blish, /A Clash of Cymbals/
    Chris Dollin, Nov 1, 2006
    #11
  12. "Frederick Gotham" <> wrote in message
    news:2qT1h.15440$...
    >
    > [ This post deals with both C and C++, but does not alienate either
    > language because the language feature being discussed is common to both
    > languages. ]
    >
    > Over on comp.lang.c, we've been discussing the accessing of array elements
    > via subscript indices which may appear to be out of range. In particular,
    > accesses similar to the following:
    >
    > int arr[2][2];
    >
    > arr[0][3] = 7;
    >

    <snip>
    > Frederick Gotham


    Consider what happens when you pass this to a function

    foo( arr, 2, 2 );

    and foo is defined as:

    void foo( int **arr, int dim1, int dim2 ) {
    /*
    * you think this is OK as long as [0][3]
    * is inside the bounds of [2][2] ?
    */
    arr[0][3] = 7;
    }

    Now foo can't determine whether you passed a 2D array to it,
    or a pointer to a pointer to int.

    Now supposes somewhere else I write this code:
    int **arr2;
    arr2 = malloc( 2 * sizeof (*arr) );
    for ( i=0; i < 2; i++ ) {
    arr2 = malloc( 2 * sizeof(*arr2) );
    }
    foo( arr2 ) ;

    What will happen in foo() ?
    --
    Fred L. Kleinschmidt
    Boeing Associate Technical Fellow
    Technical Architect, Software Reuse Project
    Fred Kleinschmidt, Nov 1, 2006
    #12
  13. Chris Dollin:

    > int arr[2][2];
    > arr[0][3] = 7;
    >
    > `arr[0]` has type `array[2]int`.



    The type in question is written as: int[2]


    > Clearly such an object has no
    > element at index 3. BOOM.



    No, but it's part of a contiguous sequence of memory.

    --

    Frederick Gotham
    Frederick Gotham, Nov 1, 2006
    #13
  14. What ever happened to the idea of contiguous memory? When I define the
    following object:

    int arr[2][2];

    , the type of the object "arr" is: int[2][2]

    It consists of four int objects which are lain out contiguously in memory.

    Therefore, if we take the address of the first int, why can't we add to that
    address to yield the addresses of the int's which are directly after it in
    contiguous memory? Isn't that one of the fundamental faculties of pointers?

    --

    Frederick Gotham
    Frederick Gotham, Nov 1, 2006
    #14
  15. Do you think there's anything wrong with the following?

    int arr[2][2];

    int *p = *arr;

    *p++ = 1;
    *p++ = 2;
    *p++ = 3;
    *p++ = 4;

    --

    Frederick Gotham
    Frederick Gotham, Nov 1, 2006
    #15
  16. Fred Kleinschmidt:

    > Consider what happens when you pass this to a function
    >
    > foo( arr, 2, 2 );
    >
    > and foo is defined as:
    >
    > void foo( int **arr, int dim1, int dim2 ) {
    > /*
    > * you think this is OK as long as [0][3]
    > * is inside the bounds of [2][2] ?
    > */
    > arr[0][3] = 7;
    > }



    Thankfully, there's no implicit conversion from int[2][2] to int**.

    It would appear you have confused a multi-dimensional array with an array
    of pointers to arrays... ?


    > Now supposes somewhere else I write this code:
    > int **arr2;



    Here you define a pointer to a pointer to an int.


    > arr2 = malloc( 2 * sizeof (*arr) );



    Here you allocate enough memory for two int pointers.


    > for ( i=0; i < 2; i++ ) {
    > arr2 = malloc( 2 * sizeof(*arr2) );
    > }
    > foo( arr2 ) ;



    I think this confirms my suspicion that you're thinking of arrays of
    pointers to arrays, rather than multi-dimensional arrays.

    Oh, by the way, a multi-dimensonal array is merely an array of arrays.

    --

    Frederick Gotham
    Frederick Gotham, Nov 1, 2006
    #16
  17. [OT] Re: Out-of-bounds nonsense

    Frederick Gotham wrote:
    > Pierre Asselin:
    >
    >> That sounds right, but to write portable code you will need to
    >> express your intent with an explicit cast.
    >>
    >> ((int * const) arr[0])[3]= 7;

    >
    > Firstly, all casts yield an R-value, so the const is redudant. That would
    > leave us with:
    >
    > ((int*)arr[0])[3] = 7;
    >

    I totally love this word you just created:

    redudant
    adj 1. More dude than is needed or required; "being that cool is
    just redudant, dude"
    Clever Monkey, Nov 1, 2006
    #17
  18. Frederick Gotham said:

    >
    > Do you think there's anything wrong with the following?
    >
    > int arr[2][2];
    >
    > int *p = *arr;


    *arr is equivalent to arr[0], which is an array of two int. It is acceptable
    for p to point to the first element in this array, so the assignment is
    fine.

    > *p++ = 1;


    No problem. Now arr[0][0] has the value 1, and p points to arr[0][1].

    > *p++ = 2;


    No problem. Now arr[0][1] has the value 2, and p points one past the end of
    the arr[0] array.

    > *p


    Illegal dereference of p. The behaviour is undefined.

    And it will remain undefined, no matter which way you cut it.

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at above domain (but drop the www, obviously)
    Richard Heathfield, Nov 1, 2006
    #18
  19. Frederick Gotham said:

    >
    > What ever happened to the idea of contiguous memory? When I define the
    > following object:
    >
    > int arr[2][2];
    >
    > , the type of the object "arr" is: int[2][2]
    >
    > It consists of four int objects which are lain out contiguously in memory.
    >
    > Therefore, if we take the address of the first int, why can't we add to
    > that address to yield the addresses of the int's which are directly after
    > it in contiguous memory?


    You can, as long as you don't exceed the bounds of any array.

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at above domain (but drop the www, obviously)
    Richard Heathfield, Nov 1, 2006
    #19
  20. Frederick Gotham

    Jordan Abel Guest

    2006-11-01 <>,
    Richard Heathfield wrote:
    > Frederick Gotham said:
    >
    >>
    >> Do you think there's anything wrong with the following?
    >>
    >> int arr[2][2];
    >>
    >> int *p = *arr;

    >
    > *arr is equivalent to arr[0], which is an array of two int. It is acceptable
    > for p to point to the first element in this array, so the assignment is
    > fine.
    >
    >> *p++ = 1;

    >
    > No problem. Now arr[0][0] has the value 1, and p points to arr[0][1].
    >
    >> *p++ = 2;

    >
    > No problem. Now arr[0][1] has the value 2, and p points one past the end of
    > the arr[0] array.
    >
    >> *p

    >
    > Illegal dereference of p. The behaviour is undefined.
    >
    > And it will remain undefined, no matter which way you cut it.


    ok. So how about if instead of int *p = *arr; you instead use this:
    int *p;

    p = (int *)(unsigned char *)arr;
    p[0]=0; p[1]=1; /* no problems */
    p[2]=2; p[3]=3; /* is this legal? */

    /* assuming the above wasn't wrong, or if it was wrong, wasn't executed */
    p = (int *)((unsigned char *)arr+2*sizeof(int))
    p[0]=2; p[1]=3; /* is this legal? */
    Jordan Abel, Nov 1, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    11,887
    Thomas Hawtin
    Nov 17, 2005
  2. Asfand Yar Qazi
    Replies:
    3
    Views:
    559
  3. Christian Tismer
    Replies:
    0
    Views:
    254
    Christian Tismer
    Apr 2, 2004
  4. Johnny Lee

    Why the nonsense number appears?

    Johnny Lee, Oct 31, 2005, in forum: Python
    Replies:
    9
    Views:
    334
    Steve Horsley
    Nov 1, 2005
  5. Frederick Gotham

    Out-of-bounds Nonsense

    Frederick Gotham, Nov 1, 2006, in forum: C++
    Replies:
    7
    Views:
    404
    Kai-Uwe Bux
    Nov 2, 2006
Loading...

Share This Page