out of range array subscript

Discussion in 'C Programming' started by Richard Delorme, May 3, 2004.

  1. The n869 draft says:

    J.2 Undefined behavior

    [#1] The behavior is undefined in the following
    circumstances:

    -- An array subscript is out of range, even if an object
    is apparently accessible with the given subscript (as
    in the lvalue expression a[1][7] given the declaration
    int a[4][5]) (6.5.6).

    I am wondering if a cast can change this behaviour to something well
    defined, ie if the las following line is ok:

    int a[4][5];

    a[1][7] = 3; /* <- undefined behaviour */

    ((int*)a[1])[7] = 42; /* defined or not? */

    --
    Richard
    Richard Delorme, May 3, 2004
    #1
    1. Advertising

  2. Richard Delorme wrote:
    > The n869 draft says:
    >
    > J.2 Undefined behavior
    >
    > [#1] The behavior is undefined in the following
    > circumstances:
    >
    > -- An array subscript is out of range, even if an object
    > is apparently accessible with the given subscript (as
    > in the lvalue expression a[1][7] given the declaration
    > int a[4][5]) (6.5.6).
    >
    > I am wondering if a cast can change this behaviour to something well
    > defined, ie if the las following line is ok:
    >
    > int a[4][5];
    >
    > a[1][7] = 3; /* <- undefined behaviour */
    >
    > ((int*)a[1])[7] = 42; /* defined or not? */


    Still undefined. The cast does not change the type of the object
    itself, only the type of the expression you are using to access the
    object.

    For example, imagine an array bounds checking implementation which
    represents a[1] as the triple (pointer to a[1][], offset 0, size 5).
    The cast may preserve this representation, just as it would with a
    pointer to a malloced area. The lvalue (a[1][], offset 0, size 5)[7]
    translates into *(a[1][], offset 7, size 5), and fails on the array
    bounds check that offset < size. (Or the check offset <= size, if you
    had not dereferenced the pointer.)

    --
    Hallvard
    Hallvard B Furuseth, May 3, 2004
    #2
    1. Advertising

  3. On 3 May 2004, Hallvard B Furuseth wrote:

    > Richard Delorme wrote:
    > > The n869 draft says:
    > >
    > > J.2 Undefined behavior
    > >
    > > [#1] The behavior is undefined in the following
    > > circumstances:
    > >
    > > -- An array subscript is out of range, even if an object
    > > is apparently accessible with the given subscript (as
    > > in the lvalue expression a[1][7] given the declaration
    > > int a[4][5]) (6.5.6).
    > >
    > > I am wondering if a cast can change this behaviour to something well
    > > defined, ie if the las following line is ok:
    > >
    > > int a[4][5];
    > >
    > > a[1][7] = 3; /* <- undefined behaviour */
    > >
    > > ((int*)a[1])[7] = 42; /* defined or not? */

    >
    > Still undefined. The cast does not change the type of the object
    > itself, only the type of the expression you are using to access the
    > object.


    So let's alter it slightly just to make things unclear:

    ((int*)(a+1))[7] = 42; /* still UB? */
    Jarno A Wuolijoki, May 3, 2004
    #3
  4. Jarno A Wuolijoki wrote:
    > On 3 May 2004, Hallvard B Furuseth wrote:
    >>> int a[4][5];

    >
    >>> ((int*)a[1])[7] = 42; /* defined or not? */

    >>
    >> Still undefined. The cast does not change the type of the object
    >> itself, only the type of the expression you are using to access the
    >> object.

    >
    > So let's alter it slightly just to make things unclear:
    >
    > ((int*)(a+1))[7] = 42; /* still UB? */


    Sure, still UB. `a+1' degenerates to `&a[0] + 1' = `&a[1]', so it still
    refers to an object where the [7] index goes above the array bound.

    OTOH, this is OK:

    ((int*)&a)[1*5 + 7] = 42;

    because `&a' gives the address of the entire object `a', which consists
    of 20 `int's, and 1*5 + 7 < 20.

    --
    Hallvard
    Hallvard B Furuseth, May 3, 2004
    #4
  5. On 3 May 2004, Hallvard B Furuseth wrote:

    > Jarno A Wuolijoki wrote:
    > > On 3 May 2004, Hallvard B Furuseth wrote:
    > >>> int a[4][5];
    > >>> ((int*)a[1])[7] = 42; /* defined or not? */
    > >>
    > >> Still undefined. The cast does not change the type of the object
    > >> itself, only the type of the expression you are using to access the
    > >> object.

    > >
    > > So let's alter it slightly just to make things unclear:
    > > ((int*)(a+1))[7] = 42; /* still UB? */

    >
    > Sure, still UB. `a+1' degenerates to `&a[0] + 1' = `&a[1]', so it still
    > refers to an object where the [7] index goes above the array bound.


    Wouldn't, say, a+1+2 be one as well by that logic?

    a+1 becomes &a[1] and &a[1]+2 would be an out of bounds access of a[1].

    Or does (int*)&a[1] somehow descend to the "subobject" in a way
    mere &a[1] doesn't?
    Jarno A Wuolijoki, May 3, 2004
    #5
  6. Richard Delorme

    Chris Torek Guest

    [I *think* I have all the attributions correct...]

    Someone wrote:
    >>>>> int a[4][5];
    >>>>> ((int*)a[1])[7] = 42; /* defined or not? */


    >>> On 3 May 2004, Hallvard B Furuseth wrote:
    >>>> Still undefined. The cast does not change the type of the object
    >>>> itself, only the type of the expression you are using to access the
    >>>> object.


    >> Jarno A Wuolijoki wrote:
    >>> So let's alter it slightly just to make things unclear:
    >>> ((int*)(a+1))[7] = 42; /* still UB? */


    >On 3 May 2004, Hallvard B Furuseth wrote:
    >>Sure, still UB. `a+1' degenerates to `&a[0] + 1' = `&a[1]', so it still
    >>refers to an object where the [7] index goes above the array bound.


    In article
    <news:p>
    Jarno A Wuolijoki <> writes:
    >Wouldn't, say, a+1+2 be one as well by that logic?


    Assuming "be one" means "be an instance of undefined behavior": no.
    Let me give the intermediate expressions names here:

    >a+1 becomes &a[1] and &a[1]+2 would be an out of bounds access of a[1].


    a+1 and &a[1] denote the same thing, a value of type "pointer to
    array 5 of int" pointing to the row of 5 "int"s in a[1]:

    int (*p)[5] = &a[1];
    /*
    * Now we have, e.g.:
    *
    * a[0]: { 0, 1, 2, 3, 4}
    * a[1]: { 5, 6, 7, 8, 9}
    * a[2]: {10,11,12,13,14}
    * a[3]: {15,16,17,18,19}
    *
    * and p points to all of a[1].
    */

    Adding 2 to this value steps forward by two of the objects to
    which this points:

    int (*q)[5] = p + 2;

    Since "p" points to one complete row of 5 "int"s, q steps forward
    by two complete rows of 5 "int"s, and now points to all of a[3].

    Note that *p is an array -- it names all 5 elements of a[1] --
    and *q is also an array (all 5 elements of a[3]). Because *p
    and *q are arrays, they are subject to The Rule about arrays
    and pointers in C, namely:

    In a value context, an object of type "array N of T"
    becomes a value of type "pointer to T", pointing to the
    first element of that array, i.e., the one with subscript 0.

    So if we write (*p)[2], that puts *p -- an array object --
    into a value context to subscript it with "[2]", and thus converts
    from "the entire row {5,6,7,8,9} as an object" into "a pointer
    to the array's first element, i.e., a pointer to the int 5".
    Subscripting by 2 is really pointer arithmetic, where to add
    2 we move forward by 2 of whatever it is that this new pointer
    points to -- in this case, 2 "int"s. This takes us from pointing
    to the 5 to pointing to the 7, and then the last step is to
    indirect again, giving the "int" 7 as an object.

    Hence:

    (*p)[2] += 70;

    makes a[1][2] change from 7 to 77. We can write (*p) as p[0]
    of course:

    p[0][2] += 70; /* same thing */

    and moreover, we can use pointer arithmetic on p -- as we did
    to get q -- as part of the subscripting operation:

    p[1][2] -= 20;

    Since p[1] "means" *(p + 1), we move forward by one of the things
    that "p" points to, i.e., one "array 5 of int" row in the array
    named "a". Now we point to the entire row {10,11,12,13,14}. The
    indirection gets us the entire array object, which "decays" to a
    pointer per The Rule for the [2] step. Subscriping this pointer
    works just as before, and p[1][2] names the same single "int" as
    a[2][2], which in the example above is "12" initially. Subtracting
    20 from it changes a[2][2] from 12 to -8.

    >Or does (int*)&a[1] somehow descend to the "subobject" in a way
    >mere &a[1] doesn't?


    Here we have &a[1] -- a pointer to the entire row {5,6,7,8,9} in
    the example above -- and convert the pointer through pointer casting
    into some other pointer. What exactly does this do? The rules
    here are at least to some extent up to the implementation. The
    "input" type (before the cast) is "int (*)[5]" or "pointer to array
    5 of int". The "output" type is "int *", or "pointer to int".
    The implementor gets to decide how to achieve the conversion and
    what the result should be.

    Suppose a hypothetical Evil Implementor decides that the rule is
    "conversion of `pointer to array N of T' to `pointer to T' finds
    the N-1th element of the array and gives you that pointer". (This
    would be quite *surprising* but I do not believe the C standards
    forbid it. It might even be the "natural" implementation on certain
    ancient IBM mainframes, the ones where Fortran array subscripting
    worked from the top down, as it were.) In this case, the result
    of (int *)p -- where p points to the entire row {5,6,7,8,9} --
    would be a pointer to the element a[1][4], currently holding 9.

    Of course, the "usual" implementation is to take the *lowest*
    machine byte or word address, so that (int *)p is just a pointer
    to the element a[1][0], currently holding 5.

    This pointer cast, like all pointer casts, should be viewed with
    suspicion. ("Cast a jaundiced eye upon the pointer"? :) See
    <http://www.phrases.org.uk/bulletin_board/19/messages/133.html>)
    C's history, particularly with "const", makes some pointer
    casts inevitable in some code, but the more you "evit" :) them
    the better, in general. (Aside: www.m-w.com claims "evitable"
    *is* a word. I thought it was a "lost positive" myself.)
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
    Chris Torek, May 15, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andy
    Replies:
    6
    Views:
    680
    James Kanze
    May 11, 2007
  2. Han
    Replies:
    4
    Views:
    7,975
  3. yogi_bear_79
    Replies:
    11
    Views:
    958
    James Kanze
    Mar 16, 2008
  4. Lukelrc

    Subscript out of range error

    Lukelrc, May 19, 2004, in forum: ASP General
    Replies:
    2
    Views:
    207
    Mark Schupp
    May 19, 2004
  5. Replies:
    18
    Views:
    214
    McKirahan
    Jan 12, 2005
Loading...

Share This Page