What does the standard say about array access wraparound?

Discussion in 'C Programming' started by David Mathog, May 27, 2004.

  1. David Mathog

    David Mathog Guest

    If this:

    int i,sum;
    int *array;
    for(sum=0, i=0; i<len; i++){
    sum += array;
    }

    is converted to this (never mind why for the moment):

    int i,sum;
    int *array;
    int *arrl;
    arl=&array[-len];
    for(sum=0,i=len; i<2*len; i++){
    sum += arrl;
    }

    it should give the same result. But there are some funny
    things that can happen. For instance, if &array is 1000 and
    len is 100000. In that case arrl will hold an address
    (1000-100000) which presumably wraps around since the
    pointer should be an unsigned int (whatever size int is).
    The address it points to will be MAX_POINTER - 100000 + 1000.
    When the second form loop loop begins i=len (100000) so
    arrl[100000] will wrap back around and point to the same
    place as array[0].

    Or will it?

    It seems possible that this sort of array access "off the top of
    memory" could trigger a fault.

    What does the C standard say about this (if anything)?

    Thanks,

    David Mathog

    Manager, Sequence Analysis Facility, Biology Division, Caltech
     
    David Mathog, May 27, 2004
    #1
    1. Advertisements

  2. David Mathog

    Mike Wahler Guest

    Note that the object you've named 'array' is
    *not* an array, it's a pointer. Also note that
    you've not given this pointer a valid value,
    so evaluating it will produce undefined behavior.
    You haven't defined 'len'.
    Also note that if you want to use them to index
    (or record the size of) an array, the objects 'i'
    and 'len' should be of type 'size_t', not 'int'.


    Undefined behavior.
    Two pointers. No arrays.
    Undefined behavior. Even if the pointer named 'array'
    had a valid value, the behavior is still undefined.
    The only valid indices are those which refer within
    the same object.


    Well, yes, undefined == undefined.
    That's essentially what 'undefined behavior' is.
    Another 'funny thing' that might happen is that
    your keyboard could emit 100,000 volts.
    There is no such thing as 'wrapping around' of arrays in C.
    A pointer is *not* an integer. It's a pointer. Its
    representation and structure is implementation-specific.

    There's no such thing as 'MAX_POINTER' in C.
    Yes. No. Maybe. Sometimes. Never. Only on Wednesdays
    when it rains in Tokyo. Your code produces undefined behavior.
    The fault is with your code.
    It says your code is not valid C.

    Which C book(s) are you reading?

    -Mike
     
    Mike Wahler, May 27, 2004
    #2
    1. Advertisements

  3. David Mathog

    Dan Pop Guest



    Do yourself a favour and read the FAQ. Don't come back until you've
    finished it!

    Dan
     
    Dan Pop, May 27, 2004
    #3
  4. David Mathog

    Case Guest



    Huh, why should a variable used as array index be of
    type 'size_t'? K&R-2nd/ANSI uses 'int' quite often
    to index arrays.

    Case
     
    Case, May 27, 2004
    #4
  5. David Mathog

    Default User Guest

    David Mathog wrote:

    [a bunch of crazy bad code with undefined behavior]


    I suggest to thoroughly learn the language before trying this sort of
    whacky stuff. Once you become familiar with using arrays and pointers,
    then you'll also find you don't need to ask these questions, you'll
    already have the answers.




    Brian Rodenborn
     
    Default User, May 27, 2004
    #5
  6. David Mathog

    Stephen L. Guest




    This is a little bit better of a starting point...

    int
    main (void)
    {
    int the_array[100]; /* Note, we're not initializing the
    array... */
    int i,sum;
    int *array = the_array; /* same as ... = &the_array[ 0 ]; */
    int len = sizeof (the_array) / sizeof (the_array[ 0 ]);

    for(sum=0, i=0; i<len; i++){
    sum += array;
    }

    return (0);
    }

    Now, the C language does _not_ keep track of
    array bounds; arrays decay to a pointer (to
    the 1st element of the object). As such, "indexing"
    through an array in C will not "wrap-around"
    to the beginning/end when either the end/beginning
    is reached. It is the programmer's responsibility
    to keep track of his array(s) and their bounds.

    I believe standard (I can't cite the exact paragraph
    - para-phrased) says that if the result of any
    pointer calculation is not within the object,
    the results are undefined.

    For example, using the above -

    array[ -1 ]

    would be undefined. There are some languages
    that would produce a nice run-time diagnostic
    for such a reference; C is not one of them.
    A reference like the above may do nothing
    meaningful (it may access memory not part of the
    object), or produce a "memory violation" error.
    It could do both in the same program, at different
    times. Those are just a couple of examples of
    "undefined behavior".

    Having said all of that, you can do something like -


    int
    main (void)
    {
    int the_array[100]; /* Note, we're not initializing the
    array... */
    int i,sum;
    int len = sizeof (the_array) / sizeof (the_array[ 0 ]);
    int *array = &the_array[ len - 1 ];


    for(sum=0, i=0; i < len; i++){
    sum += array[ -i ];
    }

    return (0);
    }

    Personally, I avoid constructs like the above
    because it "looks" like I'm advancing through the
    array instead of going from the end to the beginning.

    I'm not sure how you arrived at the "translation"
    you provided, though.


    HTH,

    Stephen
     
    Stephen L., May 27, 2004
    #6
  7. David Mathog

    Dan Pop Guest



    For indexing purposes, any integer type does the job:

    6.5.2.1 Array subscripting

    Constraints

    1 One of the expressions shall have type ``pointer to object type'',
    the other expression shall have integer type, and the result
    has type ``type''.

    Dan
     
    Dan Pop, May 27, 2004
    #7
  8. David Mathog

    Mike Wahler Guest



    If the range is sufficient (and usage is valid), 'int'
    will work. But 'size_t' is specifically guaranteed to
    be able to represent the largest possible size object
    (and by corollary, the largest possible number of (byte-sized)
    objects.) Also, since 'size_t' is an unsigned type, sometimes
    one might need a signed type like 'int', if one needs to
    index 'backward' from a point other than the beginning of
    an array. (I've never needed to do this, but I suppose
    certain applications might).

    I always use 'size_t', then I need not be concerned about
    whether 'int' is (or will be) sufficient if my code changes
    later.

    -Mike
     
    Mike Wahler, May 27, 2004
    #8
  9. David Mathog

    David Mathog Guest

    On 27 May 2004 16:25:23 GMT
    Good advice. The answer is in 6.17 where it says:

    (snip)
    Although this technique is attractive (and was used in old editions of the book Numerical Recipes in C), it does not conform to the C standards. Pointer arithmetic is defined only as long as the pointer points within the same allocated block of memory, or to the imaginary ``terminating'' element one past it; otherwise, the behavior is undefined, even if the pointer is not dereferenced. The code above could fail if, while subtracting the offset, an illegal address were generated (perhaps because the address tried to ``wrap around'' past the beginning of some memory segment).

    References: K&R2 Sec. 5.3 p. 100, Sec. 5.4 pp. 102-3, Sec. A7.7 pp. 205-6
    ANSI Sec. 3.3.6
    ISO Sec. 6.3.6
    Rationale Sec. 3.2.2.3

    By "one past it" is the FAQ referring to both ends of the
    memory block or just the "high" end? Either way this would be
    ok (calculates an address "one after it"):

    int *p;
    int *plast;
    int sum;
    p=malloc(100*sizeof(int));
    plast=&(p[99]);
    /* code which stores values into those 100 positions */
    for(sum=0; p<=plast; p++){ sum += *p; }

    but this might not be ok (calculates an address "one before it"),
    change last line only of previous to:

    for(sum=0; p<=plast; plast--){ sum += *plast; }

    Thanks,

    David Mathog

    Manager, Sequence Analysis Facility, Biology Division, Caltech
     
    David Mathog, May 27, 2004
    #9
  10. because size_t is designated to be large enough to hold the maximum size of
    an object, hence you can guarantee to access every member of any
    concievable array. An int may be smaller than size_t (and is, on some
    popular implementations).
    K&R != ISO Standard.
     
    Mark McIntyre, May 27, 2004
    #10
  11. one past is (IMHO) unambiguous. If it included one before, it would say
    "one past or before"...
     
    Mark McIntyre, May 27, 2004
    #11
  12. David Mathog

    Old Wolf Guest

    The standard permits implementations to bounds-check their arrays.
    I have heard of some which offer this as a debug option.
     
    Old Wolf, May 28, 2004
    #12
  13. David Mathog

    Mike Wahler Guest

    the book Numerical Recipes in C), it does not conform to the C standards.
    Pointer arithmetic is defined only as long as the pointer points within
    the same allocated block of memory, or to the imaginary ``terminating''
    element one past it; otherwise, the behavior is undefined, even if the
    pointer is not dereferenced. The code above could fail if, while
    subtracting the offset, an illegal address were generated (perhaps because
    the address tried to ``wrap around'' past the beginning of some memory
    segment).
    Just the 'high' end. (That is, the highest address).
    Nit:
    More idiomatic is:

    p = malloc(100 * sizeof *p);

    If you later change the type of 'p', this line
    need not be changed. Also don't forget to
    check return value of 'malloc()'.
    Is definitely not "OK".
    Not only is pointing before the start of the array
    invalid, even if it were valid, your loop would
    be 'infinite' ( p<=plast would never go false (barring
    some platform-specific 'wrapping').

    -Mike
     
    Mike Wahler, May 28, 2004
    #13
  14. David Mathog

    Dan Pop Guest

    1. Using size_t may be wasteful. Precisely because it is supposed to be
    large enough to cover the largest object supported by the
    implementation.

    2. As you also pointed out, using an unsigned type may be ocasionally
    inconvenient. I prefer to use them exclusively for bit manipulation
    purposes, unless I have a *real* need for the extended range.

    3. Using an unknown type may also be ocasionally inconvenient. You don't
    even know whether size_t gets promoted to int by the integral
    promotions ;-)

    Far too often, it is possible to tell whether int will do the job or not.
    If it does the job, there is NO point in using any other type.

    Dan
     
    Dan Pop, May 28, 2004
    #14
  15. David Mathog

    David Mathog Guest

    Interesting.

    Can anybody explain the reason why the standard
    makes 1 above allocated memory special but 1 below not?
    Doing that destroys the symmetry of loops controlled by pointer
    comparisons (as shown above). What is gained by this
    restriction? Does this have something to do with the extra
    data that (at least on some platforms) malloc stores just
    before the allocated memory?

    To make a legal decrementing loop controlled by pointer
    comparison something like this must be used instead:

    for(sum=0; ; plast--){
    sum += *plast;
    if(p==plast)break;
    }

    The incrementing case can be rewritten in this format too,
    so the format is symmetric as long as the pointer test is
    not inside the for().

    Conversely, if an index is used instead the simple for() form
    is symmetric, ie:

    for(i=0; i<=99; i++){ sum += p; };

    and

    for(i=99; i>=0; i--){ sum += p; };

    Thanks,

    David Mathog

    Manager, Sequence Analysis Facility, Biology Division, Caltech
     
    David Mathog, May 28, 2004
    #15
  16. David Mathog

    CBFalconer Guest

    Because, for an array of large items, one before could be a
    considerable distance (rather than one byte). Also, if the array
    is the first item in a segment, any distance before can easily be
    an illegal address and cause traps.
    Assuming array a to be scanned, try:

    p = &a[0] + sizeof(a)/sizeof(*p); /* one past */
    sum = 0;
    do {
    sum += *(--tp);
    } while (tp > &a[0]);
     
    CBFalconer, May 28, 2004
    #16
  17. David Mathog

    David Mathog Guest

    IMHO it is ambiguous. We can probably agree that "one above" and "one below" are unambiguous since these refer to specific memory locations.

    However "one past" is ambiguous since the definition
    of "past" is vector in nature and the direction of that vector
    is not otherwise specified in the FAQ. A pointer could
    be either incrementing up through a memory block
    or decrementing down through it. In the latter case "one past"
    may still be applied (at least grammatically, if not in C code)
    and in this case "one past" would generally be understood to
    refer to the position immediately below the memory block being traversed.

    Ie, if you were directed to "the house adjacent to and one past
    the blue Victorian on Elm street" your direction of travel on that
    street would determine which of the two possible houses fitting
    this description is your destination.

    Regards,

    David Mathog

    Manager, Sequence Analysis Facility, Biology Division, Caltech
     
    David Mathog, Jun 1, 2004
    #17
  18. David Mathog

    Eric Sosman Guest

    The direction *is* specified in the FAQ, Question 6.17.
    The specification is implicit, true: 6.17 says that trying
    to form a pointer to an imaginary [-1] element is unreliable,
    so negative-going "past" is ruled out. What directions remain?
    Sideways, at right angles to the progression of memory addresses?
    (Anybody want to submit a C09 proposal for pointer-plus-complex?
    Or pointer-times-pointer, yielding the cross product? ;-)
     
    Eric Sosman, Jun 1, 2004
    #18
  19. The definition of past is not ambiguous. Check a dictionary - the adverbial
    and prepositional meanings of "past" all relate to after, beyond etc.
    and an array is a vector. It points upwards. Hence its easy to define
    "past"
    true, but irrelevant, as in either case it means the one beyond. Only a
    pathological use would attempt to refer to the point before.
     
    Mark McIntyre, Jun 1, 2004
    #19
  20. David Mathog

    David Mathog Guest

    You're right, it does define the direction implicitly, both by
    stating that the example, containing a [-1], violates the standard and later in "while subtracting the offset". Which still doesn't
    negate the point that using "one above" in that sentence would
    make it unambiguous without reference to anything else,
    but "one past" is ambiguous without the context of
    the implicit information elsewhere in the question.

    The rest of that sentence ("perhaps because the address
    tried to ``wrap around'' past the beginning of some memory segment")
    raises another question. If a block of memory extends up to the
    highest possible address in the system (for instance, memory location 65535 on a system with 16 bit memory space and 16 bit unsigned
    pointers) then the pointer value "one past" the allocated block
    would be 0, and would "wrap around" exactly as described in the
    FAQ 6.17 for the other direction.

    in ANSI C address 0 (NULL) is special, is address -1 (top of memory)
    also special?

    This must come up on microcontrollers and other similar small
    computing devices. (Yes, those are usually programmed in assembler
    but there are C compilers for them too.)

    Regards,

    David Mathog

    Manager, Sequence Analysis Facility, Biology Division, Caltech
     
    David Mathog, Jun 2, 2004
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.