What does the standard say about array access wraparound?

D

David Mathog

If this:

int i,sum;
int *array;
for(sum=0, i=0; i<len; i++){
sum += array;
}

is converted to this (never mind why for the moment):

int i,sum;
int *array;
int *arrl;
arl=&array[-len];
for(sum=0,i=len; i<2*len; i++){
sum += arrl;
}

it should give the same result. But there are some funny
things that can happen. For instance, if &array is 1000 and
len is 100000. In that case arrl will hold an address
(1000-100000) which presumably wraps around since the
pointer should be an unsigned int (whatever size int is).
The address it points to will be MAX_POINTER - 100000 + 1000.
When the second form loop loop begins i=len (100000) so
arrl[100000] will wrap back around and point to the same
place as array[0].

Or will it?

It seems possible that this sort of array access "off the top of
memory" could trigger a fault.

What does the C standard say about this (if anything)?

Thanks,

David Mathog
(e-mail address removed)
Manager, Sequence Analysis Facility, Biology Division, Caltech
 
M

Mike Wahler

David Mathog said:
If this:

int i,sum;
int *array;

Note that the object you've named 'array' is
*not* an array, it's a pointer. Also note that
you've not given this pointer a valid value,
so evaluating it will produce undefined behavior.
for(sum=0, i=0; i<len; i++){

You haven't defined 'len'.
Also note that if you want to use them to index
(or record the size of) an array, the objects 'i'
and 'len' should be of type 'size_t', not 'int'.
sum += array;


Undefined behavior.
}

is converted to this (never mind why for the moment):

int i,sum;
int *array;
int *arrl;

Two pointers. No arrays.
arl=&array[-len];

Undefined behavior. Even if the pointer named 'array'
had a valid value, the behavior is still undefined.
The only valid indices are those which refer within
the same object.
for(sum=0,i=len; i<2*len; i++){
sum += arrl;
}

it should give the same result.


Well, yes, undefined == undefined.
But there are some funny
things that can happen.

That's essentially what 'undefined behavior' is.
Another 'funny thing' that might happen is that
your keyboard could emit 100,000 volts.
For instance, if &array is 1000 and
len is 100000. In that case arrl will hold an address
(1000-100000) which presumably wraps around

There is no such thing as 'wrapping around' of arrays in C.
since the
pointer should be an unsigned int (whatever size int is).

A pointer is *not* an integer. It's a pointer. Its
representation and structure is implementation-specific.

The address it points to will be MAX_POINTER

There's no such thing as 'MAX_POINTER' in C.
- 100000 + 1000.
When the second form loop loop begins i=len (100000) so
arrl[100000] will wrap back around and point to the same
place as array[0].

Or will it?

Yes. No. Maybe. Sometimes. Never. Only on Wednesdays
when it rains in Tokyo. Your code produces undefined behavior.
It seems possible that this sort of array access "off the top of
memory" could trigger a fault.

The fault is with your code.
What does the C standard say about this (if anything)?

It says your code is not valid C.

Which C book(s) are you reading?

-Mike
 
D

Dan Pop

In said:
If this:

int i,sum;
int *array;
for(sum=0, i=0; i<len; i++){
sum += array;
}

is converted to this (never mind why for the moment):

int i,sum;
int *array;
int *arrl;
arl=&array[-len];
for(sum=0,i=len; i<2*len; i++){
sum += arrl;
}

it should give the same result.


Do yourself a favour and read the FAQ. Don't come back until you've
finished it!

Dan
 
C

Case

Mike said:
David Mathog said:
for(sum=0, i=0; i<len; i++){

Also note that if you want to use them to index
(or record the size of) an array, the objects 'i'
and 'len' should be of type 'size_t', not 'int'.
sum += array;



Huh, why should a variable used as array index be of
type 'size_t'? K&R-2nd/ANSI uses 'int' quite often
to index arrays.

Case
 
D

Default User

David Mathog wrote:

[a bunch of crazy bad code with undefined behavior]


I suggest to thoroughly learn the language before trying this sort of
whacky stuff. Once you become familiar with using arrays and pointers,
then you'll also find you don't need to ask these questions, you'll
already have the answers.




Brian Rodenborn
 
S

Stephen L.

David said:
If this:

int i,sum;
int *array;
for(sum=0, i=0; i<len; i++){
sum += array;
}

is converted to this (never mind why for the moment):

int i,sum;
int *array;
int *arrl;
arl=&array[-len];
for(sum=0,i=len; i<2*len; i++){
sum += arrl;
}

it should give the same result. But there are some funny
things that can happen. For instance, if &array is 1000 and
len is 100000. In that case arrl will hold an address
(1000-100000) which presumably wraps around since the
pointer should be an unsigned int (whatever size int is).
The address it points to will be MAX_POINTER - 100000 + 1000.
When the second form loop loop begins i=len (100000) so
arrl[100000] will wrap back around and point to the same
place as array[0].

Or will it?

It seems possible that this sort of array access "off the top of
memory" could trigger a fault.

What does the C standard say about this (if anything)?

Thanks,

David Mathog
(e-mail address removed)
Manager, Sequence Analysis Facility, Biology Division, Caltech



This is a little bit better of a starting point...

int
main (void)
{
int the_array[100]; /* Note, we're not initializing the
array... */
int i,sum;
int *array = the_array; /* same as ... = &the_array[ 0 ]; */
int len = sizeof (the_array) / sizeof (the_array[ 0 ]);

for(sum=0, i=0; i<len; i++){
sum += array;
}

return (0);
}

Now, the C language does _not_ keep track of
array bounds; arrays decay to a pointer (to
the 1st element of the object). As such, "indexing"
through an array in C will not "wrap-around"
to the beginning/end when either the end/beginning
is reached. It is the programmer's responsibility
to keep track of his array(s) and their bounds.

I believe standard (I can't cite the exact paragraph
- para-phrased) says that if the result of any
pointer calculation is not within the object,
the results are undefined.

For example, using the above -

array[ -1 ]

would be undefined. There are some languages
that would produce a nice run-time diagnostic
for such a reference; C is not one of them.
A reference like the above may do nothing
meaningful (it may access memory not part of the
object), or produce a "memory violation" error.
It could do both in the same program, at different
times. Those are just a couple of examples of
"undefined behavior".

Having said all of that, you can do something like -


int
main (void)
{
int the_array[100]; /* Note, we're not initializing the
array... */
int i,sum;
int len = sizeof (the_array) / sizeof (the_array[ 0 ]);
int *array = &the_array[ len - 1 ];


for(sum=0, i=0; i < len; i++){
sum += array[ -i ];
}

return (0);
}

Personally, I avoid constructs like the above
because it "looks" like I'm advancing through the
array instead of going from the end to the beginning.

I'm not sure how you arrived at the "translation"
you provided, though.


HTH,

Stephen
 
D

Dan Pop

In said:
Mike said:
David Mathog said:
for(sum=0, i=0; i<len; i++){

Also note that if you want to use them to index
(or record the size of) an array, the objects 'i'
and 'len' should be of type 'size_t', not 'int'.
sum += array;



Huh, why should a variable used as array index be of
type 'size_t'? K&R-2nd/ANSI uses 'int' quite often
to index arrays.


For indexing purposes, any integer type does the job:

6.5.2.1 Array subscripting

Constraints

1 One of the expressions shall have type ``pointer to object type'',
the other expression shall have integer type, and the result
has type ``type''.

Dan
 
M

Mike Wahler

Case said:
Mike said:
David Mathog said:
for(sum=0, i=0; i<len; i++){

Also note that if you want to use them to index
(or record the size of) an array, the objects 'i'
and 'len' should be of type 'size_t', not 'int'.
sum += array;



Huh, why should a variable used as array index be of
type 'size_t'? K&R-2nd/ANSI uses 'int' quite often
to index arrays.


If the range is sufficient (and usage is valid), 'int'
will work. But 'size_t' is specifically guaranteed to
be able to represent the largest possible size object
(and by corollary, the largest possible number of (byte-sized)
objects.) Also, since 'size_t' is an unsigned type, sometimes
one might need a signed type like 'int', if one needs to
index 'backward' from a point other than the beginning of
an array. (I've never needed to do this, but I suppose
certain applications might).

I always use 'size_t', then I need not be concerned about
whether 'int' is (or will be) sufficient if my code changes
later.

-Mike
 
D

David Mathog

On 27 May 2004 16:25:23 GMT
Do yourself a favour and read the FAQ. Don't come back until you've
finished it!

Good advice. The answer is in 6.17 where it says:

(snip)
Although this technique is attractive (and was used in old editions of the book Numerical Recipes in C), it does not conform to the C standards. Pointer arithmetic is defined only as long as the pointer points within the same allocated block of memory, or to the imaginary ``terminating'' element one past it; otherwise, the behavior is undefined, even if the pointer is not dereferenced. The code above could fail if, while subtracting the offset, an illegal address were generated (perhaps because the address tried to ``wrap around'' past the beginning of some memory segment).

References: K&R2 Sec. 5.3 p. 100, Sec. 5.4 pp. 102-3, Sec. A7.7 pp. 205-6
ANSI Sec. 3.3.6
ISO Sec. 6.3.6
Rationale Sec. 3.2.2.3

By "one past it" is the FAQ referring to both ends of the
memory block or just the "high" end? Either way this would be
ok (calculates an address "one after it"):

int *p;
int *plast;
int sum;
p=malloc(100*sizeof(int));
plast=&(p[99]);
/* code which stores values into those 100 positions */
for(sum=0; p<=plast; p++){ sum += *p; }

but this might not be ok (calculates an address "one before it"),
change last line only of previous to:

for(sum=0; p<=plast; plast--){ sum += *plast; }

Thanks,

David Mathog
(e-mail address removed)
Manager, Sequence Analysis Facility, Biology Division, Caltech
 
M

Mark McIntyre

Huh, why should a variable used as array index be of
type 'size_t'?

because size_t is designated to be large enough to hold the maximum size of
an object, hence you can guarantee to access every member of any
concievable array. An int may be smaller than size_t (and is, on some
popular implementations).
K&R-2nd/ANSI uses 'int' quite often to index arrays.

K&R != ISO Standard.
 
M

Mark McIntyre

By "one past it" is the FAQ referring to both ends of the
memory block or just the "high" end?

one past is (IMHO) unambiguous. If it included one before, it would say
"one past or before"...
 
O

Old Wolf

Stephen L. said:
Now, the C language does _not_ keep track of
array bounds;

For example, using the above -

array[ -1 ]

would be undefined. There are some languages
that would produce a nice run-time diagnostic
for such a reference; C is not one of them.

The standard permits implementations to bounds-check their arrays.
I have heard of some which offer this as a debug option.
 
M

Mike Wahler

David Mathog said:
On 27 May 2004 16:25:23 GMT


Good advice. The answer is in 6.17 where it says:

(snip)
Although this technique is attractive (and was used in old editions of
the book Numerical Recipes in C), it does not conform to the C standards.
Pointer arithmetic is defined only as long as the pointer points within
the same allocated block of memory, or to the imaginary ``terminating''
element one past it; otherwise, the behavior is undefined, even if the
pointer is not dereferenced. The code above could fail if, while
subtracting the offset, an illegal address were generated (perhaps because
the address tried to ``wrap around'' past the beginning of some memory
segment).
References: K&R2 Sec. 5.3 p. 100, Sec. 5.4 pp. 102-3, Sec. A7.7 pp. 205-6
ANSI Sec. 3.3.6
ISO Sec. 6.3.6
Rationale Sec. 3.2.2.3

By "one past it" is the FAQ referring to both ends of the
memory block or just the "high" end?

Just the 'high' end. (That is, the highest address).
Either way this would be
ok (calculates an address "one after it"):

int *p;
int *plast;
int sum;
p=malloc(100*sizeof(int));

Nit:
More idiomatic is:

p = malloc(100 * sizeof *p);

If you later change the type of 'p', this line
need not be changed. Also don't forget to
check return value of 'malloc()'.
plast=&(p[99]);
/* code which stores values into those 100 positions */
for(sum=0; p<=plast; p++){ sum += *p; }

but this might not be ok

Is definitely not "OK".
(calculates an address "one before it"),
change last line only of previous to:

for(sum=0; p<=plast; plast--){ sum += *plast; }

Not only is pointing before the start of the array
invalid, even if it were valid, your loop would
be 'infinite' ( p<=plast would never go false (barring
some platform-specific 'wrapping').

-Mike
 
D

Dan Pop

In said:
I always use 'size_t', then I need not be concerned about
whether 'int' is (or will be) sufficient if my code changes
later.

1. Using size_t may be wasteful. Precisely because it is supposed to be
large enough to cover the largest object supported by the
implementation.

2. As you also pointed out, using an unsigned type may be ocasionally
inconvenient. I prefer to use them exclusively for bit manipulation
purposes, unless I have a *real* need for the extended range.

3. Using an unknown type may also be ocasionally inconvenient. You don't
even know whether size_t gets promoted to int by the integral
promotions ;-)

Far too often, it is possible to tell whether int will do the job or not.
If it does the job, there is NO point in using any other type.

Dan
 
D

David Mathog

David Mathog said:
On 27 May 2004 16:25:23 GMT
plast=&(p[99]);
/* code which stores values into those 100 positions */
for(sum=0; p<=plast; p++){ sum += *p; }

but this might not be ok

Is definitely not "OK".
(calculates an address "one before it"),
change last line only of previous to:

for(sum=0; p<=plast; plast--){ sum += *plast; }

Interesting.

Can anybody explain the reason why the standard
makes 1 above allocated memory special but 1 below not?
Doing that destroys the symmetry of loops controlled by pointer
comparisons (as shown above). What is gained by this
restriction? Does this have something to do with the extra
data that (at least on some platforms) malloc stores just
before the allocated memory?

To make a legal decrementing loop controlled by pointer
comparison something like this must be used instead:

for(sum=0; ; plast--){
sum += *plast;
if(p==plast)break;
}

The incrementing case can be rewritten in this format too,
so the format is symmetric as long as the pointer test is
not inside the for().

Conversely, if an index is used instead the simple for() form
is symmetric, ie:

for(i=0; i<=99; i++){ sum += p; };

and

for(i=99; i>=0; i--){ sum += p; };

Thanks,

David Mathog
(e-mail address removed)
Manager, Sequence Analysis Facility, Biology Division, Caltech
 
C

CBFalconer

David said:
.... snip ...

Can anybody explain the reason why the standard makes 1 above
allocated memory special but 1 below not? Doing that destroys
the symmetry of loops controlled by pointer comparisons (as
shown above). What is gained by this restriction? Does this
have something to do with the extra data that (at least on
some platforms) malloc stores just before the allocated memory?

Because, for an array of large items, one before could be a
considerable distance (rather than one byte). Also, if the array
is the first item in a segment, any distance before can easily be
an illegal address and cause traps.
To make a legal decrementing loop controlled by pointer
comparison something like this must be used instead:

for(sum=0; ; plast--){
sum += *plast;
if(p==plast)break;
}

Assuming array a to be scanned, try:

p = &a[0] + sizeof(a)/sizeof(*p); /* one past */
sum = 0;
do {
sum += *(--tp);
} while (tp > &a[0]);
 
D

David Mathog

one past is (IMHO) unambiguous. If it included one before, it would say
"one past or before"...

IMHO it is ambiguous. We can probably agree that "one above" and "one below" are unambiguous since these refer to specific memory locations.

However "one past" is ambiguous since the definition
of "past" is vector in nature and the direction of that vector
is not otherwise specified in the FAQ. A pointer could
be either incrementing up through a memory block
or decrementing down through it. In the latter case "one past"
may still be applied (at least grammatically, if not in C code)
and in this case "one past" would generally be understood to
refer to the position immediately below the memory block being traversed.

Ie, if you were directed to "the house adjacent to and one past
the blue Victorian on Elm street" your direction of travel on that
street would determine which of the two possible houses fitting
this description is your destination.

Regards,

David Mathog
(e-mail address removed)
Manager, Sequence Analysis Facility, Biology Division, Caltech
 
E

Eric Sosman

David said:
However "one past" is ambiguous since the definition
of "past" is vector in nature and the direction of that vector
is not otherwise specified in the FAQ. [...]

The direction *is* specified in the FAQ, Question 6.17.
The specification is implicit, true: 6.17 says that trying
to form a pointer to an imaginary [-1] element is unreliable,
so negative-going "past" is ruled out. What directions remain?
Sideways, at right angles to the progression of memory addresses?
(Anybody want to submit a C09 proposal for pointer-plus-complex?
Or pointer-times-pointer, yielding the cross product? ;-)
 
M

Mark McIntyre

IMHO it is ambiguous. ....
"one past" is ambiguous since the definition

The definition of past is not ambiguous. Check a dictionary - the adverbial
and prepositional meanings of "past" all relate to after, beyond etc.
of "past" is vector in nature

and an array is a vector. It points upwards. Hence its easy to define
"past"
Ie, if you were directed to "the house adjacent to and one past
the blue Victorian on Elm street" your direction of travel on that
street would determine which of the two possible houses fitting
this description is your destination.

true, but irrelevant, as in either case it means the one beyond. Only a
pathological use would attempt to refer to the point before.
 
D

David Mathog

David said:
However "one past" is ambiguous since the definition
of "past" is vector in nature and the direction of that vector
is not otherwise specified in the FAQ. [...]

The direction *is* specified in the FAQ, Question 6.17.
The specification is implicit, true: 6.17 says that trying
to form a pointer to an imaginary [-1] element is unreliable,
so negative-going "past" is ruled out.

You're right, it does define the direction implicitly, both by
stating that the example, containing a [-1], violates the standard and later in "while subtracting the offset". Which still doesn't
negate the point that using "one above" in that sentence would
make it unambiguous without reference to anything else,
but "one past" is ambiguous without the context of
the implicit information elsewhere in the question.

The rest of that sentence ("perhaps because the address
tried to ``wrap around'' past the beginning of some memory segment")
raises another question. If a block of memory extends up to the
highest possible address in the system (for instance, memory location 65535 on a system with 16 bit memory space and 16 bit unsigned
pointers) then the pointer value "one past" the allocated block
would be 0, and would "wrap around" exactly as described in the
FAQ 6.17 for the other direction.

in ANSI C address 0 (NULL) is special, is address -1 (top of memory)
also special?

This must come up on microcontrollers and other similar small
computing devices. (Yes, those are usually programmed in assembler
but there are C compilers for them too.)

Regards,

David Mathog
(e-mail address removed)
Manager, Sequence Analysis Facility, Biology Division, Caltech
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top