pointers and beginnings of arrays

T

Tim Smith

I seem to have lost my copy of the ANSI C standard. Is this code legal?

/* p points somewhere in a string that is known to contain
an 'a' at or before where p points. We want to change
this 'a' to a 'b'. */
while ( *p-- != 'a' )
;
*++p = 'b';

If the only 'a' at or before the initial place p points to is the first
character of the string, the last p-- will be doing -- on a pointer to
the first character of the string, but it is then incremented with ++
before being used.

Another example. Again, p points to a string, this time known to
contain an 'a', but this time at or after where p points. To change it
to a 'b':

--p;
while ( *++p != 'a' )
;
*p = 'b';

If p started at the beginning of the string, this would --p to before
the string, then immediately ++p back into the string.

I seem to remember that you can't take the address of elements of an
array before the 0th element, but I don't recall if using -- on a
pointer counts as taking the address of something.
 
F

Flash Gordon

Tim Smith wrote, On 23/05/07 07:37:
I seem to have lost my copy of the ANSI C standard. Is this code legal?

I refer the honourable gentleman to http://clc-wiki.net/ where, in the
obvious place, you will find pointers to free downloads of various
drafts of the C standard.

I seem to remember that you can't take the address of elements of an
array before the 0th element, but I don't recall if using -- on a
pointer counts as taking the address of something.

You are not allowed to calculate an address before the start of the
array. You are allowed to calculate the address one passed the end
because the standard gives you special dispensation, although you
obviously cannot dereference it.

Not allowed and cannot being informal terms indicating if you try it
invokes undefined behaviour.
 
C

Chris Torek

I seem to have lost my copy of the ANSI C standard. Is this code legal?

Indeed, in the case you describe, it is not:
/* p points somewhere in a string that is known to contain
an 'a' at or before where p points. We want to change
this 'a' to a 'b'. */
while ( *p-- != 'a' )
;
*++p = 'b';

If the only 'a' at or before the initial place p points to is the first
character of the string, the last p-- will be doing -- on a pointer to
the first character of the string, but it is then incremented with ++
before being used.

This is technically undefined, although it tends to work in practice
on "real systems".

You can re-code the above as:

while (*p != 'a')
p--;
*p = 'b';

to remove the bug.
Another example. Again, p points to a string, this time known to
contain an 'a', but this time at or after where p points. To change it
to a 'b':

--p;
while ( *++p != 'a' )
;
*p = 'b';

If p started at the beginning of the string, this would --p to before
the string, then immediately ++p back into the string.

Again, technically illegal.
I seem to remember that you can't take the address of elements of an
array before the 0th element, but I don't recall if using -- on a
pointer counts as taking the address of something.

It does.

The second example can be recoded as the obvious:

while (*p != 'a')
p++;
*p = 'b';

or the less-obvious (but still legal, provided there is some 'a' in
range):

while (*p++ != 'a')
continue;
*--p = 'b';

This second one is legal, despite going "one past the end" in some
cases, because "going one past the end" is explicitly allowed, so
that loops like:

for (p = arr, i = 0; i < n; p++, i++)
... operate on *p ...

remained defined in C89. On those rare "real systems" on which
out of bounds pointer arithmetic is actually trapped, the cost of
making "one past the end" work is generally one byte (or one machine
word) of extra storage at the end of a segment, while the cost of
making "one before the start" work would have been "as many bytes
(or machine words) as needed based on sizeof *p".

(This falls out naturally since, if sizeof *p is 1000, "p++" turns
into "add #1000,reg" and "p--" turns into "sub #1000,reg", all
assuming the machine works in C-bytes natively of course. When
reg is near the end of the segment, pointing to the last valid
object, adding 1000 puts it one byte past the last valid object --
so the implementation has to sneak in an extra "pad" byte at the
end of the segment -- but when reg is right at the beginning,
pointing to the first valid object, subtracting 1000 puts it 1000
bytes before the start, so the implementation would have had to
insert 1000 pad bytes at the front of the segment.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,434
Messages
2,571,691
Members
48,796
Latest member
Greg L.

Latest Threads

Top