Why pointer to "one past" is allowed but pointer to "one before" is not ?

S

spibou

Why is a pointer allowed to point to one position past
the end of an array but not to one position before the
beginning of an array ? Is there any reason why the
former is more useful than the later ?

Spiros Bousbouras
 
?

=?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?=

Why is a pointer allowed to point to one position past
the end of an array but not to one position before the
beginning of an array ? Is there any reason why the
former is more useful than the later ?
Consider code such as

char *str = somestring;
while(*str++) {
...
}

str might end up one place past somestring - nice to allow that.
 
S

spibou

Nils said:
Consider code such as

char *str = somestring;
while(*str++) {
...
}

str might end up one place past somestring - nice to allow that.

Yes it is. My question was why the "opposite" is not allowed too.
One could have just as easily something like
while (source >= beg_of_string) *dest++ = *source-- ;
to copy a string in reverse for example.

Does my example evoke undefined behaviour by the way ?

Spiros Bousbouras
 
M

Marc Boyer

Le 23-06-2006 said:
Why is a pointer allowed to point to one position past
the end of an array but not to one position before the
beginning of an array ? Is there any reason why the
former is more useful than the later ?

More useful, yes if you agree than there are more
increasing loop than decreasing ones.

But I believe the real reason is that it is easier to
implement on hardware: you just have to waste 1
memory adress, that is to say, your processor
can adress from 0 up to 2^N-1, then, if all
data are stored bewteen 0 and 2^N-2, then,
'one position past' is at worst 2^N-1, which is
a valid adress for your processor, and pointer
arithmetic still apply.

But, 'one position before' is harder. You can not
have any bound on the size of the reserved memory
at the beginning. Because if an object of size S
is stored at adress N, then, &S+1 is just one char
after the space used to store S, but &S-1 is
'sizeof(S)' char before...

It's a bit hard to explain without any blackboard,
and I am not very good at ASCII art.

Marc Boyer
 
R

Richard Bos

Why is a pointer allowed to point to one position past
the end of an array but not to one position before the
beginning of an array ?

Because a pointer one past any array need only take a single byte (since
only the address of the _first byte_ of the virtual member need be
valid, not any further ones), but a pointer one before the beginning
requires the assignment of memory space the size of an entire array
member. Given that the array member can be a humungous struct containing
arrays of structs of arrays of long doubles, this can cost a lot of
address space that could otherwise be gainfully employed.

Richard
 
R

Richard Tobin

Why is a pointer allowed to point to one position past
the end of an array but not to one position before the
beginning of an array ?
[/QUOTE]
Because a pointer one past any array need only take a single byte (since
only the address of the _first byte_ of the virtual member need be
valid, not any further ones), but a pointer one before the beginning
requires the assignment of memory space the size of an entire array
member.

That's one reason, but I think a much more compelling one was that
there was lots of existing code that did things like

for(p=proc; p<procNPROC; p++)

and very little that did the reverse.

-- Richard
 
?

=?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?=

Yes it is. My question was why the "opposite" is not allowed too.

It's much,*much* more common to iterate this way, over the other way,
and probably was when the spec made :)
 
A

Andrey Tarasevich

...
Why is a pointer allowed to point to one position past
the end of an array but not to one position before the
beginning of an array ? Is there any reason why the
former is more useful than the later ?
...

There are several different reasons for that. One of them is described
below.

The storage is normally filled with the objects from smaller addresses
to larger addresses (i.e. in the same direction in which array indices
grow). For this reason, it is not unusual to have an object that resides
close to the beginning of the storage. To create a "before" pointer (and
properly support all pointer operations) for such an object might be
either impossible or unjustifiably difficult (since such a pointer would
have to point somewhere before the beginning of the storage). "Beginning
of the storage" in this case does not necessarily stand for the
beginning of physical memory. On a hardware platform with
segmented-memory the beginning of a segment has similar properties.
 
A

Andrey Tarasevich

...
Yes it is. My question was why the "opposite" is not allowed too.
One could have just as easily something like
while (source >= beg_of_string) *dest++ = *source-- ;
to copy a string in reverse for example.

Does my example evoke undefined behaviour by the way ?
...

Formally, it does lead to UB, since it attempts to create a "one before"
pointer.

The problem with your code in its nature is similar to the problem with
the following code

unsigned i;
...
while (i >= 0) dest = source[i--];

Note that an unsigned value will never be negative and the loop will
never end.

Essentially the same thing can happen in case of a pointer. If we think
consider pointers (addresses) as arithmetic values, they are unsigned.
Imagine that your 'beg_of_string' pointer points to address 0. How do
you expect you loop to end in this case? How do you expect to represent
a pointer that is less than '0'?
 
W

William Ahern

Yes it is. My question was why the "opposite" is not allowed too. One
could have just as easily something like while (source >= beg_of_string)
*dest++ = *source-- ; to copy a string in reverse for example.

Does my example evoke undefined behaviour by the way ?

Yes. And to see a real world example of such a program failing because of
this (not that undefined means it must fail), try this compiler

http://fabrice.bellard.free.fr/tcc/

w/ your code, using the -b switch (bounds checker). I never heeded the
standard on this point until I started using TCC to improve my code
portability.

I must say there are some circumstances where it is indeed desirable to
iterate backwards. For example, I had to tweak many places in a memory
pool library because I would iterate backwards from a given pointer
reading bookkeeping information until I hit a terminator bit. Took me hours to
figure out why my program would crash using TCC:

/*
* Beginning from *p, work backwards reconstructing the value of an
* rbitsint_t integer. Stop when the highest order bit of *p is set, which
* should have been previously preserved as a marker. Return the
* reconstructed value, setting *end to the last position used of p.
*/
static inline rbitsint_t rbits_get(unsigned char *p, unsigned char **end) {
rbitsint_t i = 0; /* currently typedef to size_t */
int n = 0;

do {
i |= (*p & ~(1 << (CHAR_BIT - 1))) << (n++ * (CHAR_BIT - 1));
} while (!(*(p--) & (1 << (CHAR_BIT - 1))));

*end = p + 1;

return i;
} /* rbits_get() */
 
M

Mark McIntyre

Yes it is. My question was why the "opposite" is not allowed too.

Imagine your object was at the *very start* of memory. one-before
would be nowhere.

Going the other way is no problem - the abstract machine has infinite
memory so there is no 'very end'...

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
K

Keith Thompson

Mark McIntyre said:
Imagine your object was at the *very start* of memory. one-before
would be nowhere.

Going the other way is no problem - the abstract machine has infinite
memory so there is no 'very end'...

The abstract machine has no "very start" of memory either, and its
memory is limited to 2**(CHAR_BIT * sizeof(void*)) bytes.

The purpose of the rule is to avoid problems on real-world machines
with finite address spaces. To allow a pointer just past the end of
an array, an implementation, at most, has to allocate one extra byte.
To allow a pointer just before the beginning of an array, it might
have to allocate enough space for an entire array element, which can
be almost arbitrarily large. (It may not have to allocate any memory
if to can form an address for memory that doesn't exist, but the
standard allows for the possibility that it does have to do so.)
 
S

Skarmander

William said:
Yes. And to see a real world example of such a program failing because of
this (not that undefined means it must fail), try this compiler

http://fabrice.bellard.free.fr/tcc/

w/ your code, using the -b switch (bounds checker). I never heeded the
standard on this point until I started using TCC to improve my code
portability.

I must say there are some circumstances where it is indeed desirable to
iterate backwards. For example, I had to tweak many places in a memory
pool library because I would iterate backwards from a given pointer
reading bookkeeping information until I hit a terminator bit. Took me hours to
figure out why my program would crash using TCC:

/*
* Beginning from *p, work backwards reconstructing the value of an
* rbitsint_t integer. Stop when the highest order bit of *p is set, which
* should have been previously preserved as a marker. Return the
* reconstructed value, setting *end to the last position used of p.
*/
static inline rbitsint_t rbits_get(unsigned char *p, unsigned char **end) {
rbitsint_t i = 0; /* currently typedef to size_t */
int n = 0;

do {
i |= (*p & ~(1 << (CHAR_BIT - 1))) << (n++ * (CHAR_BIT - 1));

Two problems with this line.

First, ~(1 << (CHAR_BIT - 1)) is a questionable expression. 1 << (CHAR_BIT -
1) is a signed integer, to which ~ is applied. The value of the result is
implementation-defined. This will happen to work here because *p is at most
as wide as the bitmask, so the irrelevant bits will be masked out and the
actual value of the expression doesn't matter. Still, this is a habit best
unlearned. Use U<FOO>_MAX >> 1 for the all-ones-except-the-MSB bitmask,
where U<FOO>_MAX can be defined if necessary as ((unsigned foo) -1).

Second, (*p & ~(1 << (CHAR_BIT - 1))) is an int. You then shift this int by
(n++ * (CHAR_BIT - 1)), which has potential for undefined behavior since it
can exceed the width of an int. What you want is to shift an rbitsint_t, not
an int. (Unlike the previous issue, this can be a real problem -- try
typedef'ing rbitsint_t as a 64-bit type on a 32-bit architecture and reading
more than 32 value bits into it to see what I mean.)

This should fix both issues:
i |= (rbitsint_t) (*p & (UCHAR_MAX >> 1)) << (n++ * (CHAR_BIT - 1));
} while (!(*(p--) & (1 << (CHAR_BIT - 1))));

*end = p + 1;

return i;
} /* rbits_get() */
Not inspired by the overflow observation, but here's my take:

#define LOMASK (UCHAR_MAX >> 1)
#define MSB (LOMASK + 1)

static inline rbitsint_t rbits_get(unsigned char *p, unsigned char **end) {
rbitsint_t i;

unsigned char *q = p;
while (!(*q & MSB)) --q;
*end = q;

i = *q & LOMASK;
while (++q <= p) {
i = i << (CHAR_BIT - 1) | *q;
}
return i;
}

Yes, it's more lines; yes, I go backwards just to go forwards again. If the
operations involved are expensive, this is not a good idea, but here the
operations aren't expensive. I personally find this easier to read, and for
my compiler and my machine (YMMV) it's actually faster.

Of course, I assume you've rewritten your code since discovering the
one-before-the-beginning bug, possibly even along these lines, so my point
may be moot.

S.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top