Why pointer to "one past" is allowed but pointer to "one before" is not ?

Discussion in 'C Programming' started by spibou@gmail.com, Jun 23, 2006.

  1. Guest

    Why is a pointer allowed to point to one position past
    the end of an array but not to one position before the
    beginning of an array ? Is there any reason why the
    former is more useful than the later ?

    Spiros Bousbouras
     
    , Jun 23, 2006
    #1
    1. Advertising

  2. Re: Why pointer to "one past" is allowed but pointer to "one before"is not ?

    wrote:
    > Why is a pointer allowed to point to one position past
    > the end of an array but not to one position before the
    > beginning of an array ? Is there any reason why the
    > former is more useful than the later ?

    Consider code such as

    char *str = somestring;
    while(*str++) {
    ...
    }

    str might end up one place past somestring - nice to allow that.
     
    =?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?=, Jun 23, 2006
    #2
    1. Advertising

  3. Guest

    Nils O. Selåsdal wrote:

    > wrote:
    > > Why is a pointer allowed to point to one position past
    > > the end of an array but not to one position before the
    > > beginning of an array ? Is there any reason why the
    > > former is more useful than the later ?

    > Consider code such as
    >
    > char *str = somestring;
    > while(*str++) {
    > ...
    > }
    >
    > str might end up one place past somestring - nice to allow that.


    Yes it is. My question was why the "opposite" is not allowed too.
    One could have just as easily something like
    while (source >= beg_of_string) *dest++ = *source-- ;
    to copy a string in reverse for example.

    Does my example evoke undefined behaviour by the way ?

    Spiros Bousbouras
     
    , Jun 23, 2006
    #3
  4. Marc Boyer Guest

    Le 23-06-2006, <> a écrit :
    > Why is a pointer allowed to point to one position past
    > the end of an array but not to one position before the
    > beginning of an array ? Is there any reason why the
    > former is more useful than the later ?


    More useful, yes if you agree than there are more
    increasing loop than decreasing ones.

    But I believe the real reason is that it is easier to
    implement on hardware: you just have to waste 1
    memory adress, that is to say, your processor
    can adress from 0 up to 2^N-1, then, if all
    data are stored bewteen 0 and 2^N-2, then,
    'one position past' is at worst 2^N-1, which is
    a valid adress for your processor, and pointer
    arithmetic still apply.

    But, 'one position before' is harder. You can not
    have any bound on the size of the reserved memory
    at the beginning. Because if an object of size S
    is stored at adress N, then, &S+1 is just one char
    after the space used to store S, but &S-1 is
    'sizeof(S)' char before...

    It's a bit hard to explain without any blackboard,
    and I am not very good at ASCII art.

    Marc Boyer
     
    Marc Boyer, Jun 23, 2006
    #4
  5. Richard Bos Guest

    wrote:

    > Why is a pointer allowed to point to one position past
    > the end of an array but not to one position before the
    > beginning of an array ?


    Because a pointer one past any array need only take a single byte (since
    only the address of the _first byte_ of the virtual member need be
    valid, not any further ones), but a pointer one before the beginning
    requires the assignment of memory space the size of an entire array
    member. Given that the array member can be a humungous struct containing
    arrays of structs of arrays of long doubles, this can cost a lot of
    address space that could otherwise be gainfully employed.

    Richard
     
    Richard Bos, Jun 23, 2006
    #5
  6. In article <4all.nl>,
    Richard Bos <> wrote:

    >> Why is a pointer allowed to point to one position past
    >> the end of an array but not to one position before the
    >> beginning of an array ?


    >Because a pointer one past any array need only take a single byte (since
    >only the address of the _first byte_ of the virtual member need be
    >valid, not any further ones), but a pointer one before the beginning
    >requires the assignment of memory space the size of an entire array
    >member.


    That's one reason, but I think a much more compelling one was that
    there was lots of existing code that did things like

    for(p=proc; p<procNPROC; p++)

    and very little that did the reverse.

    -- Richard
     
    Richard Tobin, Jun 23, 2006
    #6
  7. Re: Why pointer to "one past" is allowed but pointer to "one before"is not ?

    wrote:
    > Nils O. Selåsdal wrote:
    >
    >> wrote:
    >>> Why is a pointer allowed to point to one position past
    >>> the end of an array but not to one position before the
    >>> beginning of an array ? Is there any reason why the
    >>> former is more useful than the later ?

    >> Consider code such as
    >>
    >> char *str = somestring;
    >> while(*str++) {
    >> ...
    >> }
    >>
    >> str might end up one place past somestring - nice to allow that.

    >
    > Yes it is. My question was why the "opposite" is not allowed too.


    It's much,*much* more common to iterate this way, over the other way,
    and probably was when the spec made :)
     
    =?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?=, Jun 23, 2006
    #7
  8. Re: Why pointer to "one past" is allowed but pointer to "one before"is not ?

    wrote:
    > ...
    > Why is a pointer allowed to point to one position past
    > the end of an array but not to one position before the
    > beginning of an array ? Is there any reason why the
    > former is more useful than the later ?
    > ...


    There are several different reasons for that. One of them is described
    below.

    The storage is normally filled with the objects from smaller addresses
    to larger addresses (i.e. in the same direction in which array indices
    grow). For this reason, it is not unusual to have an object that resides
    close to the beginning of the storage. To create a "before" pointer (and
    properly support all pointer operations) for such an object might be
    either impossible or unjustifiably difficult (since such a pointer would
    have to point somewhere before the beginning of the storage). "Beginning
    of the storage" in this case does not necessarily stand for the
    beginning of physical memory. On a hardware platform with
    segmented-memory the beginning of a segment has similar properties.

    --
    Best regards,
    Andrey Tarasevich
     
    Andrey Tarasevich, Jun 23, 2006
    #8
  9. Re: Why pointer to "one past" is allowed but pointer to "one before"is not ?

    wrote:
    > ...
    > Yes it is. My question was why the "opposite" is not allowed too.
    > One could have just as easily something like
    > while (source >= beg_of_string) *dest++ = *source-- ;
    > to copy a string in reverse for example.
    >
    > Does my example evoke undefined behaviour by the way ?
    > ...


    Formally, it does lead to UB, since it attempts to create a "one before"
    pointer.

    The problem with your code in its nature is similar to the problem with
    the following code

    unsigned i;
    ...
    while (i >= 0) dest = source[i--];

    Note that an unsigned value will never be negative and the loop will
    never end.

    Essentially the same thing can happen in case of a pointer. If we think
    consider pointers (addresses) as arithmetic values, they are unsigned.
    Imagine that your 'beg_of_string' pointer points to address 0. How do
    you expect you loop to end in this case? How do you expect to represent
    a pointer that is less than '0'?

    --
    Best regards,
    Andrey Tarasevich
     
    Andrey Tarasevich, Jun 23, 2006
    #9
  10. On Fri, 23 Jun 2006 05:40:43 -0700, spibou wrote:

    >
    > Nils O. Selåsdal wrote:
    >
    >> wrote:
    >> > Why is a pointer allowed to point to one position past the end of an
    >> > array but not to one position before the beginning of an array ? Is
    >> > there any reason why the former is more useful than the later ?

    >> Consider code such as
    >>
    >> char *str = somestring;
    >> while(*str++) {
    >> ...
    >> }
    >> }
    >> str might end up one place past somestring - nice to allow that.

    >
    > Yes it is. My question was why the "opposite" is not allowed too. One
    > could have just as easily something like while (source >= beg_of_string)
    > *dest++ = *source-- ; to copy a string in reverse for example.
    >
    > Does my example evoke undefined behaviour by the way ?
    >


    Yes. And to see a real world example of such a program failing because of
    this (not that undefined means it must fail), try this compiler

    http://fabrice.bellard.free.fr/tcc/

    w/ your code, using the -b switch (bounds checker). I never heeded the
    standard on this point until I started using TCC to improve my code
    portability.

    I must say there are some circumstances where it is indeed desirable to
    iterate backwards. For example, I had to tweak many places in a memory
    pool library because I would iterate backwards from a given pointer
    reading bookkeeping information until I hit a terminator bit. Took me hours to
    figure out why my program would crash using TCC:

    /*
    * Beginning from *p, work backwards reconstructing the value of an
    * rbitsint_t integer. Stop when the highest order bit of *p is set, which
    * should have been previously preserved as a marker. Return the
    * reconstructed value, setting *end to the last position used of p.
    */
    static inline rbitsint_t rbits_get(unsigned char *p, unsigned char **end) {
    rbitsint_t i = 0; /* currently typedef to size_t */
    int n = 0;

    do {
    i |= (*p & ~(1 << (CHAR_BIT - 1))) << (n++ * (CHAR_BIT - 1));
    } while (!(*(p--) & (1 << (CHAR_BIT - 1))));

    *end = p + 1;

    return i;
    } /* rbits_get() */
     
    William Ahern, Jun 23, 2006
    #10
  11. On 23 Jun 2006 05:40:43 -0700, in comp.lang.c ,
    wrote:

    >
    >Yes it is. My question was why the "opposite" is not allowed too.


    Imagine your object was at the *very start* of memory. one-before
    would be nowhere.

    Going the other way is no problem - the abstract machine has infinite
    memory so there is no 'very end'...

    --
    Mark McIntyre

    "Debugging is twice as hard as writing the code in the first place.
    Therefore, if you write the code as cleverly as possible, you are,
    by definition, not smart enough to debug it."
    --Brian Kernighan
     
    Mark McIntyre, Jun 23, 2006
    #11
  12. Re: Why pointer to "one past" is allowed but pointer to "onebefore" is not ?

    Mark McIntyre <> writes:
    > On 23 Jun 2006 05:40:43 -0700, in comp.lang.c ,
    > wrote:
    >>Yes it is. My question was why the "opposite" is not allowed too.

    >
    > Imagine your object was at the *very start* of memory. one-before
    > would be nowhere.
    >
    > Going the other way is no problem - the abstract machine has infinite
    > memory so there is no 'very end'...


    The abstract machine has no "very start" of memory either, and its
    memory is limited to 2**(CHAR_BIT * sizeof(void*)) bytes.

    The purpose of the rule is to avoid problems on real-world machines
    with finite address spaces. To allow a pointer just past the end of
    an array, an implementation, at most, has to allocate one extra byte.
    To allow a pointer just before the beginning of an array, it might
    have to allocate enough space for an entire array element, which can
    be almost arbitrarily large. (It may not have to allocate any memory
    if to can form an address for memory that doesn't exist, but the
    standard allows for the possibility that it does have to do so.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, Jun 24, 2006
    #12
  13. Skarmander Guest

    Re: Why pointer to "one past" is allowed but pointer to "one before"is not ?

    William Ahern wrote:
    > On Fri, 23 Jun 2006 05:40:43 -0700, spibou wrote:
    >
    >> Nils O. Selåsdal wrote:
    >>
    >>> wrote:
    >>>> Why is a pointer allowed to point to one position past the end of an
    >>>> array but not to one position before the beginning of an array ? Is
    >>>> there any reason why the former is more useful than the later ?
    >>> Consider code such as
    >>>
    >>> char *str = somestring;
    >>> while(*str++) {
    >>> ...
    >>> }
    >>> }
    >>> str might end up one place past somestring - nice to allow that.

    >> Yes it is. My question was why the "opposite" is not allowed too. One
    >> could have just as easily something like while (source >= beg_of_string)
    >> *dest++ = *source-- ; to copy a string in reverse for example.
    >>
    >> Does my example evoke undefined behaviour by the way ?
    >>

    >
    > Yes. And to see a real world example of such a program failing because of
    > this (not that undefined means it must fail), try this compiler
    >
    > http://fabrice.bellard.free.fr/tcc/
    >
    > w/ your code, using the -b switch (bounds checker). I never heeded the
    > standard on this point until I started using TCC to improve my code
    > portability.
    >
    > I must say there are some circumstances where it is indeed desirable to
    > iterate backwards. For example, I had to tweak many places in a memory
    > pool library because I would iterate backwards from a given pointer
    > reading bookkeeping information until I hit a terminator bit. Took me hours to
    > figure out why my program would crash using TCC:
    >
    > /*
    > * Beginning from *p, work backwards reconstructing the value of an
    > * rbitsint_t integer. Stop when the highest order bit of *p is set, which
    > * should have been previously preserved as a marker. Return the
    > * reconstructed value, setting *end to the last position used of p.
    > */
    > static inline rbitsint_t rbits_get(unsigned char *p, unsigned char **end) {
    > rbitsint_t i = 0; /* currently typedef to size_t */
    > int n = 0;
    >
    > do {
    > i |= (*p & ~(1 << (CHAR_BIT - 1))) << (n++ * (CHAR_BIT - 1));


    Two problems with this line.

    First, ~(1 << (CHAR_BIT - 1)) is a questionable expression. 1 << (CHAR_BIT -
    1) is a signed integer, to which ~ is applied. The value of the result is
    implementation-defined. This will happen to work here because *p is at most
    as wide as the bitmask, so the irrelevant bits will be masked out and the
    actual value of the expression doesn't matter. Still, this is a habit best
    unlearned. Use U<FOO>_MAX >> 1 for the all-ones-except-the-MSB bitmask,
    where U<FOO>_MAX can be defined if necessary as ((unsigned foo) -1).

    Second, (*p & ~(1 << (CHAR_BIT - 1))) is an int. You then shift this int by
    (n++ * (CHAR_BIT - 1)), which has potential for undefined behavior since it
    can exceed the width of an int. What you want is to shift an rbitsint_t, not
    an int. (Unlike the previous issue, this can be a real problem -- try
    typedef'ing rbitsint_t as a 64-bit type on a 32-bit architecture and reading
    more than 32 value bits into it to see what I mean.)

    This should fix both issues:
    i |= (rbitsint_t) (*p & (UCHAR_MAX >> 1)) << (n++ * (CHAR_BIT - 1));

    > } while (!(*(p--) & (1 << (CHAR_BIT - 1))));
    >
    > *end = p + 1;
    >
    > return i;
    > } /* rbits_get() */
    >

    Not inspired by the overflow observation, but here's my take:

    #define LOMASK (UCHAR_MAX >> 1)
    #define MSB (LOMASK + 1)

    static inline rbitsint_t rbits_get(unsigned char *p, unsigned char **end) {
    rbitsint_t i;

    unsigned char *q = p;
    while (!(*q & MSB)) --q;
    *end = q;

    i = *q & LOMASK;
    while (++q <= p) {
    i = i << (CHAR_BIT - 1) | *q;
    }
    return i;
    }

    Yes, it's more lines; yes, I go backwards just to go forwards again. If the
    operations involved are expensive, this is not a good idea, but here the
    operations aren't expensive. I personally find this easier to read, and for
    my compiler and my machine (YMMV) it's actually faster.

    Of course, I assume you've rewritten your code since discovering the
    one-before-the-beginning bug, possibly even along these lines, so my point
    may be moot.

    S.
     
    Skarmander, Jun 24, 2006
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mr. SweatyFinger
    Replies:
    2
    Views:
    1,989
    Smokey Grindel
    Dec 2, 2006
  2. Sushil

    pointer one past malloc.ed memory

    Sushil, Jul 11, 2004, in forum: C Programming
    Replies:
    11
    Views:
    540
    Chris Torek
    Aug 11, 2004
  3. Old Wolf

    Subtracting 0 from one-past-end pointer

    Old Wolf, Nov 17, 2004, in forum: C Programming
    Replies:
    7
    Views:
    368
    Dan Pop
    Nov 17, 2004
  4. BigMan
    Replies:
    8
    Views:
    372
    Ben Pope
    Jul 30, 2005
  5. Iñaki Baz Castillo
    Replies:
    13
    Views:
    505
    Iñaki Baz Castillo
    May 1, 2011
Loading...

Share This Page