UDB and pointer increments and decrements

Discussion in 'C Programming' started by Richard, Sep 23, 2008.

  1. Richard

    Richard Guest

    I'm still battling with this causing UDB:

    while(e-- > s);

    if s points to the start of a string and e becomes less than s then e is
    not really pointing to defined char. Fine.

    But UDB?

    Yes, e has an UDV (undefined value) but would this really cause a
    program to misbehave? In any platfrom? Remember this value of e is never
    used again in this case.

    I ask because theoretically s can be pointing to the middle of a bigger
    string. We then call a function with s as a parameter.

    The function called can have no idea that s is the pointer to a middle
    string. therefore it can have no idea how to "do undefined things" when
    e is decremented past the start of s. e and s are strictly char *s. It
    would be so "not C" if the compiler generated code to check the contents
    pointed to do determine the range of the object to the middle of which s
    points. I mean then we may as well have array limits and exceptions
    built into the language.

    I'm not being difficult here. Explain how this works. My problem (and I
    admit its a problem) is that i feel too much of C is being elevated to
    an almost ADA type status and (in this group) C is losing that "down and
    dirty and efficient" feeling which it is famous for.
     
    Richard, Sep 23, 2008
    #1
    1. Advertising

  2. Richard<> writes:

    > I'm still battling with this causing UDB:
    >
    > while(e-- > s);
    >
    > if s points to the start of a string and e becomes less than s then e is
    > not really pointing to defined char. Fine.
    >
    > But UDB?
    >
    > Yes, e has an UDV (undefined value) but would this really cause a
    > program to misbehave? In any platfrom? Remember this value of e is never
    > used again in this case.


    1/ C has 4 levels of definition (defined behavior, implementation --
    includede locale -- defined behavior, unspecified behavior, undefined
    behavior), no more. Spending effort to try and classify undefined behavior
    more finely is probably not worthwhile. And it seems to be that's what you
    want, having different rules for the undefined value created by
    decrementing a pointer and all they others. There is precedence (the
    similar past of end of an array pointer comes immediately to mind), but
    your's would be more limited than that one (or you'd have got opposition
    from DOS folk as allowing them in comparison would have constrained them a
    lot, probably limitting the size of an object to 32767 instead of 65535
    they have got).

    2/ Optimizers tend to use undefined behavior in creative way. For example,
    things like value propagation can optimize out the then part of the if in
    this code:

    if (i == INT_MAX) {
    do something not modifying i;
    }
    ++i;

    (reasonning: as incrementing i is an overflow if i is INT_MAX, so it would
    be undefined behavior is that was the case, then the optimizer can assume i
    isn't INT_MAX, the result of the comparison is false). Optimizations like
    this is one of the reasons for which undefined behavior can be non causal
    (you just have to be sure that the code causing the undefined behavior
    would have been executed). And note that optimized do such propagation to
    more than the current function, they potentially can even do it for the
    whole program, and that's the way they are heading.

    Yours,

    --
    Jean-Marc
     
    Jean-Marc Bourguet, Sep 23, 2008
    #2
    1. Advertising

  3. Richard

    Guest

    Richard wrote:
    > I'm still battling with this causing UDB:
    >
    > while(e-- > s);
    >
    > if s points to the start of a string and e becomes less than s then e is
    > not really pointing to defined char. Fine.
    >
    > But UDB?
    >
    > Yes, e has an UDV (undefined value) but would this really cause a
    > program to misbehave? In any platfrom? Remember this value of e is never
    > used again in this case.
    >
    > I ask because theoretically s can be pointing to the middle of a bigger
    > string. We then call a function with s as a parameter.
    > The function called can have no idea that s is the pointer to a middle
    > string. therefore it can have no idea how to "do undefined things" when
    > e is decremented past the start of s. e and s are strictly char *s. It
    > would be so "not C" if the compiler generated code to check the contents
    > pointed to do determine the range of the object to the middle of which s
    > points. I mean then we may as well have array limits and exceptions
    > built into the language.


    It's too late - the language that makes the behavior undefined was
    inserted into the standard precisely for the purpose of allowing (but
    not mandating) array limit checks. In order to make array limit
    checks mandatory, the behavior could not be undefined - it would have
    to be either standard-defined or implementation-defined. Because the
    behavior is undefined, an implementation is currently free to deal
    with array limits by ignoring them.

    Permitting array limit checks was done, in part, because there were
    (and are) real implementations that perform them. On some machines,
    such checks are built into the hardware; avoiding them would require
    software emulation. In other cases, the checks are performed in
    software.

    I know of at least two ways of implementing pointers that make array
    limit checks feasible: fat pointers, and segmented memory. Of course,
    in both cases this means that pointer values cannot be correctly
    understood by treating them as simple numbers, which might be a
    conceptual hurdle for you. If you need an explanation of how those
    techniques work, I can provide it.

    More subtly, an implementation can use the existence of code that
    might, under certain circumstances, have undefined behavior, to
    justify optimizations of related code that will fail only under
    exactly those same circumstances. As a result, the actual catastrophic
    failure might occur while executing code other than the code whose
    execution makes the behavior undefined. But I've already explained
    that possibility in more detail in another message.
     
    , Sep 23, 2008
    #3
  4. Richard

    Richard Guest

    writes:

    > Richard wrote:
    >> I'm still battling with this causing UDB:
    >>
    >> while(e-- > s);
    >>
    >> if s points to the start of a string and e becomes less than s then e is
    >> not really pointing to defined char. Fine.
    >>
    >> But UDB?
    >>
    >> Yes, e has an UDV (undefined value) but would this really cause a
    >> program to misbehave? In any platfrom? Remember this value of e is never
    >> used again in this case.
    >>
    >> I ask because theoretically s can be pointing to the middle of a bigger
    >> string. We then call a function with s as a parameter.
    >> The function called can have no idea that s is the pointer to a middle
    >> string. therefore it can have no idea how to "do undefined things" when
    >> e is decremented past the start of s. e and s are strictly char *s. It
    >> would be so "not C" if the compiler generated code to check the contents
    >> pointed to do determine the range of the object to the middle of which s
    >> points. I mean then we may as well have array limits and exceptions
    >> built into the language.

    >
    > It's too late - the language that makes the behavior undefined was
    > inserted into the standard precisely for the purpose of allowing (but
    > not mandating) array limit checks. In order to make array limit


    That makes sense. Thanks.

    > checks mandatory, the behavior could not be undefined - it would have
    > to be either standard-defined or implementation-defined. Because the
    > behavior is undefined, an implementation is currently free to deal
    > with array limits by ignoring them.


    And them remaining undefined? Unspecified would have been better surely?

    >
    > Permitting array limit checks was done, in part, because there were
    > (and are) real implementations that perform them. On some machines,
    > such checks are built into the hardware; avoiding them would require
    > software emulation. In other cases, the checks are performed in
    > software.
    >
    > I know of at least two ways of implementing pointers that make array
    > limit checks feasible: fat pointers, and segmented memory. Of course,
    > in both cases this means that pointer values cannot be correctly
    > understood by treating them as simple numbers, which might be a
    > conceptual hurdle for you. If you need an explanation of how those
    > techniques work, I can provide it.


    I know about segmented memory. I have written oodles of VGA libraries
    for the, in x86 using the various addressing modes. The point that has
    been totally taken out of context is that these segmented
    representations are STILL represented as numbers in my debugger. Nothing
    more nothing less. Yes i call them numbers. Addresses. Numbers.

    >
    > More subtly, an implementation can use the existence of code that
    > might, under certain circumstances, have undefined behavior, to
    > justify optimizations of related code that will fail only under
    > exactly those same circumstances. As a result, the actual catastrophic
    > failure might occur while executing code other than the code whose
    > execution makes the behavior undefined. But I've already explained
    > that possibility in more detail in another message.
    >


    I appreciate the time you have taken to explain. I would still love
    someone to explain the case I asked about above though. The one where s
    is pointing into the middle of an array. Or did you and I didn't
    understand?
     
    Richard, Sep 23, 2008
    #4
  5. Richard<> writes:
    > I'm still battling with this causing UDB:
    >
    > while(e-- > s);
    >
    > if s points to the start of a string and e becomes less than s then e is
    > not really pointing to defined char. Fine.
    >
    > But UDB?


    A small note: You're the only person I've ever seen refer to undefined
    behavior as "UDB". Most posters here (at least those who choose to
    abbreviate it) refer to it as "UB". Why do you feel the need to
    invent your own abbreviation when there's already a perfectly good one
    in widespread use? (One could argue that "UB" could also mean
    unspecified behavior, but i've never seen it used that way, and it's
    generally clear enough from the context.)

    Yes, the behavior is undefined, simply because the standard doesn't
    define the behavior. That's all "undefined behavior" means.

    > Yes, e has an UDV (undefined value) but would this really cause a
    > program to misbehave? In any platfrom? Remember this value of e is never
    > used again in this case.


    Yes. I don't have a real-world example, but if the containing object
    happens to be allocated at the beginning of a memory segment, it could
    easily blow up. And, as has been mentioned elsethread, a compiler is
    allowed to *assume* that undefined behavior does not occur, and
    perform code transformations based on that assumption (after all, if
    the behavior is already undefined, it can't make things worse); that
    may be a more realistic risk for most modern systems.

    > I ask because theoretically s can be pointing to the middle of a bigger
    > string. We then call a function with s as a parameter.


    Undefined behavior occurs if a pointer is decremented past the
    beginning of an array object, not if it's decremented past the initial
    value of a function parameter. Given this:

    char s[100];

    char *func(char *ptr) { return ptr - 1; }

    calling func(s+10) has well-defined behavior, but calling func(s) has
    undefined behavior. (I haven't compiled the above, so there may be
    some dumb mistakes.)

    > The function called can have no idea that s is the pointer to a middle
    > string.


    Right.

    > therefore it can have no idea how to "do undefined things" when
    > e is decremented past the start of s. e and s are strictly char *s.


    It doesn't deliberately "do undefined things"; that's not the point.
    The point is that the standard doesn't define what it does. In my
    example above, I'm thinking of a hypothetical system on which
    constructing the pointer value s-1 causes a hardware trap (because s
    is allocated at the beginning of a segment, and the hardware
    "decrement address" instruction traps in this case). The code
    generated for the body of the function has no awareness of this.

    For example, assume an implementation on which signed integer overflow
    causes a trap.

    int func(int n) { return n + 1; }

    func(42) has well-defined behavior, and returns 43. func(INT_MAX) has
    undefined behavior, and (on this particular implementation) causes a
    trap (or does something arbitrarily strange if an optimizing compiler
    rearranges code based on the assumption that no UB occurs). The
    function has no awareness of this; it just returns the result of n +
    1.

    > It
    > would be so "not C" if the compiler generated code to check the contents
    > pointed to do determine the range of the object to the middle of which s
    > points. I mean then we may as well have array limits and exceptions
    > built into the language.


    The compiler is *allowed* to perform such checks, but it's not
    required to. That's why the behavior is undefined, rather than being
    defined to do whatever a failing check would do.

    > I'm not being difficult here. Explain how this works. My problem (and I
    > admit its a problem) is that i feel too much of C is being elevated to
    > an almost ADA type status and (in this group) C is losing that "down and
    > dirty and efficient" feeling which it is famous for.


    (It's "Ada", not "ADA".)

    C loses none of its "down and dirty and efficient" feeling because of
    this. In fact, the generated code can gain in efficiency because the
    compiler is allowed to trust the user to avoid undefined behavior and
    to perform aggressive optimization based on that assumption.

    A C implementation that does exactly what you seem to expect it to do
    (treat addresses as simple integers, allow arbitrary addresses to be
    computed, etc.) would be conforming. An implementation that performs
    aggressive bounds checking can also be conforming.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Sep 23, 2008
    #5
  6. Richard<> writes:
    > writes:

    [...]
    >> checks mandatory, the behavior could not be undefined - it would have
    >> to be either standard-defined or implementation-defined. Because the
    >> behavior is undefined, an implementation is currently free to deal
    >> with array limits by ignoring them.

    >
    > And them remaining undefined? Unspecified would have been better surely?


    Better how?

    Unspecified behavior is "use of an unspecified value, or other
    behavior where this International Standard provides two or more
    possibilities and imposes no further requirements on which is chosen
    in any instance".

    For the behavior of, for example, attempting to access an array
    outside its bounds to be unspecified rather than undefined, the
    standard would have to provide a number of possible behaviors, and
    anything other than one of those behaviors would be non-conforming.

    Suppose I have an array object declared within a function, and I write
    to element -1 of that array. I could clobber nearly anything,
    including the function's stored return address or some other vital
    piece of information. How would you restrict the possible
    consequences of that to "two or more possibilities"?

    [snip]

    > I appreciate the time you have taken to explain. I would still love
    > someone to explain the case I asked about above though. The one where s
    > is pointing into the middle of an array. Or did you and I didn't
    > understand?


    See my other recent response in this thread.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Sep 23, 2008
    #6
  7. Richard

    Flash Gordon Guest

    Richard wrote, On 23/09/08 16:44:
    > I'm still battling with this causing UDB:
    >
    > while(e-- > s);
    >
    > if s points to the start of a string and e becomes less than s then e is
    > not really pointing to defined char. Fine.
    >
    > But UDB?


    <snip>

    > I'm not being difficult here. Explain how this works. My problem (and I
    > admit its a problem) is that i feel too much of C is being elevated to
    > an almost ADA type status and (in this group) C is losing that "down and
    > dirty and efficient" feeling which it is famous for.


    Myself and another poster suggested an object starting at the beginning
    of a page or segment and *hardware* that traps on trying to decrement to
    before the start of the page/segment. No software checks need be involved!
    --
    Flash Gordon
    If spamming me sent it to
    If emailing me use my reply-to address
    See the comp.lang.c Wiki hosted by me at http://clc-wiki.net/
     
    Flash Gordon, Sep 23, 2008
    #7
  8. Richard

    Guest

    Richard wrote:
    > writes:
    >
    > > Richard wrote:
    > >> I'm still battling with this causing UDB:
    > >>
    > >> while(e-- > s);
    > >>
    > >> if s points to the start of a string and e becomes less than s then e is
    > >> not really pointing to defined char. Fine.
    > >>
    > >> But UDB?
    > >>
    > >> Yes, e has an UDV (undefined value) but would this really cause a
    > >> program to misbehave? In any platfrom? Remember this value of e is never
    > >> used again in this case.
    > >>
    > >> I ask because theoretically s can be pointing to the middle of a bigger
    > >> string. We then call a function with s as a parameter.


    The behavior is only undefined if e points at the beginning of an
    array, or points at a different array than s points at. For the code
    you posted on the "Highly efficient string reversal code" thread, it's
    quite likely that s does point at the beginning of an array; if it
    does, and that array contains a zero-length string, then e will also
    end up pointing at the start of the array.

    However, in that same code, if s starts out pointing in the middle of
    a bigger string, then e won't end up pointing at the beginning of the
    array. And no one said anything to suggest to you that the behavior
    would be undefined when that was the case.

    In this thread, you chose to start a new discussion without cross
    referencing the old one, and without giving any context for "while(e--
    > s)". Using only the information you've provided on this thread,

    it's not possible to derive the fact that a) e points at the same
    array as s and b) that e does not point at the beginning of an array.

    > >> The function called can have no idea that s is the pointer to a middle
    > >> string. therefore it can have no idea how to "do undefined things" when
    > >> e is decremented past the start of s. e and s are strictly char *s. It
    > >> would be so "not C" if the compiler generated code to check the contents
    > >> pointed to do determine the range of the object to the middle of which s
    > >> points. I mean then we may as well have array limits and exceptions
    > >> built into the language.

    > >
    > > It's too late - the language that makes the behavior undefined was
    > > inserted into the standard precisely for the purpose of allowing (but
    > > not mandating) array limit checks. In order to make array limit

    >
    > That makes sense. Thanks.
    >
    > > checks mandatory, the behavior could not be undefined - it would have
    > > to be either standard-defined or implementation-defined. Because the
    > > behavior is undefined, an implementation is currently free to deal
    > > with array limits by ignoring them.

    >
    > And them remaining undefined? Unspecified would have been better surely?


    I'm not quite sure whether you're talking about prohibiting array
    limit checks, or mandating them. Making the behavior of such pointer
    arithmetic undefined, as is currently the case, neither mandates nor
    prohibits array limit checks.

    If the behavior is unspecified, the standard must, at least
    implicitly, provide a range of permitted behaviors. Depending upon
    what that range is, it could mandate array limit checks, for instance,
    by requiring that an unspecified signal be raise()d.

    Alternatively, it could also prohibit array limit checks, by saying
    moving a pointer beyond it's valid range produces a pointer to an
    unspecified but valid location, and that writing through such a
    pointer value has no effect. Note that this would prohibit array limit
    checks only in the sense that they would be invisible to the user; I
    see no way of implementing such a requirement without the
    implementation performing array limit checks to determine whether or
    not a write is required to have no effect. Notice that if the location
    is unspecified, and writes were actually permitted to have an effect,
    then the consequences would be pretty much indistinguishable from
    undefined behavior. Being allowed to write to an arbitrary memory
    location can have arbitrarily bad consequences on many
    implementations.

    > I appreciate the time you have taken to explain. I would still love
    > someone to explain the case I asked about above though. The one where s
    > is pointing into the middle of an array. Or did you and I didn't
    > understand?


    I'll use a segmented architecture as an example, since you're familiar
    with the concepts. Consider the possibility that e points at the
    beginning of an array. That array might have been allocated at the
    beginning of a memory segment. As a built-in feature of the hardware,
    or as the result of code generated by the compiler, any attempt to
    decrement a pointer that already points at the beginning of a segment
    could cause the program to abort. Consider the possibility that e and
    s point into different memory segments. As a built-in feature of the
    hardware, or as the result of code generated by the compiler, any
    attempt to compare pointers into different memory segments for order
    (<, >, <=, >=) might cause the program to abort. A implementation that
    produces such behavior can be perfectly conforming so long as it makes
    sure to never allocate different parts of the same object in different
    memory segments.
     
    , Sep 23, 2008
    #8
  9. Richard

    Old Wolf Guest

    On Sep 24, 3:44 am, Richard<> wrote:
    > I'm still battling with this causing UDB:
    >
    > while(e-- > s);
    >
    > if s points to the start of a string and e becomes less than s then e is
    > not really pointing to defined char. Fine.


    What if the string is at the very start of
    the address space? Where does 'e' point after
    decrementing it?

    There are CPUs or MMUs that will trap upon
    loading of an obviously bogus pointer such
    as this one that doesn't even describe a
    memory location that exists.
     
    Old Wolf, Sep 23, 2008
    #9
  10. Richard

    James Kuyper Guest

    Rosario wrote:
    ....
    > for what i can see for this group the speaking time of varios "UB"
    > (undefinite behaviours) is more time consuming that programming


    That's because undefined (not "undefinite") behavior is the single most
    serious kind of problem C code can have. It's also because most code
    that people bring to this group because they're having problems with it,
    has undefined behavior. That's a selection effect; syntax errors and
    constraint violations are easily caught by the compiler; the programs
    that actually compile and fail tend to have subtler problems, usually
    involving undefined behavior.
     
    James Kuyper, Sep 26, 2008
    #10
  11. Richard

    Tim Rentsch Guest

    writes:

    > Richard wrote:
    > > I'm still battling with this causing UDB:
    > >
    > > while(e-- > s);
    > >
    > > if s points to the start of a string and e becomes less than s then e is
    > > not really pointing to defined char. Fine.
    > >
    > > But UDB?
    > >
    > > Yes, e has an UDV (undefined value) but would this really cause a
    > > program to misbehave? In any platfrom? Remember this value of e is never
    > > used again in this case.
    > >
    > > I ask because theoretically s can be pointing to the middle of a bigger
    > > string. We then call a function with s as a parameter.
    > > The function called can have no idea that s is the pointer to a middle
    > > string. therefore it can have no idea how to "do undefined things" when
    > > e is decremented past the start of s. e and s are strictly char *s. It
    > > would be so "not C" if the compiler generated code to check the contents
    > > pointed to do determine the range of the object to the middle of which s
    > > points. I mean then we may as well have array limits and exceptions
    > > built into the language.

    >
    > It's too late - the language that makes the behavior undefined was
    > inserted into the standard precisely for the purpose of allowing (but
    > not mandating) array limit checks. [...]


    Nonsense. Allowing a pointer to be decremented to before the
    start of an array is still compatible with doing array limit
    checks, just as allowing a pointer to be incremented past the end
    of an array is compatible with doing array limit checks.
    The rationale document makes clear that decrementing a pointer
    to before the start of an array was rejected because it would
    impose overly burdensome requirements on implementations.
    Array limit checks are equally possible whether e-- is allowed
    or not.
     
    Tim Rentsch, Oct 9, 2008
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Zeid Karadsheh

    Connecting to DB2 UDB v7.x

    Zeid Karadsheh, Jun 25, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    944
    Zeid Karadsheh
    Jun 25, 2003
  2. =?Utf-8?B?RC5TYXJhdmFuYW4=?=

    DB_E_BADPROPERTYVALUE error when connecting to AS400 db2 udb

    =?Utf-8?B?RC5TYXJhdmFuYW4=?=, Jun 18, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    488
    =?Utf-8?B?RC5TYXJhdmFuYW4=?=
    Jun 18, 2004
  3. Jagster
    Replies:
    0
    Views:
    349
    Jagster
    Aug 6, 2003
  4. Nicolla MacPherson
    Replies:
    4
    Views:
    396
    Karl Heinz Buchegger
    Aug 13, 2003
  5. Sudip Kundu
    Replies:
    0
    Views:
    114
    Sudip Kundu
    Jun 9, 2008
Loading...

Share This Page