possibly undefined behavior

Discussion in 'C Programming' started by Mark, Jun 18, 2009.

  1. Mark

    Mark Guest

    Hello

    does this code invoke UB?

    int func(int i)
    {
    return (i * i);
    }

    int main(void)
    {
    int i = 2;

    i = func(i);
    return 0;
    }

    GCC with "-pedantic -W -Wall -Wextra" says nothing though.

    --
    Mark
    Mark, Jun 18, 2009
    #1
    1. Advertising

  2. Mark

    James Kuyper Guest

    Mark wrote:
    > Hello
    >
    > does this code invoke UB?
    >
    > int func(int i)
    > {
    > return (i * i);
    > }
    >
    > int main(void)
    > {
    > int i = 2;
    >
    > i = func(i);
    > return 0;
    > }


    Not as far as I can see, though I may have missed something (that would
    be hard to do on code this simple, but it's unfortunately always a
    possibility). Do you have any particular reason for thinking otherwise?
    James Kuyper, Jun 18, 2009
    #2
    1. Advertising

  3. Mark

    Mark Guest

    "James Kuyper" <> wrote in message
    news:U2h_l.1652$...
    >> int func(int i)
    >> {
    >> return (i * i);
    >> }
    >>
    >> int main(void)
    >> {
    >> int i = 2;
    >>
    >> i = func(i);
    >> return 0;
    >> }

    >
    > Not as far as I can see, though I may have missed something (that would be
    > hard to do on code this simple, but it's unfortunately always a
    > possibility). Do you have any particular reason for thinking otherwise?


    I was thinking that it may invoke UB, because a parameter 'i' is being
    passed and then a value is written in it. I didn't find any C standard's
    evidence, but thought it'd be implementation defined how to handle parameter
    and return value in such case.

    --
    Mark
    Mark, Jun 18, 2009
    #3
  4. Mark

    Thad Smith Guest

    Mark wrote:
    > "James Kuyper" <> wrote in message
    > news:U2h_l.1652$...
    >>> int func(int i)
    >>> {
    >>> return (i * i);
    >>> }
    >>>
    >>> int main(void)
    >>> {
    >>> int i = 2;
    >>>
    >>> i = func(i);
    >>> return 0;
    >>> }

    >>
    >> Not as far as I can see, though I may have missed something (that
    >> would be hard to do on code this simple, but it's unfortunately always
    >> a possibility). Do you have any particular reason for thinking otherwise?

    >
    > I was thinking that it may invoke UB, because a parameter 'i' is being
    > passed and then a value is written in it.


    This is well defined.

    > I didn't find any C standard's
    > evidence, but thought it'd be implementation defined how to handle
    > parameter and return value in such case.


    The implementation is not required by Standard C to define the
    mechanisms used for passing parameters and return values, although many
    do this to facilitate interfacing assembly code.

    Even though the specific mechanism of parameter passing varies with
    different implementations, they all have the required effect.

    --
    Thad
    Thad Smith, Jun 18, 2009
    #4
  5. On Thu, 18 Jun 2009 11:08:39 +0900, "Mark"
    <> wrote:

    > "James Kuyper" <> wrote in message
    > news:U2h_l.1652$...
    >>> int func(int i)
    >>> {
    >>> return (i * i);
    >>> }
    >>>
    >>> int main(void)
    >>> {
    >>> int i = 2;
    >>>
    >>> i = func(i);
    >>> return 0;
    >>> }

    >>
    >> Not as far as I can see, though I may have missed something (that would
    >> be hard to do on code this simple, but it's unfortunately always a
    >> possibility). Do you have any particular reason for thinking otherwise?

    >
    > I was thinking that it may invoke UB, because a parameter 'i' is being
    > passed and then a value is written in it. I didn't find any C standard's
    > evidence, but thought it'd be implementation defined how to handle
    > parameter and return value in such case.


    A function call is a sequence point. It's okay to read from 'i' and
    write to 'i' so long as there is an intervening sequence point.

    - Anand
    Anand Hariharan, Jun 18, 2009
    #5
  6. Anand Hariharan <> writes:
    > On Thu, 18 Jun 2009 11:08:39 +0900, "Mark"
    > <> wrote:
    >> "James Kuyper" <> wrote in message
    >> news:U2h_l.1652$...
    >>>> int func(int i)
    >>>> {
    >>>> return (i * i);
    >>>> }
    >>>>
    >>>> int main(void)
    >>>> {
    >>>> int i = 2;
    >>>>
    >>>> i = func(i);
    >>>> return 0;
    >>>> }
    >>>
    >>> Not as far as I can see, though I may have missed something (that would
    >>> be hard to do on code this simple, but it's unfortunately always a
    >>> possibility). Do you have any particular reason for thinking otherwise?

    >>
    >> I was thinking that it may invoke UB, because a parameter 'i' is being
    >> passed and then a value is written in it. I didn't find any C standard's
    >> evidence, but thought it'd be implementation defined how to handle
    >> parameter and return value in such case.

    >
    > A function call is a sequence point. It's okay to read from 'i' and
    > write to 'i' so long as there is an intervening sequence point.


    It's even ok to do so without an intervening sequence point.
    i = i + 1 is perfectly valid, because the value of i is read (on the
    RHS) to determine the value to be stored in i (on the LHS).

    You only get UB if the same object is modified twice between sequence
    points, or if it's read and written with the result not being used to
    determine the value to be stored (as in i = i++). The latter rule may
    seem confusing (it confused me for a long time), but the point is that
    if the value read is used to determine the value to be stored, that
    imposes an ordering. If it's read and written with no imposed
    ordering, the behavior is undefined.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Jun 18, 2009
    #6
  7. Mark

    luserXtrog Guest

    On Jun 18, 12:50 am, Keith Thompson <> wrote:
    > Anand Hariharan <> writes:
    > > On Thu, 18 Jun 2009 11:08:39 +0900, "Mark"
    > > <> wrote:
    > >> "James Kuyper" <> wrote in message
    > >>news:U2h_l.1652$...
    > >>>> int func(int i)
    > >>>> {
    > >>>>    return (i * i);
    > >>>> }

    >
    > >>>> int main(void)
    > >>>> {
    > >>>>    int i = 2;

    >
    > >>>>    i = func(i);
    > >>>>    return 0;
    > >>>> }

    >
    > >>> Not as far as I can see, though I may have missed something (that would
    > >>> be hard to do on code this simple, but it's unfortunately always a
    > >>> possibility). Do you have any particular reason for thinking otherwise?

    >
    > >> I was thinking that it may invoke UB, because a parameter 'i' is being
    > >> passed and then a value is written in it. I didn't find any C standard's
    > >> evidence, but thought it'd be implementation defined how to handle
    > >> parameter and return value in such case.

    >
    > > A function call is a sequence point.  It's okay to read from 'i' and
    > > write to 'i' so long as there is an intervening sequence point.

    >
    > It's even ok to do so without an intervening sequence point.
    > i = i + 1 is perfectly valid, because the value of i is read (on the
    > RHS) to determine the value to be stored in i (on the LHS).
    >
    > You only get UB if the same object is modified twice between sequence
    > points, or if it's read and written with the result not being used to
    > determine the value to be stored (as in i = i++).  The latter rule may
    > seem confusing (it confused me for a long time), but the point is that
    > if the value read is used to determine the value to be stored, that
    > imposes an ordering.  If it's read and written with no imposed
    > ordering, the behavior is undefined.


    eg.
    i=i; /* ok */
    i=i+i; /* ok */
    i=i+i+i; /* ok */
    i=i=i; /* NOT OK */
    i=i,i=i; /* ok, comma is a sequence point */
    i=sin(i); /* ok, one read one write */
    (i=i)&&(i=i); /* probably ok, but don't tell 'em I told you */
    (i=i)*(i=i); /* NOT OK, 2 writes */
    (i=i)?(i==i):(i+=i-(i=i)); /* NOT OK 2 writes (that cause the prob)*/

    This is a variant of the popular
    i=i++;
    which is equivalent to
    i=(i=i+1);
    see 'em now? 2 '='s == BAD.

    What's not forbidden is allowed!

    --
    lxt
    luserXtrog, Jun 18, 2009
    #7
  8. pete <> writes:

    > Keith Thompson wrote:

    <snip>
    >> You only get UB if the same object is modified twice between sequence
    >> points, or if it's read and written with the result not being used to
    >> determine the value to be stored (as in i = i++). The latter rule may
    >> seem confusing (it confused me for a long time), but the point is that
    >> if the value read is used to determine the value to be stored, that
    >> imposes an ordering. If it's read and written with no imposed
    >> ordering, the behavior is undefined.
    >>

    >
    > I think of
    >
    > p = p -> next = q;
    >
    > as my favorite example of undefined behavior resulting
    > from the value of p being read with the result not being used
    > to determine the value to be stored.


    and mine is:

    a = i++; /* UB */

    i is used to determine where the value is stored rather than what
    value to store.

    --
    Ben.
    Ben Bacarisse, Jun 18, 2009
    #8
  9. Mark

    Richard Bos Guest

    Jack Klein <> wrote:

    > On Thu, 18 Jun 2009 11:08:39 +0900, "Mark"
    > > "James Kuyper" <> wrote in message


    > > >> int func(int i)
    > > >> {
    > > >> return (i * i);
    > > >> }
    > > >>
    > > >> int main(void)
    > > >> {
    > > >> int i = 2;
    > > >>
    > > >> i = func(i);
    > > >> return 0;
    > > >> }
    > > >
    > > > Not as far as I can see, though I may have missed something (that would be
    > > > hard to do on code this simple, but it's unfortunately always a
    > > > possibility). Do you have any particular reason for thinking otherwise?

    > >
    > > I was thinking that it may invoke UB, because a parameter 'i' is being
    > > passed and then a value is written in it. I didn't find any C standard's
    > > evidence, but thought it'd be implementation defined how to handle parameter
    > > and return value in such case.

    >
    > It would be UB if there were not a sequence point involved in the
    > function call.


    Nope. Take this code:

    #define MACRO(i) ((i)*(i))

    int main(void)
    {
    int i = 2;

    i = MACRO(i);
    return 0;
    }

    This has no more undefined behaviour than the original code.

    It's only UB if the object assigned to (i.e., i) is also read for other
    purposes than determining the assigned value. In both these cases, i is
    read only to determine i*i, which is then assigned back to i. This is
    legal.

    You're right in so far that _if_ func() had been more involved, and did
    things to i other than computing a value from it (for which it'd have
    needed to be passed &i rather than i's value, in a sanely written
    program), _then_ the function call sequence point would have prevented
    that case of UB.

    Richard
    Richard Bos, Jun 18, 2009
    #9
  10. On Jun 18, 12:50 am, Keith Thompson <> wrote:
    > Anand Hariharan <> writes:
    > > On Thu, 18 Jun 2009 11:08:39 +0900, "Mark"
    > > <> wrote:
    > >> "James Kuyper" <> wrote in message
    > >>news:U2h_l.1652$...
    > >>>> int func(int i)
    > >>>> {
    > >>>>    return (i * i);
    > >>>> }

    >
    > >>>> int main(void)
    > >>>> {
    > >>>>    int i = 2;

    >
    > >>>>    i = func(i);
    > >>>>    return 0;
    > >>>> }

    >
    > >>> Not as far as I can see, though I may have missed something (that would
    > >>> be hard to do on code this simple, but it's unfortunately always a
    > >>> possibility). Do you have any particular reason for thinking otherwise?

    >
    > >> I was thinking that it may invoke UB, because a parameter 'i' is being
    > >> passed and then a value is written in it. I didn't find any C standard's
    > >> evidence, but thought it'd be implementation defined how to handle
    > >> parameter and return value in such case.

    >
    > > A function call is a sequence point.  It's okay to read from 'i' and
    > > write to 'i' so long as there is an intervening sequence point.

    >
    > It's even ok to do so without an intervening sequence point.
    > i = i + 1 is perfectly valid, because the value of i is read (on the
    > RHS) to determine the value to be stored in i (on the LHS).
    >
    > You only get UB if the same object is modified twice between sequence
    > points, or if it's read and written with the result not being used to
    > determine the value to be stored (as in i = i++).  The latter rule may
    > seem confusing (it confused me for a long time), but the point is that
    > if the value read is used to determine the value to be stored, that
    > imposes an ordering.  If it's read and written with no imposed
    > ordering, the behavior is undefined.
    >


    Not playing devil's advocate here, but does not

    i = ++i;

    impose an ordering, and hence should have well-defined behaviour?

    - Anand
    Anand Hariharan, Jun 18, 2009
    #10
  11. Anand Hariharan <> writes:
    > On Jun 18, 12:50 am, Keith Thompson <> wrote:
    >> Anand Hariharan <> writes:

    [...]
    >> > A function call is a sequence point.  It's okay to read from 'i' and
    >> > write to 'i' so long as there is an intervening sequence point.

    >>
    >> It's even ok to do so without an intervening sequence point.
    >> i = i + 1 is perfectly valid, because the value of i is read (on the
    >> RHS) to determine the value to be stored in i (on the LHS).
    >>
    >> You only get UB if the same object is modified twice between sequence
    >> points, or if it's read and written with the result not being used to
    >> determine the value to be stored (as in i = i++).  The latter rule may
    >> seem confusing (it confused me for a long time), but the point is that
    >> if the value read is used to determine the value to be stored, that
    >> imposes an ordering.  If it's read and written with no imposed
    >> ordering, the behavior is undefined.
    >>

    >
    > Not playing devil's advocate here, but does not
    >
    > i = ++i;
    >
    > impose an ordering, and hence should have well-defined behaviour?


    Yes and no.

    The assignment evaluates the expression "++i" and stores the result in
    i, so the result of the expression must be determined before the value
    is stored. But the side effect of "++i" is to modify i; that side
    effect doesn't need to occur before the assignment modifies i, since
    the side effect isn't necessary for determining what the result of
    "++i" is going to be.

    Using a well-defined example:

    j = ++i;

    There are several things that must happen here:

    (a) Evaluate "j" as an lvalue (i.e., determine its address).
    (b) Evaluate "i" to determine its current value.
    (c) Determine the result of "++i".
    (d) Store the result of "++i" in j (side effect of "=").
    (e) Increment i (side effect of "++"").
    (f) Determine (and discard) the result of the assignment expression.

    Some of these things must occur before other things can happen. For
    example, (c) must precede (d). But (e) can occur either before or
    after (d); you don't need to modify i to determine what the result of
    "++i" is going to be.

    In this case, since i and j are separate objects, there's no problem.
    In the case of "i = ++i", the two modifications to i are unordered,
    and so the behavior is undefined.

    The pre-C201X draft:
    http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1362.pdf>
    has a very interesting re-statement of the rules in 6.5 (it helped me
    understand what the C90/C99 wording really means):

    An _expression_ is a sequence of operators and operands that
    specifies computation of a value, or that designates an object or
    a function, or that generates side effects, or that performs a
    combination thereof. The value computations of the operands of an
    operator are sequenced before the value computation of the result
    of the operator.

    If a side effect on a scalar object is unsequenced relative to
    either a different side effect on the same scalar object or a
    value computation using the value of the same scalar object, the
    behavior is undefined. If there are multiple allowable orderings
    of the subexpressions of an expression, the behavior is undefined
    if such an unsequenced side effect occurs in any of the orderings.

    The grouping of operators and operands is indicated by the
    syntax. Except as specified later, side effects and value
    computations of subexpressions are unsequenced.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Jun 18, 2009
    #11
  12. On Jun 18, 11:43 pm, Ben Bacarisse <> wrote:
    > pete <> writes:
    > > I think of
    > >
    > >     p = p -> next = q;
    > >
    > > as my favorite example of undefined behavior...

    >
    > and mine is:
    >
    >   a = i++; /* UB */


    a[a] = 0; /* when a == i */

    --
    Peter
    Peter Nilsson, Jun 19, 2009
    #12
  13. Peter Nilsson <> writes:

    > On Jun 18, 11:43 pm, Ben Bacarisse <> wrote:
    >> pete <> writes:
    >> > I think of
    >> >
    >> >     p = p -> next = q;
    >> >
    >> > as my favorite example of undefined behavior...

    >>
    >> and mine is:
    >>
    >>   a = i++; /* UB */

    >
    > a[a] = 0; /* when a == i */


    I have a new favourite example :)

    --
    Ben.
    Ben Bacarisse, Jun 19, 2009
    #13
  14. On Thu, 18 Jun 2009 13:24:57 -0700, Keith Thompson <> wrote:
    > Anand Hariharan <> writes:
    >> Not playing devil's advocate here, but does not
    >>
    >> i = ++i;
    >>
    >> impose an ordering, and hence should have well-defined behaviour?

    >
    > Yes and no.
    >
    > The assignment evaluates the expression "++i" and stores the result in
    > i, so the result of the expression must be determined before the value
    > is stored. But the side effect of "++i" is to modify i; that side
    > effect doesn't need to occur before the assignment modifies i, since the
    > side effect isn't necessary for determining what the result of "++i" is
    > going to be.
    >
    > Using a well-defined example:
    >
    > j = ++i;
    >
    > There are several things that must happen here:
    >
    > (a) Evaluate "j" as an lvalue (i.e., determine its address). (b)
    > Evaluate "i" to determine its current value. (c) Determine the result of
    > "++i".
    > (d) Store the result of "++i" in j (side effect of "="). (e) Increment i
    > (side effect of "++""). (f) Determine (and discard) the result of the
    > assignment expression.
    >
    > Some of these things must occur before other things can happen. For
    > example, (c) must precede (d). But (e) can occur either before or after
    > (d); you don't need to modify i to determine what the result of "++i" is
    > going to be.
    >
    > In this case, since i and j are separate objects, there's no problem. In
    > the case of "i = ++i", the two modifications to i are unordered, and so
    > the behavior is undefined.
    >

    [snip explanation from Standard]

    I understand what you are saying (thank you for the patient analysis), so
    this is more a rant than anything else:

    I understand several definitions of UB cannot be avoided (e.g., "char
    *c=NULL; *c;") or even useful (e.g., "int *p = (int *)0x1234;"), but when
    statements such as "i = i++;" -

    * have no useful value,
    * code that has such statements is broken (and most likely don't know
    about it),
    * can be detected by the compiler

    - why cannot the standard require a diagnostic?

    How different is this rant from "Why is gets() still in the standard?"?

    - Anand
    Anand Hariharan, Jun 19, 2009
    #14
  15. Anand Hariharan <> writes:
    > On Thu, 18 Jun 2009 13:24:57 -0700, Keith Thompson <> wrote:

    [...]
    >> In this case, since i and j are separate objects, there's no problem. In
    >> the case of "i = ++i", the two modifications to i are unordered, and so
    >> the behavior is undefined.
    >>

    > [snip explanation from Standard]
    >
    > I understand what you are saying (thank you for the patient analysis), so
    > this is more a rant than anything else:
    >
    > I understand several definitions of UB cannot be avoided (e.g., "char
    > *c=NULL; *c;") or even useful (e.g., "int *p = (int *)0x1234;"), but when
    > statements such as "i = i++;" -
    >
    > * have no useful value,
    > * code that has such statements is broken (and most likely don't know
    > about it),
    > * can be detected by the compiler
    >
    > - why cannot the standard require a diagnostic?
    >
    > How different is this rant from "Why is gets() still in the standard?"?


    Because these situations can't always be detected by the compiler.

    Consider (untested code):

    int arr[10] = {0};
    int i = rand() * 10;
    int j = rand() * 10;
    arr = arr[j] ++;

    This is well defined if i != j, but it invokes undefined behavior if
    i == j.

    Or, similarly:

    *p1 = (*p2) ++;

    where p1 and p2 may or may not be equal.

    Some cases, such as "i = i++", can be detected fairly easily (and
    compiler writers are certainly free to spend as much effort as they
    like detecting such cases). Other cases can be detected with some
    data-flow analysis:

    int *p0 = /* ... */;
    int *p1 = p0 + 1;
    *p0 = *(p1 - 1) ++;

    In my first example, a stunningly clever compiler might issue a
    message: "Warning: 10% chance of nasal demons".

    It would be interesting to try to define rigorously a set of rules for
    which cases a compiler is required to detect, but I'm skeptical of the
    success of such an effort. And any such formulation could render some
    compilers non-conforming. Under the current rules, a compiler author
    could make a legitimate choice to concentrate on fast and simple code
    generation and de-emphasize detection of potential problems.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Jun 19, 2009
    #15
  16. On Thu, 18 Jun 2009 18:26:11 -0700, Keith Thompson <> wrote:
    > Anand Hariharan <> writes:

    (...)
    >> I understand several definitions of UB cannot be avoided (e.g., "char
    >> *c=NULL; *c;") or even useful (e.g., "int *p = (int *)0x1234;"), but
    >> when statements such as "i = i++;" -
    >>
    >> * have no useful value,
    >> * code that has such statements is broken (and most likely don't know
    >> about it),
    >> * can be detected by the compiler
    >>
    >> - why cannot the standard require a diagnostic?

    (...)
    >
    > Because these situations can't always be detected by the compiler.
    >
    > Consider (untested code):
    >
    > int arr[10] = {0};
    > int i = rand() * 10;
    > int j = rand() * 10;
    > arr = arr[j] ++;
    >
    > This is well defined if i != j, but it invokes undefined behavior if i
    > == j.
    >

    [snip other examples]

    I assume you meant rand() to be a non standard function that returns a
    random floating point value in [0.0, 1.0).

    Thank you for the explanation,
    - Anand
    Anand Hariharan, Jun 19, 2009
    #16
  17. Anand Hariharan <> writes:
    > On Thu, 18 Jun 2009 18:26:11 -0700, Keith Thompson <> wrote:

    [...]
    >> Consider (untested code):
    >>
    >> int arr[10] = {0};
    >> int i = rand() * 10;
    >> int j = rand() * 10;
    >> arr = arr[j] ++;
    >>
    >> This is well defined if i != j, but it invokes undefined behavior if i
    >> == j.
    >>

    > [snip other examples]
    >
    > I assume you meant rand() to be a non standard function that returns a
    > random floating point value in [0.0, 1.0).


    No, I meant rand() to be a *standard* function that returns a random
    floating point value in [0.0, 1.0).

    Unfortunately, my intentions were inconsistent with reality.

    (I *told* you it was untested code!)

    > Thank you for the explanation,


    You're welcome.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Jun 19, 2009
    #17
  18. Mark

    Kaz Kylheku Guest

    On 2009-06-18, pete <> wrote:
    > I think of
    >
    > p = p -> next = q;
    >
    > as my favorite example of undefined behavior resulting
    > from the value of p being read with the result not being used
    > to determine the value to be stored.


    Where is p being read with the result not being used to determine
    the value to be stored?

    The standard makes it clear that the value of an assignment expression
    is that of the left operand, after the assignment.
    (See C99, 6.5.16, paragraph 3).

    So above, what is assigned to p? The value of the assignment
    expression (p->next = q).

    What is the value of that expression? It is the value of p->next after the
    assignment.

    You seem to be assuming that the value of the assignment expression is that of
    the right operand (converted to the type of the left), with no ordering
    dependency with respect to the assignment. I.e. that the value of q can flow
    out of p->next = q expression right into the p assignment, independently
    of the completion of the p->next assignment.

    But as you can see, that is not true. Since the standard says that
    the value is that of the left operand after assignment, we must take this to be
    literally true. The standard describes abstract semantics, which must be obeyed
    in accordance with the ``as if'' principle, regardless of any optimizations.

    In this case, the abstract semantics says that, literally, the value is stored
    into p->next, and then the assignment expression's value is derived by
    accessing the value of p->next. That's what it means to get the value
    of p->next ``after the assignment''. And of course p cannot be modified until
    that value is available.

    If such an ordering were not required, then the standard wouldn't use the
    phrase ``after the assignment'', but only something like ``the value of the
    assignment expression is the same as that which is stored in the left operand,
    of the same type''.
    Kaz Kylheku, Jun 19, 2009
    #18
  19. Kaz Kylheku <> writes:
    > On 2009-06-18, pete <> wrote:
    >> I think of
    >>
    >> p = p -> next = q;
    >>
    >> as my favorite example of undefined behavior resulting
    >> from the value of p being read with the result not being used
    >> to determine the value to be stored.

    >
    > Where is p being read with the result not being used to determine
    > the value to be stored?


    In "p -> next = q", where p is read to determine where to store the
    value of q.

    > The standard makes it clear that the value of an assignment expression
    > is that of the left operand, after the assignment.
    > (See C99, 6.5.16, paragraph 3).
    >
    > So above, what is assigned to p? The value of the assignment
    > expression (p->next = q).
    >
    > What is the value of that expression? It is the value of p->next after the
    > assignment.
    >
    > You seem to be assuming that the value of the assignment expression
    > is that of the right operand (converted to the type of the left),
    > with no ordering dependency with respect to the assignment.
    > I.e. that the value of q can flow out of p->next = q expression
    > right into the p assignment, independently of the completion of the
    > p->next assignment.
    >
    > But as you can see, that is not true. Since the standard says that
    > the value is that of the left operand after assignment, we must take
    > this to be literally true. The standard describes abstract
    > semantics, which must be obeyed in accordance with the ``as if''
    > principle, regardless of any optimizations.
    >
    > In this case, the abstract semantics says that, literally, the value
    > is stored into p->next, and then the assignment expression's value
    > is derived by accessing the value of p->next. That's what it means
    > to get the value of p->next ``after the assignment''. And of course
    > p cannot be modified until that value is available.
    >
    > If such an ordering were not required, then the standard wouldn't
    > use the phrase ``after the assignment'', but only something like
    > ``the value of the assignment expression is the same as that which
    > is stored in the left operand, of the same type''.


    That's not a bad argument, but it implies that the side effect of
    storing the value in the target must occur before the result of the
    assignment is used. But then why does the standard say, "The side
    effect of updating the stored value of the left operand shall occur
    between the previous and the next sequence point."? Yes, that's
    strictly consistent with what you say, but if that's the intent it's
    an odd way to express it.

    Consider:

    int x, y;
    x = y = 3;

    Both x and y have the value 3 stored in them, but this can occur in
    either order. The result of "y = 3", and therefore the value stored
    in x, is "the value of [y] after the assignment", which seems to imply
    an ordering constraint, but I don't think it's intended to. The value
    stored in x is 3; the value of y after the assignment is 3. 3 is 3,
    which satisfies the requirement.

    N1362, the pre-C201x draft, re-words the section, but I don't think it
    resolves the issue:

    An assignment operator stores a value in the object designated by
    the left operand. An assignment expression has the value of the
    left operand after the assignment, but is not an lvalue. The type
    of an assignment expression is the type of the left operand unless
    the left operand has qualified type, in which case it is the
    unqualified version of the type of the left operand. The side
    effect of updating the stored value of the left operand is
    sequenced after the value computations of the left and right
    operands. The evaluations of the operands are unsequenced.

    But note that it doesn't say that any use of the result is sequenced
    after the side effect of updating the stored value.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Jun 19, 2009
    #19
  20. Mark

    Kaz Kylheku Guest

    On 2009-06-19, Anand Hariharan <> wrote:
    > I understand several definitions of UB cannot be avoided (e.g., "char
    > *c=NULL; *c;") or even useful (e.g., "int *p = (int *)0x1234;"), but when
    > statements such as "i = i++;" -
    >
    > * have no useful value,
    > * code that has such statements is broken (and most likely don't know
    > about it),
    > * can be detected by the compiler


    Not all instances of this can be detected statically. Think about:

    (*p) = (*q)++;

    This can be well-defined if p and q point to different objects,
    but is undefined if they point to the same object.

    The values of p and q can vary at run-time, and can be made to depend
    on input to the program.

    So to issue the diagnostic at translation time, you have to have the input to
    the program available, and be prepared to solve the halting problem. :)

    But I agree with you. Unspecified orders of evaluation, in a primarily
    imperative language, are complete nonsense, and atrociously irresponsible
    engineering that partially keeps us in the dark ages.

    Rather than inventing misfeatures and then trying to diagnose them,
    we should specify the order of everything, so that there is no ambiguity.

    There is a religious belief, completely unsubstantiated, that unspecified
    evaluation orders are required for the generation of good code.

    This is pure bunk because:

    - actual evaluation can be considerably rearranged in the face of
    required orders.

    informal proof 1: there are already sequence points in C programs. If
    optimizers could not move effects across abstract sequence points,
    most optimizations would not be possible. Optimizations like
    function inlining and loop unrolling ``obliterate semicolons''.

    informal proof 2: programmers are encouraged to rewrite ambiguous-looking
    code into multiple statements, with sequence points.
    But wait, aren't we supposed to stuff everything into one expression
    with lots of side effects to get the benefit of speed?
    Maybe, if you're working with a PDP-11 C compiler from a 1979 Unix box.

    - the few cases where this is true are now addressed with restrict
    pointers.

    suppose that side effects are nicely ordered left to right
    (they aren't, of course, but consider an imaginary C dialect)
    and you have this expression:

    (*p) = (*q)++;

    because this is well-defined, the compiler for our imaginary
    dialct has to make it work properly. The problem is that p and q may or may
    not point to the same object, and it has to work regardless. The compiler
    for this strictly evaluated dialect could generate better code if it could
    assume that p and q do not point to the same object, just like it does for
    code like:

    i = j++;

    where i and j are known not to be aliases since they are separately
    defined variables.

    In the C99 language, we can make p and q restrict-qualified
    pointers. By doing so, we promise to the language implementation
    that these ojects are not aliased.

    So we have a way to tell the compiler: ``Please assume these object
    accessed through pointers are different objects, so that updating
    one has no effect on the value of the other, or else I will eat my
    unsigned shorts.''

    But in the C language being what it is, with its unspecified
    evaluation orders, we don't actually need to indicate
    that p and q are different objects. The (*p) = (*q)++ expression
    encodes the assumption that they are!

    In other words, bmbiguity in expressions is also a way of promising to the
    compiler that there is no aliasing. With it you can express ``since I am
    updating several things here without a sequence point, or accessing some
    things while modifying others, I am hereby promising that they are all
    distinct things.''

    Using a declared attribute of the pointer (restrict qualifier) is
    a better way of achieving this. It can't hurt you if you don't use it,
    and you don't have to jam multiple operations into one evaluation between two
    sequence points to get the optimization benefit.

    If p and q are declared as pointing to distinct objects, then this assumption
    still helps optimization even if there are sequence points:

    *p = *q;
    (*q)++;

    In spite of the sequence point, the compiler can assume that the
    assignment to *p has no effect on *q. We are free to restructure
    the code; we don't lose the no-aliasing assumption just because
    we added a semicolons.
    Kaz Kylheku, Jun 19, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mantorok Redgormor
    Replies:
    70
    Views:
    1,754
    Dan Pop
    Feb 17, 2004
  2. luser- -droog

    possibly undefined operation

    luser- -droog, Dec 30, 2010, in forum: C Programming
    Replies:
    12
    Views:
    494
    Jorgen Grahn
    Jan 2, 2011
  3. Chris Beall

    Reference to possibly undefined variable

    Chris Beall, Mar 1, 2005, in forum: Javascript
    Replies:
    4
    Views:
    148
  4. VK
    Replies:
    45
    Views:
    592
    Dr John Stockton
    Sep 12, 2006
  5. -Lost
    Replies:
    13
    Views:
    370
    Richard Cornford
    Jan 31, 2007
Loading...

Share This Page