sequence points in subexpressions

amit.codename13 · Dec 13, 2009

Does the statement given below invoke undefined behavior?
i = (i, i++, i) + 1;

I am almost convinced that it does not because of the following
reasons

1> the RHS must be evaluated before a value can be stored in i
2> evaluation of RHS does not invoke UB due to the sequence points
introduced by comma operator

Correct me if i an wrong!

Thanks

amit.codename13 · Dec 13, 2009

What do you think the answer is?

i=i+2 ?

I am sure that it's well defined.

I am quoting someone's comment/view on that statement.

Yes the sequence point after i++ completes all side effects before it, but there is nothing that > stops the assignment side effect overlapping with the side effect of i++. The underlying problem > is that the side effect of an assignment is not specified to happen after or before the
evaluation of both operands of the assignment, and so sequence points cannot do anything with
regard to protecting this: Sequence points induce a partial order: Just because there is a
sequence point after and before i++ doesn't mean all side effects are sequenced with regard to
it.

I was trying to comprehend it.
I am not *100%* sure that what he said is wrong.
So any comments on it(the comment)?

mohangupta13 · Dec 13, 2009

What side affects would you expect from the right hand side, keeping in
mind the sequence points?

With due regard to everyone's comments , can anyone please explicitly
say whether the above statement is actually UD or not ? I am not able
to get this from the above comments .

Thanks
Mohan

Seebs · Dec 13, 2009

With due regard to everyone's comments , can anyone please explicitly
say whether the above statement is actually UD or not ? I am not able
to get this from the above comments .

I am honestly not sure.

I think it is, but... I don't know. Here's the case that convinced me:

a = (1, i++, 1);

It seems clear to me that there's a real-world risk that the evaluation of
i on the left is at risk of occurring during the evaluation of the RHS.
So I don't think there's a sequence point between the sides.

More importantly:
1. You don't need to do that.
2. Even if it's not undefined behavior, it's the kind of case where compilers
might have unexpected bugs.
3. So don't do that, then. Just write out what you mean.

In most of the "interesting" edge cases, the right answer is not to go there.

-s

Nick · Dec 13, 2009

mohangupta13 said:
With due regard to everyone's comments , can anyone please explicitly
say whether the above statement is actually UD or not ? I am not able
to get this from the above comments .

I often wonder why anyone cares about this sort of statement. It
doesn't look the sort of thing that might be produced by computer
generated code, and it doesn't look the thing you'd actually want to
write into a program. If it's that borderline and you'd never need it,
why does it actually matter, other than as a sort of C language sudoku.

Kenny McCormack · Dec 13, 2009

Nick said:
I often wonder why anyone cares about this sort of statement. It
doesn't look the sort of thing that might be produced by computer
generated code, and it doesn't look the thing you'd actually want to
write into a program. If it's that borderline and you'd never need it,
why does it actually matter, other than as a sort of C language sudoku.

Indeed. Well put. ITA.

But the point is that that is precisely the stock-in-trade of this
newsgroup. As I have shown many times, it is not possible to post
anything to this newsgroup, that will meet the generally accepted
standards of "appropriateness", that is not either a) a topicality flame
(gotta love them!) or b) language lawyering, of the type that is of no
interest to the vast majority of working C programmers.

Your post hits the nail on the head as to the sort of thing that people
like Kiki, et al, just salivate over, but the rest of us find dull and
uninteresting at best, and downright offensive at worst.

Flash Gordon · Dec 13, 2009

For context, the statement was
i = (i, i++, i) + 1;

I am honestly not sure.

I think it is, but... I don't know.

I think it isn't UB.

Here's the case that convinced me:

a = (1, i++, 1);

It seems clear to me that there's a real-world risk that the evaluation of
i on the left is at risk of occurring during the evaluation of the RHS.
So I don't think there's a sequence point between the sides.

That example is different because i is used on the left to determine the
object to be stored, where as in the original it is merely the object in
which the result will be stored.

More importantly:
1. You don't need to do that.
2. Even if it's not undefined behavior, it's the kind of case where compilers
might have unexpected bugs.
3. So don't do that, then. Just write out what you mean.

In most of the "interesting" edge cases, the right answer is not to go there.

Click to expand...

That I definitely agree with. I would reject any code like this I came
across in a code review.

Chad · Dec 13, 2009

For context, the statement was
i = (i, i++, i) + 1;

I am honestly not sure.

Click to expand...

I think it is, but... I don't know.

Click to expand...

I think it isn't UB.

Here's the case that convinced me:

Click to expand...

a = (1, i++, 1);

Click to expand...

It seems clear to me that there's a real-world risk that the evaluation of
i on the left is at risk of occurring during the evaluation of the RHS.
So I don't think there's a sequence point between the sides.

Click to expand...

That example is different because i is used on the left to determine the
object to be stored, where as in the original it is merely the object in
which the result will be stored.

Why wouldn't the result get store in a = (1, i++, 1); ?

Beej Jorgensen · Dec 13, 2009

Does the statement given below invoke undefined behavior?
i = (i, i++, i) + 1;

I think these are the sequence points:

i = (i , i++ , i) + 1 ;
1 2 3

(1 and 2 at the end of the first operand to a comma operator, and 3 at
the end of a full expression.) So I believe what we have is this:

i [i == 0, let's say]
[sequence point 1]
i++ [i == 1 by the next sequence point]
[sequence point 2]
i = i + 1 [i == 2 by the next sequence point]
[sequence point 3]

I don't see anything here that violates C99 6.5p2, so I'm going to bet
that it's OK. gcc offers no applicable complaints* at full-warnings,
and outputs 2 in the above example.

(And I'd like to buy insurance on that bet--I'm pushing the bounds of my
knowledge, here.)

-Beej

* gcc does warn that the leftmost i in (i, i++, i) has no effect.

James Dow Allen · Dec 13, 2009

I think it isn't UB.

I'm not sure. In the simple case:
i = (1, i++, i) + 1;
It may be hard to imagine how the C system
could go wrong, but one might be able to imagine
some cache-speeding trick that assumes it
won't encounter this code (or can do what it wants
with it, if marked UB in The Standard).

For those who think commas are permitted, what about:
*(p += i, ++i, p += i) = j++, ++j, j;
No problem right?
The commas at left separate left-side sequence points,
and commas at right separate (order) a different
set of sequence points. We end up, in effect with
i += 1, *(p += i+i-1) = j += 2;

What do we know about *which* sequence points are
reached first, right-side vs left-side, or can they
be interelaved?

Now what about:
*(p += i, ++i, p += i) = i++, ++i, i;
Definitely UB-lookingish.

.
That I definitely agree with. I would reject any code like this I came
across in a code review.

While certainly this code would be rejected,
it *is* good to look at border cases.

I often wonder why anyone cares about this sort of statement. It
doesn't look the sort of thing that might be produced by computer
generated code, and it doesn't look the thing you'd actually want to
write into a program. If it's that borderline and you'd never need it,
why does it actually matter, other than as a sort of C language sudoku.

I argued much like this 5 weeks back in a somewhat similar thread and
was rebuked. And the old thread was the same old silly expression
designed
to provoke UB, while OP's query *does* represent a defining corner-
case.

James Dow Allen

Beej Jorgensen · Dec 13, 2009

Seebs said:
Seebs said:

a = (1, i++, 1);

Click to expand...

It seems clear to me that there's a real-world risk that the evaluation of
i on the left is at risk of occurring during the evaluation of the RHS.
So I don't think there's a sequence point between the sides.

Click to expand...

That example is different because i is used on the left to determine the
object to be stored, where as in the original it is merely the object in
which the result will be stored.

Click to expand...

Why wouldn't the result get store in a = (1, i++, 1); ?

Just because the right side of the assignment is in parentheses and uses
the comma operator doesn't mean the subexpression in left side can't be
evaluated at the same time. (C99 6.5p3)

(1, i++, 1) definitely has to be evaluated before the assignment, but
the a+i subexpression of *(a+i) (same as a) can be evaluated before
or after (1, i++, 1).

I don't think this example runs afoul of 6.5p2, which forbids things
like a[i++]=i, because of the sequence point after i++. ...?

-Beej

Flash Gordon · Dec 13, 2009

James said:
I'm not sure. In the simple case:
i = (1, i++, i) + 1;
It may be hard to imagine how the C system
could go wrong, but one might be able to imagine
some cache-speeding trick that assumes it
won't encounter this code (or can do what it wants
with it, if marked UB in The Standard).

Certainly if it is UB such assumptions can be made, but it is?
There is a sequence point between the evaluation of i++ and the
evaluation of i to its right, and it is the result of that i which is
yielded by the comma operator and then has 1 added to it before being
assigned to i. So, the sequence point of the comma operator is before
the assignment side effect of the equals operator.

For those who think commas are permitted, what about:
*(p += i, ++i, p += i) = j++, ++j, j;
No problem right?
The commas at left separate left-side sequence points,
and commas at right separate (order) a different
set of sequence points. We end up, in effect with
i += 1, *(p += i+i-1) = j += 2;

What do we know about *which* sequence points are
reached first, right-side vs left-side, or can they
be interelaved?

They can.

Now what about:
*(p += i, ++i, p += i) = i++, ++i, i;
Definitely UB-lookingish.

Yes, because there here i is not simply the object to which the right
hand side of the equals operator is being assigned.

While certainly this code would be rejected,
it *is* good to look at border cases.

Well, it doesn't particularly bother me.

I argued much like this 5 weeks back in a somewhat similar thread and
was rebuked. And the old thread was the same old silly expression
designed
to provoke UB, while OP's query *does* represent a defining corner-
case.

Ah well, you can't expect to always get the same answer!

Kaz Kylheku · Dec 14, 2009

It does.

That's wrong.

The side effect from the increment operator.

The value being stored in the assignment is that of the rightmost
operand of the comma expression. The computation of the rightmost
operand follows a sequence point. So the modification of i in the
assignment is well-ordered with regard to the prior side effects
in the comma expression.

The sequence points from the comma operator are not relevant
because there is no sequence point between the evaluation
of the right and left operands of the assignment operator.

There is a data flow dependency, however. The value cannot be
stored before it is computed.

In the expression

i = (i, i++, i) + 1;
^

the value to be stored is derived from the value of the expression
denoted by the caret, by adding 1.

The denoted expression is the right operand of a comma, so its
evaluation is delayed until prior side effects have settled.

The sequencing in the comma operator, plus the dataflow dependency
in the assigment, add up to well-defined behavior.

Keith Thompson · Dec 14, 2009

pete said:
It does.

That's wrong.

The side effect from the increment operator.

The sequence points from the comma operator are not relevant
because there is no sequence point between the evaluation
of the right and left operands of the assignment operator.

If the right operand of the assignment opeartor is evaluated first,
then there shouldn't be any problem with
i = i++;
but there is a problem.
Assignment is not a sequence point.

The problem with

i = i++;

is that the side effect of the "++" can happen any time before
the end of the statement.

In

i = (i, i++, i) + 1;

let's consider the subexpression

(i, i++, i)

Here, the side effect of the "++" must happen before the next sequence
point, which occurs, not at the end of the statement, but at the comma
operator. The entire subexpression yields the value of i after it's
been incremented, and the value of i is updated before the
subexpression completes.

Looking at the full expression:

i = (i, i++, i) + 1;

I argue that adding 1 to the result of the subexpression and
assigning that to i doesn't introduce any undefined behavior.
The assignment cannot modify i until the RHS has been evaluated.
The RHS cannot yield a result until after the side effect of the
increment has occurred.

So you could at least have a reasonable and consistent set of rules
that makes "i = i++" undefined but makes "i = (i, i++, i) + 1" well
defined.

Whether C99 actually has such rules is another question. N1256 says:

Between the previous and next sequence point an object shall
have its stored value modified at most once by the evaluation
of an expression. Furthermore, the prior value shall be read
only to determine the value to be stored.

I *think* this makes the expression in question well defined, but
I'm not certain.

The C201X drafts (the latest is
<http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1425.pdf>)
use different wording in this area, referring to operations being
"sequenced before" or "sequenced after" other operations. The new
wording might make this case clearer (I'm too lazy to check at
the moment).

Beej Jorgensen · Dec 14, 2009

Does the statement given below invoke undefined behavior?
i = (i, i++, i) + 1;

I am almost convinced that it does not because of the following
reasons

Ok, I've been poring over the latest draft, which takes a better stab at
all of this. I still don't really know the answer, but here's more
stuff according to that draft. (I stared at C99 today trying to coax
the real answer out of it, but I was just getting unhappy with the model
which didn't seem to want to spit it out. C09 is more complex with some
abstractions that I think help clarify these issues.)

Comments welcomed on everything I've written here. Please be aware that
this is my understanding of a document that I just looked at for the
first time today and is in no way necessarily correct or definitive.

1> the RHS must be evaluated before a value can be stored in i

It's a little bit nuanced, because "evaluation" is two things:

# Evaluation of an expression in general includes both value
# computations and initiation of side effects. [5.1.2.3p2]

Note that it's not "resolution" of side effects (which don't necessarily
occur until a sequence point.)

With respect to expressions:

# The value computations of the operands of an operator are sequenced
# before the value computation of the result of the operator. [6.5p1]

So, yes, the value computation must be done before the assignment, but
not necessarily the resolution of side effects.

In terms of sequencing of operations:

# Given any two evaluations A and B, if A is sequenced before B, then
# the execution of A shall precede the execution of B. [...] If A is
# not sequenced before or after B, then A and B are unsequenced.
# [5.1.2.3p3]

Remember, we're talking about "evaluations", which does not necessarily
include resolution of side effects.

And how this relates to expressions (this is *the* paragraph that lays
down the law):

# If a side effect on a scalar object is unsequenced relative to either
# a different side effect on the same scalar object or a value
# computation using the value of the same scalar object, the behavior is
# undefined. [6.5p2]

So back to the example:

i = (i, i++, i) + 1;

We have two side effects in the assignment and the ++. The question is,
are they sequenced?

Well, we know that the value computations of the operands to + are
sequenced before the value computation of the result of +. So the value
of 1 and the value of (i,i++,i) are computed before the result of + is.

What of the comma operator?

# The left operand of a comma operator is evaluated as a void
# expression; there is a sequence point between its evaluation and that
# of the right operand. Then the right operand is evaluated; the result
# has its type and value. [6.5.17p2]

What is a sequence point?

# The presence of a sequence point between the evaluation of expressions
# A and B implies that every value computation and side effect
# associated with A is sequenced before every value computation and side
# effect associated with B. [5.1.2.3p3]

So now we get our forced sequencing of side effects, as well. With the
expression (i,i++,i), the side effect of i++ must be complete before the
value of the expression (namely i) is can be computed. And the value of
the expression must be computed before it can subsequently be used by +.

And +'s value must be computed before the assignment can occur:

# The side effect of updating the stored value of the left operand is
# sequenced after the value computations of the left and right operands.
# [6.5.16p3]

Working backward:

o For the assignment side effect to occur, the value computations of
both operands of the assignment must be complete.

o For the value computations on the right side of the assignment to be
complete, the value computations of the + operator's operands have
to be complete.

o For the value computation of (i,i++,i) to be complete, i++'s side
effects must be complete.

And so, I think, the side effect of i++ is sequenced before the side
effect of i=, and so in this case is not undefined behavior.

Some counter cases:

i = i++;

While the sequence of value computations is defined for i=i++, the
side effects are unsequenced, and so it is undefined behavior.

|----- A ----| |----- B ----|
k = (i, i /= 3, i) + (i, i *= 5, i); // "please...kill me..."

In this case, the value computations of both subexpressions A and B
must be complete before +, and therefore, by the previous pages of
arguments, the side effects of i/=3 and i*=5 must also be complete
before the +.

And, therefore, the side effects of i/=3 and i*=5 must also be
complete before the result of the value computation of + is finally
assigned into k.

However, the two subexpressions A and B are unsequenced relative to
one another and both modify the same object, and so the behavior is
undefined.

Do I believe it myself? I don't even know anymore.

What do you think, folks?

-Beej

(Remember: this analysis is based on the draft, not the Standard. I'm
just presuming they're going to try to keep it basically compatible.)

Ben Bacarisse · Dec 14, 2009

Beej Jorgensen said:
Ok, I've been poring over the latest draft, which takes a better stab at
all of this.

So back to the example:

i = (i, i++, i) + 1;

We have two side effects in the assignment and the ++. The question is,
are they sequenced?

Well, we know that the value computations of the operands to + are
sequenced before the value computation of the result of +. So the value
of 1 and the value of (i,i++,i) are computed before the result of + is.

What of the comma operator?

# The left operand of a comma operator is evaluated as a void
# expression; there is a sequence point between its evaluation and that
# of the right operand. Then the right operand is evaluated; the result
# has its type and value. [6.5.17p2]

What is a sequence point?

# The presence of a sequence point between the evaluation of expressions
# A and B implies that every value computation and side effect
# associated with A is sequenced before every value computation and side
# effect associated with B. [5.1.2.3p3]

So now we get our forced sequencing of side effects, as well. With the
expression (i,i++,i), the side effect of i++ must be complete before the
value of the expression (namely i) is can be computed. And the value of
the expression must be computed before it can subsequently be used by +.

And +'s value must be computed before the assignment can occur:

# The side effect of updating the stored value of the left operand is
# sequenced after the value computations of the left and right operands.
# [6.5.16p3]

I find all this wording much clearer than the old description. The
trouble I always had with the old wording is that sequence points are
points in the program text, but the restriction on what is permitted
is worded in terms of temporal ordering of actual events. When the
C99 standard says, in effect, that the order of execution is
unspecified, you are left trying to relate possible execution paths
though the text so as to get all the event orderings that are possible
to see if any violate the constraint.

This new wording greatly simplifies the task of ascertaining the
permitted orderings. I like it much better.

Working backward:

o For the assignment side effect to occur, the value computations of
both operands of the assignment must be complete.

o For the value computations on the right side of the assignment to be
complete, the value computations of the + operator's operands have
to be complete.

o For the value computation of (i,i++,i) to be complete, i++'s side
effects must be complete.

And so, I think, the side effect of i++ is sequenced before the side
effect of i=, and so in this case is not undefined behavior.

I agree. I also agree (for what it is worth) that it is not undefined
even using the current text of the standard.

<snip>

Michael Foukarakis · Dec 14, 2009

I disagree.
I know that the value of (i,i++,i) is one greater
than the original value of (i).
I computed that without accomplishing any side effects.

The postfix increment operator's side effect is incrementing its
operand by 1. That's what you "computed". Get it now? Basic
comprehension OK?

Beej's post is great and very informative. The OP's construct doesn't
invoke UB.

Beej Jorgensen · Dec 14, 2009

I disagree.
I know that the value of (i,i++,i) is one greater
than the original value of (i).
I computed that without accomplishing any side effects.

Then I think you skipped over a sequence point without performing every
value computation and side effect associated with subexpression i++, in
violation of 5.1.2.3p3:

# The presence of a sequence point between the evaluation of expressions
# A and B implies that every value computation and side effect
# associated with A is sequenced before every value computation and side
# effect associated with B.

But you're arguing the side effects don't necessarily take place at the
sequence point, is that right?

-Beej

Eric Sosman · Dec 14, 2009

Certainly if it is UB such assumptions can be made, but it is?
There is a sequence point between the evaluation of i++ and the
evaluation of i to its right, and it is the result of that i which is
yielded by the comma operator and then has 1 added to it before being
assigned to i. So, the sequence point of the comma operator is before
the assignment side effect of the equals operator.

The value of the parenthesized sub-expression is the
value of `i' after incrementation, yes. But where is it
written that the sub-expression's value must be determined
by actually reading it from `i'? If an optimizing compiler
knew that `i' was 42 before the line in question, could it
not replace the assignment with `i=44', with the `i++'
happening at some undetermined moment?

Michael Tsang · Dec 14, 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Does the statement given below invoke undefined behavior?
i = (i, i++, i) + 1;

I am almost convinced that it does not because of the following
reasons

1> the RHS must be evaluated before a value can be stored in i
2> evaluation of RHS does not invoke UB due to the sequence points
introduced by comma operator

Correct me if i an wrong!

Thanks

I don't think it is UB. Let SQ 0 be the last sequence point before the full
expression, SQ 1 be the sequence point between i and i++, SQ 2 be the
sequence point between ++i and i, SQ 3 be the sequence point the the end of
the full expression. Because the right hand side must be read in order to
determine the value stored, the = operator must be evaluated between SQ 2
and SQ 3 but in the right hand side, i++ is done between SQ 1 and SQ 2 so
there is no UB.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAksmTKEACgkQG6NzcAXitM9oxgCfTCFGZxWjIl4iJP/5YYlLOgkq
hVgAnjmhyxO6RMYmsa6WztW65CNBlyHC
=VY9x
-----END PGP SIGNATURE-----

sequence points and printf()	17	Dec 1, 2011
sequence points and expression evaluation	10	Jul 2, 2010
Creating sequence points using macros	2	Mar 28, 2007
sequence points	5	Aug 17, 2008
sequence points	10	Oct 14, 2007
The comma operator, and assigning twice between sequence points	9	Feb 8, 2008
sequence points and evaluation order	9	Sep 4, 2006
function composition, sequence point, and unsuspected side effects	63	Nov 15, 2013

sequence points in subexpressions

amit.codename13

amit.codename13

mohangupta13

Seebs

Nick

Kenny McCormack

Flash Gordon

Chad

Beej Jorgensen

James Dow Allen

Beej Jorgensen

Flash Gordon

Kaz Kylheku

Keith Thompson

Beej Jorgensen

Ben Bacarisse

Michael Foukarakis

Beej Jorgensen

Eric Sosman

Michael Tsang

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads