sequence points in subexpressions

Kaz Kylheku · Dec 14, 2009

No.
The side effects of the increment operator
are not needed for the dataflow dependency.

That is true in general, but not in this specific expression.

The key here is that the comma operator imposes the side effect barrier.

The value of the right operand of the assignment operator
can be calculated without any side effects taking place;
It will be equal to (i + 2).

You are gapingly overlooking the fact that algebraic evaluation
shortcuts are optimizations which are only permitted if they don't break
the abstract semantics.

Sequence points are only defined by
when the side effects of an evaluation take place,
not the data calculations.

That is false. A sequence point establishes not only that prior side
effects have settled, but that the next evaluation has not yet started.

I.e. if the prior evaluation has the side effect of modifying X, and the
next evaluation accesses X (a data calculation), then this means that
the side effect of X is settled, and the next evaluation will use the
new, updated, stable value of X.

(p = p -> next = q)

means the exact same thing as

(p = (p -> next = q))

But the side effect of evaluating (p -> next = q)
does not have to take place before the update of p.

This is different because no operator is used in this expression which
has sequencing properties.

Since it does not contain any, this example you have given can be used
to neither support nor refute any of your claims about the properties of
sequence points.

Kaz Kylheku · Dec 14, 2009

I disagree.
I know that the value of (i,i++,i) is one greater
than the original value of (i).
I computed that without accomplishing any side effects.

What you are describing is the existence of an algebraic shortcut,
which is an optimization.

Remember, that the actual machine can take arbitrary optimizations
in computing the visible behavior of the program, but the results have
to be like what the abstract machine would have computed.

The abstract machine requires adherence to sequence points.

Optimizations can discard sequence points only when it makes no
difference to the computed value, or externally visible behavior.

Keith Thompson · Dec 14, 2009

pete said:
No.
I'm saying that the word "evaluation" includes side effects,
and I'm saying that value of an expression can be determined and used
prior to the evaluation of that expression being completed.

Consider:

int i = 0;
int j = 0;
i ++;
j = i;

I can determine the value of the expression ``i'' on line 4
(it's 1) without completing the evaluation of ``i ++'' on line 3.
Nevertheless, the side effect of the "++" must occur before the
side effect of the "=" on line 4.

A compiler may generate code that performs the operations in a
different order, or that omits some operations altogether, but any
such optimization cannot produce visible behavior outside the range
of permitted behaviors of the abstract machine. In particular,
optimization cannot introduce undefined behavior when the behavior
was well defined in the first place.

I think you'll agree that the behavior of my 4-line snippet
above is well defined. I argue that the behavior of

i = (i, i++, i) + 1;

is well defined for the same reason: the sequence points impose
requirements on when the side effects take place, and limit the
optimizer's options to rearrange operations.

[A request: when replying in this thread, please include the original
statement we're discussion.]

Antoninus Twink · Dec 14, 2009

If it's that borderline and you'd never need it, why does it actually
matter, other than as a sort of C language sudoku.

*ding*

It seems a light has come on.

Chad · Dec 14, 2009

Consider:

int i = 0;
int j = 0;
i ++;
j = i;

I can determine the value of the expression ``i'' on line 4
(it's 1) without completing the evaluation of ``i ++'' on line 3.
Nevertheless, the side effect of the "++" must occur before the
side effect of the "=" on line 4.

How can you determine the value of the expression 'i' on line 4
without completing the evaluation of 'i++' on line 3?

Beej Jorgensen · Dec 14, 2009

How can you determine the value of the expression 'i' on line 4
without completing the evaluation of 'i++' on line 3?

It depends on if by "completing the evaluation of", you mean that side
effects have been resolved, or if they are still pending and merely the
value calculation has been performed.

Because maybe the side effect of storing the result of i++ in i hasn't
been done yet, even though the answer is known. The side effect of
storing the result must merely happen before the next sequence point.

Here's some example fake "assembly" of (i,i++,i) starting with i = 3490
that does this:

; I think this example violates the
; Standard by ignoring a sequence point

i = 3490
i_inc = i + 1
result = i_inc ; we've calculated the result before storing it
i++ ; now we store it

Where I'm differing with Pete is that I think there's a sequence point after
i++, and therefore the side effect must take place by then:

i = 3490
;; == sequence point ==
i_inc = i + 1
i++ ; now we store it because of the seq point
;; == sequence point ==
result = i_inc
;; == sequence point ==

Take special note of that last line there. It could just as well have
been:

result = i

and had it work. This is the bit Pete is saying that I do agree with.

-Beej

Keith Thompson · Dec 14, 2009

Chad said:
How can you determine the value of the expression 'i' on line 4
without completing the evaluation of 'i++' on line 3?

By analyzing the code. We know that the value that will be stored in
j is 1; if we can figure it out, so can the compiler.

Consider this complete program:

include <stdio.h>
int main(void)
{
int i = 0;
int j = 0;
i ++;
j = i;
printf("i = %d, j = %d\n", i, j);
return 0;
}

The compiler is free to replace the assignment ``j = i;'' with the
equivalent of ``j = 1;'', or even to eliminate j altogether and
replace the printf call with ``puts("i = 1, j = 1")''. What it
*can't* do is generate code that produces output other than
i = 1, j = 1

In particular, if it generates code for "j = i;" that actually reads i
and updates j, it can't postpone the side efect of the "++" operator
so it occurs after the assignment.

Beej Jorgensen · Dec 14, 2009

The value of the parenthesized sub-expression is the
value of `i' after incrementation, yes. But where is it
written that the sub-expression's value must be determined
by actually reading it from `i'?

I don't think that's written--I think it can get the calculated value of
i++ from wherever it wants, but I think it is written that the side
effect of i++ (i.e. the modification of i) must occur before the
rightmost i is evaluated in the subexpression i,i++:

[C99 5.1.2.3p2]
# At certain specified points in the execution sequence called sequence
# points, all side effects of previous evaluations shall be complete and
# no side effects of subsequent evaluations shall have taken place.

[C99 6.5.1.7p2]
# The left operand of a comma operator is evaluated as a void
# expression; there is a sequence point after its evaluation.

I see how it's theoretically possible to perform this calculation:

i = (i, i++, i) + 1

without applying the side effect of storing i+1 in i, but I don't see
how it would be *legal* to do so under the Standard.

If an optimizing compiler knew that `i' was 42 before the line in
question, could it not replace the assignment with `i=44', with the
`i++' happening at some undetermined moment?

In a world without sequence points, I'd totally allow i++ to store the
result in i at some undetermined moment, but the sequence point at the
comma forces the side effect to take place at that particular moment.
(Perhaps the side effect hasn't taken place in "machine code reality",
but it must have taken place in "C code reality".)

Of course, if an optimizing compiler can determine that none of the side
effects or calculations will be visible, it is free to optimize the
entire expression away (spelled out in C99 5.1.2.3p3). But I figure
that since we're discussing it, this particular optimization has not
occurred in this case.

-Beej

Johannes Schaub (litb) · Dec 14, 2009

Does the statement given below invoke undefined behavior?
i = (i, i++, i) + 1;

I am almost convinced that it does not because of the following
reasons

1> the RHS must be evaluated before a value can be stored in i
2> evaluation of RHS does not invoke UB due to the sequence points
introduced by comma operator

Hello there! I'm the evil guy that stated it is UB in C. After reading
analysis of all you, i agree that this is not undefined behavior in C1x
(great work @ Beej Jorgensen !)

But i still think it is UB in C99. The additional requirements about value
computations are missing from C99 and so the "Between the previous and next
sequence point an object shall have its stored value modified at most once
by the evaluation of an expression." seems to render behavior UB.

In C99 it doesn't matter whether the assignment to i needs to compute a
value first. Merely the missing sequence point between the assignment and
the "i++" is enough to render behavior UB by the above quote.

Of course, my analysis might be too simple and miss some important points.

Beej Jorgensen · Dec 14, 2009

But i still think it is UB in C99. The additional requirements about value
computations are missing from C99 and so the "Between the previous and next
sequence point an object shall have its stored value modified at most once
by the evaluation of an expression." seems to render behavior UB.

I'm still not sure because I'm not sure where the "next sequence point"
is in C99.

A B C D
| | | |
i = (i, i++, i) + 1

i is modified twice between A and D, which causes UB by your above cite.
But what of sequence point C? Does it not count?

I can't help but feel that C99 is missing a needed dimension in the
model, and 201x is addressing this. (Though it might have been a
necessary addition due to the threading stuff [5.1.2.4 "Multi-threaded
executions and data races"] which heavily relies on sequencing side
effects.)

-Beej

Johannes Schaub (litb) · Dec 14, 2009

Beej said:
I'm still not sure because I'm not sure where the "next sequence point"
is in C99.

A B C D
| | | |
i = (i, i++, i) + 1

i is modified twice between A and D, which causes UB by your above cite.
But what of sequence point C? Does it not count?

I think "previous and next" means the sequence point prior to the point of
execution in the execution sequence, and next to the point of execution. In
the UB scenario, the assignment happens between B and C. If it happens
between C and D, behavior is not UB. Where it happens is unspecified.

I share your worry as i don't know what is meant in 6.5.16/3 by "The side
effect of updating the stored value of the left operand shall
occur between the previous and the next sequence point.". In general, i
think sequence points are not at certain places in code, but in the
execution of it (in describing the comma operator, for instance, C99 says
"The left operand of a comma operator is evaluated as a void expression;
there is a sequence point after its evaluation." - it doesn't say "there is
a sequence point after the first operand.". But then in the description of
the assignment expression, it appears to refer to the "enclosing" full
expression sequence points - weird!

I can't help but feel that C99 is missing a needed dimension in the
model, and 201x is addressing this. (Though it might have been a
necessary addition due to the threading stuff [5.1.2.4 "Multi-threaded
executions and data races"] which heavily relies on sequencing side
effects.)

Same here

Peter Nilsson · Dec 14, 2009

pete said:
No.
I'm saying that the word "evaluation" includes side
effects, and I'm saying that value of an expression can
be determined and used prior to the evaluation of that
expression being completed.

Consider...

int f(int i) { printf("%d\n", i); return i }
int i = 42;
i = f(i) + 1;

An implementation could deduce that i ends up with the value
of one more than it started with. So then, could it do the
assignment _before_ the function call?

One would hope not. Why should the situation be any different
for...

i = (printf("%d\n", i), i) + 1;

....or...

i = (printf("%d\n", i++), i);

....or...

int f(int *i) { printf("%d\n", *i); return ++*i; }
i = f(&i);

Yes, an implementation can make deductions about the
expression, but the middle sequence point(s), and the
concept that the rhs must be evaluated, guarantees there
are no multiple assignments of the same object, and no
access for a purpose other than to compute the new
value, between sequence points for the cases above.

I don't know if C90 and C99 make this clear, but
without it, C would be defective IMO.

Beej Jorgensen · Dec 14, 2009

I think "previous and next" means the sequence point prior to the point of
execution in the execution sequence, and next to the point of execution. In
the UB scenario, the assignment happens between B and C.

Hmmm. Is that possible? I think it being possible hinges on us being
able to use the value of (i,i++,i) before the i++ side effect takes
place (since that value is needed for the assignment into i.)

C99 5.1.2.3p2:
# At certain specified points in the execution sequence called sequence
# points, all side effects of previous evaluations shall be complete and
# no side effects of subsequent evaluations shall have taken place.

So the assignment would have to count as a "previous evaluation" by
sequence point C...? That would imply that the entire + calculation
would also need to be complete by sequence point C...

C99 5.1.2.3p2 is all about "side effects"... it doesn't say anything
about the value computations. (in 201x the same paragraph clarifies
that it applies to both "every value computation and side effect".) Are
the value computations in C99 free to run off into the glorious future
and do all kinds of stuff beyond the sequence point before the sequence
point is hit? So, then, is this conforming pseudo-assembly that
demonstrates the UB in C99:

i = (i, i++, i) + 1

0 ;; == sequence point A ==
1 ;; == sequence point B ==
2 inc_i = i + 1
3 i = inc_i + 1 ; assignment
4 i = inc_i ; postincrement
5 ;; == sequence point C ==
6 ;; == sequence point D ==

Note that lines 3 and 4 can be interchanged, leading to the undefined
behavior.

(In 201x, I think this pseudo-assembly would be non-conforming because
the line 3 assignment would have to occur after sequence point C--thus
leading to a well-defined result.)

-Beej

Kaz Kylheku · Dec 15, 2009

Hmmm. Is that possible?

Not even by any halfway reasonable interpretation of C90.

Flash Gordon · Dec 15, 2009

Kenneth Brody wrote:

Now, had the original example used "i" instead of "1" outside the parens:

i = (i, i++, i) + i;

then this would be a totally different question. I'm not certain if
this constitutes "undefined" or "unspecified". However, given the same
sequence points because of the comma operators, I would have to lean
towards "unspecified".

I would say it is undefined, because there is no sequence point between
the i++ and the +i. It would still be undefined for the same reason if
you had
j = (i, i++, i) + i;

Beej Jorgensen · Dec 15, 2009

Not even by any halfway reasonable interpretation of C90.

I grant that, but what about my 25% reasonable interpretation?

-Beej

Johannes Schaub (litb) · Dec 15, 2009

Richard said:
Why? The left hand expression in brackets is a sequence of
expressions. The i++ MUST be evaluated before the last one or it makes
a mockery of the entire reason for sequence points IMO.

You are missing that "i" is an "ihh" not a "one". So the previous value in
"i" is accessed *not* only to determine the new value to be stored, but
independently of the store action. This is what makes it undefined behavior.

Johannes Schaub (litb) · Dec 15, 2009

Kenneth said:
You don't see any sequence point between "i++" and "+i" here?

i = (i, i++, i) + i;
^ ^
here and here

Those are not sequence points between "i++" and "+i". Those are sequence
points between "i", "i++" and "i". Notice that when we talk about "sequence
point between A and B" we are not talking about tokens, but about
evaluations - in the above case, evaluations of i++ and +i.

Kaz Kylheku · Dec 15, 2009

Why? The left hand expression in brackets is a sequence of
expressions. The i++ MUST be evaluated before the last one or it makes
a mockery of the entire reason for sequence points IMO.

The + i makes it undefined. The operands of + can be evaluated in any
orer, nicluding interleaved and parallel. So this use violates the
second rule. The left operand of + makes modifications to i, which
race against the access of i on the opposite side. We don't know whether
this right hand side i is accessed before, during or after the i++
expression embedded in the left side.

Beej Jorgensen · Dec 15, 2009

|________| |
| |
A B

Why? The left hand expression in brackets is a sequence of
expressions. The i++ MUST be evaluated before the last one or it makes
a mockery of the entire reason for sequence points IMO.

But which subexpression will be evaluated first, A or B? C99 6.5p3 says
it's unspecified for operands of +.

-Beej

sequence points and printf()	17	Dec 1, 2011
sequence points and expression evaluation	10	Jul 2, 2010
Creating sequence points using macros	2	Mar 28, 2007
sequence points	5	Aug 17, 2008
sequence points	10	Oct 14, 2007
The comma operator, and assigning twice between sequence points	9	Feb 8, 2008
sequence points and evaluation order	9	Sep 4, 2006
function composition, sequence point, and unsuspected side effects	63	Nov 15, 2013

sequence points in subexpressions

Kaz Kylheku

Kaz Kylheku

Keith Thompson

Antoninus Twink

Chad

Beej Jorgensen

Keith Thompson

Beej Jorgensen

Johannes Schaub (litb)

Beej Jorgensen

Johannes Schaub (litb)

Peter Nilsson

Beej Jorgensen

Kaz Kylheku

Flash Gordon

Beej Jorgensen

Johannes Schaub (litb)

Johannes Schaub (litb)

Kaz Kylheku

Beej Jorgensen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads