sequence points

K

Kai-Uwe Bux

Please consider

#include <iostream>

int main ( void ) {
int a = 0;
a = ( (a = 5), 4 ); // (*)
std::cout << a << '\n';
}

I would like to determine whether line(*) has undefined behavior (modifying
a variable twice between sequence points).

What I think is this: We have an assignment expression

lhs = rhs

where the right hand side is this:

(a = 5), 4

I note that the comma will introduce a sequence point that separates the
side-effects of a=5 from the side-effects of 4. The main question therefore
is whether the right hand side

(a = 5), 4

can have a value before the side effects of its evaluation take place. My
understanding of the standard is that sequence points only separate side
effect but do not tell us anything about how, where and when values of
expressions are established. However, if that understanding is correct, the
abstract machine would be allowed to predict that the right hand side will
have value 4 and perform the side effect of the ambient assignment

lhs = rhs

before the side effects of the rhs take place. Along this possible path of
execution, the value of a would be modified twice. Thus, as of now, I
believe that line (*) has undefined behavior.


However, I am not sure about this interpretation of the standard; and I
would appreciate your help. (In fact, I hope that I am wrong.)


Note that similar considerations would apply to:

a = f();

Is it guaranteed that the side-effects of evaluating f() take place before
the value of a is changed? or would a conforming implementation allowed to
put all side-effects of f() on hold until it knows the value, perform the
assignment, and then run the side-effects of f()?



Thanks

Kai-Uwe Bux
 
G

Guest

Please consider
Note that similar considerations would apply to:

a = f();

Is it guaranteed that the side-effects of evaluating f() take place before
the value of a is changed? or would a conforming implementation allowed to
put all side-effects of f() on hold until it knows the value, perform the
assignment, and then run the side-effects of f()?


The standard says that "[o]nce the execution of a function begins, no
expressions from the calling function are evaluated until execution of
the called function has completed." With a footnote adding that "n
other words, function executions do not interleave with each other." I
interpret that as though all side-effects of a function must be complete
before "returning".
 
B

Ben Bacarisse

Kai-Uwe Bux said:
Please consider

#include <iostream>

int main ( void ) {
int a = 0;
a = ( (a = 5), 4 ); // (*)
std::cout << a << '\n';
}

I would like to determine whether line(*) has undefined behavior (modifying
a variable twice between sequence points).

What I think is this: We have an assignment expression

lhs = rhs

where the right hand side is this:

(a = 5), 4

I note that the comma will introduce a sequence point that separates the
side-effects of a=5 from the side-effects of 4. The main question therefore
is whether the right hand side

(a = 5), 4

can have a value before the side effects of its evaluation take place. My
understanding of the standard is that sequence points only separate side
effect but do not tell us anything about how, where and when values of
expressions are established. However, if that understanding is correct, the
abstract machine would be allowed to predict that the right hand side will
have value 4 and perform the side effect of the ambient assignment

lhs = rhs

before the side effects of the rhs take place. Along this possible path of
execution, the value of a would be modified twice. Thus, as of now, I
believe that line (*) has undefined behavior.

I take it the clause that is bothering you is this one following the
well-known wording about scalar values being modified only once
between sequence points:

The requirements of this paragraph shall be met for each allowable
ordering of the subexpressions of a full expression; otherwise the
behavior is undefined.

This is missing in the C99 standard and, in C, your example is
unequivocally well-defined. However, I can't see the behaviour you
propose as being allowed under the wording "each allowable ordering"
so, in short, I think the above is as sound in C++ as it is in C.
 
L

Lance Diduck

Please consider

#include <iostream>

int main ( void ) {
int a = 0;
a = ( (a = 5), 4 ); // (*)
std::cout << a << '\n';
}

I would like to determine whether line(*) has undefined behavior (modifying
a variable twice between sequence points).

What I think is this: We have an assignment expression

lhs = rhs

where the right hand side is this:

(a = 5), 4

I note that the comma will introduce a sequence point that separates the
side-effects of a=5 from the side-effects of 4. The main question therefore
is whether the right hand side

(a = 5), 4

can have a value before the side effects of its evaluation take place. My
understanding of the standard is that sequence points only separate side
effect but do not tell us anything about how, where and when values of
expressions are established. However, if that understanding is correct, the
abstract machine would be allowed to predict that the right hand side will
have value 4 and perform the side effect of the ambient assignment

lhs = rhs

before the side effects of the rhs take place. Along this possible path of
execution, the value of a would be modified twice. Thus, as of now, I
believe that line (*) has undefined behavior.

However, I am not sure about this interpretation of the standard; and I
would appreciate your help. (In fact, I hope that I am wrong.)

Note that similar considerations would apply to:

a = f();

Is it guaranteed that the side-effects of evaluating f() take place before
the value of a is changed? or would a conforming implementation allowed to
put all side-effects of f() on hold until it knows the value, perform the
assignment, and then run the side-effects of f()?

Thanks

Kai-Uwe Bux
I would believe that an good optimising compiler would recognize that
int a=0;
a=((a=5),4);
cout<<a;
would reduce to
cout<<4;
since assigning (non volatile)int has no side effects. Whether this is
correct by the standard (the C standard at that) there is probably no
consensus. This is like the "is the Return Value Optimization Correct"
debate? RVO is usually correct,but not always, esp if the copy
constructor has a side effect. But in practice, people want RVO
anyway.

I think you could clarify the question by only considering volatile
ints. Then mistakes in the assigment sequence
0,5,4
may have observalbe consequences. Otherwise, the state of 'a' does
not have any consequence outside uts own context --i.e there are no
"side effects".
Other things to consider:
if a were not of type int, then
1) T a=0; //may have side effect in T constructor
2) a=5 assignment operator could be overloaded to return something
other that the value of a
which leads to
3) the type of the object returned by the expr 'a=5' could itself
overload operator, producitng a side effect
and so forth.

I think the question of correctness is very hairy, and given that few
developers actually overload , and = to do things sematically
different that the built in types, even finding a compiler that
actually was tested for these occurances and actually gets it right
(given they actaully agree on the interpretation of the standard)
makes for a very difficult question.
 
J

James Kanze

I would believe that an good optimising compiler would recognize that
int a=0;
a=((a=5),4);
cout<<a;
would reduce to
cout<<4;
since assigning (non volatile)int has no side effects. Whether
this is correct by the standard (the C standard at that) there
is probably no consensus.

That's definitly allowed, under the "as if" rule, even if the
expression in question has no undefined behavior, but that's a
separate issue.

The issue with sequence points is more complex. As Kai-Uwe
correctly understands, they only introduce a partial ordering.
I *think* that in this case, the two assignments are ordered,
because the outer assignment requires the results of the rhs,
and there is a sequence point in the expression there. But I'd
ask the question in comp.std.c/comp.std.c++, just to be sure.

Note that in C++, the current draft replaces sequence points
with another concept, in order to be able to define ordering
when threads are involved, so the answer may change in the next
version of the standard.
This is like the "is the Return Value Optimization Correct"
debate? RVO is usually correct,but not always, esp if the copy
constructor has a side effect. But in practice, people want RVO
anyway.

RVO is always correct, even if the copy constructor has a side
effect, because the standard says it is.
 
J

James Kanze

I would believe that an good optimising compiler would recognize that
int a=0;
a=((a=5),4);
cout<<a;
would reduce to
cout<<4;
since assigning (non volatile)int has no side effects. Whether
this is correct by the standard (the C standard at that) there
is probably no consensus.

That's definitly allowed, under the "as if" rule, even if the
expression in question has no undefined behavior, but that's a
separate issue.

The issue with sequence points is more complex. As Kai-Uwe
correctly understands, they only introduce a partial ordering.
I *think* that in this case, the two assignments are ordered,
because the outer assignment requires the results of the rhs,
and there is a sequence point in the expression there. But I'd
ask the question in comp.std.c/comp.std.c++, just to be sure.

Note that in C++, the current draft replaces sequence points
with another concept, in order to be able to define ordering
when threads are involved, so the answer may change in the next
version of the standard.
This is like the "is the Return Value Optimization Correct"
debate? RVO is usually correct,but not always, esp if the copy
constructor has a side effect. But in practice, people want RVO
anyway.

RVO is always correct, even if the copy constructor has a side
effect, because the standard says it is.
 
P

Pete Becker

Note that in C++, the current draft replaces sequence points
with another concept, in order to be able to define ordering
when threads are involved, so the answer may change in the next
version of the standard.

However, the intention was that the new formulation would have the same
semantics as the previous one. The reason for changing the wording was
to provide a better base for the changes needed for multi-threading.
 
J

James Kanze

On 2007-10-15 03:38:49 -0400, James Kanze <[email protected]> said:
However, the intention was that the new formulation would have
the same semantics as the previous one.

Presumably, that only applies to the cases where we know what
the current formulation really means:).
The reason for changing the wording was to provide a better
base for the changes needed for multi-threading.

That's certainly the motivation, but that shouldn't prevent us
from formulating something more exact and more understandable as
well as well.
 
K

Kai-Uwe Bux

James said:
That's definitly allowed, under the "as if" rule, even if the
expression in question has no undefined behavior, but that's a
separate issue.

The issue with sequence points is more complex. As Kai-Uwe
correctly understands, they only introduce a partial ordering.
I *think* that in this case, the two assignments are ordered,
because the outer assignment requires the results of the rhs,
and there is a sequence point in the expression there.
[snip]

That actually is the core of the matter.

Let me slightly modify the example so that all subexpressions have
side-effects:

int a = 0;
int b = 0;
a = ( ( a = 5 ), ( b = 4 ) );

The comma operator introduces a sequence point that clearly separates

a=5 from b=4

The question at hand is whether this sequence point also separates

a = rhs from a=5


On a more fundamental level, it is the very computational model of C++ about
which I am not sure. It seems to me that there is an 'obvious'
understanding: an expression specifies a computation, the computation has
side-effect and yields a value. The value is not available to take part in
other side-effect before it has been computed.

However, there is also a 'counter-intuitive' reading of the standard: an
expression specifies a computation. This computation establishes a value
and can cause side-effect. However, the time at which side-effect takes
place is independent of the actual establishment of the value. In
particular, a conforming implementation could proceed as follows: For each
expression, create (bottom-up starting with innermost subexpressions) a
pair: (value, instruction_sequence) where the value is the value and the
instruction_sequence can be executed to make all side-effects of the
evaluation happen. Then, for each operator, we have a rule like these

given lhs + rhs:
find the pairs (lhs_value, lhs_sequence) and (rhs_value, rhs_sequence)
the pair for lhs+rhs is:
(lhs_value + rhs_value, shuffle( lhs_sequence, rhs_sequence) )

given lhs = rhs
find ( lhs_value, lhs_sesquence ) and ( rhs_value, rhs_sequence )
the pair for lhs = rhs is:
( rhs_value, shuffle( lhs_sequence, rhs_sequence, {assignment} ) )

given lhs, rhs
the pair for lhs, rhs is:
( rhs_value, concat( lhs_sequence, rhs_sequence ) )
^^^^^^

Note how the last rule will ensure that all side-effects of the left-hand
side in a comma-expression take place before any side-effect of the
right-hand side.

If such an implementation was conforming, the comma sequence point would not
separate the outer assignment from the inner.

Note that in C++, the current draft replaces sequence points
with another concept, in order to be able to define ordering
when threads are involved, so the answer may change in the next
version of the standard.

Good point. It appears that the next standard is much clearer with regard to
the example above.

The draft that I have uses "sequenced before" and "sequenced after" to
define a partial order on computations and side-effects. It says about
assignment

In all cases, the assignment is sequenced after the value computation of
the right and left operands, and before the value computation of the
assignment expression.

This seems to say that

int volatile a = 0;
int volatile b = 4;
a = ( b = 3 );

sequences the assignment to b before the value of b=3 is computed, which in
turn is sequenced before the assignment to a.

It also says that in

a = ( ( a = 5 ), ( b = 4 ) )

the outer assignment is sequenced after the value computation of the rhs. It
does not say that the outer assignment is sequenced after the side-effects
of the right hand side.

However(!), the draft says about the comma operator:

Every value computation and side effect associated with the left
expression is sequenced before every value computation and side effect
associated with the right expression.

If we take it that the value-computation of the rhs in a comma-expression
is involved in the the value-computation of the comma-expression, then this
says that at least the side-effects of a = 5 are sequenced before the
value-computation of b = 4, which in turn is sequenced before the outer
assignment.


The main difference between the draft and the standard with regard to this
example seems to be that, in the draft, sequencing restrictions are given
for value-computations and side-effects, whereas in the current standard,
sequencing restrictions are only given for side-effects.


[snip]


Best

Kai-Uwe Bux
 
G

Greg Herlihy

Let me slightly modify the example so that all subexpressions have
side-effects:

int a = 0;
int b = 0;
a = ( ( a = 5 ), ( b = 4 ) );

The comma operator introduces a sequence point that clearly separates

a=5 from b=4

The question at hand is whether this sequence point also separates

a = rhs from a=5

The answer at hand is of course, yes - by transitivity: "b=4" is
evaluated after "a=5" due to the comma operator between them, whereas
"b=4" is evaluated before "a=rhs" because the evaluated value of "b=4"
is the "rhs" assigned to "a" to complete the expression. Therefore the
"a=rhs" assignment must occur after the "a=5" assignment - because we
know that the evaluation of "b=4" must take place after the latter and
before the former.

In C++, operands and operators guide the evaluation of an expression -
but do not mandate a single interpretation (like Java does). So
"sequence points" were invented as a way to describe just how much
leeway a C++ compiler does have when evaluating an expression - and by
the same token - to let a C++ programmer recognize when an expression
has granted too much latitude to the compiler and could therefore lead
to an unexpected result.

In this example, the semantics of the expression are as follows:

assign 5 to a as the left operand of the comma expression
assign b to 4 as the right operand of a comma expression
evaluate b=4 as 4
evaluate the entire comma expression to the right hand operand: 4
assign 4 to a

Next we can determine how much latitude the compiler has in following
this order. We know that assigning 5 to "a" must be done first (and
"a" must have the value 5) before proceeding to the next step
(courtesy of comma operator)

We know that right-hand side must completely evaluate to "4" before
"4" can be assigned to "a". We cannot be sure that "b" will have the
value "4" before "b=4" evaluates to 4 - but since there is nothing
that depends on whether a or b gets their assigned value first - the
uncertain order of the side effects on the right hand side of the
expression - makes no difference.

In short, the comma operator cleanly divides the evaluation and side
effects of "a=5" from the assignments and evaluations that follow it.
And there is no amount of lawyering that will get around that fact.
On a more fundamental level, it is the very computational model of C++ about
which I am not sure. It seems to me that there is an 'obvious'
understanding: an expression specifies a computation, the computation has
side-effect and yields a value. The value is not available to take part in
other side-effect before it has been computed.

However, there is also a 'counter-intuitive' reading of the standard: an
expression specifies a computation. This computation establishes a value
and can cause side-effect. However, the time at which side-effect takes
place is independent of the actual establishment of the value. In
particular, a conforming implementation could proceed as follows: For each
expression, create (bottom-up starting with innermost subexpressions) a
pair: (value, instruction_sequence) where the value is the value and the
instruction_sequence can be executed to make all side-effects of the
evaluation happen. Then, for each operator, we have a rule like these

given lhs + rhs:
find the pairs (lhs_value, lhs_sequence) and (rhs_value, rhs_sequence)
the pair for lhs+rhs is:
(lhs_value + rhs_value, shuffle( lhs_sequence, rhs_sequence) )

given lhs = rhs
find ( lhs_value, lhs_sesquence ) and ( rhs_value, rhs_sequence )
the pair for lhs = rhs is:
( rhs_value, shuffle( lhs_sequence, rhs_sequence, {assignment} ) )

given lhs, rhs
the pair for lhs, rhs is:
( rhs_value, concat( lhs_sequence, rhs_sequence ) )
^^^^^^

Note how the last rule will ensure that all side-effects of the left-hand
side in a comma-expression take place before any side-effect of the
right-hand side.

If such an implementation was conforming, the comma sequence point would not
separate the outer assignment from the inner.

But since a comma operator makes precisely that guarantee, this
implementation as described - would not be conforming.

Greg
 
K

Kai-Uwe Bux

Greg said:
The answer at hand is of course, yes - by transitivity: "b=4" is
evaluated after "a=5" due to the comma operator between them, whereas
"b=4" is evaluated before "a=rhs" because the evaluated value of "b=4"
is the "rhs" assigned to "a" to complete the expression. Therefore the
"a=rhs" assignment must occur after the "a=5" assignment - because we
know that the evaluation of "b=4" must take place after the latter and
before the former.

Thanks. I just looked that up, and indeed the comma operator makes a
guarantee that goes farther than I was aware of. It does not only ensure
that the side effects of b = 4 come after the side effects of a = 5; it
even ensures that the evaluation of b = 4 does not commence before the side
effects of a = 5 have taken place.

In C++, operands and operators guide the evaluation of an expression -
but do not mandate a single interpretation (like Java does). So
"sequence points" were invented as a way to describe just how much
leeway a C++ compiler does have when evaluating an expression - and by
the same token - to let a C++ programmer recognize when an expression
has granted too much latitude to the compiler and could therefore lead
to an unexpected result.

In this example, the semantics of the expression are as follows:

assign 5 to a as the left operand of the comma expression
assign b to 4 as the right operand of a comma expression
evaluate b=4 as 4
evaluate the entire comma expression to the right hand operand: 4
assign 4 to a

Next we can determine how much latitude the compiler has in following
this order. We know that assigning 5 to "a" must be done first (and
"a" must have the value 5) before proceeding to the next step
(courtesy of comma operator)

We know that right-hand side must completely evaluate to "4" before
"4" can be assigned to "a". We cannot be sure that "b" will have the
value "4" before "b=4" evaluates to 4 - but since there is nothing
that depends on whether a or b gets their assigned value first - the
uncertain order of the side effects on the right hand side of the
expression - makes no difference.

In short, the comma operator cleanly divides the evaluation and side
effects of "a=5" from the assignments and evaluations that follow it.
And there is no amount of lawyering that will get around that fact.

Right. I was confused because I thaught the comma operator just inserts a
sequence point that separates side effects.

[snip]


Best

Kai-Uwe BUx
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top