The comma operator, and assigning twice between sequence points

A

ais523

I've been wondering more about Undefined Behaviour, and the way in
which (i=i++)-like examples can be 'corrected' so they mean something
defined. This was particularly inspired by a line in some computer-
generated code, whose essence was as follows:

void func()
{
int a, b, *p;
a=b=0;
p=&b;
*p=1+((p=&a),2);
}

Here, the variable actually being assigned to depends on the RHS of
the assignment; but the comma introduces a sequence point, so I think
this is a defined unambiguous assignment to a. (I'm not sure, though:
this is why I'm asking c.l.c.) Some more examples along similar lines
for statements for which I'm not clear about defined/undefined/
unspecified:

c=(a++,b)+(b++,a);

The question here is whether the implementation is forced to evaluate
the two parenthesised groups sequentially due to the sequence points
in them. I think that this line might be UB, because of the
possibility of incrementing both variables first and then adding the
new values of a and b.

a=(a++,a);

My guess about this one is that it isn't UB because the comma forces
the increment to happen before the assignment, leaving the line
equivalent to ++a;.

So my question is: which of these examples are UB, and why?
 
F

fred.l.kleinschmidt

I've been wondering more about Undefined Behaviour, and the way in
which (i=i++)-like examples can be 'corrected' so they mean something
defined. This was particularly inspired by a line in some computer-
generated code, whose essence was as follows:

void func()
{
int a, b, *p;
a=b=0;
p=&b;
*p=1+((p=&a),2);

}

Here, the variable actually being assigned to depends on the RHS of
the assignment; but the comma introduces a sequence point, so I think
this is a defined unambiguous assignment to a. (I'm not sure, though:
this is why I'm asking c.l.c.) Some more examples along similar lines
for statements for which I'm not clear about defined/undefined/
unspecified:

c=(a++,b)+(b++,a);

The question here is whether the implementation is forced to evaluate
the two parenthesised groups sequentially due to the sequence points
in them. I think that this line might be UB, because of the
possibility of incrementing both variables first and then adding the
new values of a and b.

a=(a++,a);

My guess about this one is that it isn't UB because the comma forces
the increment to happen before the assignment, leaving the line
equivalent to ++a;.

So my question is: which of these examples are UB, and why?

First of all, I would fire anyone who actually
wrote such a line of code.

Sequence points and the order of execution
are not the same thing.

a=5;
b=6;

For the above, there are definitely sequence points
for each statement. However, after optimiztion the compiler
may set b=6 before it sets a=5 - as long as it will not
affect the outcome.
 
E

Eric Sosman

ais523 said:
I've been wondering more about Undefined Behaviour, and the way in
which (i=i++)-like examples can be 'corrected' so they mean something
defined. This was particularly inspired by a line in some computer-
generated code, whose essence was as follows:

void func()
{
int a, b, *p;
a=b=0;
p=&b;
*p=1+((p=&a),2);
}

Here, the variable actually being assigned to depends on the RHS of
the assignment; but the comma introduces a sequence point, so I think
this is a defined unambiguous assignment to a. (I'm not sure, though:
this is why I'm asking c.l.c.)

Undefined. There's no sequence point between the
assignment to p in the RHS and the use of p's value in
the LHS. The compiler is not obliged to delay reading
p on the LHS until after the RHS is evaluated.
Some more examples along similar lines
for statements for which I'm not clear about defined/undefined/
unspecified:

c=(a++,b)+(b++,a);

The question here is whether the implementation is forced to evaluate
the two parenthesised groups sequentially due to the sequence points
in them. I think that this line might be UB, because of the
possibility of incrementing both variables first and then adding the
new values of a and b.

Undefined. There are sequence points at the comma
operators, but no sequence point associated with the `+'.
Hence, there is no sequence point between `a++' and `a',
nor between `b' and `b++'.
a=(a++,a);

My guess about this one is that it isn't UB because the comma forces
the increment to happen before the assignment, leaving the line
equivalent to ++a;.

I think this one is all right -- but it's so nauseating
I'd be delighted to be wrong ...
 
K

Kenneth Brody

Eric said:
I've been wondering more about Undefined Behaviour, and the way in
which (i=i++)-like examples can be 'corrected' so they mean something
defined. This was particularly inspired by a line in some computer-
generated code, whose essence was as follows: [...]
c=(a++,b)+(b++,a);

The question here is whether the implementation is forced to evaluate
the two parenthesised groups sequentially due to the sequence points
in them. I think that this line might be UB, because of the
possibility of incrementing both variables first and then adding the
new values of a and b.

Undefined. There are sequence points at the comma
operators, but no sequence point associated with the `+'.
Hence, there is no sequence point between `a++' and `a',
nor between `b' and `b++'.

Actually, there are sequence points between them, AFAICS. But,
the order of execution is still unspecified. There is no
guarantee that in "a = foo() + bar()" that foo() would be called
before bar(). I think this one falls under "unspecified" rather
than "undefined".

Of course, I could be wrong. :)

On further reflection, the unspecified order makes it look to
me like there is either a sequence point between "a++" and "a",
but not between "b" and "b++", _or_ there is a sequence point
between "b++" and "b", but not "a" and "a++". (Eww... Is that
even possible?)

In other words:

a++
sequence point
b
b++
sequence point
a
+

or

b++
sequence point
a
a++
sequence point
b
+


Given:

(w,x) + (y,z)

we are guaranteed that "w" will be evaluated before "x", and
that "y" will be evaluated before "z". But, are we guaranteed
that "w" and "x" will be evaluated separately from "y" and "z"?

In other words, can the evaluation order be w, y, x, z, with a
sequence point between "y" and "x"? I don't see why not, given
that the result is "as if" they were done in w/x/y/z order, and
the sequence points are respected.
I think this one is all right -- but it's so nauseating
I'd be delighted to be wrong ...

I think the same holds true for all of the OP's eaxmples. :)

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
E

Eric Sosman

Kenneth said:
Eric said:
ais523 said:
I've been wondering more about Undefined Behaviour, and the way in
which (i=i++)-like examples can be 'corrected' so they mean something
defined. This was particularly inspired by a line in some computer-
generated code, whose essence was as follows: [...]
c=(a++,b)+(b++,a);

The question here is whether the implementation is forced to evaluate
the two parenthesised groups sequentially due to the sequence points
in them. I think that this line might be UB, because of the
possibility of incrementing both variables first and then adding the
new values of a and b.
Undefined. There are sequence points at the comma
operators, but no sequence point associated with the `+'.
Hence, there is no sequence point between `a++' and `a',
nor between `b' and `b++'.

Actually, there are sequence points between them, AFAICS. But,
the order of execution is still unspecified. There is no
guarantee that in "a = foo() + bar()" that foo() would be called
before bar(). I think this one falls under "unspecified" rather
than "undefined".

Of course, I could be wrong. :)

On further reflection, the unspecified order makes it look to
me like there is either a sequence point between "a++" and "a",
but not between "b" and "b++", _or_ there is a sequence point
between "b++" and "b", but not "a" and "a++". (Eww... Is that
even possible?)

In other words:

a++
sequence point
b
b++
sequence point
a
+

or

b++
sequence point
a
a++
sequence point
b
+

I think the sequence points impose only a partial
ordering. Any arrangement of the sub-expressions `a++',
`b', `b++', `a' is allowed, provided `a++' precedes `b'
and `b++' precedes `a':

a++ b++ a b
a++ b++ b a
a++ b b++ a
b++ a a++ b
b++ a++ a b
b++ a++ b a

.... and there may be further possibilities involving
overlapped evaluation. The situation seems similar to
that of

f(g(x=1), h(x=2))

.... where there are sequence points a-plenty, but none
that separate the two assignments to `x'.
 
K

Kaz Kylheku

I've been wondering more about Undefined Behaviour, and the way in
which (i=i++)-like examples can be 'corrected' so they mean something
defined. This was particularly inspired by a line in some computer-
generated code, whose essence was as follows:

void func()
{
int a, b, *p;
a=b=0;
p=&b;
*p=1+((p=&a),2);

This is well-defined behavior because of the sequencing. The
assignment to *p cannot take place until the right hand side is
evaluated, and that evaluation is divided into two phases: before the
comma and after.

Without that comma, it would be undefined, because then p, in the same
expression where it is being modified, would be accessed for a purpose
other than determining the new value to be stored back into p.
Here, the variable actually being assigned to depends on the RHS of
the assignment;

Right. The answer to the question /which/ variable is assigned to
depends on the modification of p in the right hand side.

but the comma introduces a sequence point, so I think
this is a defined unambiguous assignment to a. (I'm not sure, though:
this is why I'm asking c.l.c.) Some more examples along similar lines
for statements for which I'm not clear about defined/undefined/
unspecified:

c=(a++,b)+(b++,a);

The problem here is that although within the two constituent clauses
of the + operator, there is sequencing going on, the two clauses
themselves are not sequenced relative to each other.

That is to say, given (x,y)+(z,w) there is a sequence point between
x and y, and between z and w, so these pairs are ordered. But it
cannot be deduced that there is a sequence point between x and z,
between x and w, between y and z and between y and w. You know nothing
about their relative ordering.
The question here is whether the implementation is forced to evaluate
the two parenthesised groups sequentially due to the sequence points
in them.

Nope. The two subexpressions of the + could be sent to different
processor pipelines to be done concurrently.

Even if the two subexpressions are sequenced with respect to each
other, and fully evaluated, you don't know in which order: left then
right, or right then left? No evaluation order is specified for the +
operator.
a=(a++,a);
My guess about this one is that it isn't UB because the comma forces
the increment to happen before the assignment, leaving the line
equivalent to ++a;

That's right.
 
K

Kaz Kylheku

Sequence points and the order of execution
are not the same thing.

a=5;
b=6;

For the above, there are definitely sequence points
for each statement. However, after optimiztion the compiler
may set b=6 before it sets a=5 - as long as it will not
affect the outcome.

I think what you're trying to rather say is that actual semantics
(where optimization takes place) is not the same as abstract
semantics.
 
E

Eric Sosman

Kaz said:
This is well-defined behavior because of the sequencing. The
assignment to *p cannot take place until the right hand side is
evaluated, and that evaluation is divided into two phases: before the
comma and after.

Nothing can be stored at *p until after the RHS is
evaluated, but cannot the LHS' p be evaluated earlier?

load R0,p ; get LHS' p
load R1,1
load R2,&a
store R2,p ; p = &a on RHS
load R2,2
add R1,R2
store R1,*R0 ; *p (stale) = RHS
 
C

christian.bau

This is well-defined behavior because of the sequencing. The
assignment to *p cannot take place until the right hand side is
evaluated, and that evaluation is divided into two phases: before the
comma and after.

The problem is not the assignment to *p, the problem is the assignment
to p. p is used on the left side to get the address for the store to
*p, and on the right side it is changed to &a, without intervening
sequence point. So this is undefined behaviour.
 
K

Kaz Kylheku

     Nothing can be stored at *p until after the RHS is
evaluated, but cannot the LHS' p be evaluated earlier?

Ah shit, you're right of course. No, this is undefined. The value p
can of course be used to calculate the lvalue at any time.

Given

L = (A, B)

the timing of the calculation of lvalue L is not sequenced with
respect to A or B. Only the storage into that L value (which cannot
take place until B is evaluated).

We can illustrate it also like this:

*(l()) = (f(), g())

The function f must be called before g. But the call to l can be
interleaved arbitrarily, so any of these three orders are possible:

l(); f(); g();
f(); l(); g();
f(); g(); l();

In all three cases, the store cannot take place until g and l are
called since it depends on both fo them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top