sequence points and expression evaluation

cmelliso · Jul 2, 2010

I'm working on a formal definition for C and I'd like to clarify some
information about expression evaluation. Please accept my apologies
if these questions have been asked before; I looked around but
couldn't find these particular answers.

First: why the change to the description of expression evaluation in
Sec 6.5:2 from ISO/IEC 9899:1999 to 9899:201x? It used to say:
"Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be read only to determine the value
to be stored."

Now it says:
"If a side effect on a scalar object is unsequenced relative to either
a different side effect on the same scalar object or a value
computation using the value of the same scalar object, the behavior is
undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an
unsequenced side effect occurs in any of the orderings."

What is the difference? It seems like the older definition was more
clear. Furthermore, it seems like they both imply the following
simple statement:
"Between the previous and next sequence point you can't write to a
location twice or read from a location after writing to it. For a
given expression, if there is any evaluation order where such a thing
could happen, the expression is undefined."

Also, the introduction of "indeterminate sequencing" for functions
just confuses things in my mind, as it seems like nothing changed.
There were already sequence points at function calls and returns, so
it seems clear that related sub-expressions at the call-site would
have been sequenced before or after the call. What am I missing?

Finally, if my simplification is correct, I don't see why one needs to
consider delayed side effects, at least in establishing
undefinedness. If there is a problem in an expression involving a
side effect, the path on which you detect that problem would be the
one where the side effect happens as early as possible. If there is
no problem, then there is no way to detect that there was delayed side-
effecting anyway. It seems to me the only way you can "detect"
delayed side-effects is by writing an expression that is undefined
anyway.

As an example,
((x=y) + ...) + ...

The y itself must be evaluated before you can determine what the value
of the assignment expression is. Thus, there's no way to have the
assignment take place BEFORE that evaluation. Further, if you delay
changing the object x represents in memory, the only way you could
detect that in an expression would be by reading or writing x, which
would cause the entire expression to be undefined. Again, what am I
missing?

Is this an oversimplification? I realize there are reams of documents
about sequence points and evaluation order (including JTC1/SC22/WG14
N925 and N926), but I'd like have a simple interpretation as long as
it's still correct.

I would prefer an example/counterexample if you think my
simplification misses something. That is probably the best way to
make me realize I'm dumb

Any information is welcome,
-Chucky

Eric Sosman · Jul 2, 2010

I'm working on a formal definition for C and I'd like to clarify some
information about expression evaluation. Please accept my apologies
if these questions have been asked before; I looked around but
couldn't find these particular answers.
[... questions about wording changes in C1x drafts ...]

Questions about the whys and wherefores of changes to the
Standard are probably better in comp.std.c than here.

As to the specifics you ask about, I have a hunch many of
them are connected to the work of extending C to multi-threaded
environments. It's not enough just to publish a list of library
functions that start and stop threads, synchronize between them,
and so on; it's also necessary to describe some kind of "memory
model" that specifies (for example) the conditions under which a
change made by one thread will necessarily be visible to others.
This is new territory for C, which up through C99 has had almost
no notion of parallel activity.

(Lest anyone think that a memory model is easy to concoct,
ponder the fact that Java -- a multi-threaded language right from
its origins -- didn't acquire a memory model until several years
after the language had already been in use. Given C's greater
tolerance for platform-to-platform variation, I'd expect inventing
a *portable* memory model for C to be substantially more difficult
than it was for Java.)

Paul N · Jul 2, 2010

It used to say:
"Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be read only to determine the value
to be stored."

I can't really help here, but if it's any consolation, I haven't so
far been able to make any sense of the second sentence, despite some
kind help from people here. As I read it, it forbids:

a = b;

because this reads b and uses the result for something other then
working out what to put in b afterwards. Clearly this is not what the
author meant; but I can't tell what they did mean or how the words can
be read in that way...

But (and I think I'm on stronger ground here) the references to
"previous" and "next" seem a bit presumptious. For example, in:

x = (a(), b()) + (c(), d());

there are (amongst others) two sequence points - one between a and b
and one between c and d. The function a is required to be called
before b, and the function c before d. But they could be called in the
order a, b, c, d - in which case the left-hand sequence point comes
first - or c, d, a, b - in which case the right-hand sequence point
comes first. Or a, c, d, b in which either sequence point could be
considered first.

Paul.

Ben Bacarisse · Jul 2, 2010

Paul N said:
I can't really help here, but if it's any consolation, I haven't so
far been able to make any sense of the second sentence, despite some
kind help from people here. As I read it, it forbids:

a = b;

because this reads b and uses the result for something other then
working out what to put in b afterwards. Clearly this is not what the
author meant; but I can't tell what they did mean or how the words can
be read in that way...

The phrase "prior value" can't apply to b since b is not being changed.
The key word is "Furthermore": it means that the sentence refers to the
object whose stored value is being modified. It adds a further
restriction on the use of _that_ value but says nothing about the use of
any others that are not having their stored value changed.

But (and I think I'm on stronger ground here) the references to
"previous" and "next" seem a bit presumptious. For example, in:

x = (a(), b()) + (c(), d());

there are (amongst others) two sequence points - one between a and b
and one between c and d. The function a is required to be called
before b, and the function c before d. But they could be called in the
order a, b, c, d - in which case the left-hand sequence point comes
first - or c, d, a, b - in which case the right-hand sequence point
comes first. Or a, c, d, b in which either sequence point could be
considered first.

There are other possible orders of course. The comma restricts the set
of orderings to ones in which b() comes after a() and comes d() after
c().

I think your example would be simpler without the function calls since
function calls also add sequence points! In

(e1, e2) + (e3, e4)

there is indeed no way to tell which sequence point "happens" before the
other. Most people interpret the current wording to mean that if there
is /any/ otherwise valid order for executing the expression that
violates this rule, then the expression is undefined. For example:

(i=1, j=2) + (j=3, i=4)

can be evaluated in an order that avoid UB (i=1 -- j=3 -- SP1 -- j=2 --
SP2 -- i=4 -- ADD) but it can also be evaluated in an order that does
violate the current wording: i=1 -- SP1 -- j=2 -- j=3 -- SP2 -- i=4 --
ADD. Since the standard does not specify any order for the
sub-expressions of +, this seems to be the only sane way to interpret
the rule. To excuse the above expression because one of the many orders
of execution is not undefined would be absurd.

I think (but I am not sure) that the new wording makes this explicitly
undefined and that this sort of expression is one reason for the change.

Ben Bacarisse · Jul 2, 2010

cmelliso said:
I'm working on a formal definition for C and I'd like to clarify some
information about expression evaluation. Please accept my apologies
if these questions have been asked before; I looked around but
couldn't find these particular answers.

First: why the change to the description of expression evaluation in
Sec 6.5:2 from ISO/IEC 9899:1999 to 9899:201x? It used to say:
"Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be read only to determine the value
to be stored."

Now it says:
"If a side effect on a scalar object is unsequenced relative to either
a different side effect on the same scalar object or a value
computation using the value of the same scalar object, the behavior is
undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an
unsequenced side effect occurs in any of the orderings."

What is the difference?

I think it is, in part, just tidying up the wording. See another post
of mine in this thread for an example where I think it makes the
undefinedness more explicit.

It seems like the older definition was more
clear. Furthermore, it seems like they both imply the following
simple statement:
"Between the previous and next sequence point you can't write to a
location twice or read from a location after writing to it. For a
given expression, if there is any evaluation order where such a thing
could happen, the expression is undefined."

As I read it, that would render both statements in

int i = 0;
f(++i);
f(i = 2);

undefined. The value of i could be seen as being read after being
written to in both cases (indeed that is how the standard describes
it). For example, the value of an assignment is "the value of the left
operand after the assignment" (6.5.16 p3). That implies the value must
be read from i after setting it to 2 (in the abstract machine, I mean --
a real compiler will probably just use a literal 2) but before the
sequence point at the function call.

The problem is to write the rule without using "before" or "after"
except as they relate to sequence points. Objects only have known
values at sequence points.

Also, the introduction of "indeterminate sequencing" for functions
just confuses things in my mind, as it seems like nothing changed.
There were already sequence points at function calls and returns, so
it seems clear that related sub-expressions at the call-site would
have been sequenced before or after the call. What am I missing?

Finally, if my simplification is correct, I don't see why one needs to
consider delayed side effects, at least in establishing
undefinedness. If there is a problem in an expression involving a
side effect, the path on which you detect that problem would be the
one where the side effect happens as early as possible. If there is
no problem, then there is no way to detect that there was delayed side-
effecting anyway. It seems to me the only way you can "detect"
delayed side-effects is by writing an expression that is undefined
anyway.

I don't follow this, sorry. I don't really know what you mean by
"detect" and why it matters to you. Also, your use of "problem" is
rather vague. What is "a problem in an expression involving a side
effect"?

As an example,
((x=y) + ...) + ...

The y itself must be evaluated before you can determine what the value
of the assignment expression is. Thus, there's no way to have the
assignment take place BEFORE that evaluation. Further, if you delay
changing the object x represents in memory, the only way you could
detect that in an expression would be by reading or writing x, which
would cause the entire expression to be undefined. Again, what am I
missing?

Not sure. The example did not help me to grasp your point.

Tim Rentsch · Jul 7, 2010

cmelliso said:
I'm working on a formal definition for C and I'd like to clarify some
information about expression evaluation. Please accept my apologies
if these questions have been asked before; I looked around but
couldn't find these particular answers.

First: why the change to the description of expression evaluation in
Sec 6.5:2 from ISO/IEC 9899:1999 to 9899:201x? It used to say:
"Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be read only to determine the value
to be stored."

Now it says:
"If a side effect on a scalar object is unsequenced relative to either
a different side effect on the same scalar object or a value
computation using the value of the same scalar object, the behavior is
undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an
unsequenced side effect occurs in any of the orderings."

What is the difference? It seems like the older definition was more
clear. Furthermore, it seems like they both imply the following
simple statement:
"Between the previous and next sequence point you can't write to a
location twice or read from a location after writing to it. For a
given expression, if there is any evaluation order where such a thing
could happen, the expression is undefined."

Also, the introduction of "indeterminate sequencing" for functions
just confuses things in my mind, as it seems like nothing changed.
There were already sequence points at function calls and returns, so
it seems clear that related sub-expressions at the call-site would
have been sequenced before or after the call. What am I missing?

Finally, if my simplification is correct, I don't see why one needs to
consider delayed side effects, at least in establishing
undefinedness. If there is a problem in an expression involving a
side effect, the path on which you detect that problem would be the
one where the side effect happens as early as possible. If there is
no problem, then there is no way to detect that there was delayed side-
effecting anyway. It seems to me the only way you can "detect"
delayed side-effects is by writing an expression that is undefined
anyway.

As an example,
((x=y) + ...) + ...

The y itself must be evaluated before you can determine what the value
of the assignment expression is. Thus, there's no way to have the
assignment take place BEFORE that evaluation. Further, if you delay
changing the object x represents in memory, the only way you could
detect that in an expression would be by reading or writing x, which
would cause the entire expression to be undefined. Again, what am I
missing?

Is this an oversimplification? I realize there are reams of documents
about sequence points and evaluation order (including JTC1/SC22/WG14
N925 and N926), but I'd like have a simple interpretation as long as
it's still correct.

I would prefer an example/counterexample if you think my
simplification misses something. That is probably the best way to
make me realize I'm dumb

My understanding is that the primary impetus for this change in
wording has to do with support for threading. Because threads
can perform multiple actions at a time, it's important to have a
more exactly worded specification about access ordering.

Besides anything having to do with threads, the newer wording
seems better than the old wording because the statement of
requirements is more complete and more exact (even if the
newer wording may be harder to understand). Here are two
examples. First, function calls. It's well understood that
any function call in an expression executes "atomically",
that is, either completely before or completely after
any other side effect in the expression containing the call.
Even though it's well understood, that requirement isn't
explained very clearly in the C99 Standard. So you might
want to look at the wording in C99 and also the new wording,
and consider which one better states or expresses this
"atomic function call" property.

Second example, various corner cases that come up with regard to
reading and writing the same object in a single expression.
Consider the multiple assignment

p = (p->x = q);

with redundant parentheses added to emphasize grouping. Under
the C99 wording, it's debatable whether the behavior for this
statement is defined or undefined. In particular, does C99 mean
to allow that 'p' may be assigned into before 'p->x' is
evaluated? The existing wording doesn't produce a clear cut
answer to this question. The revised wording you offered, with
the phrase "... if there is any evaluation order where ...", has
the same problem, because it doesn't state precisely /which/
evaluation orders are /possible/. The point of the newer wording
is that an expression's abstract syntax tree imposes a partial
ordering as to which subexpressions must be evaluated before
other subexpression, and this partial ordering is specifed using
precise language, so there is no question or argument about which
evaluation orders are possible. The previous wording may seem
easier to understand, but it is vague near the edges, because
certain important phrases or concepts (eg, "previous sequence
point", "next sequence point", "may be read only to determine")
are not defined but are left as ordinary English phrases with
only subjective interpretation to decipher them. The newer
wording is more formal, which may make it hard to understand,
but it is more exact and more complete, which seems like a
good thing (and again, even not counting any considerations
related to threading, which is another factor).

Does that make a little more sense now?

Tim Rentsch · Jul 8, 2010

Ben Bacarisse said:
Paul N said:

[snip]

But (and I think I'm on stronger ground here) the references to
"previous" and "next" seem a bit presumptious. For example, in:

x = (a(), b()) + (c(), d());

there are (amongst others) two sequence points - one between a and b
and one between c and d. The function a is required to be called
before b, and the function c before d. But they could be called in the
order a, b, c, d - in which case the left-hand sequence point comes
first - or c, d, a, b - in which case the right-hand sequence point
comes first. Or a, c, d, b in which either sequence point could be
considered first.

Click to expand...

There are other possible orders of course. The comma restricts the set
of orderings to ones in which b() comes after a() and comes d() after
c().

I think your example would be simpler without the function calls since
function calls also add sequence points! In

(e1, e2) + (e3, e4)

there is indeed no way to tell which sequence point "happens" before the
other. Most people interpret the current wording to mean that if there
is /any/ otherwise valid order for executing the expression that
violates this rule, then the expression is undefined. For example:

(i=1, j=2) + (j=3, i=4)

can be evaluated in an order that avoid UB (i=1 -- j=3 -- SP1 -- j=2 --
SP2 -- i=4 -- ADD) but it can also be evaluated in an order that does
violate the current wording: i=1 -- SP1 -- j=2 -- j=3 -- SP2 -- i=4 --
ADD. Since the standard does not specify any order for the
sub-expressions of +, this seems to be the only sane way to interpret
the rule. To excuse the above expression because one of the many orders
of execution is not undefined would be absurd.

I think (but I am not sure) that the new wording makes this explicitly
undefined and that this sort of expression is one reason for the change.

The new wording does give an unambiguous conclusion that this
example is undefined behavior. I don't know if I would say
"explicitly" since the conclusion relies on deductions
about what orderings are possible, however there is an explicit
statement of undefined behavior under certain circumstances,
and those circumstances can be shown to occur in the given
example. More specifically, because (for example) 'j=2'
is not sequenced before 'j=3', and 'j=3' is not sequenced
before 'j=2' [*], the two assignments to j are unsequenced,
and this condition is explicitly undefined behavior.

[*] The arguments that each assignment is not sequenced
before the other are however not constructive arguments.
These assignments are unsequenced only because there is
no "proof" under the logical rules of the Standard that
one is sequenced before the other, and that makes them
unsequenced.

cmelliso · Jul 9, 2010

Consider the multiple assignment

p = (p->x = q);

with redundant parentheses added to emphasize grouping. Under
the C99 wording, it's debatable whether the behavior for this
statement is defined or undefined. In particular, does C99 mean
to allow that 'p' may be assigned into before 'p->x' is
evaluated?

Is your example meant to be undefined or not? My understanding is
that assuming it's not the case that &(p->x) == &p, or something
similar, then the expression is defined because there are no two
writes to the same memory location, and the read of p required in the
evaluation of the (p->x = q) must come before the write to p.

In some sense, the sequencing relation precisely describes "must
happen before" and "may happen before".

Eric Sosman · Jul 10, 2010

No read of p is required in the evaluation of the (p->x = q),
in order to write to p.

The value which is assigned to p, is q.

But which p's x is written? The "before" p, or the "after?"

struct thing { struct thing *x; };
struct thing s1 = { NULL }, s2 = { NULL };
struct thing *p, *q;
p = &s1;
q = &s2;
p = p->x = q;
assert (s1.x == q); // does this fail?
assert (s2.x == q); // ... or this?

(Actually, since the behavior is undefined it is not certain that
either assert() will fire; anything at all might have happened.)

Tim Rentsch · Jul 10, 2010

cmelliso said:
Is your example meant to be undefined or not? My understanding is
that assuming it's not the case that &(p->x) == &p, or something
similar, then the expression is defined because there are no two
writes to the same memory location, and the read of p required in the
evaluation of the (p->x = q) must come before the write to p.

The question is: Is the read access of variable p in 'p->x'
performed "to determine the value to be stored" in the assignment
'p = ...'? If the read is done /not/ as part of determining the
value to be stored, the behaviour would then be undefined.
(Indeed there have already been a couple of followups saying that
this statement exhibits undefined behavior.) The problem is, the
existing wording isn't clear enough; an argument can be made
either way. Personally I believe that this example conforms to
both the letter and the spirit of the wording of the requirement
about being read to determine the value to be stored, but other
people hold different views, and unfortunately the existing
wording admits both interpretations -- either is plausible. That
ambiguity is what I was trying to get at in my earlier posting.

In some sense, the sequencing relation precisely describes "must
happen before" and "may happen before".

Yes, the new wording using the "sequenced before" relation makes
it clear that 'p = p-x = q;' is defined. Larry Jones has made
the comment that this result was intended all along, from which
I infer that part of the impetus for the newer wording is to
clear up such questions.

cmelliso · Jul 10, 2010

Yes, the new wording using the "sequenced before" relation makes
it clear that 'p = p-x = q;' is defined. Larry Jones has made
the comment that this result was intended all along, from which
I infer that part of the impetus for the newer wording is to
clear up such questions.

I see now. It was clear to me because I was thinking in terms of the
sequencing relation all along. Because the evaluation of operands is
sequenced before the evaluation of an operator, p must be evaluated in
(p->x = q) before the _=_ itself is evaluated, despite the fact that
you know "know" what the result will be without looking at the LHS.
And of course, that _=_ has to be evaluated before the top _=_ can be
evaluated.

sequence points and printf()	17	Dec 1, 2011
sequence points in subexpressions	114	Dec 13, 2009
sequence points	5	Aug 17, 2008
Sequence point.	10	Sep 30, 2013
sequence points and evaluation order	9	Sep 4, 2006
sequence points	10	Oct 14, 2007
Sequence points	4	Jul 23, 2003
Creating sequence points using macros	2	Mar 28, 2007

sequence points and expression evaluation

cmelliso

Eric Sosman

Paul N

Ben Bacarisse

Ben Bacarisse

Tim Rentsch

Tim Rentsch

cmelliso

Eric Sosman

Tim Rentsch

cmelliso

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads