possibly undefined behavior

B

BartC

Peter said:
pete said:
I think of

p = p -> next = q;

as my favorite example of undefined behavior...

and mine is:

a = i++; /* UB */


a[a] = 0; /* when a == i */


What's wrong with that?

#include <stdio.h>

int main(void){
int a[5]={0,1,2,3,4};
int i;

for (i=0; i<5; ++i) printf("%d ",a); puts("");

i=2;
a[a]=0;

for (i=0; i<5; ++i) printf("%d ",a); puts("");
}
 
K

Kaz Kylheku

In "p -> next = q", where p is read to determine where to store the
value of q.

So an expression where p is read to determine where to store the value
indicates that p is read /not/ to determine the value to be stored?

I see.
That's not a bad argument, but it implies that the side effect of
storing the value in the target must occur before the result of the
assignment is used.

Actually, no it doesn't.
But then why does the standard say, "The side
effect of updating the stored value of the left operand shall occur
between the previous and the next sequence point."?

Because it's still possible for the object p to be updated first,
and then for p->next to be updated, even if p->next is unambiguously
understood to be based on the prior value of p.

The question of what side effects are generated by the expression, affecting
what objects, is independent of the question of the order in which
those effects actually complete prior to the end of the evaluation
phase.

The implementation can collect information about the side effects into
a set, and then do the set in a random (possibly wholly or partially parallel)
order. In this side effect set, objects which are the targets of effects no
longer identified as p and p->next, but by their concrete identities (like
machine addresses).
Yes, that's
strictly consistent with what you say, but if that's the intent it's
an odd way to express it.

Consider:

int x, y;
x = y = 3;

Both x and y have the value 3 stored in them, but this can occur in
either order.

Sure in the actual semantics.

But remember that actual semantics can obliterate sequence points as well.

Loop unrolling, function inlining, making entire objects disappear, etc.
The result of "y = 3", and therefore the value stored
in x, is "the value of [y] after the assignment", which seems to imply
an ordering constraint, but I don't think it's intended to.

I would say that using words like ``before'' and ``after'' is a curious way of
showing that non-intent of ordering. :)
The value
stored in x is 3; the value of y after the assignment is 3. 3 is 3,
which satisfies the requirement.

Any actual semantics that sticks 3 into x and y satisfies the requirement.

These objects might even disappear completely; when x and y later
accessed in same basic block of code, this can turn into instructions
that contain 3 as an immediate operand, because the compiler sees
that these variables can have no other live values at that point.
N1362, the pre-C201x draft, re-words the section, but I don't think it
resolves the issue:

An assignment operator stores a value in the object designated by
the left operand. An assignment expression has the value of the
left operand after the assignment, but is not an lvalue. The type
of an assignment expression is the type of the left operand unless
the left operand has qualified type, in which case it is the
unqualified version of the type of the left operand. The side
effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right
operands. The evaluations of the operands are unsequenced.

This makes a clear case for the well-definedness of

p = p->next = q.

The main constituent of this expression is the p assignment. The draft says
that the value computations of the the left and right sides, i.e.
of p and of p->next = q, are sequenced before the completion of the assignment.

The value computation of p->next must pin down what p->next is, and in
so doing, establish the identity of the object where to store q. This
happens before p is modified, and so must use the prior value of p.

What we don't know is whether p is updated first or p->next is updated
first, but we do know that this updated p->next is based off the old p;
i.e. we are talking about p'->next where p' means old p.
But note that it doesn't say that any use of the result is sequenced
after the side effect of updating the stored value.

Right. The update of the object p'->next could well be the absolute last
thing that happens prior to the next sequence point.
 
K

Kaz Kylheku

Peter said:
I think of

p = p -> next = q;

as my favorite example of undefined behavior...

and mine is:

a = i++; /* UB */


a[a] = 0; /* when a == i */


What's wrong with that?


Absolutely nothing. The assignment cannot take place until it is
known where the value is going to be stored. The store
cannot interfere with the calculation which computed where
the store went! Simple causality.
 
B

Ben Bacarisse

Kaz Kylheku said:
Peter said:
I think of

p = p -> next = q;

as my favorite example of undefined behavior...

and mine is:

a = i++; /* UB */

a[a] = 0; /* when a == i */


What's wrong with that?


Absolutely nothing. The assignment cannot take place until it is
known where the value is going to be stored. The store
cannot interfere with the calculation which computed where
the store went! Simple causality.


i = 3;
a = 3;
a[a] = 0;

There are sequence points at the semicolons. Between the last two,
the object a[3] is (as required) modified only once, however the prior
value of a[3] is not "read only to determine the value to be stored".

I imagine you know this, so presumably your point is that the above
should not be UB; or do we simply have different readings of section
6.5.2?
 
B

BartC

Ben Bacarisse said:
Kaz Kylheku said:
Peter Nilsson wrote:
I think of

p = p -> next = q;

as my favorite example of undefined behavior...

and mine is:

a = i++; /* UB */

a[a] = 0; /* when a == i */

What's wrong with that?


Absolutely nothing. The assignment cannot take place until it is
known where the value is going to be stored. The store
cannot interfere with the calculation which computed where
the store went! Simple causality.


i = 3;
a = 3;
a[a] = 0;


The last line is equivalent to:

*(a+*(a+i)) = 0;

and I still don't see a problem. Nothing is being modified until the outer
*(...).

The only possibility is if, for some reason, the compiler decides to
evaluate *(a+i) twice, and uses a temporary value p = a+i, then overwrites
the *(a+*p) with something, then puts the final value (0) in *(a+*p), where
*p now contains that something. But that would be weird.
There are sequence points at the semicolons. Between the last two,
the object a[3] is (as required) modified only once, however the prior
value of a[3] is not "read only to determine the value to be stored".

I imagine you know this, so presumably your point is that the above
should not be UB; or do we simply have different readings of section
6.5.2?
 
J

James Kuyper

BartC said:
Ben Bacarisse said:
Kaz Kylheku said:
Peter Nilsson wrote:
I think of

p = p -> next = q;

as my favorite example of undefined behavior...

and mine is:

a = i++; /* UB */

a[a] = 0; /* when a == i */

What's wrong with that?

Absolutely nothing. The assignment cannot take place until it is
known where the value is going to be stored. The store
cannot interfere with the calculation which computed where
the store went! Simple causality.


i = 3;
a = 3;
a[a] = 0;


The last line is equivalent to:

*(a+*(a+i)) = 0;

and I still don't see a problem. Nothing is being modified until the outer
*(...).


Multiple modification is not the issue.
There are sequence points at the semicolons. Between the last two,
the object a[3] is (as required) modified only once, however the prior
value of a[3] is not "read only to determine the value to be stored".

The issue is that the value a[3] is being modified, and is also being
read to determine which element of a should be modified, not just "to
determine the value to be stored"; something that the standard
explicitly says is undefined behavior.

Do you disagree that the prior value of a[3] is being read? Do you claim
that it is being read to determine the value to be stored? Or are you
saying that you don't understand why the standard says that the behavior
is undefined?
 
B

BartC

James Kuyper said:
BartC said:
i = 3;
a = 3;
a[a] = 0;


The last line is equivalent to:

*(a+*(a+i)) = 0;

and I still don't see a problem. Nothing is being modified until the
outer
*(...).


Multiple modification is not the issue.
There are sequence points at the semicolons. Between the last two,
the object a[3] is (as required) modified only once, however the prior
value of a[3] is not "read only to determine the value to be stored".

The issue is that the value a[3] is being modified, and is also being read
to determine which element of a should be modified, not just "to determine
the value to be stored"; something that the standard explicitly says is
undefined behavior.

Do you disagree that the prior value of a[3] is being read? Do you claim
that it is being read to determine the value to be stored? Or are you
saying that you don't understand why the standard says that the behavior
is undefined?


I've now seen the paragraph at 6.5#2 (not 6.5.2). Yes it states prior values
cannot be accessed except to get the new value, but gives no good reason and
a very poor example.

I'd still like to see an example where a=i; a[a]=0; gives a problem.
 
K

Kaz Kylheku

BartC said:
a[a] = 0;


The last line is equivalent to:

*(a+*(a+i)) = 0;

and I still don't see a problem. Nothing is being modified until the outer
*(...).


Multiple modification is not the issue.


BartC didn't say it was. The timing of the modification is the issue,
of course.
There are sequence points at the semicolons. Between the last two,
the object a[3] is (as required) modified only once, however the prior
value of a[3] is not "read only to determine the value to be stored".

The issue is that the value a[3] is being modified, and is also being
read to determine which element of a should be modified, not just "to

Note that is also the case in

a[a[3] = 4] = 0;

a[3] is being modified with 4, and it is accessed to determine which element of
a receives the zero, namely a[4].

The standard says so: the value of an assignment expression is that
of the left operand after the assignment. So the value of a[3] = 4
is that of a[3] (but after the assignment). a[3] is modified, and it is
accessed for another purpose, yet there are no additional sequence points
to order any of it.

An even simpler example is:

f(i = i + 1);

The object i is accesed to determine the value to be stored into i,
but not only that, but also to determine the argument to pass to f.
Oops, violation of 6.5, paragraph 2!

Everyone, stop all your work and fix millions of lines of perfectly portable
code!
determine the value to be stored"; something that the standard
explicitly says is undefined behavior.

The standard is defective here, and contradicts not only itself but
the basic logic of causality.

If the object being modified is used to determine the /location/ where
the store is taking place, that establishes causality just as well
as if it is used to determine the value to be stored.

The store cannot take place until both quantities are known: where and what.

Until the store takes place, it cannot have any effect on the value
of any operand being used to compute the where and what.

I take that as an obvious axiom which doesn't need to be spelled out.
Do you disagree that the prior value of a[3] is being read? Do you claim
that it is being read to determine the value to be stored? Or are you
saying that you don't understand why the standard says that the behavior
is undefined?

What is obvious is that BartC can think, and not only regurgitate portions of
an document which constitutes an unsound and incomplete description of a living
language.
 
J

jameskuyper

Kaz said:
BartC said:
a[a] = 0;

The last line is equivalent to:

*(a+*(a+i)) = 0; ....
There are sequence points at the semicolons. Between the last two,
the object a[3] is (as required) modified only once, however the prior
value of a[3] is not "read only to determine the value to be stored".


The issue is that the value a[3] is being modified, and is also being
read to determine which element of a should be modified, not just "to


Note that is also the case in

a[a[3] = 4] = 0;

a[3] is being modified with 4, and it is accessed to determine which element of
a receives the zero, namely a[4].


The key difference being, of course, that the value being read is not
also the value being modified. It is the combination of reading and
modifying the same value which makes the behavior undefined, unless
the read is for the purpose of determining the value to be written.

.....
Do you disagree that the prior value of a[3] is being read? Do you claim
that it is being read to determine the value to be stored? Or are you
saying that you don't understand why the standard says that the behavior
is undefined?

What is obvious is that BartC can think, and not only regurgitate portions of
an document which constitutes an unsound and incomplete description of a living
language.

All I was doing was trying to clarify whether he misunderstood the
rule, didn't understand that it applied to this code, or disagreed
with the reason for the rule - his comments did not make that clear.

The fact that I know what the standard says doesn't mean that I
interpret it uncritically - far from it - this newsgroup and
comp.std.c are filled with posts in which I criticize various features
of the standard. I don't defend this particular rule; I used to find
it very confusing; nowadays I've got a much better understanding of
it, so I'm now only moderately confused by it.

The Rationale is silent on this issue. People who were on the
committee have described the reasons for it; my understand of their
explanation is as follows: they had a definite desire to make things
simple for implementors by not requiring them to handle certain
problematic code constructs. On the other hand, they wanted developers
to be able to rely upon such unproblematic constructs as i = i+1. They
tried to find a simple rule that clearly distinguished the two types
of code, and failed. Therefore, they settled for a simple rule that
excludes all of the problematic cases, but also includes some other
unproblematic cases.
 
L

lawrence.jones

Kaz Kylheku said:
Peter said:
a[a] = 0; /* when a == i */


What's wrong with that?


Absolutely nothing.


Not in reality, but it runs afoul of the Standard's proviso that the
previous value of an object being modified can only be read to determine
the new value to be stored; in this case, it's being read to determine
the location to store the new value instead. The Standard's proviso is
overly broad, but it's the best we could do in a reasonably short
English statement. Clive Feather's Formal Model of Sequence Points
(which has never been adopted by the committee) did, indeed, make this
case well-defined. I'm not sure whether the adjustments that are being
made to the sequencing rules to support threads will change the status
quo or not.
 
T

Tim Rentsch

pete said:
I think of

p = p -> next = q;

as my favorite example of undefined behavior resulting
from the value of p being read with the result not being used
to determine the value to be stored.

It's debatable whether the existing wording leaves the behavior in
this case undefined or not. However, the next standard draft
(n1362, the paragraphs quoted in Keith Thompson's posting in this
thread) resolves the question and makes the behavior here
well-defined.
 
T

Tim Rentsch

Ben Bacarisse said:
pete said:
Keith Thompson wrote:

I think of

p = p -> next = q;

as my favorite example of undefined behavior resulting
from the value of p being read with the result not being used
to determine the value to be stored.

and mine is:

a = i++; /* UB */

i is used to determine where the value is stored rather than what
value to store.


To me this example seems no different than 'b = i + i++' -- it's
still just reading and writing a single variable in independent
subexpressions.
 
T

Tim Rentsch

Peter Nilsson said:
pete said:
I think of

p = p -> next = q;

as my favorite example of undefined behavior...

and mine is:

a = i++; /* UB */


a[a] = 0; /* when a == i */


Here again, regardless of whether it was intended that
this case be defined under the current wording, the
next draft standard n1362 makes the behavior here
well-defined.
 
T

Tim Rentsch

Kaz Kylheku said:
BartC said:
a[a] = 0;

The last line is equivalent to:

*(a+*(a+i)) = 0;

and I still don't see a problem. Nothing is being modified until the outer
*(...).


Multiple modification is not the issue.


BartC didn't say it was. The timing of the modification is the issue,
of course.
There are sequence points at the semicolons. Between the last two,
the object a[3] is (as required) modified only once, however the prior
value of a[3] is not "read only to determine the value to be stored".

The issue is that the value a[3] is being modified, and is also being
read to determine which element of a should be modified, not just "to

Note that is also the case in

a[a[3] = 4] = 0;

a[3] is being modified with 4, and it is accessed to determine which element of
a receives the zero, namely a[4].


Please note the wording of 6.5.16 p 3. It's clear that the value
stored is the same as the value delivered, but it isn't clear that a
read access of the left operand is required to produce that value
(even in the abstract machine). Yes, I admit the wording in
ambiguous, and also that this is a defect that should be fixed; I'm
just pointing out what might be a loose brick in the reasoning.
As far as that goes, if an assignment requires a read access to
produce its value, even a statement like

b = 0;

would read the variable 'b' "other than to determine the value
to be stored".
 
T

Tim Rentsch

jameskuyper said:
Kaz said:
BartC wrote:
a[a] = 0;

The last line is equivalent to:

*(a+*(a+i)) = 0; ...
There are sequence points at the semicolons. Between the last two,
the object a[3] is (as required) modified only once, however the prior
value of a[3] is not "read only to determine the value to be stored".

The issue is that the value a[3] is being modified, and is also being
read to determine which element of a should be modified, not just "to


Note that is also the case in

a[a[3] = 4] = 0;

a[3] is being modified with 4, and it is accessed to determine which element of
a receives the zero, namely a[4].


The key difference being, of course, that the value being read is not
also the value being modified. It is the combination of reading and
modifying the same value which makes the behavior undefined, unless
the read is for the purpose of determining the value to be written.

....
Do you disagree that the prior value of a[3] is being read? Do you claim
that it is being read to determine the value to be stored? Or are you
saying that you don't understand why the standard says that the behavior
is undefined?

What is obvious is that BartC can think, and not only regurgitate portions of
an document which constitutes an unsound and incomplete description of a living
language.

All I was doing was trying to clarify whether he misunderstood the
rule, didn't understand that it applied to this code, or disagreed
with the reason for the rule - his comments did not make that clear.

The fact that I know what the standard says doesn't mean that I
interpret it uncritically - far from it - this newsgroup and
comp.std.c are filled with posts in which I criticize various features
of the standard. I don't defend this particular rule; I used to find
it very confusing; nowadays I've got a much better understanding of
it, so I'm now only moderately confused by it.

The Rationale is silent on this issue. People who were on the
committee have described the reasons for it; my understand of their
explanation is as follows: they had a definite desire to make things
simple for implementors by not requiring them to handle certain
problematic code constructs. On the other hand, they wanted developers
to be able to rely upon such unproblematic constructs as i = i+1. They
tried to find a simple rule that clearly distinguished the two types
of code, and failed. Therefore, they settled for a simple rule that
excludes all of the problematic cases, but also includes some other
unproblematic cases.


<editorial>

and unfortunately potentially allows bogus conclusions for
perfectly sensible constructions, which conclusions any
intelligent person knowledgeable in the ways of programming
would be shocked to discover.

</editorial>
 
T

Tim Rentsch

Kaz Kylheku said:
Peter Nilsson wrote:

a[a] = 0; /* when a == i */

What's wrong with that?


Absolutely nothing.


Not in reality, but it runs afoul of the Standard's proviso that the
previous value of an object being modified can only be read to determine
the new value to be stored; in this case, it's being read to determine
the location to store the new value instead. The Standard's proviso is
overly broad, but it's the best we could do in a reasonably short
English statement. Clive Feather's Formal Model of Sequence Points
(which has never been adopted by the committee) did, indeed, make this
case well-defined. I'm not sure whether the adjustments that are being
made to the sequencing rules to support threads will change the status
quo or not.


The wording in the current draft standard (meaning, n1362)
makes the behavior for this case well-defined. I don't
know if that wording has anything to do with thread support.
 
T

Tim Rentsch

Jack Klein said:
This last sentence above was the silly statement.


Actually, what I said makes a weird kind of sense, in a dumb way. It
just has no relationship to reality.

If there were NO sequence points associated with a function call, not
the one after the evaluation of arguments, not any inside the function
itself, not even the one at the terminating ; of the return statement,
then for any non-array type object x:

x = somefunc(x);

...because there would be no sequence points between reading the
current value of x and assigning a new value, therefore nothing
preventing the assignment to x to occur before the function call and
return.

Irrelevant, because x is read here to determine the value
to be stored into x. The presence or absence of any
sequence points related to function call wouldn't
change the well-definedness of such code.
 
T

Tim Rentsch

Keith Thompson said:
Yes and no.

The assignment evaluates the expression "++i" and stores the result in
i, so the result of the expression must be determined before the value
is stored. But the side effect of "++i" is to modify i; that side
effect doesn't need to occur before the assignment modifies i, since
the side effect isn't necessary for determining what the result of
"++i" is going to be.

Using a well-defined example:

j = ++i;

There are several things that must happen here:

(a) Evaluate "j" as an lvalue (i.e., determine its address).
(b) Evaluate "i" to determine its current value.
(c) Determine the result of "++i".
(d) Store the result of "++i" in j (side effect of "=").
(e) Increment i (side effect of "++"").
(f) Determine (and discard) the result of the assignment expression.

Some of these things must occur before other things can happen. For
example, (c) must precede (d). But (e) can occur either before or
after (d); you don't need to modify i to determine what the result of
"++i" is going to be.

In this case, since i and j are separate objects, there's no problem.
In the case of "i = ++i", the two modifications to i are unordered,
and so the behavior is undefined.

The pre-C201X draft:
http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1362.pdf>
has a very interesting re-statement of the rules in 6.5 (it helped me
understand what the C90/C99 wording really means):

An _expression_ is a sequence of operators and operands that
specifies computation of a value, or that designates an object or
a function, or that generates side effects, or that performs a
combination thereof. The value computations of the operands of an
operator are sequenced before the value computation of the result
of the operator.

If a side effect on a scalar object is unsequenced relative to
either a different side effect on the same scalar object or a
value computation using the value of the same scalar object, the
behavior is undefined. If there are multiple allowable orderings
of the subexpressions of an expression, the behavior is undefined
if such an unsequenced side effect occurs in any of the orderings.

The grouping of operators and operands is indicated by the
syntax. Except as specified later, side effects and value
computations of subexpressions are unsequenced.

Thank you Keith for posting this excerpt (along with the other
excerpt having to do with assignments, given in another posting).

Please note that the wording in N1362 makes statements like

p = p->next = q;

and

i = 0, a[0] = 0;
a[a] = 0;

be well-defined. So if, as you say, this has helped with an
understanding of what the current wording means, it might help to
re-read the relevant sections in both drafts and consider what
what meaning was intended in the current standard, and whether
the wording in N1362 was intended to change the earlier meaning,
or just to clarify it.
 
T

Tim Rentsch

Kaz Kylheku said:
Not all instances of this can be detected statically. Think about:

(*p) = (*q)++;

This can be well-defined if p and q point to different objects,
but is undefined if they point to the same object.

The values of p and q can vary at run-time, and can be made to depend
on input to the program.

So to issue the diagnostic at translation time, you have to have the input to
the program available, and be prepared to solve the halting problem. :)

But I agree with you. Unspecified orders of evaluation, in a primarily
imperative language, are complete nonsense, and atrociously irresponsible
engineering that partially keeps us in the dark ages.

Rather than inventing misfeatures and then trying to diagnose them,
we should specify the order of everything, so that there is no ambiguity.

There is a religious belief, completely unsubstantiated, that unspecified
evaluation orders are required for the generation of good code.

I can't speak for other people, but my own beliefs in this area are
scientific, not religious; meaning, I am happy to change my belief
if presented with reliable evidence as to the question. Can you
provide a compiler that will generate code for deterministic
order-of-evaluation that's just as good as (say) gcc on X86 does for
non-deterministic order-of-evalution? If so, bring it on. If not,
please refrain from calling other beliefs "religious" just because
they happen to disagree with yours.
 
T

Tim Rentsch

Keith Thompson said:
[snip-snip-snip]
Actually, no it doesn't.

Oh? Then what exactly did you mean by "In this case, the abstract
semantics says that, literally, the value is stored into p->next, and
then the assignment expression's value is derived by accessing the
value of p->next."?

It seems clear (to me anyway) that the two updates can occur in
either order (or arbitrarily interleaved, for that matter).
However, the value being assigned to p is the value of the right
hand side (namely, p->next = q), which in turn is "the value of
the left operand [p->next] after the assignment." Regardless of
when the two updates occur, saying "after the assignment" means
the assignment operation has been evaluated, so p->next must have
been evaluated (the result being one operand of the assignment
here).

Also, to repeat myself, N1362 clearly establishes the behavior
in such statements as well-defined.


[snip]
[snip]
Again, I'm talking about the abstract semantics in which objects don't
disappear.

A concrete example:

#include <stdio.h>

static int x;
static int y;

static int *x_addr(void) { puts("x_addr"); return &x; }
static int *y_addr(void) { puts("y_addr"); return &y; }

int main(void)
{
*x_addr() = *y_addr() = 3;
return 0;
}

The lines "x_addr" and "y_addr" may be printed in either order.

This example isn't really on point since x_addr() and y_addr() can
both be evaluated before either assignment operator has started,
let alone when they may have finished or produced a value.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top