Ambiguity in semantics of assignments?

P

Paul Steckler

Here's some code that's giving me differing results, depending
on the compiler.

[includes omitted]

typedef foo {
int A,B;
} FOO;

int main() {
int A;
FOO foo;

A = 57;
foo.A = 42;

A = foo.A += A; /* crucial statement */

printf("A = %d; foo.A = %d\n",A,foo.A);

return 0;
}

Using Cygwin gcc 3.3.3, I get 99 for both A and foo.A.
Using a different compiler, I get 84 for both A and foo.A.

The result given by gcc is what I expected. Is there any
ambiguity in the semantics of the crucial statement here?

-- Paul
 
D

Dave Vandervies

Here's some code that's giving me differing results, depending
on the compiler.

[includes omitted]

typedef foo {
int A,B;
} FOO;

int main() {
int A;
FOO foo;

A = 57;
foo.A = 42;

A = foo.A += A; /* crucial statement */

printf("A = %d; foo.A = %d\n",A,foo.A);

return 0;
}

Using Cygwin gcc 3.3.3, I get 99 for both A and foo.A.
Using a different compiler, I get 84 for both A and foo.A.

(Which compiler?)

The result given by gcc is what I expected. Is there any
ambiguity in the semantics of the crucial statement here?

I don't see any.

The first suspect is invoking undefined behavior by both changing
an object's value and using an its value for something other than
calculating the new value to be stored between sequence points (N869
6.5#2), but you're not doing that here.

Is the code you posted, plus #including <stdio.h> for the printf,
a complete program that demonstrates the problem?
If not, there's probably something in the part you cut out that's
introducing an ambiguity; find it and fix it. If it is, especially if
you get different behavior by using two int variables directly instead
of wrapping them in a struct, you may have encountered a compiler bug.


dave
 
E

E. Robert Tisdale

Paul said:
Here's some code that's giving me differing results,
depending on the compiler.
> cat foo.c
#include <stdio.h>

typedef struct Foo {
int A, B;
} Foo;

int main() {
int A;
Foo foo;

A = 57;
foo.A = 42;

A = foo.A += A; // crucial statement

printf("A = %d\tfoo.A = %d\n", A, foo.A);

return 0;
}
> gcc -Wall -std=c99 -pedantic -o foo foo.c
> ./foo
A = 99 foo.A = 99
Using Cygwin gcc 3.3.3, I get 99 for both A and foo.A.
Using a different compiler, I get 84 for both A and foo.A.

The result given by gcc is what I expected. Is there any
ambiguity in the semantics of the crucial statement here?

Nope.
 
E

Eric Sosman

Paul said:
Here's some code that's giving me differing results, depending
on the compiler.

[includes omitted]

typedef foo {
int A,B;
} FOO;

int main() {
int A;
FOO foo;

A = 57;
foo.A = 42;

A = foo.A += A; /* crucial statement */

printf("A = %d; foo.A = %d\n",A,foo.A);

return 0;
}

Using Cygwin gcc 3.3.3, I get 99 for both A and foo.A.
Using a different compiler, I get 84 for both A and foo.A.

The result given by gcc is what I expected. Is there any
ambiguity in the semantics of the crucial statement here?

The statement is worse than ambiguous; it's meaningless
because it resides in a non-C source. Once the compiler
issues the required diagnostic it's free to go ahead and
translate the non-C program, but the Standard no longer
governs what happens when and if the program is executed.

Things would have been different if there had been a
`struct' or `union' keyword right after the `typedef', or
if `foo' had been #define'd as `struct' or `union' itself.
Had that been the case, there would have been no ambiguity,
the Standard would have remained in force, and both outputs
would have been 99.

But as the code stands, both compilers are "right" --
provided they've issued that diagnostic, of course.

(You *did* cut and paste the actual code, didn't you?
Surely you wouldn't have been foolhardy enough to post an
inaccurate paraphrase and then ask for diagnosis of a
fine point of language law, would you? No, I'm sure you
wouldn't have done anything *that* silly -- but then again,
it's pretty silly to torment your compilers with syntax
errors and then try to make sense of the results ...)
 
E

E. Robert Tisdale

Eric said:
Paul said:
Here's some code that's giving me differing results, depending on the
compiler.

[includes omitted]

typedef foo {
int A,B;
} FOO;

int main() {
int A;
FOO foo;

A = 57;
foo.A = 42;

A = foo.A += A; /* crucial statement */

printf("A = %d; foo.A = %d\n",A,foo.A);

return 0;
}

Using Cygwin gcc 3.3.3, I get 99 for both A and foo.A.
Using a different compiler, I get 84 for both A and foo.A.

The result given by gcc is what I expected. Is there any ambiguity in
the semantics of the crucial statement here?


The statement is worse than ambiguous; it's meaningless
because it resides in a non-C source. Once the compiler
issues the required diagnostic it's free to go ahead and
translate the non-C program, but the Standard no longer
governs what happens when and if the program is executed.

Things would have been different if there had been a
`struct' or `union' keyword right after the `typedef', or
if `foo' had been #define'd as `struct' or `union' itself.
Had that been the case, there would have been no ambiguity,
the Standard would have remained in force, and both outputs
would have been 99.

But as the code stands, both compilers are "right" --
provided they've issued that diagnostic, of course.

(You *did* cut and paste the actual code, didn't you?
Surely you wouldn't have been foolhardy enough to post an
inaccurate paraphrase and then ask for diagnosis of a
fine point of language law, would you? No, I'm sure you
wouldn't have done anything *that* silly -- but then again,
it's pretty silly to torment your compilers with syntax
errors and then try to make sense of the results ...)
Eric Sosman is trying to be clever and funny.
He is saying that the code above is *not* even a C program
and that subscribers to comp.lang.c should not be expected
to make any intelligent comments about it,

Paul Steckler should have tested and included a complete
standard compiant program that we could test ourselves.
I "guessed" at what Paul meant in my reply to his original posting
but I would *not* expect any other subscriber to do the same.
 
S

steck

Eric said:
You *did* cut and paste the actual code, didn't you?

OK, you got me. My error, and my apologies.

Here's the full program I tried (cut and paste, albeit in two gulps).
It contains a bit more code:

--
#include <stdio.h>

typedef struct _s_ {
unsigned int A,B;
} ST;

int main(int argc,char **argv) {
ST st;
unsigned long A,B;

st.A = 42;
st.B = 18;

A = 57;
B = 39;

// clever code
A = st.A += A;
B = st.B += B;

puts("Clever code fails");
printf("A and st.A should be %d; A = %d, st.A = %d\n",42 +
57,A,st.A);
printf("B and st.B should be %d; B = %d, st.B = %d\n",18 +
39,B,st.B);

// same init values
st.A = 42;
st.B = 18;

A = 57;
B = 39;

// unclever version sequentializes assignments
st.A += A;
A = st.A;
st.B += B;
B = st.B;

puts("Unclever code succeeds");
printf("A and st.A should be %d; A = %d, st.A = %d\n",42 +
57,A,st.A);
printf("B and st.B should be %d; B = %d, st.B = %d\n",18 +
39,B,st.B);

return 0;
}
--

The other compiler was Green Hill Multi 4 for the PowerPC.
Green Hills has tentatively acknowledged the issue as a bug.

While I believe there's no ambiguity in the nested
assignments, can anyone cite relevant sections of
the C90 or C99 standards to support that belief?

-- Paul
 
A

Arthur J. O'Dwyer

OK, you got me. My error, and my apologies.

Yup, you /always/ need to give the actual code you're asking about.
Otherwise, the error could be anywhere... such as in the part you
forgot to include! (In this case, it wasn't... but it could have
been.)

typedef struct _s_ {

'_s_' is a terrible name for anything in C, because many names
beginning with an underscore are reserved for the implementation.
I don't think '_s_' itself is reserved, but in this case you could
have avoided the whole debate by leaving it blank.

typedef struct {
unsigned int A,B;
} ST;

int main(int argc,char **argv) {
ST st;
unsigned long A,B;

Learn to indent your code by putting spaces in front of certain
statements to indicate nesting. It'll save you many hassles in the
future. Also, note that 'argc' and 'argv' are not used here, so it
would be appropriate to write

int main(void)
{

instead.
st.A = 42;
st.B = 18;

A = 57;
B = 39;

// clever code
A = st.A += A;
B = st.B += B;

Both of these lines invoke undefined behavior, AFAICT. Look in
section "6.5.16 Assignment operators" of N869, and consider the
following valid interpretations of the line 'A = st.A += A;'

(1)
Sequence point
Compute st.A + A
Update value of object st.A with that result
Update value of object A with value of st.A
Sequence point

(2)
Sequence point
Compute result of st.A += A (which is st.A + A)
Update value of object A with that result
Compute st.A + A
Update value of object st.A with that result
Sequence point

Now, I'm not a (very good) language lawyer, but I would certainly
consider this undefined behavior, rather than a compiler bug. It's
definitely a construct that a good programmer would never use in
serious code, though, so I can't say I find it a life-or-death issue. ;-)

-Arthur
 
S

steck

Arthur said:
Learn to indent your code by putting spaces in front of certain
statements to indicate nesting. It'll save you many hassles in the
future.

Ummm, blame that one on Google groups beta submission form.
My code was beautifully indented.
It's definitely a construct that a good programmer would never use in
serious code, though, so I can't say I find it a life-or-death issue.
;-)
The issue was discovered when compiling the OpenSSL sources.

-- Paul
 
M

Mark McIntyre

'_s_' is a terrible name for anything in C, because many names
beginning with an underscore are reserved for the implementation.
I don't think '_s_' itself is reserved, but in this case you could
have avoided the whole debate by leaving it blank.

7.1.3 Reserved identifiers
Each header declares or defines all identifiers listed in its associated
subclause, and optionally declares or defines identifiers listed in its
associated future library directions subclause and identifiers which are
always reserved either for any use or for use as file scope identifiers.

- All identifiers that begin with an underscore and either an uppercase
letter or another underscore are always reserved for any use.

- All identifiers that begin with an underscore are always reserved for
use as identifiers with file scope in both the ordinary and tag name
spaces.
 
D

Dave Vandervies

Both of these lines invoke undefined behavior, AFAICT. Look in
section "6.5.16 Assignment operators" of N869, and consider the
following valid interpretations of the line 'A = st.A += A;'

(1)
Sequence point
Compute st.A + A
Update value of object st.A with that result
Update value of object A with value of st.A
Sequence point

(2)
Sequence point
Compute result of st.A += A (which is st.A + A)
Update value of object A with that result
Compute st.A + A
Update value of object st.A with that result
Sequence point

Now, I'm not a (very good) language lawyer, but I would certainly
consider this undefined behavior, rather than a compiler bug.

Nope. Assignment operators associate right to left, so this parses as
`A = (st.A += A)'. The `st.A+=A' part has a well-defined value that
depends only on the old values of A and st.A, and that's what should be
stored in A (and also st.A).
(The value of A can't be accessed after it's been updated, since the
old value is required to determine the new value.)

So the order of operations is:
sequence point
compute st.A+A (required for both of the following)
unordered
{
update value of st.A with result
update value of A with result
}
sequence point
It's
definitely a construct that a good programmer would never use in
serious code, though, so I can't say I find it a life-or-death issue. ;-)

I'd say that depends on what the serious code is supposed to be doing.
Generally the only good reason to use expressions with multiple
side-effects is if they maintain an invariant of some kind (the canonical
example is `foo[i++]=value', to maintain the invariant `i indexes the
first empty position in foo[]'); if that's what's happening here, then
it makes sense to keep it all in one expression.
On the other hand, given that it appears to be an easy way to work around
a compiler bug, that's a good reason to split it up here.


dave
 
C

Chris Barts

Ummm, blame that one on Google groups beta submission form.
My code was beautifully indented.

If the Google groups submission form is eating initial space characters,
it's buggy and you'd do well to get your own news client and find a free
server (if your ISP doesn't offer one).

Plus, a good news client implements killfiling and other useful functions
that, so far as I can tell, are missing from the Google web-based Usenet
interface.
The issue was discovered when compiling the OpenSSL sources.

Not good. I would regard this as a bug in the sources, and I'd be tempted
to send in a patch to the maintainers. It's bad practice even if gcc does
the expected thing when it encounters it, and in a security-conscious
application like OpenSSL that is inexcusable.
 
S

steck

Dave said:
So the order of operations is:
sequence point
compute st.A+A (required for both of the following)
unordered
{
update value of st.A with result
update value of A with result
}
sequence point

This corresponds with my understanding, indicating that
the result of the statement is determinate.
issue. ;-)

I'd say that depends on what the serious code is supposed to be
doing.

Because it's used in OpenSSL, it's protecting your credit card number,
if not your life. :)
-- Paul
 
O

Old Wolf

Is there any ambiguity in the semantics of the crucial
statement here?

I don't see any.

The first suspect is invoking undefined behavior by both changing
an object's value and using an its value for something other than
calculating the new value to be stored between sequence points (N869
6.5#2), but you're not doing that here.

I disagree. Assignment operators do not create sequence points.
There is clearly a read and a write of A with no intervening
sequence point. The only question is whether this case is
covered by the provision in 6.5#2 (see below for more
discussion of this).

If the statement were: A = B += C, then the shortest set
of operations is clearly:
- add C into B
- write B to A
But some CPUs do not even have an instruction for
(add contents of one address into contents of another address)
and those that do, may find it less optimal than register
accumulation.
So do the steps in that order a compiler might have to go:
- write B to register R
- add C into R
- write R to B
- write R to A
If A is a register then the optimal set of steps would be:
- write B to A
- add C into A
- write A to B
(clearly better than the 4-step sequence above).

This way would result in A = 84 and foo.A = 84.
For the OP, we could try and see if that is what the compiler
did:
1) What is the value of 'd': d = (A = foo.A += A);
2) Does it make a difference if you force A to not be a register:
int *p = &A; (before the assignment)
3) Does it make a difference if foo.A is a register?
(to do this you probably will need to use a
"register int b;" instead of foo.A)
4) Can you paste the assembly generated for the assignment.

Now, I don't fully grok the wording of the relevant
standard passage:

[#2] Between the previous and next sequence point an object
shall have its stored value modified at most once by the
evaluation of an expression. Furthermore, the prior value
shall be accessed only to determine the value to be
stored.60)

But it seems to me that A is being read to determine the
value of foo.A, not to determine the new value of A.
Therefore it is UB.
 
D

Dave Vandervies

I disagree. Assignment operators do not create sequence points.

There's no need for them to do so here.
There is clearly a read and a write of A with no intervening
sequence point. The only question is whether this case is
covered by the provision in 6.5#2 (see below for more
discussion of this).
[reordered]
Now, I don't fully grok the wording of the relevant
standard passage:

[#2] Between the previous and next sequence point an object
shall have its stored value modified at most once by the
evaluation of an expression. Furthermore, the prior value
shall be accessed only to determine the value to be
stored.60)

But it seems to me that A is being read to determine the
value of foo.A, not to determine the new value of A.
Therefore it is UB.

The value assigned to A is the value of the expression `foo.A+=A'.
This value depends on the value read from A, so the access to A is used
to determine the value stored in A. The fact that the read access to
A is also used to determine the new value of foo.A (and that this value
is the same as the new value of A) is incidental.


If the statement were: A = B += C, then the shortest set
of operations is clearly:
- add C into B
- write B to A
But some CPUs do not even have an instruction for
(add contents of one address into contents of another address)
and those that do, may find it less optimal than register
accumulation.
So do the steps in that order a compiler might have to go:
- write B to register R
- add C into R
- write R to B
- write R to A
If A is a register then the optimal set of steps would be:
- write B to A
- add C into A
- write A to B
(clearly better than the 4-step sequence above).

....except that if A and C are the same register it gives incorrect
results, by erroneously clobbering the old value of A/C before saving it.

If the code were:
D = A + B + C
would you accept this:
-Copy A to D
-Add B into D
-Add C into D
? Would your answer change if B and D were the same register?

This way would result in A = 84 and foo.A = 84.
For the OP, we could try and see if that is what the compiler
did:
1) What is the value of 'd': d = (A = foo.A += A);
2) Does it make a difference if you force A to not be a register:
int *p = &A; (before the assignment)
3) Does it make a difference if foo.A is a register?
(to do this you probably will need to use a
"register int b;" instead of foo.A)
4) Can you paste the assembly generated for the assignment.

The results of this would be helpful in demonstrating that there's a
compiler bug, but it's still a compiler bug.


dave
 
A

Arthur J. O'Dwyer

Nope. Assignment operators associate right to left, so this parses as
`A = (st.A += A)'. The `st.A+=A' part has a well-defined value that
depends only on the old values of A and st.A, and that's what should be
stored in A (and also st.A).

Correct, except for the UB part.
(The value of A can't be accessed after it's been updated, since the
old value is required to determine the new value.)

Chapter and verse, please.
So the order of operations is:
sequence point
compute st.A+A (required for both of the following)
unordered
{
update value of st.A with result
update value of A with result

This is /obviously/ wrong, since by the second "result" you either mean
the result of st.A+A, which is /not/ necessarily the value stored in A (it
gets the result of st.A+=A, which may have a different type and even a
different value, depending on the types of A and st.A); or else you mean
the result of updating st.A with st.A+A, in which case you're mistaken
about the "unordered" part.
}
sequence point

-Arthur
 
D

Dave Vandervies

Correct, except for the UB part.


Chapter and verse, please.

"Can't be" is perhaps too strong. The implementation is required to
produce code that acts as if it isn't, though.

(All quotes from N869)

6.5.16#1 gives the syntax for assignment operators:
assignment-expr:
conditional-expr
unary-expr assignment-operator assignment-expr

assignment-operator: one of
= *= /= %= += -= <<= >>= &= ^= |=

So `st.A+=A' is the right operand of '=' in the expression `A=st.A+=A'.

6.5.16#3 (semantics of assignment operators):
[#3] An assignment operator stores a value in the object
designated by the left operand. An assignment expression
has the value of the left operand after the assignment, but
is not an lvalue. The type of an assignment expression is
the type of the left operand unless the left operand has
qualified type, in which case it is the unqualified version
of the type of the left operand. The side effect of
updating the stored value of the left operand shall occur
between the previous and the next sequence point.

So we're storing a value in A. What's that value?

6.5.16.1#2 (semantics for simple assignment):
[#2] In simple assignment (=), the value of the right
operand is converted to the type of the assignment
expression and replaces the value stored in the object
designated by the left operand.

So it's the value of the right operand, `st.A+=A', converted appropriately
(in this case, no conversion, since both st.A and A are ints).
What's that value?

6.5.16.2#3 (semantics for compound assignment):
[#3] A compound assignment of the form E1 op= E2 differs
from the simple assignment expression E1 = E1 op (E2) only
in that the lvalue E1 is evaluated only once.

So (with a stop back at 6.5.16#3 where the value of the expression is
defined to be the value stored in the left operand) 6.5.16.2#3 applies:
It's the new value of st.A, which is the value of `st.A + (A)' evaluated
before the values of the objects are updated[%], converted appropriately.

So the value stored in A is required to be the sum of the old value of
st.A and the old value of A, converted appropriately (to the type of
st.A and then to the type of A).

[%] I'm assuming without proof that `a=a+b' requires that the old value
of a be used as an operand to the addition. If you really want me
to, I can dig up chapter and verse for this too, but hopefully at
least this much is clear.

None of this violates 6.5#2:
[#2] Between the previous and next sequence point an object
shall have its stored value modified at most once by the
evaluation of an expression. Furthermore, the prior value
shall be accessed only to determine the value to be
stored.60) *

The prior values of both A and st.A are accessed only to determine the
value to be stored in st.A, and the value to be stored in A is determined
directly from this (and indirectly from the old values) without further
access to the objects.


So since everything can be reduced to storing well-defined values into
objects, and the non-store accesses are used only to determine those
values to be stored, the entire expression is well-defined and has
well-defined side effects.

This is /obviously/ wrong, since by the second "result" you either mean
the result of st.A+A, which is /not/ necessarily the value stored in A (it
gets the result of st.A+=A, which may have a different type and even a
different value, depending on the types of A and st.A);

In the code under discussion, both were ints, so no conversions were
necessary. But since we like pedantic nitpickery here:

The order of operations is:
sequence point
compute st.A+A (call it "result", for notational convenience)
unordered
{
store "result", converted appropriately, into st.A
store "result", converted appropriately, into A
(Note that this involves two (possibly no-op) conversions, to type
of st.A and then to type of A)
}
sequence point

or else you mean
the result of updating st.A with st.A+A, in which case you're mistaken
about the "unordered" part.

I'm trying and failing to see how that interpretation could be considered
sensible enough to not be immediately discarded in favor of the more
sensible alternative, even by the most pedantic nitpicker.


dave
 
M

Mark F. Haigh

Here's some code that's giving me differing results, depending
on the compiler.

A = foo.A += A; /* crucial statement */

The result given by gcc is what I expected. Is there any
ambiguity in the semantics of the crucial statement here?

No, there's not. 6.5.16:

3 An assignment operator stores a value in the object designated
by the left operand. An assignment expression has the value of
the left operand after the assignment, but is not an lvalue.
The type of an assignment expression is the type of the left
operand unless the left operand has qualified type, in which
case it is the unqualified version of the type of the left
operand. The side effect of updating the stored value of the
left operand shall occur between the previous and the next
sequence point.


The value of the += in 'foo.A += A' is well defined, even though the
state of the foo.A object itself is indeterminite until the next
sequence point. It is the value of the +=, not the value of foo.A,
that should be stored in A. This may be where the compiler gets it
wrong.

As you noted in another post, this is a compiler bug, and from the
looks of it, a relatively severe one. Hmmm.


Mark F. Haigh
(e-mail address removed)
 
J

j0mbolar

Nope. Assignment operators associate right to left, so this parses as
`A = (st.A += A)'. The `st.A+=A' part has a well-defined value that
depends only on the old values of A and st.A, and that's what should be
stored in A (and also st.A).
(The value of A can't be accessed after it's been updated, since the
old value is required to determine the new value.)

What I am unsure about is what the following means:

"The order of evaluation of the operands is unspecified.
If an attempt is made to modify the result of an assignment
operator or to access it after the next sequence point,
the behavior is undefined."

Which is 6.5.16#4

How can you modify the result of an assignment operator?
I don't think this is possible.

e.g. int i = 10;
int b;

b = ++i = i;

is how I interpret that.

additionally,
"to access the result of an assignment
/after/ the next sequent point is undefined."

That doesn't seem to make any sense.

e.g. int i = 10;
<before next sequence point>
i = i + 10;
<next sequence point>
i++; /* accessed the result of an
assignment after the next sequence point */


also, while on the topic of sequence points,

what happens with something like this:

foo = a + func(++a); ?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,141
Latest member
BlissKeto
Top