How to understand C FAQ 3.8?

M

mrby

Hi,

I was reading section 3 of the C FAQ:
http://www.eskimo.com/~scs/C-faq/top.html

I need more information about question 3.8:

==================
3.8: How can I understand these complex expressions? What's a
"sequence point"?

A: A sequence point is a point in time (at the end of the
evaluation of a full expression, or at the ||, &&, ?:, or comma
operators, or just before a function call) at which the dust
has settled and all side effects are guaranteed to be complete.
The ANSI/ISO C Standard states that

Between the previous and next sequence point an
object shall have its stored value modified at
most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed
only to determine the value to be stored.

The second sentence can be difficult to understand. It says
that if an object is written to within a full expression, any
and all accesses to it within the same expression must be for
the purposes of computing the value to be written. This rule
effectively constrains legal expressions to those in which the
accesses demonstrably precede the modification.

See also question 3.9 below.

References: ISO Sec. 5.1.2.3, Sec. 6.3, Sec. 6.6, Annex C;
Rationale Sec. 2.1.2.3; H&S Sec. 7.12.1 pp. 228-9.
=========================

I understand "a legal expression (i.e: not undefined)" as:
1) Any variable should be modified at most once.
2) The variable modified should not be referenced
"elsewhere" within the expression.

So ... I can tell the following code are undefined:

a = i++;
//i is modified and referenced "elsewhere"
//see FAQ 3.1.

i++*i++
// i is modified twice!

But how about *p++ ? It is commonly used and should not
be "undefined" behavior, but p is modified and the new
value is accessed.

This is conflict with my understanding and with FAQ 3.2.
It said that post ++ merely guarantees that the variable
will be incremented before the expression "finishes", how
can it gurantee that p is incremented right before the *
access? (i.e: Is there the possibility that * operator accesses
the old value of p)

I am not sure I have made myself clear.
Will some guy explain it for me?

Thanks,
mrby
 
E

Eric Sosman

mrby said:
[Quotation from the FAQ, itself quoting the Standard:]
Between the previous and next sequence point an
object shall have its stored value modified at
most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed
only to determine the value to be stored.

[... and here the FAQ's own text resumes:]
The second sentence can be difficult to understand. It says
that if an object is written to within a full expression, any
and all accesses to it within the same expression must be for
the purposes of computing the value to be written. This rule
effectively constrains legal expressions to those in which the
accesses demonstrably precede the modification.

[... and this is mrby]:
So ... I can tell the following code are undefined:

a = i++;
//i is modified and referenced "elsewhere"
//see FAQ 3.1.

i++*i++
// i is modified twice!

But how about *p++ ? It is commonly used and should not
be "undefined" behavior, but p is modified and the new
value is accessed. [...]


(You mean the "old" value, I'm sure.) The object `p'
is used only once here, as the operand of the `++' operator.
The expression `p++' produces a value, and that value is
the operand of the `*' operator. `p++' is not an object,
just as `p+1' is not an object.

The "only to determine the value to be stored" language
prohibits things like `i++ * i'. The "only one change" rule
would permit this, but since we don't know when the change
actually takes place we don't know what value we'll get for
the right-hand `i'. (Indeed, there's not even a guarantee
that `i' will have *any* value; optimizers may well generate
truly bizarre results from illegal constructs like this.)

If you are scrupulous in making a distinction between
an object and the value that happens to occupy it at the
moment, I think much of your confusion will disappear.
 
M

Merrill & Michele

"Eric Sosman"
mrby wrote:
[Quotation from the FAQ, itself quoting the Standard:]
Between the previous and next sequence point an
object shall have its stored value modified at
most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed
only to determine the value to be stored.

[... and here the FAQ's own text resumes:]
The second sentence can be difficult to understand. It says
that if an object is written to within a full expression, any
and all accesses to it within the same expression must be for
the purposes of computing the value to be written. This rule
effectively constrains legal expressions to those in which the
accesses demonstrably precede the modification.

[... and this is mrby]:
So ... I can tell the following code are undefined:

a = i++;
//i is modified and referenced "elsewhere"
//see FAQ 3.1.

i++*i++
// i is modified twice!

But how about *p++ ? It is commonly used and should not
be "undefined" behavior, but p is modified and the new
value is accessed. [...]


(You mean the "old" value, I'm sure.) The object `p'
is used only once here, as the operand of the `++' operator.
The expression `p++' produces a value, and that value is
the operand of the `*' operator. `p++' is not an object,
just as `p+1' is not an object.

The "only to determine the value to be stored" language
prohibits things like `i++ * i'. The "only one change" rule
would permit this, but since we don't know when the change
actually takes place we don't know what value we'll get for
the right-hand `i'. (Indeed, there's not even a guarantee
that `i' will have *any* value; optimizers may well generate
truly bizarre results from illegal constructs like this.)


Can you re-visit the first sentence of the above paragraph? MPJ
 
J

Joona I Palaste

Eric Sosman said:
mrby said:
[Quotation from the FAQ, itself quoting the Standard:]
Between the previous and next sequence point an
object shall have its stored value modified at
most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed
only to determine the value to be stored.

[... and here the FAQ's own text resumes:]
The second sentence can be difficult to understand. It says
that if an object is written to within a full expression, any
and all accesses to it within the same expression must be for
the purposes of computing the value to be written. This rule
effectively constrains legal expressions to those in which the
accesses demonstrably precede the modification.

[... and this is mrby]:
So ... I can tell the following code are undefined:

a = i++;
//i is modified and referenced "elsewhere"
//see FAQ 3.1.

i++*i++
// i is modified twice!

But how about *p++ ? It is commonly used and should not
be "undefined" behavior, but p is modified and the new
value is accessed. [...]

(You mean the "old" value, I'm sure.) The object `p'
is used only once here, as the operand of the `++' operator.
The expression `p++' produces a value, and that value is
the operand of the `*' operator. `p++' is not an object,
just as `p+1' is not an object.
The "only to determine the value to be stored" language
prohibits things like `i++ * i'. The "only one change" rule
would permit this, but since we don't know when the change
actually takes place we don't know what value we'll get for
the right-hand `i'. (Indeed, there's not even a guarantee
that `i' will have *any* value; optimizers may well generate
truly bizarre results from illegal constructs like this.)
If you are scrupulous in making a distinction between
an object and the value that happens to occupy it at the
moment, I think much of your confusion will disappear.

Actually, if merely modifying an object and accessing its new value
caused undefined behaviour, modifying objects would be pretty damn
near impossible. Look at this code:

int i=0;
i=1;

The second line modifies the object i and then accesses it, returning
its new value 1.

--
/-- Joona Palaste ([email protected]) ------------- Finland --------\
\-------------------------------------------------------- rules! --------/
"Parthenogenetic procreation in humans will result in the founding of a new
religion."
- John Nordberg
 
F

Flash Gordon

Eric Sosman said:
mrby said:
[Quotation from the FAQ, itself quoting the Standard:]
Between the previous and next sequence point an
object shall have its stored value modified at
most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed
only to determine the value to be stored.

[... and here the FAQ's own text resumes:]
The second sentence can be difficult to understand. It says
that if an object is written to within a full expression, any
and all accesses to it within the same expression must be for
the purposes of computing the value to be written. This rule
effectively constrains legal expressions to those in which the
accesses demonstrably precede the modification.
The "only to determine the value to be stored" language
prohibits things like `i++ * i'. The "only one change" rule
would permit this, but since we don't know when the change
actually takes place we don't know what value we'll get for
the right-hand `i'. (Indeed, there's not even a guarantee
that `i' will have *any* value; optimizers may well generate
truly bizarre results from illegal constructs like this.)
If you are scrupulous in making a distinction between
an object and the value that happens to occupy it at the
moment, I think much of your confusion will disappear.

Actually, if merely modifying an object and accessing its new value
caused undefined behaviour, modifying objects would be pretty damn
near impossible. Look at this code:

int i=0;
i=1;

The second line modifies the object i and then accesses it, returning
its new value 1.

Is it accessing the value of the object "i" or is it accessing the value
of the expression "i=1" ?
 
J

Jack Klein

Eric Sosman said:
mrby said:
[Quotation from the FAQ, itself quoting the Standard:]
Between the previous and next sequence point an
object shall have its stored value modified at
most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed
only to determine the value to be stored.

[... and here the FAQ's own text resumes:]
The second sentence can be difficult to understand. It says
that if an object is written to within a full expression, any
and all accesses to it within the same expression must be for
the purposes of computing the value to be written. This rule
effectively constrains legal expressions to those in which the
accesses demonstrably precede the modification.

[... and this is mrby]:
So ... I can tell the following code are undefined:

a = i++;
//i is modified and referenced "elsewhere"
//see FAQ 3.1.

i++*i++
// i is modified twice!

But how about *p++ ? It is commonly used and should not
be "undefined" behavior, but p is modified and the new
value is accessed. [...]

(You mean the "old" value, I'm sure.) The object `p'
is used only once here, as the operand of the `++' operator.
The expression `p++' produces a value, and that value is
the operand of the `*' operator. `p++' is not an object,
just as `p+1' is not an object.
The "only to determine the value to be stored" language
prohibits things like `i++ * i'. The "only one change" rule
would permit this, but since we don't know when the change
actually takes place we don't know what value we'll get for
the right-hand `i'. (Indeed, there's not even a guarantee
that `i' will have *any* value; optimizers may well generate
truly bizarre results from illegal constructs like this.)
If you are scrupulous in making a distinction between
an object and the value that happens to occupy it at the
moment, I think much of your confusion will disappear.

Actually, if merely modifying an object and accessing its new value
caused undefined behaviour, modifying objects would be pretty damn
near impossible. Look at this code:

int i=0;
i=1;

The second line modifies the object i and then accesses it, returning
its new value 1.


You need read up on the definition of the assignment operators. They
most specifically are not required to read back the value assigned,
although an implementation could unless the value was volatile. If
'i' was a volatile int and the expression "i = 1;" generated code that
caused the value of 'i' to be read after the assignment was made would
not be conforming.

As assignment expression returns the value that (is/will be) assigned
to destination, not the value of the destination after the assignment.

The value of the expression "i=1" is the int value 1, not the rvalue
generated by reading the lvalue 'i'.
 
C

Chris Torek

[Assignment operators] most specifically are not required to read
back the value assigned, although an implementation could unless
the value was volatile. If 'i' was a volatile int and the expression
"i = 1;" generated code that caused the value of 'i' to be read
after the assignment was made would not be conforming.

As an aside (on the "volatile" issue), I recall some discussion
either here or in comp.std.c on code of the form:

volatile int *p;
int status;
...
status = *p = 0;
if ((status & READY) == 0)
return HARDWARE_NOT_PRESENT_OR_NOT_RESPONDING;

Suppose "p" here points to a control-and-status register in
hardware, where writing 0 to *p resets the device, and a subsequent
read should return with the READY bit set. In real code, I would
write this as:

*p = 0;
status = *p;
if ((status & READY) == 0)
return HARDWARE_NOT_PRESENT_OR_NOT_RESPONDING;

rather than combining the lines -- but the question and discussion
was about whether an implementation had to read back *p, or could
simply set "status" to 0, or in fact *had* to set "status" to 0.
Was there any final consensus? Your implication here is that status
must be 0, even if *p would have the READY bit set. (My own opinion
is that the implementation-defined-ness of "what constitutes an
access" to volatile things makes the whole question at best somewhat
academic, but I am curious about the "re-read prohibition"
implication.)

Back on topic, we *can* say for certain that code of the form:

int i;
unsigned char c;

i = c = 257;

sets i to 1, not 257, when UCHAR_MAX is 255. That is, the value
undergoes any conversions implied by the type of the left hand side
of the right hand assignment: since c is set to 1, i must also be
set to 1. Here c is not volatile, so we may assume that there are
no behind-the-scenes hardware machinations making it act like
something other than ordinary RAM; hence, a compiler is free to
re-read c or not, as long as i gets the value 1. Or as you put
it:
As assignment expression returns the value that (is/will be) assigned
to destination, not the value of the destination after the assignment.

The value of the expression "i=1" is the int value 1, not the rvalue
generated by reading the lvalue 'i'.

When there are no "volatile"s involved, the difference between "the
int value 1" and "the rvalue generated by reading the lvalue 'i'"
is effectively invisible. But the C Standard prohibits the
*programmer* from re-reading i without a sequence point, and since:

j = i = 0;

is well-defined, it must at least *logically* not be a re-read
(which I think is how the whole "volatile" question came up in
the first place).
 
E

Eric Sosman

Merrill said:
Can you re-visit the first sentence of the above paragraph? MPJ

With pleasure:

The "only to determine the value to be stored" language
prohibits things like `i++ * i'.

Does that help?
 
M

Merrill & Michele

"Eric Sosman"

With pleasure:

The "only to determine the value to be stored" language
prohibits things like `i++ * i'.

Does that help?

Yeah. For all the things that get said and don't get said around here, that
part seemed important. I wanted to make sure that is what you wanted to
say. MPJ
 
J

Joona I Palaste

You need read up on the definition of the assignment operators. They
most specifically are not required to read back the value assigned,
although an implementation could unless the value was volatile. If
'i' was a volatile int and the expression "i = 1;" generated code that
caused the value of 'i' to be read after the assignment was made would
not be conforming.
As assignment expression returns the value that (is/will be) assigned
to destination, not the value of the destination after the assignment.
The value of the expression "i=1" is the int value 1, not the rvalue
generated by reading the lvalue 'i'.

Now that I have read this, I have understood that the original cause for
this discussion, the expression *p++, fits into these rules quite well
too. Assume an environment with a linear memory model, which defines
and supports casting integers into pointer values. (This is for
simplicity of explanation.) Now let p be an int pointer with the value
(int *)0xDEADBEEF. Now we calculate *p++. We first calculate p++, which
causes p to get the value (int *)0xDEADBEF0, and returns the value
(int *)0xDEADBEEF. Then we indirect through this value, getting a value
of type int, in the exact same way as we would calculate
*((int *)0xDEADBEEF).
 
K

Keith Thompson

Joona I Palaste said:
Now that I have read this, I have understood that the original cause for
this discussion, the expression *p++, fits into these rules quite well
too. Assume an environment with a linear memory model, which defines
and supports casting integers into pointer values. (This is for
simplicity of explanation.) Now let p be an int pointer with the value
(int *)0xDEADBEEF. Now we calculate *p++. We first calculate p++, which
causes p to get the value (int *)0xDEADBEF0, and returns the value
(int *)0xDEADBEEF. Then we indirect through this value, getting a value
of type int, in the exact same way as we would calculate
*((int *)0xDEADBEEF).

As long as you're showing concrete values for clarity, you should use
more plausible values. Arithmetic on int* values is scaled by
sizeof(int). If (int*)0xDEADBEEF is valid, incrementing it will
probably yield (int*)0xDEADBEF3, assuming sizeof(int)==4. (On the
other hand, 0xDEADBEEF is misaligned on many systems; incrementing
0xDEADBEF0 to 0xDEADBEF4 is more realistic.)

None of which invalidates your point, of course.
 
J

Jack Klein

[Assignment operators] most specifically are not required to read
back the value assigned, although an implementation could unless
the value was volatile. If 'i' was a volatile int and the expression
"i = 1;" generated code that caused the value of 'i' to be read
after the assignment was made would not be conforming.

As an aside (on the "volatile" issue), I recall some discussion
either here or in comp.std.c on code of the form:

volatile int *p;
int status;
...
status = *p = 0;
if ((status & READY) == 0)
return HARDWARE_NOT_PRESENT_OR_NOT_RESPONDING;

Suppose "p" here points to a control-and-status register in
hardware, where writing 0 to *p resets the device, and a subsequent
read should return with the READY bit set. In real code, I would
write this as:

*p = 0;
status = *p;
if ((status & READY) == 0)
return HARDWARE_NOT_PRESENT_OR_NOT_RESPONDING;

rather than combining the lines -- but the question and discussion
was about whether an implementation had to read back *p, or could
simply set "status" to 0, or in fact *had* to set "status" to 0.
Was there any final consensus? Your implication here is that status
must be 0, even if *p would have the READY bit set. (My own opinion
is that the implementation-defined-ness of "what constitutes an
access" to volatile things makes the whole question at best somewhat
academic, but I am curious about the "re-read prohibition"
implication.)

That's another topic on which the unacceptably vague wording in the C
standard tends to be waved away by committee members who participate
in comp.std.c.
Back on topic, we *can* say for certain that code of the form:

int i;
unsigned char c;

i = c = 257;

sets i to 1, not 257, when UCHAR_MAX is 255. That is, the value
undergoes any conversions implied by the type of the left hand side
of the right hand assignment: since c is set to 1, i must also be
set to 1. Here c is not volatile, so we may assume that there are
no behind-the-scenes hardware machinations making it act like
something other than ordinary RAM; hence, a compiler is free to
re-read c or not, as long as i gets the value 1. Or as you put
it:


When there are no "volatile"s involved, the difference between "the
int value 1" and "the rvalue generated by reading the lvalue 'i'"
is effectively invisible. But the C Standard prohibits the
*programmer* from re-reading i without a sequence point, and since:

j = i = 0;

is well-defined, it must at least *logically* not be a re-read
(which I think is how the whole "volatile" question came up in
the first place).

In this case, the poor wording is:

"An assignment expression has the value of the left operand after the
assignment, but is not an lvalue."

It could be construed to say that the value is read back from the left
operand after being stored, which is certainly not the intent. You
can determine this from the final "but it is not an lvalue" clause.
Since it is not an lvalue, it must be that concept formerly known as
an rvalue.

You can also infer this from:

"The side effect of updating the stored value of the left operand
shall occur between the previous and the next sequence point."

I would much prefer wording like:

"An assignment expression has the value of the right operand after
conversion to the type of the left hand operand, and it an rvalue."

....except of course the term "rvalue" has been banished from the
standard and replaced, as indicated in a footnote I can't be bothered
to look up at the moment, with the phrase "the value of an
expression".

Hence the use of "is not an lvalue" in place of "is an rvalue", as the
new phraseology would yield:

"An assignment expression has the value of the left operand after the
assignment, but is the value of the expression."
 
D

Dave Vandervies

[Assignment operators] most specifically are not required to read
back the value assigned, although an implementation could unless
the value was volatile. If 'i' was a volatile int and the expression
"i = 1;" generated code that caused the value of 'i' to be read
after the assignment was made would not be conforming.

As an aside (on the "volatile" issue), I recall some discussion
either here or in comp.std.c on code of the form:

volatile int *p;
int status;
...
status = *p = 0;
if ((status & READY) == 0)
return HARDWARE_NOT_PRESENT_OR_NOT_RESPONDING;

Suppose "p" here points to a control-and-status register in
hardware, where writing 0 to *p resets the device, and a subsequent
read should return with the READY bit set. In real code, I would
write this as:

*p = 0;
status = *p;
if ((status & READY) == 0)
return HARDWARE_NOT_PRESENT_OR_NOT_RESPONDING;

rather than combining the lines -- but the question and discussion
was about whether an implementation had to read back *p, or could
simply set "status" to 0, or in fact *had* to set "status" to 0.
Was there any final consensus? Your implication here is that status
must be 0, even if *p would have the READY bit set.

I don't think it's the discussion you're thinking of, but a bit back we
were discussing here whether assignments of the form
A = B += A
invoked undefined behavior by accessing A multiple times other than to
determine the value to be stored in it.

The thread starts with <[email protected]>;
Google seems to have replaced their working and familiar interface with
a new one that still has a few glitches, but you should still be able
to find it at
http://groups.google.com/[email protected]

In <[email protected]> (should be findable at
http://groups.google.com/[email protected] )
I gave quotes from N869 to support my claim that giving results other
than the equivalent of
B += A;
A = B;
is a compiler bug.

This applies to a read access to A-as-right-operand-of-+= and not of
A-as-left-operand-of-= in this expression, but since everything happens
between sequence points, it's not immediately clear that it'd be any
more valid to re-access the left operand to determine the value of the
expression than to re-access the right operand.


dave
(Or I might just need another coffee...)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top