expressions and order of evaluation

T

Taras_96

Hi everyone,

I have a couple of questions about expressions and the subsequent
order of evaluation.

Firstly, as I understand it, an expression is what wiki defines it to
be: "An expression in a programming language is a combination of
values, variables, operators, and functions that are interpreted
(evaluated) according to the particular rules of precedence and of
association for a particular programming language, which computes and
then produces (returns, in a stateful environment) another value".

I am also aware that expressions can be grouped into sub-expressions.

eg: 2 + 3 + 4 can be thought of as two expressions, (2+3) & 5 + 4

According to TCPL (Bjarne Stroustrup), the order of evaluation of sub-
expressions is not defined. Because of this, I'm guessing that how a
compound expression is decomposed into a sub-expression can effect
whether an expression is defined or not. How is a compound expression
decomposed into atomic sub-expressions? Is it according to precedence
& associativity rules?

eg: the addition operator is right-associative. If we have the
expression:

f() + g() + h()

Which, because the addition operator is left associative, can be
written (into 'atomic' sub-expressions) as

( f() + g() ) + h()

Then is it the case that we can't guarantee whether h() gets evaluated
first or second, but *can* we guarantee that IF f() IS evaluated
before h(), then the expression ( f() + g() ) will be evaluated before
h() (ie: sub-expressions get evaluated as a complete unit)?
--------------------------------------------

Bjarne also writes that the expression

int i = 1;

v = v[i++];

is undefined: "may be evaluated as either v [1 ]=1 or v [2 ]=1 or may
cause some even stranger behavior."

This comes as a surprise, as I would have thought that the RHS of an
assignment operator would be evaluated before the LHS!

By the same virtue, does it mean that v = v[++i] is also undefined?
(if my understanding of post/pre fix operators is correct).

Does this mean that the following is also undefined?

class A
{
int a;
public:
int & returnA( return a;}
}

returnA() = 3 + 2;

as we don't know whether returnA() is evaluated first or 3+2 is
evaluated first? I would think the 'logical' choice would be that 3+2
would be evaluated first...
------------------

A c++ expression always evaluates to a value, so x = 3 + 2 is an
expression that actually evaluates to 5... so in y=x=3+2, y get's
assigned the value of (x = 3 + 2), which is 5. Is it always the case
that the evaluated value for a statement involving an assignment
operator is always what both individual sides evaluate to (I assume
that this is the case)? I also guess that this does not hold true for
user defined types, as theoretically you could return anything from
the operator= function
------------------

Finally, at http://tiny.cc/6C2UA, Victor Bazarov states that
expression x[x[0]] = 2 is not conforming "for one simple reason:
you're accessing and changing the value of x[0] in the same
expression."

x[0] = 0;
x[2] = 123;
x[x[0]] = 2;

Could someone explain this a bit further? Going on my previous 'guess'
at precedence, I suppose that something like:

x[0] = (x[0] + 2);

would be defined, but:

x[0] = x[0] + 2

would not

I'm impressed if you've read this far :)

Taras
 
A

Andrey Tarasevich

Taras_96 said:
...
According to TCPL (Bjarne Stroustrup), the order of evaluation of sub-
expressions is not defined.

That's true, as long as we are talking about built-in operators. In the
world of user-defined operators things are "more ordered", so to say,
but even there there's quite a bit of freedom for implementation.
Because of this, I'm guessing that how a
compound expression is decomposed into a sub-expression can effect
whether an expression is defined or not. How is a compound expression
decomposed into atomic sub-expressions? Is it according to precedence
& associativity rules?

While the actual order of evaluation is not defined, the decomposition
into subexpressions is always defined. It is defined by the language
grammar. It is also defined by the rules of associativity and
precedence, which are just a semi-informal attempt to express the
grammar in more simplified form.

One can argue that the "grammar" is a purely _syntactical_ construct,
while the decomposition in question is a _semantical_ one. This is true.
But the language specification explicitly bridges the two by stating
that operator-operand grouping is identical to syntactical grouping
defined by the grammar. In other words, the grammar is intentionally and
specifically built to reflect the semantical relationships existing in
the language.
eg: the addition operator is right-associative. If we have the
expression:

f() + g() + h()

Which, because the addition operator is left associative, can be
written (into 'atomic' sub-expressions) as

( f() + g() ) + h()

Yes. This describes the "meaning" of the expression with the one and
only one purpose: to define the correct result. This is what tells us
that 2+2*2 is 2+(2*2)=6 and not (2+2)*2=8.
Then is it the case that we can't guarantee whether h() gets evaluated
first or second, but *can* we guarantee that IF f() IS evaluated
before h(), then the expression ( f() + g() ) will be evaluated before
h() (ie: sub-expressions get evaluated as a complete unit)?

No. Absolutely not. Firstly, the order in which the operands of the
expression are prepared is never defined. So 'f()', 'g()' and 'h()' can
be called in absolutely any order. Let's say 'F', 'G' and 'H' are the
results of these calls. Secondly, the only thing that this grouping
tells us is that the correct result is the one that we'd obtain if we
first evaluated 'F + G' and then added the 'H' to the intermediate
result. However, the implementation is free to evaluate it in any order
at all, as long as the result is correct. The implementation is free to
do 'G + H' first and then add 'F' it it is sure that the result will
remain correct.
--------------------------------------------

Bjarne also writes that the expression

int i = 1;

v = v[i++];

is undefined: "may be evaluated as either v [1 ]=1 or v [2 ]=1 or may
cause some even stranger behavior."

This comes as a surprise, as I would have thought that the RHS of an
assignment operator would be evaluated before the LHS!


Why? Firstly, there's absolutely no reason to evaluate RHS before LHS.
Secondly, the big problem here is the side effect of 'i++'. Side effects
can take place at any moment before the next sequence point, which
ruins the above regardless of the order of evaluation.
By the same virtue, does it mean that v = v[++i] is also undefined?
(if my understanding of post/pre fix operators is correct).
Yes.

Does this mean that the following is also undefined?

class A
{
int a;
public:
int & returnA( return a;}
}

returnA() = 3 + 2;


The code is invalid. Did you mean

A a;
a.returnA() = 3 + 2;

And this is perfectly valid.
as we don't know whether returnA() is evaluated first or 3+2 is
evaluated first? I would think the 'logical' choice would be that 3+2
would be evaluated first...

But it doesn't matter! Why do you even care which one is evaluated
first? The LHS of a built-in assignment is an lvalue, it says _where_ to
put the result. The RHS of a built-in assignment is an rvalue, it says
_what_ the result is. It doesn't matter at all which one we determine
first and which one we determine second.
A c++ expression always evaluates to a value, so x = 3 + 2 is an
expression that actually evaluates to 5...

Well... Not exactly. This expression evaluates to '(X) (3+2)', where 'X'
is the type of 'x'. You didn't say what it is. If 'x' is of type 'int',
then yes, it evaluates to 5.
so in y=x=3+2, y get's
assigned the value of (x = 3 + 2), which is 5.

Yes (taking into account the above)
Is it always the case
that the evaluated value for a statement involving an assignment
operator is always what both individual sides evaluate to (I assume
that this is the case)?

Sorry, I don't understand the question.
I also guess that this does not hold true for
user defined types, as theoretically you could return anything from
the operator= function

User-defined functions is a completely different world, as far as this
issue is concerned.
------------------

Finally, at http://tiny.cc/6C2UA, Victor Bazarov states that
expression x[x[0]] = 2 is not conforming "for one simple reason:
you're accessing and changing the value of x[0] in the same
expression."

If 'x[0]' is originally '0' then yes, you do indeed access and change
value of the same object. Note, that in C++ it is not necessarily a
problem. You are _allowed_ to access and change value of the same
object, as long as the "accessing" is done for the purpose of
"changing", i.e. for determining the new value of the object. This is
not a very precise definition, which is why the validity of 'x[x[0]] =
2' has been open question in C/C++ world for a while. Unfortunately, I
don't know whether it's been resolved already.
Could someone explain this a bit further? Going on my previous 'guess'
at precedence, I suppose that something like:

x[0] = (x[0] + 2);

would be defined, but:

x[0] = x[0] + 2

would not

No. Redundant parentheses never have any effect on the "definedness" of
an expression with built-in operators in C++. The 'x[0] = x[0] + 2'
expression is always perfectly well-defined, because of what I mentioned
above: we clearly access the old value of 'x[0]' for the sole purpose of
determining the new value of 'x[0]'. This makes it valid. But in
'x[x[0]] = 2' it is not as clear, which is what makes 'x[x[0]] = 2'
different.
 
E

Erik Wikström

Hi everyone,

I have a couple of questions about expressions and the subsequent
order of evaluation.

Firstly, as I understand it, an expression is what wiki defines it to
be: "An expression in a programming language is a combination of
values, variables, operators, and functions that are interpreted
(evaluated) according to the particular rules of precedence and of
association for a particular programming language, which computes and
then produces (returns, in a stateful environment) another value".

Well, when talking C++ an expression is what the standard defines it to
be, regardless of what it might be in other contexts.
I am also aware that expressions can be grouped into sub-expressions.

eg: 2 + 3 + 4 can be thought of as two expressions, (2+3) & 5 + 4

According to TCPL (Bjarne Stroustrup), the order of evaluation of sub-
expressions is not defined. Because of this, I'm guessing that how a
compound expression is decomposed into a sub-expression can effect
whether an expression is defined or not. How is a compound expression
decomposed into atomic sub-expressions? Is it according to precedence
& associativity rules?
Yes.

eg: the addition operator is right-associative. If we have the
expression:

f() + g() + h()

Which, because the addition operator is left associative, can be
written (into 'atomic' sub-expressions) as

( f() + g() ) + h()

Then is it the case that we can't guarantee whether h() gets evaluated
first or second, but *can* we guarantee that IF f() IS evaluated
before h(), then the expression ( f() + g() ) will be evaluated before
h() (ie: sub-expressions get evaluated as a complete unit)?

The order of evaluation of subexpressions is undefined, so any ordering
is possible. Obviously the expression ( f() + g() ) has to be evaluated
before the full expression, but there is no requirement on the order of
evaluation of f(), g(), and h().
Bjarne also writes that the expression

int i = 1;

v = v[i++];

is undefined: "may be evaluated as either v [1 ]=1 or v [2 ]=1 or may
cause some even stranger behavior."

This comes as a surprise, as I would have thought that the RHS of an
assignment operator would be evaluated before the LHS!


No, both RHS and LHS are both subexpressions and their order of
evaluation is not defined.
By the same virtue, does it mean that v = v[++i] is also undefined?
(if my understanding of post/pre fix operators is correct).
Yes.

Does this mean that the following is also undefined?

class A
{
int a;
public:
int & returnA( return a;}
}

returnA() = 3 + 2;

as we don't know whether returnA() is evaluated first or 3+2 is
evaluated first? I would think the 'logical' choice would be that 3+2
would be evaluated first...


No, the order of evaluation is not defined, but it does not have to be
for the semantics to be well defined.
A c++ expression always evaluates to a value, so x = 3 + 2 is an
expression that actually evaluates to 5... so in y=x=3+2, y get's
assigned the value of (x = 3 + 2), which is 5. Is it always the case
that the evaluated value for a statement involving an assignment
operator is always what both individual sides evaluate to (I assume
that this is the case)? I also guess that this does not hold true for
user defined types, as theoretically you could return anything from
the operator= function

y = x = 2 + 3 is the same as y = (x = 2 + 3), now x = 2 + 3 evaluates to
x (which will have the value of 5), so the whole expression is equal to
y = x, and evaluates to y, which is also 5.

When it comes to user defined types they do not use assignment, rather
they use operator= which is a function that returns whatever the user wants.

class U; // Some class with operator=(int) defined
U u;
u = 5; // Same as u.operator=(5);
Finally, at http://tiny.cc/6C2UA, Victor Bazarov states that
expression x[x[0]] = 2 is not conforming "for one simple reason:
you're accessing and changing the value of x[0] in the same
expression."

x[0] = 0;
x[2] = 123;
x[x[0]] = 2;

Could someone explain this a bit further?

I can not explain it better than Victor, accessing and changing the
value of x[0] in the same expression is not legal, just like i = i++ is
illegal.
Going on my previous 'guess' at precedence, I suppose that something
like:

x[0] = (x[0] + 2);

would be defined, but:

x[0] = x[0] + 2

They are both equal, the LHS is already a sub-expression adding the
parenthesis will change nothing.
 
J

James Kanze

I have a couple of questions about expressions and the
subsequent order of evaluation.
Firstly, as I understand it, an expression is what wiki
defines it to be: "An expression in a programming language is
a combination of values, variables, operators, and functions
that are interpreted (evaluated) according to the particular
rules of precedence and of association for a particular
programming language, which computes and then produces
(returns, in a stateful environment) another value".

In C++, an expression doesn't necessarily return a value, or
even return. (In C++, "(void)0" and "throw something" are
expressions.)
I am also aware that expressions can be grouped into
sub-expressions.
eg: 2 + 3 + 4 can be thought of as two expressions, (2+3) & 5 + 4
According to TCPL (Bjarne Stroustrup), the order of evaluation
of sub-expressions is not defined. Because of this, I'm
guessing that how a compound expression is decomposed into a
sub-expression can effect whether an expression is defined or
not. How is a compound expression decomposed into atomic
sub-expressions? Is it according to precedence & associativity
rules?
eg: the addition operator is right-associative. If we have the
expression:
f() + g() + h()
Which, because the addition operator is left associative, can
be written (into 'atomic' sub-expressions) as
( f() + g() ) + h()
Then is it the case that we can't guarantee whether h() gets
evaluated first or second, but *can* we guarantee that IF f()
IS evaluated before h(), then the expression ( f() + g() )
will be evaluated before h() (ie: sub-expressions get
evaluated as a complete unit)?

No. A compiler is allowed to evaluate all of the functions
first, then do the two additions. The only things constraining
ordering are direct dependencies (f() and g() must both be
evaluated before their results are added) and something called
sequence points. Sequence points only introduce a partial
ordering, however, and very few operators induce a sequence
point.
int i = 1;
v = v[i++];

is undefined: "may be evaluated as either v [1 ]=1 or v [2 ]=1
or may cause some even stranger behavior."
This comes as a surprise, as I would have thought that the RHS
of an assignment operator would be evaluated before the LHS!

There are two issues here. The first is that the compiler can
evaluate either side of the assignment first, or even do parts
of one side, then parts of the other. (Note too that while Java
does impose an order, it requires the left hand side of an
assignment to be evaluated first. I'm not sure from where you
get the idea that that right hand side should be evaluated
first.)

The second is that the language imposes some constraints
regarding what a legal program can do in an expression. In
particular "between the previous and next sequence point a
scalar object shall have its stored value modified at most once
by the evaluation of an expression. Furthermore, the prior
value shall be accessed only to determine the value to be
stored." Otherwise, you get undefined behavior. Since your
expression contains no sequence points, it's undefined behavior.
(I've probably forgotten some exotic cases, but the end of the
full expression, the &&, || ?: and comma operators, and a
function call or a return are sequence points. Note that they
don't necessarily impose a full ordering; a compiler can still
intermingle the evaluation of a functions arguments with other
parts of the expression; all of the side effects of evaluating
the function's arguments must occur before the function is
called, however.)
By the same virtue, does it mean that v = v[++i] is also
undefined? (if my understanding of post/pre fix operators is
correct).


Yes. For the same reasons.
Does this mean that the following is also undefined?
class A
{
int a;
public:
int & returnA( return a;}
}
returnA() = 3 + 2;
as we don't know whether returnA() is evaluated first or 3+2
is evaluated first? I would think the 'logical' choice would
be that 3+2 would be evaluated first...

In this case, it almost certainly will be---I don't know of a
compile that won't evaluate 3+2 at compile time:). More
generally, however, there's no real reason; one popular strategy
is to evaluate which ever side requires the most registers
first.

No, you can have void expressions in C++.
so x = 3 + 2 is
an expression that actually evaluates to 5...

An important side note here: an expression has a type; if the
type isn't void, it also has a value. The type can affect the
results: if x is an unsigned char, and UCHAR_MAX is 255, then
the results of
x = 1024
are 0, since the expression has type unsigned char, and the
results must be a value which can be represented in an unsigned
char.
so in y=x=3+2, y get's assigned the value of (x = 3 + 2),
which is 5. Is it always the case that the evaluated value for
a statement involving an assignment operator is always what
both individual sides evaluate to (I assume that this is the
case)?

Sort of, depending on what you consider the "right hand side".
After evaluating the right hand side, the resulting value (which
has a type) is converted to the type of the left hand side, and
it is the converted value which is assigned, and is the value of
the expression.
I also guess that this does not hold true for user defined
types, as theoretically you could return anything from the
operator= function

Correct. More generally, precedence and associativity determine
how the expression is parsed. Then overload resolution is
applied to each operator. If overload resolution resolves to a
user defined operator, all further considerations deal with the
function. This includes the return type, lvalue-ness (and
lvalue requirements) and sequence points.
------------------
Finally, at http://tiny.cc/6C2UA, Victor Bazarov states that
expression x[x[0]] = 2 is not conforming "for one simple reason:
you're accessing and changing the value of x[0] in the same
expression."

That's a tricky one (supposing x[0] contains 0). Technically,
according to the current wording, I think he's right; if x[0]
contains 0, you're modifying x[0], and you're accessing it other
than to determine the value to be stored (which is independent
of the value of x[0]). In this case, I'm not sure that this was
intended (you obviously can't modify x[0] until you've read it).

Everything concerning ordering constraints, etc., has recently
been rewritten to take multithreaded environments into
consideration; the next version of the standard uses the concept
of sequencing (an operation is sequenced before, unsequenced or
indeterminately sequenced), rather than sequence points. I've
not read it in enough detail to be sure, but I think it will
result in the above being defined. There is a sentence: "The
value computations of the operands of an operator are sequenced
before the value computation of the result of the operator." And
the "undefined behavior" being considered here is defined by the
following sentence "If a side effect on a scalar object is
unsequenced relative to either a different side effect on the
same scalar object or a value computation using the value of the
same scalar object, the behavior is undefined." If I understand
this correctly, in the above expression, the evaluation of x[0]
is part of the value computations of the operands of the
assignment operator (along with x[x[0]]), and so is sequenced
before the actual assignment. Which means that the expression
does not contain undefined behavior.
x[0] = 0;
x[2] = 123;
x[x[0]] = 2;
Could someone explain this a bit further?

There's no real logical explination. It's just what the
standard says.
 
J

James Kanze

On Nov 23, 10:36 pm, Andrey Tarasevich <[email protected]>
wrote:

[...]
Finally, athttp://tiny.cc/6C2UA, Victor Bazarov states that
expression x[x[0]] = 2 is not conforming "for one simple
reason: you're accessing and changing the value of x[0] in
the same expression."
If 'x[0]' is originally '0' then yes, you do indeed access and
change value of the same object. Note, that in C++ it is not
necessarily a problem. You are _allowed_ to access and change
value of the same object, as long as the "accessing" is done
for the purpose of "changing", i.e. for determining the new
value of the object. This is not a very precise definition,
which is why the validity of 'x[x[0]] = 2' has been open
question in C/C++ world for a while. Unfortunately, I don't
know whether it's been resolved already.

I don't think that there's any question with regards to what the
standard says. It says very explicitly "the prior value shall
be accessed only to determine the value to be stored". (The C
standard says exactly the same thing.) Nothing there about
determining where the storage will take place.

Whether this is intentional or not is another question. It
doesn't matter, however, since the next version of the standard
defines this in radically different terms. And from a first,
very quick reading, I gather that this expression *will* be
allowed; the wording doesn't talk about "value to be stored" but
"the value computations of the operands of an operator".
 
T

taras.diakiw

Hi everyone,

First off, thanks for all of the detailed replies. It's taken me a
while to digest the information, and I've decided to write a few
separate responses in an effort to partition the discussion into
separate ideas.

First off, the mention of sequence points. I understanding the postfix
increment operator (and equivalenty the prefix increment operator)
very roughly - "'use' first, increment later" (I'm aware that this is
over-simplified, and thus is most probably not entirely correct, and
the term 'use' is not precisely defined). If we think of a postfix
increment operator being implemented as a function, then the pseudo-
code would look something like:

* store copy of argument
* increment argument
* return copy of argument

In this case, a statement like:

i = i++;

would be well defined, as regardless of whether the LHS or the RHS is
evaluated first, at the end of the statement the LHS will 'receive'
the old value of i (the LHS evaluates to a l-value, the RHS evaluates
to the old value of i).

However, obviously this is incorrect, which implies that my
understanding of the postfix increment operator is not complete, and
my pseudo-code is not precisely what happens in reality. I believe
that this concept of 'sequence points' has something to do with it :).

What are these sequence points, and how does their use/implementation
differ to my 'pseudo-code' above? Also, would you have the same
problems with UDTs (as the operators are actually functions, so my
psuedo-code would be closer to what is actually happening)?

I have included the previous snippets relating to 'sequence points'
below.

Cheers

Taras

Taras_96 said:
... ...
v = v[i++];

is undefined: "may be evaluated as either v [1 ]=1 or v [2 ]=1 or may
cause some even stranger behavior."
This comes as a surprise, as I would have thought that the RHS of an
assignment operator would be evaluated before the LHS!

Why? Firstly, there's absolutely no reason to evaluate RHS before LHS.
Secondly, the big problem here is the side effect of 'i++'. Side effects
can take place at any moment before the next sequence point, which
ruins the above regardless of the order of evaluation.
 
T

taras.diakiw

But it doesn't matter! Why do you even care which one is evaluated
first? The LHS of a built-in assignment is an lvalue, it says _where_ to
put the result. The RHS of a built-in assignment is an rvalue, it says
_what_ the result is. It doesn't matter at all which one we determine
first and which one we determine second.

On 2008-11-23 20:39, Taras_96 wrote:

No, both RHS and LHS are both subexpressions and their order of
evaluation is not defined.

There are two issues here. The first is that the compiler can
evaluate either side of the assignment first, or even do parts
of one side, then parts of the other. (Note too that while Java
does impose an order, it requires the left hand side of an
assignment to be evaluated first. I'm not sure from where you
get the idea that that right hand side should be evaluated
first.)

Because I was thinking of the assignment operation not as just another
operator, but as a particular sequence:

1) evaluate the RHS
2) store the result in the l-value on the LHS

However, if you think of assignment as simply another function with
two parameters:

1) evaluate both operands (order not determinable)
2) call the function

Then it clears up a lot of confusion I was having :D

Taras
 
T

Taras_96

Hi everyone,

First off, thanks for all of the detailed replies. It's taken me a
while to digest the information, and I've decided to write a few
separate responses in an effort to partition the discussion into
separate ideas.

First off, the mention of sequence points. I understanding the postfix
increment operator (and equivalenty the prefix increment operator)
very roughly - "'use' first, increment later" (I'm aware that this is
over-simplified, and thus is most probably not entirely correct, and
the term 'use' is not precisely defined). If we think of a postfix
increment operator being implemented as a function, then the pseudo-
code would look something like:

* store copy of argument
* increment argument
* return copy of argument

In this case, a statement like:

i = i++;

would be well defined, as regardless of whether the LHS or the RHS is
evaluated first, at the end of the statement the LHS will 'receive'
the old value of i (the LHS evaluates to a l-value, the RHS evaluates
to the old value of i).

However, obviously this is incorrect, which implies that my
understanding of the postfix increment operator is not complete, and
my pseudo-code is not precisely what happens in reality. I believe
that this concept of 'sequence points' has something to do with it :).

What are these sequence points, and how does their use/implementation
differ to my 'pseudo-code' above? Also, would you have the same
problems with UDTs (as the operators are actually functions, so my
psuedo-code would be closer to what is actually happening)?

I have included the previous snippets relating to 'sequence points'
below.

Cheers

Taras

Taras_96 said:
... ...
v = v[i++];
is undefined: "may be evaluated as either v [1 ]=1 or v [2 ]=1 or may
cause some even stranger behavior."
This comes as a surprise, as I would have thought that the RHS of an
assignment operator would be evaluated before the LHS!

Why? Firstly, there's absolutely no reason to evaluate RHS before LHS.
Secondly, the big problem here is the side effect of 'i++'. Side effects
can take place at any moment before the next sequence point, which
ruins the above regardless of the order of evaluation.

 
T

Taras_96

But it doesn't matter! Why do you even care which one is evaluated
first? The LHS of a built-in assignment is an lvalue, it says _where_ to
put the result. The RHS of a built-in assignment is an rvalue, it says
_what_ the result is. It doesn't matter at all which one we determine
first and which one we determine second.

On 2008-11-23 20:39, Taras_96 wrote:

No, both RHS and LHS are both subexpressions and their order of
evaluation is not defined.

There are two issues here. The first is that the compiler can
evaluate either side of the assignment first, or even do parts
of one side, then parts of the other. (Note too that while Java
does impose an order, it requires the left hand side of an
assignment to be evaluated first. I'm not sure from where you
get the idea that that right hand side should be evaluated
first.)

Because I was thinking of the assignment operation not as just another
operator, but as a particular sequence:

1) evaluate the RHS
2) store the result in the l-value on the LHS

However, if you think of assignment as simply another function with
two parameters:

1) evaluate both operands (order not determinable)
2) call the function

Then it clears up a lot of confusion I was having :D

Taras
 
T

Taras_96

Taras_96 wrote:

No. Absolutely not. Firstly, theorderin which the operands of the
expression are prepared is never defined. So 'f()', 'g()' and 'h()' can
be called in absolutely anyorder. Let's say 'F', 'G' and 'H' are the
results of these calls. Secondly, the only thing that this grouping
tells us is that the correct result is the one that we'd obtain if we
first evaluated 'F + G' and then added the 'H' to the intermediate
result. However, the implementation is free to evaluate it in anyorder
at all, as long as the result is correct. The implementation is free to
do 'G + H' first and then add 'F' it it is sure that the result will
remain correct.

I'm guessing here that this is ignoring any side effects that these
functions may have.

Taras_96 wrote:

While the actual order of evaluation is not defined, the decomposition
into subexpressions is always defined. It is defined by the language
grammar. It is also defined by the rules of associativity and
precedence, which are just a semi-informal attempt to express the
grammar in more simplified form.

I suppose what brought upon this line of thinking was that ALUs may
only have two inputs for operands, and thus a complicated expression
must be broken down into expressions involving only two operands (I'm
reminded of reverse polish notation here) - is this of relevance?

Taras
 
T

Taras_96

Taras_96 wrote:
No. Redundant parentheses never have any effect on the "definedness" of
an expression with built-in operators in C++. The 'x[0] = x[0] + 2'
expression is always perfectly well-defined, because of what I mentioned
above: we clearly access the old value of 'x[0]' for the sole purpose of
determining the new value of 'x[0]'. This makes it valid. But in
'x[x[0]] = 2' it is not as clear, which is what makes 'x[x[0]] = 2'
different.

Thanks Andrey

Taras
 
T

Taras_96

On 2008-11-23 20:39, Taras_96 wrote:


No, the order of evaluation is not defined, but it does not have to be
for the semantics to be well defined.

Are you referring to the fact that if you have:

int x;
x = 3 + 2;

That even though it isn't defined whether '3' 'gets evaluted first',
or '2', or 'x', that the result is semantically well defined?

Thanks

Taras
 
E

Erik Wikström

I'm guessing here that this is ignoring any side effects that these
functions may have.

Well, the fact that the order of evaluation is undefined only becomes a
problem if the functions have side effects. If the functions are pure
you can not tell in which order they were evaluated, but neither will it
matter. If the functions have side effects you must make sure that the
correct behaviour of the program does not depend on the order in which
the functions are evaluated.
I suppose what brought upon this line of thinking was that ALUs may
only have two inputs for operands, and thus a complicated expression
must be broken down into expressions involving only two operands (I'm
reminded of reverse polish notation here) - is this of relevance?

Maybe, or it might just be because that's they way we usually evaluate
expressions in our minds. In fact I would not be surprised in a non-RISC
processor had an instruction which added two values together, multiplied
it with some third value, and then stored it.
 
E

Erik Wikström

Are you referring to the fact that if you have:

int x;
x = 3 + 2;

That even though it isn't defined whether '3' 'gets evaluted first',
or '2', or 'x', that the result is semantically well defined?

Yes, the only case in which the order of evaluation matters is if there
are any side effects of the evaluation.
 
J

James Kanze

On 2008-11-24 05:51:59 -0500, James Kanze <[email protected]> said:
Just a passing comment: the changes in wording were intended
to preserve the existing semantics. Whether they succeeded, of
course, is a different question.

I thought that the wording was changed mainly to support
threading. Of course, I would be very upset if the new wording
rendered a previously legal single threaded program illegal.
But if it "accidentally" makes something like a[a[0]] = x legal,
where it wasn't before, I don't see any real problem; I suspect
that the illegality here is just an oversight anyway.
 
J

James Kanze

First off, the mention of sequence points. I understanding the
postfix increment operator (and equivalenty the prefix
increment operator) very roughly - "'use' first, increment
later" (I'm aware that this is over-simplified, and thus is
most probably not entirely correct, and the term 'use' is not
precisely defined). If we think of a postfix increment
operator being implemented as a function, then the pseudo-
code would look something like:
* store copy of argument
* increment argument
* return copy of argument
In this case, a statement like:
would be well defined, as regardless of whether the LHS or the
RHS is evaluated first, at the end of the statement the LHS
will 'receive' the old value of i (the LHS evaluates to a
l-value, the RHS evaluates to the old value of i).
However, obviously this is incorrect, which implies that my
understanding of the postfix increment operator is not
complete, and my pseudo-code is not precisely what happens in
reality. I believe that this concept of 'sequence points' has
something to do with it :).
What are these sequence points, and how does their
use/implementation differ to my 'pseudo-code' above? Also,
would you have the same problems with UDTs (as the operators
are actually functions, so my psuedo-code would be closer to
what is actually happening)?

First, pseudo-code may help in getting a vague understanding of
what is happening, but you have to be very careful about it.
The language definition gives an implementation a large degree
of liberty in implementing various operators (i.e. the
corresponding pseudo-code), for optimization reasons. It also
forbids certain operations, again in order to allow better
optimization.

Basically, every expression (and every sub-expression of an
expression) has a value (unless it's type is void) and side
effects. Thus, for example, the value of "a+b" is the sum of a
and b, and the side effects are null. The value of "a++" is the
original value of a, and the side effect is the update of a with
the incremented value. In a very real sense, the two are
independent.

The standard doesn't really specify when the value is
calculated; that's entirely up to the compiler. In practice,
however, it's generally acknowledged that it must be calculated
before it is used, and the standard does require that the values
used to calculate it be those which were "stable" at the last
sequence point. And all the standard says about side effects is
that they must occur sometime between the previous and the
following sequence point.

There is a sequence point at the end of all full expressions,
before a function call (but after its arguments have been
evaluated), after a return, and at a very few operators: ||, &&,
?: and the comma operator. Note too that sequence points only
define a partial ordering: in an expression like "f(++a) +
g(++b)", there are sequence points at each function call, but
they only order the call relative to the arguments of that
function; the following sequence, for example, would be legal:
increment b
increment a
call f
call g

Finally, the language specification introduces some limitations
as to what you can do, in theory to allow for some optimizations
(although no one has yet been able to show where it would make a
difference using today's compiler technology). In particular,
if you modify an object, the compiler allowed to assume that you
do not access that object otherwise in the expression, *except*
to determine the new value. It's a somewhat artificial rule,
but it must be respected. (From a stylistic point of view, it's
generally a good idea that an expression only modify a single
object. There are a few cases which are so ubiquious as to not
cause problems, but in general, code is clearer and more
readable if you avoid doing several things in the same
statement.)
 
T

Taras_96

James said:
Basically, every expression (and every sub-expression of an
expression) has a value (unless it's type is void) and side
effects. Thus, for example, the value of "a+b" is the sum of a
and b, and the side effects are null. The value of "a++" is the
original value of a, and the side effect is the update of a with
the incremented value. In a very real sense, the two are
independent.

The standard doesn't really specify when the value is
calculated; that's entirely up to the compiler. In practice,
however, it's generally acknowledged that it must be calculated
before it is used, and the standard does require that the values
used to calculate it be those which were "stable" at the last
sequence point. And all the standard says about side effects is
that they must occur sometime between the previous and the
following sequence point.

So in the expression:

a = b++ + c;

the value of b being 'used' is the original value of b. The value of (b
++ + c) is (the sum of the original value of b and c), and the LHS
evaluates to an l-value. The value of the expression (b++ + c) must be
calculated before it is 'used' (ie: before it is assigned to the l-
value).

i = i++;

The value of (i++) is the original value of i. As above, the LHS
evaluates to an l-value (which doesn't change regardless of when the
side effect is applied). So why wouldn't the value on the RHS (which
AFAIK is the original value of i regardless of when the side effect is
applied) get assigned to the l-value regardless of when the side
effect is applied? As you mentioned, the value of i++ (which is the
original value of i) must be calculated before it is 'used' (ie:
assigned), which has been done.
 
J

James Kanze

So in the expression:
a = b++ + c;
the value of b being 'used' is the original value of b.

The value of b being used is the value of b. The expression
'b++' has a value which is the original value of b, and a type
which is the type of b. It has a side effect of incrementing b,
which can occur at any time between the preceding and the
following sequence point.
The value of (b ++ + c) is (the sum of the original value of b
and c), and the LHS evaluates to an l-value. The value of the
expression (b++ + c) must be calculated before it is 'used'
(ie: before it is assigned to the l- value).

The value, yes. When any side effects occur is unspecified.
The value of (i++) is the original value of i.

Maybe. The standard says that you have undefined behavior, so
who knows. Maybe the value of (i++) is never used.
As above, the LHS evaluates to an l-value (which doesn't
change regardless of when the side effect is applied). So why
wouldn't the value on the RHS (which AFAIK is the original
value of i regardless of when the side effect is applied) get
assigned to the l-value regardless of when the side effect is
applied?

Because the standard says that this expression has undefined
behavior. (Apparently, there are, or at least were, machines
where it would cause the hardware to hang.)
As you mentioned, the value of i++ (which is the original
value of i) must be calculated before it is 'used' (ie:
assigned), which has been done.

The problem isn't with the value of i++. The problem is that you
are modifying the same object twice without an intervening
sequence point.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top