possibly undefined behavior

M

Mark

Hello

does this code invoke UB?

int func(int i)
{
return (i * i);
}

int main(void)
{
int i = 2;

i = func(i);
return 0;
}

GCC with "-pedantic -W -Wall -Wextra" says nothing though.
 
J

James Kuyper

Mark said:
Hello

does this code invoke UB?

int func(int i)
{
return (i * i);
}

int main(void)
{
int i = 2;

i = func(i);
return 0;
}

Not as far as I can see, though I may have missed something (that would
be hard to do on code this simple, but it's unfortunately always a
possibility). Do you have any particular reason for thinking otherwise?
 
M

Mark

James Kuyper said:
Not as far as I can see, though I may have missed something (that would be
hard to do on code this simple, but it's unfortunately always a
possibility). Do you have any particular reason for thinking otherwise?

I was thinking that it may invoke UB, because a parameter 'i' is being
passed and then a value is written in it. I didn't find any C standard's
evidence, but thought it'd be implementation defined how to handle parameter
and return value in such case.
 
T

Thad Smith

Mark said:
I was thinking that it may invoke UB, because a parameter 'i' is being
passed and then a value is written in it.

This is well defined.
I didn't find any C standard's
evidence, but thought it'd be implementation defined how to handle
parameter and return value in such case.

The implementation is not required by Standard C to define the
mechanisms used for passing parameters and return values, although many
do this to facilitate interfacing assembly code.

Even though the specific mechanism of parameter passing varies with
different implementations, they all have the required effect.
 
A

Anand Hariharan

I was thinking that it may invoke UB, because a parameter 'i' is being
passed and then a value is written in it. I didn't find any C standard's
evidence, but thought it'd be implementation defined how to handle
parameter and return value in such case.

A function call is a sequence point. It's okay to read from 'i' and
write to 'i' so long as there is an intervening sequence point.

- Anand
 
K

Keith Thompson

Anand Hariharan said:
A function call is a sequence point. It's okay to read from 'i' and
write to 'i' so long as there is an intervening sequence point.

It's even ok to do so without an intervening sequence point.
i = i + 1 is perfectly valid, because the value of i is read (on the
RHS) to determine the value to be stored in i (on the LHS).

You only get UB if the same object is modified twice between sequence
points, or if it's read and written with the result not being used to
determine the value to be stored (as in i = i++). The latter rule may
seem confusing (it confused me for a long time), but the point is that
if the value read is used to determine the value to be stored, that
imposes an ordering. If it's read and written with no imposed
ordering, the behavior is undefined.
 
L

luserXtrog

It's even ok to do so without an intervening sequence point.
i = i + 1 is perfectly valid, because the value of i is read (on the
RHS) to determine the value to be stored in i (on the LHS).

You only get UB if the same object is modified twice between sequence
points, or if it's read and written with the result not being used to
determine the value to be stored (as in i = i++).  The latter rule may
seem confusing (it confused me for a long time), but the point is that
if the value read is used to determine the value to be stored, that
imposes an ordering.  If it's read and written with no imposed
ordering, the behavior is undefined.

eg.
i=i; /* ok */
i=i+i; /* ok */
i=i+i+i; /* ok */
i=i=i; /* NOT OK */
i=i,i=i; /* ok, comma is a sequence point */
i=sin(i); /* ok, one read one write */
(i=i)&&(i=i); /* probably ok, but don't tell 'em I told you */
(i=i)*(i=i); /* NOT OK, 2 writes */
(i=i)?(i==i):(i+=i-(i=i)); /* NOT OK 2 writes (that cause the prob)*/

This is a variant of the popular
i=i++;
which is equivalent to
i=(i=i+1);
see 'em now? 2 '='s == BAD.

What's not forbidden is allowed!
 
B

Ben Bacarisse

pete said:
Keith Thompson wrote:

I think of

p = p -> next = q;

as my favorite example of undefined behavior resulting
from the value of p being read with the result not being used
to determine the value to be stored.

and mine is:

a = i++; /* UB */

i is used to determine where the value is stored rather than what
value to store.
 
R

Richard Bos

Jack Klein said:
It would be UB if there were not a sequence point involved in the
function call.

Nope. Take this code:

#define MACRO(i) ((i)*(i))

int main(void)
{
int i = 2;

i = MACRO(i);
return 0;
}

This has no more undefined behaviour than the original code.

It's only UB if the object assigned to (i.e., i) is also read for other
purposes than determining the assigned value. In both these cases, i is
read only to determine i*i, which is then assigned back to i. This is
legal.

You're right in so far that _if_ func() had been more involved, and did
things to i other than computing a value from it (for which it'd have
needed to be passed &i rather than i's value, in a sanely written
program), _then_ the function call sequence point would have prevented
that case of UB.

Richard
 
A

Anand Hariharan

It's even ok to do so without an intervening sequence point.
i = i + 1 is perfectly valid, because the value of i is read (on the
RHS) to determine the value to be stored in i (on the LHS).

You only get UB if the same object is modified twice between sequence
points, or if it's read and written with the result not being used to
determine the value to be stored (as in i = i++).  The latter rule may
seem confusing (it confused me for a long time), but the point is that
if the value read is used to determine the value to be stored, that
imposes an ordering.  If it's read and written with no imposed
ordering, the behavior is undefined.

Not playing devil's advocate here, but does not

i = ++i;

impose an ordering, and hence should have well-defined behaviour?

- Anand
 
K

Keith Thompson

Anand Hariharan said:
Not playing devil's advocate here, but does not

i = ++i;

impose an ordering, and hence should have well-defined behaviour?

Yes and no.

The assignment evaluates the expression "++i" and stores the result in
i, so the result of the expression must be determined before the value
is stored. But the side effect of "++i" is to modify i; that side
effect doesn't need to occur before the assignment modifies i, since
the side effect isn't necessary for determining what the result of
"++i" is going to be.

Using a well-defined example:

j = ++i;

There are several things that must happen here:

(a) Evaluate "j" as an lvalue (i.e., determine its address).
(b) Evaluate "i" to determine its current value.
(c) Determine the result of "++i".
(d) Store the result of "++i" in j (side effect of "=").
(e) Increment i (side effect of "++"").
(f) Determine (and discard) the result of the assignment expression.

Some of these things must occur before other things can happen. For
example, (c) must precede (d). But (e) can occur either before or
after (d); you don't need to modify i to determine what the result of
"++i" is going to be.

In this case, since i and j are separate objects, there's no problem.
In the case of "i = ++i", the two modifications to i are unordered,
and so the behavior is undefined.

The pre-C201X draft:
http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1362.pdf>
has a very interesting re-statement of the rules in 6.5 (it helped me
understand what the C90/C99 wording really means):

An _expression_ is a sequence of operators and operands that
specifies computation of a value, or that designates an object or
a function, or that generates side effects, or that performs a
combination thereof. The value computations of the operands of an
operator are sequenced before the value computation of the result
of the operator.

If a side effect on a scalar object is unsequenced relative to
either a different side effect on the same scalar object or a
value computation using the value of the same scalar object, the
behavior is undefined. If there are multiple allowable orderings
of the subexpressions of an expression, the behavior is undefined
if such an unsequenced side effect occurs in any of the orderings.

The grouping of operators and operands is indicated by the
syntax. Except as specified later, side effects and value
computations of subexpressions are unsequenced.
 
A

Anand Hariharan

Yes and no.

The assignment evaluates the expression "++i" and stores the result in
i, so the result of the expression must be determined before the value
is stored. But the side effect of "++i" is to modify i; that side
effect doesn't need to occur before the assignment modifies i, since the
side effect isn't necessary for determining what the result of "++i" is
going to be.

Using a well-defined example:

j = ++i;

There are several things that must happen here:

(a) Evaluate "j" as an lvalue (i.e., determine its address). (b)
Evaluate "i" to determine its current value. (c) Determine the result of
"++i".
(d) Store the result of "++i" in j (side effect of "="). (e) Increment i
(side effect of "++""). (f) Determine (and discard) the result of the
assignment expression.

Some of these things must occur before other things can happen. For
example, (c) must precede (d). But (e) can occur either before or after
(d); you don't need to modify i to determine what the result of "++i" is
going to be.

In this case, since i and j are separate objects, there's no problem. In
the case of "i = ++i", the two modifications to i are unordered, and so
the behavior is undefined.
[snip explanation from Standard]

I understand what you are saying (thank you for the patient analysis), so
this is more a rant than anything else:

I understand several definitions of UB cannot be avoided (e.g., "char
*c=NULL; *c;") or even useful (e.g., "int *p = (int *)0x1234;"), but when
statements such as "i = i++;" -

* have no useful value,
* code that has such statements is broken (and most likely don't know
about it),
* can be detected by the compiler

- why cannot the standard require a diagnostic?

How different is this rant from "Why is gets() still in the standard?"?

- Anand
 
K

Keith Thompson

Anand Hariharan said:
In this case, since i and j are separate objects, there's no problem. In
the case of "i = ++i", the two modifications to i are unordered, and so
the behavior is undefined.
[snip explanation from Standard]

I understand what you are saying (thank you for the patient analysis), so
this is more a rant than anything else:

I understand several definitions of UB cannot be avoided (e.g., "char
*c=NULL; *c;") or even useful (e.g., "int *p = (int *)0x1234;"), but when
statements such as "i = i++;" -

* have no useful value,
* code that has such statements is broken (and most likely don't know
about it),
* can be detected by the compiler

- why cannot the standard require a diagnostic?

How different is this rant from "Why is gets() still in the standard?"?

Because these situations can't always be detected by the compiler.

Consider (untested code):

int arr[10] = {0};
int i = rand() * 10;
int j = rand() * 10;
arr = arr[j] ++;

This is well defined if i != j, but it invokes undefined behavior if
i == j.

Or, similarly:

*p1 = (*p2) ++;

where p1 and p2 may or may not be equal.

Some cases, such as "i = i++", can be detected fairly easily (and
compiler writers are certainly free to spend as much effort as they
like detecting such cases). Other cases can be detected with some
data-flow analysis:

int *p0 = /* ... */;
int *p1 = p0 + 1;
*p0 = *(p1 - 1) ++;

In my first example, a stunningly clever compiler might issue a
message: "Warning: 10% chance of nasal demons".

It would be interesting to try to define rigorously a set of rules for
which cases a compiler is required to detect, but I'm skeptical of the
success of such an effort. And any such formulation could render some
compilers non-conforming. Under the current rules, a compiler author
could make a legitimate choice to concentrate on fast and simple code
generation and de-emphasize detection of potential problems.
 
A

Anand Hariharan

I understand several definitions of UB cannot be avoided (e.g., "char
*c=NULL; *c;") or even useful (e.g., "int *p = (int *)0x1234;"), but
when statements such as "i = i++;" -

* have no useful value,
* code that has such statements is broken (and most likely don't know
about it),
* can be detected by the compiler

- why cannot the standard require a diagnostic?
(...)

Because these situations can't always be detected by the compiler.

Consider (untested code):

int arr[10] = {0};
int i = rand() * 10;
int j = rand() * 10;
arr = arr[j] ++;

This is well defined if i != j, but it invokes undefined behavior if i
== j.

[snip other examples]

I assume you meant rand() to be a non standard function that returns a
random floating point value in [0.0, 1.0).

Thank you for the explanation,
- Anand
 
K

Keith Thompson

Anand Hariharan said:
Consider (untested code):

int arr[10] = {0};
int i = rand() * 10;
int j = rand() * 10;
arr = arr[j] ++;

This is well defined if i != j, but it invokes undefined behavior if i
== j.

[snip other examples]

I assume you meant rand() to be a non standard function that returns a
random floating point value in [0.0, 1.0).


No, I meant rand() to be a *standard* function that returns a random
floating point value in [0.0, 1.0).

Unfortunately, my intentions were inconsistent with reality.

(I *told* you it was untested code!)
Thank you for the explanation,

You're welcome.
 
K

Kaz Kylheku

I think of

p = p -> next = q;

as my favorite example of undefined behavior resulting
from the value of p being read with the result not being used
to determine the value to be stored.

Where is p being read with the result not being used to determine
the value to be stored?

The standard makes it clear that the value of an assignment expression
is that of the left operand, after the assignment.
(See C99, 6.5.16, paragraph 3).

So above, what is assigned to p? The value of the assignment
expression (p->next = q).

What is the value of that expression? It is the value of p->next after the
assignment.

You seem to be assuming that the value of the assignment expression is that of
the right operand (converted to the type of the left), with no ordering
dependency with respect to the assignment. I.e. that the value of q can flow
out of p->next = q expression right into the p assignment, independently
of the completion of the p->next assignment.

But as you can see, that is not true. Since the standard says that
the value is that of the left operand after assignment, we must take this to be
literally true. The standard describes abstract semantics, which must be obeyed
in accordance with the ``as if'' principle, regardless of any optimizations.

In this case, the abstract semantics says that, literally, the value is stored
into p->next, and then the assignment expression's value is derived by
accessing the value of p->next. That's what it means to get the value
of p->next ``after the assignment''. And of course p cannot be modified until
that value is available.

If such an ordering were not required, then the standard wouldn't use the
phrase ``after the assignment'', but only something like ``the value of the
assignment expression is the same as that which is stored in the left operand,
of the same type''.
 
K

Keith Thompson

Kaz Kylheku said:
Where is p being read with the result not being used to determine
the value to be stored?

In "p -> next = q", where p is read to determine where to store the
value of q.
The standard makes it clear that the value of an assignment expression
is that of the left operand, after the assignment.
(See C99, 6.5.16, paragraph 3).

So above, what is assigned to p? The value of the assignment
expression (p->next = q).

What is the value of that expression? It is the value of p->next after the
assignment.

You seem to be assuming that the value of the assignment expression
is that of the right operand (converted to the type of the left),
with no ordering dependency with respect to the assignment.
I.e. that the value of q can flow out of p->next = q expression
right into the p assignment, independently of the completion of the
p->next assignment.

But as you can see, that is not true. Since the standard says that
the value is that of the left operand after assignment, we must take
this to be literally true. The standard describes abstract
semantics, which must be obeyed in accordance with the ``as if''
principle, regardless of any optimizations.

In this case, the abstract semantics says that, literally, the value
is stored into p->next, and then the assignment expression's value
is derived by accessing the value of p->next. That's what it means
to get the value of p->next ``after the assignment''. And of course
p cannot be modified until that value is available.

If such an ordering were not required, then the standard wouldn't
use the phrase ``after the assignment'', but only something like
``the value of the assignment expression is the same as that which
is stored in the left operand, of the same type''.

That's not a bad argument, but it implies that the side effect of
storing the value in the target must occur before the result of the
assignment is used. But then why does the standard say, "The side
effect of updating the stored value of the left operand shall occur
between the previous and the next sequence point."? Yes, that's
strictly consistent with what you say, but if that's the intent it's
an odd way to express it.

Consider:

int x, y;
x = y = 3;

Both x and y have the value 3 stored in them, but this can occur in
either order. The result of "y = 3", and therefore the value stored
in x, is "the value of [y] after the assignment", which seems to imply
an ordering constraint, but I don't think it's intended to. The value
stored in x is 3; the value of y after the assignment is 3. 3 is 3,
which satisfies the requirement.

N1362, the pre-C201x draft, re-words the section, but I don't think it
resolves the issue:

An assignment operator stores a value in the object designated by
the left operand. An assignment expression has the value of the
left operand after the assignment, but is not an lvalue. The type
of an assignment expression is the type of the left operand unless
the left operand has qualified type, in which case it is the
unqualified version of the type of the left operand. The side
effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right
operands. The evaluations of the operands are unsequenced.

But note that it doesn't say that any use of the result is sequenced
after the side effect of updating the stored value.
 
K

Kaz Kylheku

I understand several definitions of UB cannot be avoided (e.g., "char
*c=NULL; *c;") or even useful (e.g., "int *p = (int *)0x1234;"), but when
statements such as "i = i++;" -

* have no useful value,
* code that has such statements is broken (and most likely don't know
about it),
* can be detected by the compiler

Not all instances of this can be detected statically. Think about:

(*p) = (*q)++;

This can be well-defined if p and q point to different objects,
but is undefined if they point to the same object.

The values of p and q can vary at run-time, and can be made to depend
on input to the program.

So to issue the diagnostic at translation time, you have to have the input to
the program available, and be prepared to solve the halting problem. :)

But I agree with you. Unspecified orders of evaluation, in a primarily
imperative language, are complete nonsense, and atrociously irresponsible
engineering that partially keeps us in the dark ages.

Rather than inventing misfeatures and then trying to diagnose them,
we should specify the order of everything, so that there is no ambiguity.

There is a religious belief, completely unsubstantiated, that unspecified
evaluation orders are required for the generation of good code.

This is pure bunk because:

- actual evaluation can be considerably rearranged in the face of
required orders.

informal proof 1: there are already sequence points in C programs. If
optimizers could not move effects across abstract sequence points,
most optimizations would not be possible. Optimizations like
function inlining and loop unrolling ``obliterate semicolons''.

informal proof 2: programmers are encouraged to rewrite ambiguous-looking
code into multiple statements, with sequence points.
But wait, aren't we supposed to stuff everything into one expression
with lots of side effects to get the benefit of speed?
Maybe, if you're working with a PDP-11 C compiler from a 1979 Unix box.

- the few cases where this is true are now addressed with restrict
pointers.

suppose that side effects are nicely ordered left to right
(they aren't, of course, but consider an imaginary C dialect)
and you have this expression:

(*p) = (*q)++;

because this is well-defined, the compiler for our imaginary
dialct has to make it work properly. The problem is that p and q may or may
not point to the same object, and it has to work regardless. The compiler
for this strictly evaluated dialect could generate better code if it could
assume that p and q do not point to the same object, just like it does for
code like:

i = j++;

where i and j are known not to be aliases since they are separately
defined variables.

In the C99 language, we can make p and q restrict-qualified
pointers. By doing so, we promise to the language implementation
that these ojects are not aliased.

So we have a way to tell the compiler: ``Please assume these object
accessed through pointers are different objects, so that updating
one has no effect on the value of the other, or else I will eat my
unsigned shorts.''

But in the C language being what it is, with its unspecified
evaluation orders, we don't actually need to indicate
that p and q are different objects. The (*p) = (*q)++ expression
encodes the assumption that they are!

In other words, bmbiguity in expressions is also a way of promising to the
compiler that there is no aliasing. With it you can express ``since I am
updating several things here without a sequence point, or accessing some
things while modifying others, I am hereby promising that they are all
distinct things.''

Using a declared attribute of the pointer (restrict qualifier) is
a better way of achieving this. It can't hurt you if you don't use it,
and you don't have to jam multiple operations into one evaluation between two
sequence points to get the optimization benefit.

If p and q are declared as pointing to distinct objects, then this assumption
still helps optimization even if there are sequence points:

*p = *q;
(*q)++;

In spite of the sequence point, the compiler can assume that the
assignment to *p has no effect on *q. We are free to restructure
the code; we don't lose the no-aliasing assumption just because
we added a semicolons.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top