Does this program have undefined behavior?

S

Seebs

I still think it's not equivalent,
because it can affect the legality of other code in the same
translation unit.

Hmmmm.

foo_s.h:
struct foo { int x; };

foo_f.h:
int use_foo(struct foo *a);

bar.c:
#include "foo_s.h"
#include "foo_f.h"

int main(void) {
struct foo x = { 0 };
use_foo(&x);
}

foo.c:
#include "foo_s.h"
struct bar { int x; };
extern int use_foo(struct bar *b) {
return b.x;
}

Common initial sequence says this should work, I think.

Now, consider the choice of whether to declare use_foo that way,
or declare it as:

#include "foo_f.h";
extern int use_foo(struct foo *a) {
return a.x;
}

And consider another function, in foo.c:

void dummy(void) {
struct foo z = { 0 };
use_foo(&z);
}

Obviously, which way you declare use_foo affects whether dummy() is
valid or not -- but I don't think it affects whether main() in bar.c
is valid or not, because all that's at issue is whether the arguments
are compatible, and the structs in question have a common initial
sequence.

So I think they're equivalent definitions, even though one of them
clearly breaks other code in the same translation unit.

-s
 
J

James Dow Allen

James Dow Allen said:
    [snip]  ;  [snip]
Semi-colon.  Less well known, perhaps, than the Comma "sequence point"
but it also serializes.
; does not mean there is a sequence point.  For example, there is no
sequence point in
  break;
nor in
  continue;
    foo(a++ + a++)
gives undefined behavior; one could say that's because a needed
sequence point is missing.  In this way, the concept "sequence
point" becomes meaningful.
Is there an example program where saying " 'continue;' lacks
a sequence point" is meaningful?  That is, where the same
program could give different results *because* of that "missing"
sequence point?  I can't think of one; maybe I'm lacking
imagination.

I can't either or I would have included one!  I don't think there is
one.  My point was only about how C is specified.

OK. We seem to be in agreement that the "lack of a sequence point"
in "break;" or "continue;" is a concept which lacks any meaning
except in the context of a literary review of The C Standard(tm).

Out of curiosity, would you also concede that it is difficult or
impossible to imagine a C compiler(Note 1) which handles
if (setjmp((a, b), c))
properly but mishandles
if (a, setjmp(b, c))

(Note 1: Obviously we exclude compilers which go out of their way
to deliberately mishandle the latter in the spirit of the villain
in the Nicholas Cage movie _8MM_, "just because they could.")

FWIW, James never had a reputation for lackadaisical or imprecise
thinking. I've been consulted by experts to explain the detailed
operations of certain machines. For some reason the mechanisms
whereby 1-bits change to 0-bits in real hardware has intrigued me
in a way that studying whether The Standard(tm) has inconsistencies
in its discussions of commas and semi-colons does not.

Recently I was corrected for suggesting that
p += 0;
might be a no-operation when p is a null pointer.
I immediately thought "Aha!"; perhaps some machines with
1's-complement arithmetic use negative-zero as the null-pointer
representation; adding zero to it flips it to positive-zero
and no longer null -- that's probably why.

Yet, although 2 or 3 people spoke up to clarify The Standard(tm),
no one thought c.l.c'ers had the intellectual curiosity
to wonder *why* (p += 0) might not work on some machines.

I'm retired now, but *loved* C when I was an active programmer.
If I were starting out today, I might have to look elsewhere
for a programming language whose culture suits my temperament.

James Dow Allen
 
S

Seebs

Yet, although 2 or 3 people spoke up to clarify The Standard(tm),
no one thought c.l.c'ers had the intellectual curiosity
to wonder *why* (p += 0) might not work on some machines.

I'm not that curious, but there's a lot of possible explanations.

e.g., one could imagine a system where emitting code for an address
arithmetic operation does additional checks of the pointer's validity
which aren't otherwise performed.

-s
 
K

Keith Thompson

Kenneth Brody said:
Ben said:
James Dow Allen said:
[snip] ; [snip]
Semi-colon. Less well known, perhaps, than the Comma "sequence point"
but it also serializes.

; does not mean there is a sequence point. For example, there is no
sequence point in

break;

nor in

continue;

<snip>

Given the definition:

static /* volatile */ int i;

would you say that this does, or does not, include a sequence point?

i;

Yes. See C99 6.8p4. ``i'' is a "full expression"; the end of a full
expression is a sequence point.
If not, why not, given that "i++;" definitely does include a sequence point?
Mu.

And if so, how is "i;" different from "break;" in the context of
sequence points?

The difference is that ``break'' is not a full expression (or any kind
of expression).
 
L

Luca Forlizzi

There is another sequence point.  Look at func_2().  Notice that it has
a statement.  The end of that statement is a sequence point.  That sequence
point must be hit before the function returns its value.  So, before func_2()
returns 3, there has been a sequence point after the modification of g.

Some time ago I discovered the thread "i=i++ (almost)" where, I
believe,
Lawrence Kirby has a different opinion.
In the thread, several examples similar to the one posted here have
been discussed, and Mr. Kirby advocates that they shoud be considered
undefined.
Some examples are:
i=f(i++);
i = (i = 2, 3);
i=0*g(); // g is a function that modifies i

The core of its argument, as far as I understand it, is that the
standard does not say that storing the value of an assignment in the
RHS has to occurr after the LHS has been evaluates.

The thread is from 1994: has there been some DR clarifying such
examples
or the debate is still open? :)

Luca Forlizzi
 
F

Flash Gordon

Luca said:
Some time ago I discovered the thread "i=i++ (almost)" where, I
believe,
Lawrence Kirby has a different opinion.
In the thread, several examples similar to the one posted here have
been discussed, and Mr. Kirby advocates that they shoud be considered
undefined.
Some examples are:
i=f(i++);
i = (i = 2, 3);

The above two are definitely undefined IMHO.
i=0*g(); // g is a function that modifies i

The above probably is because the multiply by 0 means the value to be
assigned to i can be known prior to calling g.
The core of its argument, as far as I understand it, is that the
standard does not say that storing the value of an assignment in the
RHS has to occurr after the LHS has been evaluates.

The thread is from 1994: has there been some DR clarifying such
examples
or the debate is still open? :)

Well, the example that started this thread was different, and as I
posted earlier I believe it does not invoke undefined behaviour. The
difference being the need to call the function in order to determine the
value to be assigned.
 
F

Flash Gordon

Luca said:
Some time ago I discovered the thread "i=i++ (almost)" where, I
believe,
Lawrence Kirby has a different opinion.
In the thread, several examples similar to the one posted here have
been discussed, and Mr. Kirby advocates that they shoud be considered
undefined.
Some examples are:
i=f(i++);
i = (i = 2, 3);

The above two are definitely undefined IMHO.
i=0*g(); // g is a function that modifies i

The above probably is because the multiply by 0 means the value to be
assigned to i can be known prior to calling g.
The core of its argument, as far as I understand it, is that the
standard does not say that storing the value of an assignment in the
RHS has to occurr after the LHS has been evaluates.

The thread is from 1994: has there been some DR clarifying such
examples
or the debate is still open? :)

Well, the example that started this thread was different, and as I
posted earlier I believe it does not invoke undefined behaviour. The
difference being the need to call the function in order to determine the
value to be assigned.
 
D

David Resnick

I believe you are correct that it is completely defined. My reasoning is
as follows...

There is a sequence point between the assignment in the function and the
return statement. There is also a sequence point on returning from the
function. Therefore there are sequence points between the assignment
inside the function and the assignment outside the function. Simplyfiy
the code to the following and it is still well defined...

#include <stdio.h>
int g = 1;

int func_2()
{
    g = 2; /* Sequence point here */
    return 3; /* Sequence point here */

}

int main()
{
    g = func_2();
    printf("g = %d\n", g);
    return 0;}

Hmmm. How is that different than the program below, which I was
assured was UB some time ago (and which broken when we changed a gcc
version)?

#include <stdio.h>
#include <stdlib.h>

typedef struct blah
{
int a;
int b;
} blah;

static blah *quux;

static int foo(void)
{
// yes, I know, not safe if it fails, this is a short POC
quux = realloc(quux, 2 * sizeof *quux);
return 2;
}

int main(void)
{
quux = malloc(sizeof *quux);
quux->b = 1;
quux->a = foo();

printf("%d %d\n", quux->a, quux->b);

return 0;
}

Plenty of sequence points there, but this seems bogus. And valgrind
reports it as erroneous too...

-David
 
K

Keith Thompson

David Resnick said:
Hmmm. How is that different than the program below, which I was
assured was UB some time ago (and which broken when we changed a gcc
version)?

#include <stdio.h>
#include <stdlib.h>

typedef struct blah
{
int a;
int b;
} blah;

static blah *quux;

static int foo(void)
{
// yes, I know, not safe if it fails, this is a short POC
quux = realloc(quux, 2 * sizeof *quux);
return 2;
}

int main(void)
{
quux = malloc(sizeof *quux);
quux->b = 1;
quux->a = foo();

printf("%d %d\n", quux->a, quux->b);

return 0;
}

Plenty of sequence points there, but this seems bogus. And valgrind
reports it as erroneous too...

I think the difference is that, in the "quux" example, the evaluation
of the RHS ("foo()") can affect the evaluation of the LHS ("quux->a").
So the following actions:
(1) Evaluate quux->a as an lvalue, determining what object is to
be modified.
(2) Evaluate the RHS, foo() (which also modified quux).
(3) Copy the result of (2) to the object identified by (1).
can occur any either of two orders: (1)(2)(3) or (2)(1)(3).

In the other example, the assignment in question is "g = func_2()".
func_2() modifies the value of g, but it doesn't affect the evaluation
of the LHS of the assignment. (Evaluation the LHS of an assignment
determines the identity of the object to be modified; it doesn't
examine the previous value of the object.)

Note that the C201X draft (the latest is
<http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1401.pdf>) has
substantially modified the way sequence points are described. I find
the new description clearer. I haven't really studied it in any
depth, but it might make these questions easier to resolve.
 
K

Keith Thompson

pete said:
This part of the standard is highly counterintuitive:

N869
6.5.16 Assignment operators

[#4] The order of evaluation of the operands is unspecified.

How is that counterintuitive?

To perform an assignment, you need to (1) evaluate the LHS as an
lvalue, determining the object to be modified, (2) evaluate the RHS,
determining the value to be assigned, and (3) perform the assignment,
modifying the object. Given that the order of evaluation of most
other C operators is unspecified, are you saying you'd intuitively
expect steps (1) and (2) to be performed in some language-specified
order?

In most cases, it doesn't make any difference. Cases where it matters
should probably be rewritten anyway. This:

arr[i=2] = (i=3);

is IMHO bad code, even in a language that defines the order of
evaluation.
 
N

Nick

Flash Gordon said:
Luca said:
i=0*g(); // g is a function that modifies i

The above probably is [undefined behaviour] because the multiply by 0
means the value to be assigned to i can be known prior to calling g.

In which case, i=x*g(); // g is a function that modifies i

is "possibly" undefined behaviour, but unless your code is such (and
your compiler is clever enough to notice) that there are known times
when x is zero it won't be (cue endless discussion about whether UB is
absolute or not), because without this knowledge the optimisation can't
take place, and without the optimisation the behaviour is completely
defined.

"Mr Schrodinger, your cat's exhibiting undefined behaviour again".
 
P

Phil Carmody

Ben Bacarisse said:
James Dow Allen said:
[snip] ; [snip]

Semi-colon. Less well known, perhaps, than the Comma "sequence point"
but it also serializes.

; does not mean there is a sequence point. For example, there is no
sequence point in

break;

nor in

continue;

One learns something new every day. There's probably one just *before*
the space before the break or the continue, but not necessarily.

labelbar:
goto labelfoo;
labelfoo:
goto labelbar;

Seems to have no sequence points at all!

Phil
 
P

Phil Carmody

Seebs said:
Hmmmm.

foo_s.h:
struct foo { int x; };

foo_f.h:
int use_foo(struct foo *a);

bar.c:
#include "foo_s.h"
#include "foo_f.h"

int main(void) {
struct foo x = { 0 };
use_foo(&x);
}

foo.c:
#include "foo_s.h"
struct bar { int x; };
extern int use_foo(struct bar *b) {
return b.x;
}

Common initial sequence says this should work, I think.

No it does not. I think I told you this a couple of weeks ago
where you attempted to use the same logic, and you didn't reply.
You're making up your own language rules, you're not talking
about C.

In ALL-CAPS this time:

STANDALONE STRUCTS HAVE NO 'COMMON INITIAL SEQUENCE' RULE.

Only when they are in a *union* is there such a rider.

Phil
 
C

Charlton Wilbur

FG> The above probably is because the multiply by 0 means the value
FG> to be assigned to i can be known prior to calling g.

It's defined because there is a sequence point in the function, at the
semicolon following the statement which modifies i, before the result of
the function is assigned to i.

The following fragment of code has no undefined behavior in it:

int i;

int g()
{
i = i + 1;
return 4;
}

i = g();

The practical ordering:

* The i = g() statement begins to be evaluated.
* g() is evaluated.
* The i = i + 1 statement begins to be evaluated.
* The value of i is read.
* 1 is added to that value.
* That value is stored in i.
* There is a sequence point at the end of the statement
* 4 is returned.
* There is a sequence point at the end of the statement.
* The value 4 is stored in i.

The constraint is thus:

Between the previous and next sequence point an object shall have
its stored value modified at most once by the evaluation of an
expression. Furthermore, the prior value shall be accessed only to
determine the value to be stored.

This code meets that criterion and is thus not undefined, at least for
that reason.

Charlton
 
S

Seebs

No it does not. I think I told you this a couple of weeks ago
where you attempted to use the same logic, and you didn't reply.

Oh, I missed that.
You're making up your own language rules, you're not talking
about C.

Possible, but a little surprising to me.
STANDALONE STRUCTS HAVE NO 'COMMON INITIAL SEQUENCE' RULE.
Only when they are in a *union* is there such a rider.

And since they could be in a union in a new translation unit, it
has to apply anyway. Build your two structures, build all your code
referring to them. Compile it. Now go build a new module in which
you write
union u {
struct foo f;
struct bar b;
};

Suddenly, they have to have common initial sequences share the same
layout. Since this now has to apply to the already-compiled code, it
must have applied all along.

But in particular, for this SPECIFIC case I'm covered anyway, because even
if we imagine that the padding could vary between two structures, a pointer
to a structure, suitably converted, points to its first member, and vice
versa, so the type punning has to work for the first member. I do not think
it is possible to make a conforming implementation where that won't hold.

-s
 
L

Luca Forlizzi

The above two are definitely undefined IMHO.


The above probably is because the multiply by 0 means the value to be
assigned to i can be known prior to calling g.

that's exactly the reason given in that thread
 
L

Luca Forlizzi

    FG> Luca Forlizzi wrote:

    >> i=0*g(); // g is a function that modifies i

    FG> The above probably is because the multiply by 0 means the value
    FG> to be assigned to i can be known prior to calling g.

It's defined because there is a sequence point in the function, at the
semicolon following the statement which modifies i, before the result of
the function is assigned to i.

The following fragment of code has no undefined behavior in it:

int i;

int g()
{
  i = i + 1;
  return 4;

}

i = g();

The practical ordering:

* The i = g() statement begins to be evaluated.
* g() is evaluated.
  * The i = i + 1 statement begins to be evaluated.
  * The value of i is read.
  * 1 is added to that value.
  * That value is stored in i.
  * There is a sequence point at the end of the statement
  * 4 is returned.
  * There is a sequence point at the end of the statement.
* The value 4 is stored in i.

The constraint is thus:

    Between the previous and next sequence point an object shall have
    its stored value modified at most once by the evaluation of an
    expression. Furthermore, the prior value shall be accessed only to
    determine the value to be stored.

This code meets that criterion and is thus not undefined, at least for
that reason.

I would have agreed on this reasoning before reading the thread I was
referring to (http://groups.google.it/group/comp.lang.c++/
browse_thread/thread/1de1d00083e2ed67/8644012c80f949da?q=almost+group
%3Acomp.std.c&lnk=nl&)
However now I believe this does not use correctly the concept of
sequence point. Mr. Kirby advocates that sequence points points only
constrain the (direct) operands of the operators that define them. Let
me cite Mr. kirby, discussing the i=0*g(); example:
"The side effect of updating the stored value of the left operand shall occur
between the previous and next sequence point"
clearly permits the 2nd ordering since the only difference is when the
side effect of the assignment happens. It is constrained by the sequence
points at the end of the expression and the end of the previous expression
but it is not constrained by the sequence point defined by the function call
(because it is no part of the function call). This last part is perhaps
what needs to be expanded so consider:
i + (0, i++, 1)
Clearly in this example the 2 i's are not separated by sequence points
(the expression is undefined because it depends on whether i or i++ is
evaluated first). Sequence points only constrain the arguments to the
operators that define them.

So in your example, according to this view, sequence points before or
inside the function call
does not constrain the order of evaluation of the assignment's
operands. Given that the standard does not say that the side effect of
updating the RHS of the assignment has to occur after the LHS
evaluation (as strange as it can be), in i=g(); there are two writes
to i and no sequence points (related to the assignment) between them.
So I think that your example is undefined, even though I expect that
every implementation produce the same result on it.

However, Mr. Kirby wrote in 1994. It may be that there has been some
official statement from the Committee clarifying the issue.
Does anyone know?

Luca
 
L

Luca Forlizzi

So in your example, according to this view, sequence points before or
inside the function call
does not constrain the order of evaluation of the assignment's
operands. Given that the standard does not say that the side effect of
updating the RHS of the assignment has to occur after the LHS
evaluation (as strange as it can be), in i=g(); there are two writes
to i and no sequence points (related to the assignment) between them.
So I think that your example is undefined, even though I expect that
every implementation produce the same result on it.

However, Mr. Kirby wrote in 1994. It may be that there has been some
official statement from the Committee clarifying the issue.
Does anyone know?

Luca

I answer to myself :) Just after posting the previous message, I gave
a look at the draft N1401.
I found that the description of the assignment operator has changed,
and now there is:
"The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands.
The evaluations of
the operands are unsequenced. "

This is exactly what is needed to render expressions of the form
i=g(); // g modifies i
well defined, since now it's stated that the update of the RHS happens
after the evaluation of the RHS.
So this confirms that Mr. Kirby interpretation of C89/C99 was sound,
requiring a change to make (or at least to clearly make)
such expressions well defined.
The famous i=i++; , though, is still undefined in the new draft.

Luca
 
C

Charlton Wilbur

N> In which case, i=x*g(); // g is a function that modifies i

N> is "possibly" undefined behaviour,

Only if g() itself has undefined behavior.

The critical step you're missing is that g() must have at least two
sequence points within it -- one after all the arguments have been
evaluated, just before the function body is entered, and one after the
statement in which i is modified. Regardless of the order in which the
terms of that statement are evaluated, those two sequence points are
enough to prevent it from being undefined behavior.

Now, if g() has something idiotic like i = i++ * i++; in it, sure,
i = x * g(); will result in undefined behavior -- but that's because of the
content of g(), not becauseof the statement i = x * g();

See http://c-faq.com/expr/seqpoints.html.

Charlton
 
N

Nick

Charlton Wilbur said:
N> In which case, i=x*g(); // g is a function that modifies i

N> is "possibly" undefined behaviour,

Only if g() itself has undefined behavior.

The critical step you're missing is that g() must have at least two
sequence points within it -- one after all the arguments have been
evaluated, just before the function body is entered, and one after the
statement in which i is modified. Regardless of the order in which the
terms of that statement are evaluated, those two sequence points are
enough to prevent it from being undefined behavior.

Now, if g() has something idiotic like i = i++ * i++; in it, sure,
i = x * g(); will result in undefined behavior -- but that's because of the
content of g(), not becauseof the statement i = x * g();

I understood the post previous to mine in the thread to say that if x in
the above was a constant zero, then even if g would normally be defined
(although it modifies i) then it wouldn't be, because the multiplication
zero allows the result of x to be discarded, and so for the assignment of
0 to i and the alteration in g to happen "out of order".

So my point was that if the optimiser can see that x will always be zero
by this point, then it can make the optimisation, and then we will get
undefined behaviour. If it can't work that out (even if it would be
true) then it wouldn't.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top