*str++ is undefined?

M

Michael B Allen

I have some code:

*str++ = tolower(*str);

that gcc is complaining about:

warning: operation on `str' may be undefined

I'm getting similar warnings for 'di' and 'bi' in:

di = ++di % dn;

and

bi = ++bi % BUFSIZ;

I take it incrementing and evaluating together is undefined
bahavior? What's the rule?

Thanks,
Mike

$ gcc -v
Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
Thread model: posix
gcc version 3.3 20030304 (Apple Computer, Inc. build 1666)
 
C

CBFalconer

Michael said:
*str++ = tolower(*str);

that gcc is complaining about:

warning: operation on `str' may be undefined

I'm getting similar warnings for 'di' and 'bi' in:

di = ++di % dn;

and

bi = ++bi % BUFSIZ;

I take it incrementing and evaluating together is undefined
bahavior? What's the rule?

No. You are using an undefined value, because the ++ may actually
take effect either before or after the other use of the same
variable. The same thing would be fine for separate strings:

*s1++ = tolower((unsigned char) *s2++);

Note that you need the cast here unless s2 is already of type
unsigned char. char won't do.
 
M

Michael B Allen

No. You are using an undefined value, because the ++ may actually take
effect either before or after the other use of the same variable.

That's stupid. The right side of the expression must be evaluated before
an assignment can be made. Where is the ambiguity exactly?

Mike
 
A

Andrey Tarasevich

Michael said:
That's stupid. The right side of the expression must be evaluated before
an assignment can be made.
...

That's true. But the act of actual assignment consists of one and only
one action: storing the [suitable converted] result of the 'tolower'
call in the destination object of type 'char'. But the process of
_locating_ the destination object is not a part of the act of
assignment. The compiler is free to determine the destination object
_before_ the right-hand side is evaluated, and then perform the actual
assignment after the right hand side is evaluated. For example, the
above expression can be evaluated in accordance with the following schedule

// Determine the destination
char* dst = str++;

// Prepare the operand
char op = *str;

// Call the function
char res = tolower(op);

// Perform the assignment
*dst = res;

Or it can be evaluated in accordance with a different schedule

// Prepare the operand
char op = *str;

// Call the function
char res = tolower(op);

// Determine the destination
char* dst = str++;

// Perform the assignment
*dst = res;

Note, that both schedules are completely legal, i.e. they don't violate
any sequencing requirements imposed by the language specification. In
both cases the actual assignment is performed at the very end (i.e. "the
right side of the expression is evaluated before the assignment" as you
said). But the outcomes are completely different.

(One can easily come up with yet more possible schedules with yet more
possible outcomes).
 
C

CBFalconer

Michael said:
That's stupid. The right side of the expression must be evaluated before
an assignment can be made. Where is the ambiguity exactly?

But not before the computation of the address where the result will
be stored.
 
M

Michael B Allen

That's stupid. The right side of the expression must be evaluated
before an assignment can be made.
...

That's true. But the act of actual assignment consists of one and only
one action: storing the [suitable converted] result of the 'tolower'
call in the destination object of type 'char'. But the process of
_locating_ the destination object is not a part of the act of
assignment. The compiler is free to determine the destination object
_before_ the right-hand side is evaluated, and then perform the actual
assignment after the right hand side is evaluated.

Ok, so I guess I just have to use another variable like:

int ch = *str;
*str++ = toupper(ch)

In practice if I don't do this is it really possible for such a simple
case to do the wrong thing here?

Mike
 
B

Barry Schwarz

That's stupid. The right side of the expression must be evaluated before
an assignment can be made. Where is the ambiguity exactly?
++ is an operator. Unless optimized away (obviously not the case
here), the compiler generates code to perform the evaluation specified
by the operator. The result of evaluating a post ++ is the current
value of the operand.

But ++ also has a side effect. In this case it will increment the
value of its operand. The only thing you know about when this side
effect occurs is that it will occur at or before next sequence point.

There are two sequence points in the statement. One is when the
function tolower() is called. (Obviously, the argument must be
evaluated before the function can be called.) The second is at the
semi-colon.

Since there is no dependency/relationship between the evaluation of
the ++ and the function call, they can occur in either order.
(Consider the expression a+b+tolower(c). a+b can be evaluated before
or after tolower is called.)

If the function call occurs first, then the ++ will be evaluated
after the argument and there is no confusion.

If the ++ is evaluated first, then it could be evaluated before or
after the argument.

While unlikely, it is possible for the generated code to
evaluate the argument, then evaluate the ++, and then call the
function. Again in this case there is no confusion.

In the more likely case, the ++ will be evaluated before the
argument. Now the problem is we do not know when the side effect
occurs. Does it occur before or after the argument is evaluated?

To eliminate this ambiguity and to avoid restricting compilers that
generate code for parallel machines or other advanced architectures,
the standard states:

"Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed only to determine the
value to be stored."

(It is the second sentence that the statement in question violates
since str is accessed for two purposes. On the left it is used to
determine the current value but on the right it is dereferenced to
provide the argument to tolower.)

The standard also provides two examples:

"This paragraph renders undefined statement expressions such as

i = ++i + 1;
a[i++] = i;

while allowing

i = i + 1;
a = i;"


<<Remove the del for email>>
 
E

Eric Sosman

Michael said:
*str++ = tolower(*str);

that gcc is complaining about:

warning: operation on `str' may be undefined
[...]

Ok, so I guess I just have to use another variable like:

int ch = *str;
*str++ = toupper(ch)

More simply,

*str = toupper(*str);
++str;

.... or more likely (if `str' is a `char*')

*str = toupper((unsigned char)*str);
++str;
In practice if I don't do this is it really possible for such a simple
case to do the wrong thing here?

No. It cannot do the "wrong" thing because there is no
"right" or "wrong" when the behavior is undefined. The original
statement *has no meaning* in C; it is nonsense. Grammatically
correct nonsense, but nonsense nonetheless ("Colorless green
ideas sleep furiously"). Whatever the implementation does with
such stuff is both forgivable and forgiven; you have not told it
what you want, so you have no complaint if it does the unwanted.

Is it possible that an implementation might do something
different with the original statement than it does with the
various alternatives? Yes. I don't recall having trouble with
this exact case (perhaps because it's so obviously wrong, and
experienced programmers avoid writing it), but I certainly
have encountered problems with code that violated the same rule.
In one case, the meaningless code behaved as intended when compiled
without optimization but failed mysteriously when optimization was
turned up for the "production" build. After considerable detective
work (the failure was subtle, and this was a two-million-line
program), the troublesome function was identified. The engineer
recompiled it with debugging flags on and optimization off, and
tried to figure out what was wrong -- except, of course, nothing
was wrong any more ... My engineer spent almost a full day hunting
down the bad code; she might instead have devoted that day to
something more productive than cleaning up after a nincompoop.
 
B

Ben Pfaff

Ok, so I guess I just have to use another variable like:

int ch = *str;
*str++ = toupper(ch)

In practice if I don't do this is it really possible for such a simple
case to do the wrong thing here?

Yes. Different compilers will do different things.
 
C

Chris Croughton

That's true. But the act of actual assignment consists of one and only
one action: storing the [suitable converted] result of the 'tolower'
call in the destination object of type 'char'. But the process of
_locating_ the destination object is not a part of the act of
assignment. The compiler is free to determine the destination object
_before_ the right-hand side is evaluated, and then perform the actual
assignment after the right hand side is evaluated.

Ok, so I guess I just have to use another variable like:

int ch = *str;
*str++ = toupper(ch)

There's an easy way with no extra variable. Since what you want is for
str to be incremented after the assignment, what's wrong with:

*str = toupper(*str);
++str;

Or if you need it as a single expression you could write it:

*str = toupper(*str), ++str

Either way, it's likely to be just as efficient as your original but
with a well defined effect.
In practice if I don't do this is it really possible for such a simple
case to do the wrong thing here?

Well, not the 'wrong' thing since whatever the compiler does with it is
just as right or wrong as any other, but it's certainly possible for it
to not do what you want. It may even do different things when just
changing the optimisation level, or depending on what code is around it.

Chris C
 
O

Old Wolf

[Actually, Michael B Allan wrote that]
There are two sequence points in the statement. One is when the
function tolower() is called. (Obviously, the argument must be
evaluated before the function can be called.) The second is at the
semi-colon.

A minor nit: tolower() could be a macro, so there would be
no sequence point. (The behaviour is still undefined, of course).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top