pointer arithmetic question.

S

somenath

Hi All,

The following program is crashing.

#include<stdio.h>
#include<ctype.h>
int main(void)
{
char s[] ="test";
char *t=s;
while(*t)
{
*t = toupper(*t++);
//t++;
}
printf("\n%s\n",s);
printf("\n");
return 0;
}

But according to me it may not have undefined behaviour. So let me
describe my understanding line by line.

char s[] ="test";
This line define a modifiable character array and initialize it to
"test";

char *t=s;

s is a pointer the first element of the character array "test" and has
type as char *. That value is assigned to t. Which is legal.

while(*t)
If the value of *t is '\0' , the expression while (*t) will be false
and the loop will end.

*t = toupper(*t++);
The = operator has associativity as right to left so it is evaluates
from right to left so toupper(*t++); will be evaluated first the the
value will be assigned to *t. ++ has higher precedence than *
operator so t++ will be evaluated then the value of *t++ will be
passed to the function toupper and as a side effect of t++ , t will
be pointing to the next character in the s array.
the return value of toupper will be place in the *t.
This will continue until *t become '\0'.

If I do not increment 't' in this expression *t = toupper(*t); but
do it in next line it work as expected.
Where my understanding going wrong?

I think the crash may be due to the following reason.
====================
In the expression *t = toupper(*t++); *t (the left hand side of = )
may have the updated value that is the result of t++.
So if t++ point to '\0' and dereference that can cause undefined
behaviour.
But how that can happen also ? As post increment operator's side
effect will be taking effect after the sequence point in this case
after execution of the code *t = toupper(*t++);
I am not sure.
I find the C standard very cryptic to get answer. Is there any book
which describe the standard with example also can be understood by non
experts? Is there any easy way to know different causes of undefined
behaviour in an expression?


Regards,
Somenath
 
P

Philip Lantz

somenath said:
Hi All,

The following program is crashing.

#include<stdio.h>
#include<ctype.h>
int main(void)
{
char s[] ="test";
char *t=s;
while(*t)
{
*t = toupper(*t++);
//t++;
}
printf("\n%s\n",s);
printf("\n");
return 0;
}

But according to me it may not have undefined behaviour. So let me
describe my understanding line by line.

char s[] ="test";
This line define a modifiable character array and initialize it to
"test";

char *t=s;

s is a pointer the first element of the character array "test" and has
type as char *. That value is assigned to t. Which is legal.

while(*t)
If the value of *t is '\0' , the expression while (*t) will be false
and the loop will end.

*t = toupper(*t++);
The = operator has associativity as right to left so it is evaluates
from right to left so toupper(*t++); will be evaluated first the the
value will be assigned to *t.

Associativity doesn't constrain evaluation order. The order of
evaluation of the operands of = can occur in either order, just like the
order of evaluation of the operands of most other binary operators. (Of
course, the LHS of = is evaluated as an lvalue not as an rvalue, meaning
it is evaluated to determine the destination of the assignment, rather
than to determine a value.)
++ has higher precedence than *
operator so t++ will be evaluated then the value of *t++ will be
passed to the function toupper and as a side effect of t++ , t will
be pointing to the next character in the s array.
the return value of toupper will be place in the *t.

There is a sequence point after evaluation of the arguments to toupper
and before the call. That means that the increment of t has to be
completed before the call to toupper.

However, since the evaluation of the LHS of the = may be performed
either before or after evaluation of the RHS, the value of t used to
store the result of the assignment may be either the old or the new
value of t. In your case, based on the behavior you have observed, it
appears to be using the new value of t. (Of course, since the behavior
is undefined, it could be doing anything.)
I think the crash may be due to the following reason.
====================
In the expression *t = toupper(*t++); *t (the left hand side of = )
may have the updated value that is the result of t++.

Yes, this is the cause of the crash.
So if t++ point to '\0' and dereference that can cause undefined
behaviour.

No, dereferencing a pointer that points to '\0' is fine; it is done all
the time. But, the problem is that you overwrite that '\0', so the loop
doesn't terminate.
But how that can happen also ? As post increment operator's side
effect will be taking effect after the sequence point in this case
after execution of the code *t = toupper(*t++);

Side effects are guaranteed to take place *after *the *preceding*
sequence point and *before* the *following* sequence point. You cannot
expect a side effect to be delayed until the following sequence point.
Also, in this case, the sequence point following t++ is the one before
the call to toupper, not the one at the end of the full expression.
I find the C standard very cryptic to get answer. Is there any easy
way to know different causes of undefined behaviour in an expression?

The way for non-experts (and experts, too) to avoid undefined behaviour
in an expression is to not apply side effects to any components of an
expression that also appear elsewhere in the expression--except that it
is okay to use the LHS of an assignment expression in the RHS. This rule
may be overly restrictive, but it is easily understood and makes code
easy to read and easy to write.
 
K

Kaz Kylheku

*t = toupper(*t++);

Since t is modified in this expression, its value may not simultaneously be
used for purpose other than determining the new value being stored into t.

Here, t is also used to compute the storage location where to put the
result of toupper.

This use is not divided from the t++ modification by a sequence point.
*t = toupper(*t++);
The = operator has associativity

*GONG*

Associativity is how you parse the symbols to make a parse
tree. Evaluation is tree walking. A tree can be walked in many orders.

In C, the tree of an expression, so to speak, may be evaluated in any possible
order, subject to only sequence points, which are basically assertions that one
subtree must be done before another.

Once you use "associativity" or "predecence" in describing what you
think the evaluation order should be, you've basically lost.
from right to left so toupper(*t++); will be evaluated first the the

No, we have this tree


=
/ \
* call()
/ / \
t toupper args
/
*
/
++(post)
/
t

The only sequence point in this expression tree is before the function call.
That is, once toupper is evaluated to a function pointer, and once the args are
evaluated, and the function is ready to be called, a sequence point takes
place, and then the function is invoked.

Furthermore, to complicate things, the completion of side effects can take
place out of order with respect to the tree. Side effects can be gathered into
a "queue" and then "flushed" at the next sequence point.

However, this is played out in the right subtree of the main = node.

It has no bearing on when the left side of the = is evaluated.

(Of course, when the function is being called, the evaluation of this tree is
suspended!)

For instance, here is a possible order:

= 9
/ \
*8 call() 7
/ / \
t toupper args
1 4 /
* 5
/
++(post) 3,6
/
t
2

I put 3,6 next to the ++ because it has two events: computation of its value in
the expression tree (the value yielded by ++, which is the prior value of t),
and the event of updating t to the new value: the completion of the side
effect.

In my order above, I gave it #6: the increment happens just before 7, the call to the function (which is preceded by a sequence point). This update cannot
be delayed past the function call.

You also have to keep in mind that parallel orders are possible, and there
is optimization. Both 1 and 2 just access t so they can be merged.

Note how in my order, the dereference on the left happens at point 8,
after the function call. But the value of t accessed for that purpose happens
at point 1. So it's the old value of t.

Here is a different possible order:

= 9
/ \
*8 call() 7
/ / \
t toupper args
5 1 /
* 6
/
++(post) 3,4
/
t
2

Note this sneaky evaluation order. The t++ is evaluated early (steps 2, 3, 4),
and to completion (4 means the side effect completes and t is updated).

The very next step is 5, which is the evaluation of t in the left hand side.
It now fetches the new value!

Then evaluation goes back to the right side and completes the function call.

The result of the call is assigned to the new location pointed at by the new t,
not the old location.

In all orders, the = will be 9, because = is the root, and so it is visited
last. The evaluation of the tree cannot be just be any order whatsoever. It
has to be a bottom-up traversal! But many bottom-up traversals are possible.

Precedence and associativity are related to evaluation order like this.
They establish what is bottom and what is up. For instance,
a + b * c gives us this tree:

+
/ \
a *
/ \
b c

In all possible traversals, the * node is visited before the + node.

However, the a, b and c nodes can be visited in any six possible orders.

They are all bottom nodes (leaf nodes) and we can pick any bottom node to be
evaluated first.

The constraints are: * cannot be evaluated before b and c. (You can't multiply
until you have the values of the multiplicands!) And + cannot happen until the
* is done, and the value of b is known (you can't add until you have the
two terms.)

Yet, six serial evaluation orders are possible, plus parallel evaluation.
 
P

Paul N

Associativity doesn't constrain evaluation order.

Here's an example to illustrate the point:

x = a && (function1(b) + function2(b));

As far as associativity goes, the brackets mean that it is the result
of the addition that is acted on by the &&. In mathematical terms you
might say that the addition is "done first". But as far as execution
goes, it is very different. One of the rules of && is that, if the
first operand is zero (meaning that the result must be zero), the
second operand is not even evaluated. So here, the computer will first
check whether a is zero, and if it is then the two fuctions will not
be called. So it is doing the && (or part of it) before it does the
addition.
 
T

Tim Rentsch

pete said:
pete said:
Hi All,

The following program is crashing.
char s[] ="test";
char *t=s;
*t = toupper(*t++);
//t++;
But how that can happen also ? As post increment operator's side
effect will be taking effect after the sequence point in this case
after execution of the code *t = toupper(*t++);
Also,
for the above definition of (t),
the opcode for this expression: (t++)
may or may not be
the same as the opcode for this expression: (++t, t-1).
Whether the side effect takes place before
or after the value of (t++)is determined,
is up to the implementation.

That's the old way.
In the new standard,
there is a sequence point
in the value of a postfix increment expression.

n1570
6.5.2.4 Postfix increment and decrement operators
2 The value computation of the result is sequenced
before the side effect of updating the stored value of the operand.

Not a sequence point but a sequenced-before relationship. A
sequenced-before relationship is less restrictive than a sequence
point. (The term 'sequence point' is a shorthand meaning all the
value computations and side-effects of expressions before the
sequence point are sequenced before all the value computations
and side-effects of expressions after the sequence point. The
exact definition is given in 5.1.2.3 p3.)

Note that the relationship described here is only between
producing the result and updating the stored value. In the
assignment

*t = toupper( *t++ );

there is still no sequencing relationship between the
subexpressions '*t' and '*t++'. Because neither of
these subexpressions is required to be sequenced before
the other, this runs afoul of the condition described
in 6.5 p2, and hence is undefined behavior.
 
K

Kaz Kylheku

pete said:
pete said:
somenath wrote:

Hi All,

The following program is crashing.

char s[] ="test";
char *t=s;

*t = toupper(*t++);
//t++;
But how that can happen also ? As post increment operator's side
effect will be taking effect after the sequence point in this case
after execution of the code *t = toupper(*t++);
Also,
for the above definition of (t),
the opcode for this expression: (t++)
may or may not be
the same as the opcode for this expression: (++t, t-1).
Whether the side effect takes place before
or after the value of (t++)is determined,
is up to the implementation.

That's the old way.
In the new standard,
there is a sequence point
in the value of a postfix increment expression.

n1570
6.5.2.4 Postfix increment and decrement operators
2 The value computation of the result is sequenced
before the side effect of updating the stored value of the operand.

Not a sequence point but a sequenced-before relationship. A
sequenced-before relationship is less restrictive than a sequence
point.

No it isn't. It is exactly the same thing.

(The term 'sequence point' is a shorthand meaning all the
value computations and side-effects of expressions before the
sequence point are sequenced before all the value computations

The expressions before a sequence point are only those which are
listed as being before that particular sequence point.
and side-effects of expressions after the sequence point. The
exact definition is given in 5.1.2.3 p3.)

This is restricted to the subexpression in which it is happening,
and so it doesn't cover all expressions.

So even though the comma operator has a sequence point,
A is not sequenced before D:

(A, B) + (C, D)

A sequence point amounts to the same thing as "sequenced before".

To say that "A is evaluated, then a sequence point takes place, and then B"
is logically equivalent to "A is sequenced before B".
Note that the relationship described here is only between
producing the result and updating the stored value. In the
assignment

*t = toupper( *t++ );

there is still no sequencing relationship between the
subexpressions '*t' and '*t++'.

This would still be true even if the wording was that there is
a sequence point in the ++ operator.
 
T

Tim Rentsch

Kaz Kylheku said:
pete said:
pete wrote:

somenath wrote:

Hi All,

The following program is crashing.

char s[] ="test";
char *t=s;

*t = toupper(*t++);
//t++;

But how that can happen also ? As post increment operator's side
effect will be taking effect after the sequence point in this case
after execution of the code *t = toupper(*t++);

Also,
for the above definition of (t),
the opcode for this expression: (t++)
may or may not be
the same as the opcode for this expression: (++t, t-1).
Whether the side effect takes place before
or after the value of (t++)is determined,
is up to the implementation.

That's the old way.
In the new standard,
there is a sequence point
in the value of a postfix increment expression.

n1570
6.5.2.4 Postfix increment and decrement operators
2 The value computation of the result is sequenced
before the side effect of updating the stored value of the operand.

Not a sequence point but a sequenced-before relationship. A
sequenced-before relationship is less restrictive than a sequence
point.

No it isn't. It is exactly the same thing.

No, they are different. An example will illustrate.

The semantics for assignment includes a sequenced-before
relationship. This relationship allows expressions like

i = a = i+1;

to have well-defined behavior, rather than being undefined
behavior.

Under the existing semantics, the two side-effects of this
expression (ie, the updating of 'i' and 'a') can occur in
any order.

If the sequenced-before relationship were instead a sequence
point, then the side-effects of the operands would have to be
completed before the store into 'i' can proceed. That is, the
store into 'a' must be done before the store into 'i' starts.
That additional restriction doesn't hold under the current
semantics, which specifies only a sequenced-before relationship.

The difference is evident if we consider an expression like

a = a[j] = 7;

If the semantics for assignment specified a sequence point,
then this expression would have well-defined behavior even
when i == j. As it is, under the current semantics which
specifies only a sequenced-before relationship, when i == j
this expression has undefined behavior, because there are
two modifications to the same object with no sequencing
relationship between them.

[snip remainder]

P.S. Sorry about coming through aioe.org for this posting;
temporary while eternel-september.org is offline or I can
can find another newsgroups hosting site.
 
K

Kaz Kylheku

Kaz Kylheku said:
pete wrote:

somenath wrote:

Hi All,

The following program is crashing.

char s[] ="test";
char *t=s;

*t = toupper(*t++);
//t++;

But how that can happen also ? As post increment operator's side
effect will be taking effect after the sequence point in this case
after execution of the code *t = toupper(*t++);

Also,
for the above definition of (t),
the opcode for this expression: (t++)
may or may not be
the same as the opcode for this expression: (++t, t-1).
Whether the side effect takes place before
or after the value of (t++)is determined,
is up to the implementation.

That's the old way.
In the new standard,
there is a sequence point
in the value of a postfix increment expression.

n1570
6.5.2.4 Postfix increment and decrement operators
2 The value computation of the result is sequenced
before the side effect of updating the stored value of the operand.

Not a sequence point but a sequenced-before relationship. A
sequenced-before relationship is less restrictive than a sequence
point.

No it isn't. It is exactly the same thing.

No, they are different. An example will illustrate.

The semantics for assignment includes a sequenced-before
relationship. This relationship allows expressions like

i = a = i+1;

to have well-defined behavior, rather than being undefined
behavior.


I'm a fundamentalist believer in the literal interpretation of
the value of an assignment expression being that of the left
operand after the assignment. I.e. to me it means one of two things.

1. The left operand has to be identified during the evaluation
of the assignment expression. The side effect of updating
that operand can happen later, but once the effective
address of the operand is established, it does not change
That is to say, evaluation of the assignment expression and all of its
constitutents is complete before that expression yields a value, except
possibly for delayed side effects. There should not be a
(re-)evaluation of the raw expression a at side effect time.

A violation of this principle means that an expression's
value is used, even though the expression has not been completely
evaluated, or else that an expression which should be evaluated
just once is being evaluated twice.

The semantic description of assignment does not suggest that
any part of the evaluation of the assignment may be delayed
to the next sequence point, only the effect of updating the
operand. Updating an operand is not the same thing as
calculating an operand's effective address and then updating it.
The object known as the operand is not known until the expression which
designates it is calculated. There is no operand until then.

2. Or else "after the assignment" literally means after the complete
assignment (side effect and all). This means that the expression's
value is not available until the side effect completes.
 
T

Tim Rentsch

Kaz Kylheku said:
Kaz Kylheku said:
pete wrote:

somenath wrote:

Hi All,

The following program is crashing.

char s[] ="test";
char *t=s;

*t = toupper(*t++);
//t++;

But how that can happen also ? As post increment operator's side
effect will be taking effect after the sequence point in this case
after execution of the code *t = toupper(*t++);

Also,
for the above definition of (t),
the opcode for this expression: (t++)
may or may not be
the same as the opcode for this expression: (++t, t-1).
Whether the side effect takes place before
or after the value of (t++)is determined,
is up to the implementation.

That's the old way.
In the new standard,
there is a sequence point
in the value of a postfix increment expression.

n1570
6.5.2.4 Postfix increment and decrement operators
2 The value computation of the result is sequenced
before the side effect of updating the stored value of the operand.

Not a sequence point but a sequenced-before relationship. A
sequenced-before relationship is less restrictive than a sequence
point.

No it isn't. It is exactly the same thing.

No, they are different. An example will illustrate.

The semantics for assignment includes a sequenced-before
relationship. This relationship allows expressions like

i = a = i+1;

to have well-defined behavior, rather than being undefined
behavior.


I'm a fundamentalist believer in the literal interpretation of
the value of an assignment expression being that of the left
operand after the assignment. I.e. to me it means one of two things.

1. The left operand has to be identified during the evaluation
of the assignment expression. The side effect of updating
that operand can happen later, but once the effective
address of the operand is established, it does not change
That is to say, evaluation of the assignment expression and all of its
constitutents is complete before that expression yields a value, except
possibly for delayed side effects. There should not be a
(re-)evaluation of the raw expression a at side effect time.

A violation of this principle means that an expression's
value is used, even though the expression has not been completely
evaluated, or else that an expression which should be evaluated
just once is being evaluated twice.

The semantic description of assignment does not suggest that
any part of the evaluation of the assignment may be delayed
to the next sequence point, only the effect of updating the
operand. Updating an operand is not the same thing as
calculating an operand's effective address and then updating it.
The object known as the operand is not known until the expression which
designates it is calculated. There is no operand until then.


The Standard guarantees this. The side-effect of updating the
stored value of the left operand of an assignment is sequenced
after the value computations of the left and right operands. Also,
more generally, the value computations of the operands of an
operator (not just assignment) are sequenced before the value
computation of the result of the operator. So the update of an
assignment operator isn't started until all of its sub-expressions'
values have been completely computed.

2. Or else "after the assignment" literally means after the
complete assignment (side effect and all). This means that the
expression's value is not available until the side effect
completes.

The Standard does not guarantee this. That's why assignments like

a = a[j] = 7;

have undefined behavior when i == j;
 
K

Kaz Kylheku

Kaz Kylheku said:
2. Or else "after the assignment" literally means after the
complete assignment (side effect and all). This means that the
expression's value is not available until the side effect
completes.

The Standard does not guarantee this. That's why assignments like

a = a[j] = 7;

have undefined behavior when i == j;


Well, the sequencing (i.e sequence point: same thing, as I contend) is between
the evaluation of a[j] and 7, and the assignment. It is not "intervening"
between the update of a[j] and a.

This doesn't show "sequenced before" is a different concept from "sequence
point".

Exactly like "sequenced before", sequenced points can be localized within a
subexpression, so that they do not intervene between unrelated evaluations in
the surrounding full expression. This is well-known via the example (A, B) +
(C, D) where neither sequence point intervenes between A and D. (We don't even
know whether A is evaluated first or D, or whether they are interleaved.)

A sequence point doesn't mean that all effects are settled and no new ones
begin; just those in the scope of the operation to which the particular
sequence point immediately belongs, which can be "comma operator" or "full
expression", etc.
 
T

Tim Rentsch

Kaz Kylheku said:
Kaz Kylheku said:
2. Or else "after the assignment" literally means after the
complete assignment (side effect and all). This means that the
expression's value is not available until the side effect
completes.

The Standard does not guarantee this. That's why assignments like

a = a[j] = 7;

have undefined behavior when i == j;


Well, the sequencing (i.e sequence point: same thing, as I contend) is between
the evaluation of a[j] and 7, and the assignment. It is not "intervening"
between the update of a[j] and a.

This doesn't show "sequenced before" is a different concept from "sequence
point".

Exactly like "sequenced before", sequenced points can be localized within a
subexpression, so that they do not intervene between unrelated evaluations in
the surrounding full expression. This is well-known via the example (A, B) +
(C, D) where neither sequence point intervenes between A and D. (We don't even
know whether A is evaluated first or D, or whether they are interleaved.)

A sequence point doesn't mean that all effects are settled and no new ones
begin; just those in the scope of the operation to which the particular
sequence point immediately belongs, which can be "comma operator" or "full
expression", etc.


The problem with these descriptions is that how you are using the
terminology doesn't correspond to how the Standard defines them.
If there is a sequence point between expression A and expression B,
every value computation and every side-effect of expression A is
sequenced before every value computation and every side-effect of
expression B. This usage corresponds to how C99 uses the term
sequence point (although C11 defines the term more carefully, in
terms of the 'sequenced before' relationship). As far as the
Standard is concerned, there is no such thing as a 'sequence point'
that defines a sequencing relationship for value computations but
not for side-effects; the Standard defines the term so that it
always includes both.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top