Undefined Behavior. ..

A

Andrey Tarasevich

>
now again consider a^=b^=a^=b
when compiler see this expression it starts from RHS as assignment
operator is right associative
so first it has to calculate rightmost a^=b then value of a must have
to change....it porceedes then to calculate b^=a ..then again a^=b....

You are making a popular mistake assuming that operator associativity
somehow defines the order of computation. This is absolutely incorrect.
Operator associativity only defines how the expression should be
_parsed_, i.e. it says which operand belong to which operator, but it
introduces no temporal ordering on the actual computations whatsoever.
This expression has no sequence points in the middle, which means that
there's absolutely no way to predict in which order it will be evaluated
and when the side effects are going to take place.

In this case the associativity dictates the following association of
operators and arguments

a ^= (b ^= (a ^= b)))

which is equivalent to

a = a ^ (b = b ^ (a = a ^ b)))

This means that the compiler has to evaluate the following intermediate
values

v1 = a ^ b
v2 = b ^ v1
v3 = a ^ v2

and realize the following side effects

a = v1
b = v2
a = v3

Now the important part is that there's absolutely no ordering
requirements on the side effects, meaning that the compiler is free to
realize these side effects in any order and at any moment after they are
introduced. The compiler is free to do it in this order, for example

1. v1 = a ^ b
2. v2 = b ^ v1
5. b = v2
3. v3 = a ^ v2
4. a = v3
6. a = v1

Needless to add, this will not "swap" your variables.
 
A

Andrey Tarasevich

Andrey said:
You are making a popular mistake assuming that operator associativity
somehow defines the order of computation. This is absolutely incorrect.
Operator associativity only defines how the expression should be
_parsed_, i.e. it says which operand belong to which operator, but it
introduces no temporal ordering on the actual computations whatsoever.
This expression has no sequence points in the middle, which means that
there's absolutely no way to predict in which order it will be evaluated
and when the side effects are going to take place.

In this case the associativity dictates the following association of
operators and arguments

a ^= (b ^= (a ^= b)))

which is equivalent to

a = a ^ (b = b ^ (a = a ^ b)))

This means that the compiler has to evaluate the following intermediate
values

v1 = a ^ b
v2 = b ^ v1
v3 = a ^ v2

and realize the following side effects

a = v1
b = v2
a = v3

Now the important part is that there's absolutely no ordering
requirements on the side effects, meaning that the compiler is free to
realize these side effects in any order and at any moment after they are
introduced. The compiler is free to do it in this order, for example

1. v1 = a ^ b
2. v2 = b ^ v1
5. b = v2
3. v3 = a ^ v2
4. a = v3
6. a = v1

Needless to add, this will not "swap" your variables.

The point of the above, of course, is to illustrate what might happen if
the compiler actually tried to interpret the expression in question in
accordance with the C computation model (normally, that's what real-life
compilers will try to do). From the formal point of view though, the
compiler is not required to do anything that deterministic, since this
expression produces undefined behavior.
 
C

c.lang.myself

Not perfectly.
Given that a and b are of int-type,
(a^=b) is capable of generating a trap representation.

Can you please elaborate about this trap representation.....
 
A

Antoninus Twink

Can you please elaborate about this trap representation.....

It's something you don't need to know about or worry about - it's just
something many of the clc regulars have a morbid obsession with.

If you're writing real-world C programs on a modern desktop, you will
never encounter trap representations, so they may as well not exist.

If you're programming mainframes from the 1970s, or if you have an
academic interest in the most obscure technicalities of the ISO
Standard, then you should live and breathe trap representations, and
expect to find them everywhere you look. Then you can join the clc
regulars' club.
 
J

James Kuyper

Can you please elaborate about this trap representation.....

A trap representation for a given data type is a bit pattern that
doesn't actually represent a valid value in that type. Most data types
are allowed to have trap representations. Whether or not they actually
do is up to your implementation. The main exceptions are the character
types. Also, a struct or a union object is never a trap representation,
even though it may have a member which is a trap representation.

Uninitialized objects might contain a trap representation. You can
create a trap representation in an object by accessing the object as if
it were an array of bytes and changing the value of some of those bytes.
A pointer object whose value used to point into a memory block allocated
by calling a member of the malloc() family can become a trap
representation as a result of free()ing that memory block(). A pointer
that points at or into a non-static object that is local to a function
can become a trap representation when that function returns. A FILE *
value can become a trap representation as a result of fclose()ing the
file. There are also several other more obscure ways in which an object
may acquire a trap representation.

Any attempt to read the value of an object containing a trap
representation has undefined behavior. Any attempt to store a trap
representation in an object has undefined behavior.

The standard explicitly states (6.2.6.2p2) that any given signed integer
type has one bit pattern that might or might not be a trap
representation - it's up to the implementation to decide (and to
document their decision). For types that use a one's complement or
sign-magnitude representation, this is the bit pattern that would
otherwise represent negative 0. If the type uses a twos-complement
representation, this is the bit pattern that would otherwise represent
-2^N, where N is the number of value bits in the type.

Some people read that clause as allowing only that one trap
representation, and requiring that all other bit patterns must be valid.
I don't read it that way. It seems to me that what it says still allows
for the possibility of other trap representations as well. An
implementation that used 1 padding bit, 1 sign bit, and 30 value bits
for 'int' could set INT_MAX to 1000000000, and INT_MIN to -1000000000,
and declare that all bit patterns that would seem to represent values
outside that range are actually trap representations. It's been argued
that this violates the requirement that for any signed type "Each bit
that is a value bit shall have the same value as the same bit in the
object representation of the corresponding unsigned type." But every
does have that value, in every non-trap representation that has that bit
set.
 
T

Tim Rentsch

James Kuyper said:
The standard quite explicitly says that the consequences of undefined
behavior can include failing to compile, which clearly indicates that it
can precede execution of the relevant code. The standard does not
explain this in any detail, but I believe that the relevant rule is that
the undefined behavior is allowed at any point after execution of the
relevant code becomes inevitable.

I had to read this a couple of times before I understood the main
point. I mostly agree with the main point (explained further below),
but the inference stated in the first sentence seems wrong to me.
It's true that undefined behavior during, e.g., macro processing might
cause compilation to fail, but execution-time undefined behavior can't
start until program execution is started. Whatever execution-time
undefined behavior there might be isn't allowed to creep across the
boundary into program translation.

Example:

if(some condition)
a^=b^=a^=b;

For code like this, the behavior of the code becomes undefined as soon
as it becomes inevitable that the if() clause will be executed. This
means that at points in the code prior the if() statement, the compiler
is allowed to generate code using optimizations that only work if the
if-condition is not true. As a result, those optimizations may cause
your code to misbehave long before evaluation of the offending statement.

That's true, but only up to a point, under the as-if rule. In
particular, the undefined behavior must not be allowed to go back
before any access of a volatile variable, or any operation on any
potentially interactive device. In practical terms, the undefined
behavior usually cannot be moved back before a call to an external
function, since any unknown function might call longjmp(), preventing
the code in question from being reached.

I don't think any of these refinements are at odds with the intended
point, but they seem important enough to be worth mentioning.
 
T

Tim Rentsch

James Kuyper said:
[...]

The standard explicitly states (6.2.6.2p2) that any given signed integer
type has one bit pattern that might or might not be a trap
representation - it's up to the implementation to decide (and to
document their decision). For types that use a one's complement or
sign-magnitude representation, this is the bit pattern that would
otherwise represent negative 0. If the type uses a twos-complement
representation, this is the bit pattern that would otherwise represent
-2^N, where N is the number of value bits in the type.

Some people read that clause as allowing only that one trap
representation, and requiring that all other bit patterns must be valid.
I don't read it that way. It seems to me that what it says still allows
for the possibility of other trap representations as well. An
implementation that used 1 padding bit, 1 sign bit, and 30 value bits
for 'int' could set INT_MAX to 1000000000, and INT_MIN to -1000000000,
and declare that all bit patterns that would seem to represent values
outside that range are actually trap representations. It's been argued
that this violates the requirement that for any signed type "Each bit
that is a value bit shall have the same value as the same bit in the
object representation of the corresponding unsigned type." But every
does have that value, in every non-trap representation that has that bit
set.

I must admit to having trouble with this one. What's the basis for
the position you state? In the example given the presence of a
padding bit seems completely irrelevant (except perhaps to make the
number of bits a multiple of 8 while preserving round limits?) -- is
there any difference between this example and one using 31 value bits
to represent values in [-2000000000 .. 2000000000]? It seems like
all you are saying is that you think some combinations of values
bits are allowed to be trap representations whereas other people
think they aren't (not counting the distinguished ones explicitly
identified in the Standard, of course). What's the argument to
support this position?
 
J

jameskuyper

Tim said:
James Kuyper said:
[...]

The standard explicitly states (6.2.6.2p2) that any given signed integer
type has one bit pattern that might or might not be a trap
representation - it's up to the implementation to decide (and to
document their decision). For types that use a one's complement or
sign-magnitude representation, this is the bit pattern that would
otherwise represent negative 0. If the type uses a twos-complement
representation, this is the bit pattern that would otherwise represent
-2^N, where N is the number of value bits in the type.

Some people read that clause as allowing only that one trap
representation, and requiring that all other bit patterns must be valid.
I don't read it that way. It seems to me that what it says still allows
for the possibility of other trap representations as well. An
implementation that used 1 padding bit, 1 sign bit, and 30 value bits
for 'int' could set INT_MAX to 1000000000, and INT_MIN to -1000000000,
and declare that all bit patterns that would seem to represent values
outside that range are actually trap representations. It's been argued
that this violates the requirement that for any signed type "Each bit
that is a value bit shall have the same value as the same bit in the
object representation of the corresponding unsigned type." But every
does have that value, in every non-trap representation that has that bit
set.

I must admit to having trouble with this one. What's the basis for
the position you state? In the example given the presence of a
padding bit seems completely irrelevant (except perhaps to make the
number of bits a multiple of 8 while preserving round limits?)

Exactly - it serves to make an extremely implausible but legal
implementation very marginally more plausible.
-- is
there any difference between this example and one using 31 value bits
to represent values in [-2000000000 .. 2000000000]?

Yes - I can imagine reasons for wanting INT_MAX to be a power of 10; I
find it much harder to come up with reasons for wanting INT_MAX to be
2 times a power of 10. Again, it's just a matter of making an
intrinsically implausible implementation a little more plausible.
... It seems like
all you are saying is that you think some combinations of values
bits are allowed to be trap representations whereas other people
think they aren't (not counting the distinguished ones explicitly
identified in the Standard, of course). What's the argument to
support this position?

Which position - mine or theirs? My position is based upon the fact
that the standard explicitly allows for trap representations, and says
nothing to limit how many any given type may have. The opposing
position is based upon the claim that 6.2.6.2p2 defines the only trap
representation involving value bits that a signed integer type is
allowed to have. As I read it, 6.2.6.2p2 serves primarily to explain
the fact that the bit pattern that would otherwise represent negative
zero in 1's complement or sign-magnitude representations is allowed to
be a trap representation. This clears up any ambiguity that might
arise due to the fact that 0 has two distinct representations for such
types. It doesn't imply in any way that negative zero is the only
allowed trap representation. The fact that it also defines a bit
pattern for 2's complement representations that is allowed to be a
trap representation is a weak point in my argument. If my argument is
correct, that clause is redundant; but it doesn't directly contradict
my conclusion.
 
T

Tim Rentsch

Get a good textbook on Temporal Mechanics. Miles O'Brien of Deep
Space Nine has one. The subject isn't as simple as you might think.

I already have one. It was required reading in the Temporal Mechanics
course that I took ten years from now.
 
T

Tim Rentsch

jameskuyper said:
Tim said:
James Kuyper said:
[...]

The standard explicitly states (6.2.6.2p2) that any given signed integer
type has one bit pattern that might or might not be a trap
representation - it's up to the implementation to decide (and to
document their decision). For types that use a one's complement or
sign-magnitude representation, this is the bit pattern that would
otherwise represent negative 0. If the type uses a twos-complement
representation, this is the bit pattern that would otherwise represent
-2^N, where N is the number of value bits in the type.

Some people read that clause as allowing only that one trap
representation, and requiring that all other bit patterns must be valid.
I don't read it that way. It seems to me that what it says still allows
for the possibility of other trap representations as well. An
implementation that used 1 padding bit, 1 sign bit, and 30 value bits
for 'int' could set INT_MAX to 1000000000, and INT_MIN to -1000000000,
and declare that all bit patterns that would seem to represent values
outside that range are actually trap representations. It's been argued
that this violates the requirement that for any signed type "Each bit
that is a value bit shall have the same value as the same bit in the
object representation of the corresponding unsigned type." But every
does have that value, in every non-trap representation that has that bit
set.

I must admit to having trouble with this one. What's the basis for
the position you state? In the example given the presence of a
padding bit seems completely irrelevant (except perhaps to make the
number of bits a multiple of 8 while preserving round limits?)

[..minor detour on padding bits..]
... It seems like
all you are saying is that you think some combinations of values
bits are allowed to be trap representations whereas other people
think they aren't (not counting the distinguished ones explicitly
identified in the Standard, of course). What's the argument to
support this position?

Which position - mine or theirs? My position is based upon the fact
that the standard explicitly allows for trap representations, and says
nothing to limit how many any given type may have. The opposing
position is based upon the claim that 6.2.6.2p2 defines the only trap
representation involving value bits that a signed integer type is
allowed to have. As I read it, 6.2.6.2p2 serves primarily to explain
the fact that the bit pattern that would otherwise represent negative
zero in 1's complement or sign-magnitude representations is allowed to
be a trap representation. This clears up any ambiguity that might
arise due to the fact that 0 has two distinct representations for such
types. It doesn't imply in any way that negative zero is the only
allowed trap representation. The fact that it also defines a bit
pattern for 2's complement representations that is allowed to be a
trap representation is a weak point in my argument. If my argument is
correct, that clause is redundant; but it doesn't directly contradict
my conclusion.

So if I may paraphrase/summarize your argument, you're saying that
trap representations involving some combination of value bits
aren't explicitly forbidden, and therefore they are allowed.
Is that right?

If that's what you're saying, I agree that TR aren't explicitly
forbidden, in the sense that there is no statement in the standard
that says directly that other combinations of value bits may not be
TR. However, I think this requirement is given implicitly by how
integer values are represented, specifically in the definition of
value bits. For unsigned types, from 6.2.6.2p1:

If there are N value bits, each bit shall represent a
different power of 2 between 1 and 2**(N-1)

And for signed types, from 6.2.6.2 p 3:

Each bit that is a value bit shall have the same value as the same
bit in the object representation of the corresponding unsigned
type (if there are M value bits in the signed type and N in the
unsigned type, then M<=N).

The statements seem specific enough so that TR involving combinations
of value bits aren't allowed, except by explicit exception, which
explains why the three distinguished representations are identified
specifically as possibly being TR. I agree that those representations
being associated with negative zero (in two cases) muddies the point;
however, I think that's incidental, since some implementations support
negative zero, and there is some other text discussing negative zeros
that specifically are not TR. Of course I agree with your point that
the TR for 2's complement weakens the case that other TR are allowed.

If we look for corroborating evidence, think about this. If what
you're saying is right, then bitfields should also be allowed TR
(besides the distinguished three) as part of their value bit range.
In fact, since there aren't any limits set on what values signed
bitfields must represent (with 0 as a possible exception, for example
in (signed _Bool)), signed bitfields couldn't be used portably at
all (not counting the trivial case where they only ever have the
value 0). There is support for the proposition that bitfields
may not have TR in 6.7.7p6:

EXAMPLE 3 The following obscure constructions

typedef signed int t;
typedef int plain;
struct tag {
unsigned t:4;
const t:5;
plain r:5;
};

declare a typedef name t with type signed int, a typedef name
plain with type int, and a structure with three bit-field
members, one named t that contains values in the range [0, 15],
an unnamed const-qualified bit-field which (if it could be
accessed) would contain values in either the range [-15, +15] or
[-16, +15], and one named r that contains values in one of the
ranges [0, 31], [-15, +15], or [-16, +15]. (The choice of range
is implementation-defined.)

The numeric ranges given indicate pretty clearly that bitfields may
not have TR in their usual value range. Unless there is other text
that differentiates bitfields and non-bitfield integer types with
respect to TR (and I haven't been able to find any), this evidence
indicates pretty strongly that TR are not allowed in the usual
value range for integer types.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top