Unions Redux

Old Wolf · Mar 14, 2007

Ok, we've had two long and haphazard threads about unions recently,
and I still don't feel any closer to certainty about what is permitted
and what isn't. The other thread topics were "Real Life Unions"
and "union {unsigned char u[10]; ...} ".

Here's a concrete example:

#include <stdio.h>

int main(void)
{
union { int s; unsigned int us; } u;

u.us = 50;
printf("%d\n", u.s);
return 0;
}

Is this program well-defined (printing 50), implementation-defined, or
UB ?

Note that the aliasing rules in C99 6.5 are not violated here -- it is
not forbidden under that section to access an object of some type T
with an lvalue expression whose type is the signed or unsigned
version of T.

In other words, is there anything other than the aliasing rules that
restrict 'free' use of unions?

Pierre Asselin · Mar 15, 2007

Old Wolf said:
[ snip ]
union { int s; unsigned int us; } u;
u.us = 50;
printf("%d\n", u.s);

I was looking at N1124. Annex J lists among the unspecified
behaviors,

-- The value of a union member other than the last one stored
into (6.2.6.1).

Annex J is informative, not normative, but it still makes sense to
look at section 6.2.6.1 to see why the behavior is unspecified.
There we see that u.s and u.us have object representations as
sequences of unsigned char (paragraphs 2,4), but that some object
representations may be trap representations leading to undefined
behavior when you access u.s (par. 5). Further down, 6.2.6.2(1-2)
leave open the possibility that int has more trap representations
than unsigned int, for example if there are padding bits or if ints
are sign-magnitude and M<N-1 in 6.2.6.2(2).

So it seems that your code has undefined (not just unspecified)
behavior, by 6.2.6.1(5).

If you want to read your 50 as a signed int, you need to convert
the bits, e.g. by assignment, u.s= u.us; on a really wicked machine,
that assignment need not be a no-op !

My argument doesn't work the other way. 6.2.6.2(5) says:

A valid (non-trap) object representation of a signed integer
type where the sign bit is zero is a valid object representation
of the corresponding unsigned type, and shall represent
the same value.

so I haven't ruled out
u.s = 50;
printf("%u\n", u.us);

I can't see anything else in 6.2.6.1 that could rule it out, either.
Paragraph 6.2.6.1(7) applies, but I'm pretty sure that sizeof(int)==
sizeof(unsigned int) by 6.2.6.2 so there are no leftover bytes
for (7) to ... bite (D'oh).

Note that the aliasing rules in C99 6.5 are not violated here -- it is
not forbidden under that section to access an object of some type T
with an lvalue expression whose type is the signed or unsigned
version of T.

I don't think that's relevant. It says that the compiler must
assume, for optimization purposes, that u.s and u.us are potentially
aliased (well, duh). For example,

memcpy(buf, &u.us, sizeof(u.us)); /* unsigned char buf[BIG] */
do_something((const unsigned char *)buf);
u.s= 50;
memcpy(buf, &u.us, sizeof(u.us)); /* can't optimize out */

In other words, is there anything other than the aliasing rules that
restrict 'free' use of unions?

As I said, I don't think the aliasing rules matter. Your example
amounts to a C++ reinterpret_cast<int> and can hit a trap
representation.

Jack Klein · Mar 15, 2007

Ok, we've had two long and haphazard threads about unions recently,
and I still don't feel any closer to certainty about what is permitted
and what isn't. The other thread topics were "Real Life Unions"
and "union {unsigned char u[10]; ...} ".

Most of the rambling was caused by the original OP, I think, rather
than the material. I am not criticizing, just observing.

Here's a concrete example:

#include <stdio.h>

int main(void)
{
union { int s; unsigned int us; } u;

u.us = 50;
printf("%d\n", u.s);
return 0;
}

Is this program well-defined (printing 50), implementation-defined, or
UB ?

The program is well-defined, I'll elaborate further down.

Note that the aliasing rules in C99 6.5 are not violated here -- it is
not forbidden under that section to access an object of some type T
with an lvalue expression whose type is the signed or unsigned
version of T.

As you pointed out, it does not violate the alias rules, but there are
other rules to consider. In this particular case, you are accessing
an object of type unsigned int with an lvalue expression of type
signed int. The entire question of the validity of the operation
depends on the object representation compatibility, and has nothing at
all to do with the fact that there is a union involved.

In this particular case, the operation is well-defined because of the
standard's guarantees about corresponding signed and unsigned integer
types. For a positive value within the range of both types, the bit
representation is identical for both.

However, if you had assigned INT_MAX + 1 to u.us, and your
implementation is one of the universal ones where UINT_MAX > INT_MAX,
the behavior would be implementation-defined because, at least
theoretically, (unsigned)INT_MAX + 1 could contain a bit pattern that
is a trap representation for signed int.

In other words, is there anything other than the aliasing rules that
restrict 'free' use of unions?

Personally, I think people get to wound up in the mystical and magical
properties of unions.

They are good for two things:

1. Space saving, such as a struct containing a data type specifier
and a union of all the possible data types. This is a frequent
feature in message passing systems. Generally, type punning is not
used here.

2. Another way to do type punning.

Consider:

int test_lone(long l)
{
int *ip = (int *)&l;
int i = *ip;
return i==l;
}

Is this code undefined, implementation-defined, or unspecified?

Technically it is undefined, but on an implementation like today's
typical desktop, where int and long have the same representation and
alignment, the result will be that the function returns 1. On an
implementation where int and long are different sizes, who knows.

Now consider:

int test_long(long l)
{
union { long ll; int ii } li;
li.ll = l;
return li.ll==li.ii;
}

Is this code undefined? Well, yes, but it is no different in
functionality than the first function. If int and long have the same
representation, it will return 1.

There is no difference in aliasing in a union than there is via
pointer casting.

Robert Gamble · Mar 15, 2007

Ok, we've had two long and haphazard threads about unions recently,
and I still don't feel any closer to certainty about what is permitted
and what isn't. The other thread topics were "Real Life Unions"
and "union {unsigned char u[10]; ...} ".

Here's a concrete example:

#include <stdio.h>

int main(void)
{
union { int s; unsigned int us; } u;

u.us = 50;
printf("%d\n", u.s);
return 0;
}

Is this program well-defined (printing 50), implementation-defined, or
UB ?

Note that the aliasing rules in C99 6.5 are not violated here -- it is
not forbidden under that section to access an object of some type T
with an lvalue expression whose type is the signed or unsigned
version of T.

In other words, is there anything other than the aliasing rules that
restrict 'free' use of unions?

C89: undefined.
It is undefined because the member of the union accessed is not the
member last stored, this is explicitly stated in the Standard.

C99: well-defined.
In C99 the aliasing rules and representation requirements determine
the legitimacy in this case. As you noted, your example does not
violate the aliasing rules so you are good there. The Standard
explicitly states that the set of non-negative values common to any
given signed integer type and its corresponding unsigned type have the
same representations in both types so you are good there as well.

Robert Gamble

Old Wolf · Mar 15, 2007

There is no difference in aliasing in a union than there is via
pointer casting.

Thanks for finally putting the issue to rest! I'm glad it
really is that simple.

Yevgen Muntyan · Mar 15, 2007

Robert said:
Ok, we've had two long and haphazard threads about unions recently,
and I still don't feel any closer to certainty about what is permitted
and what isn't. The other thread topics were "Real Life Unions"
and "union {unsigned char u[10]; ...} ".

Here's a concrete example:

#include <stdio.h>

int main(void)
{
union { int s; unsigned int us; } u;

u.us = 50;
printf("%d\n", u.s);
return 0;
}

Is this program well-defined (printing 50), implementation-defined, or
UB ?

Note that the aliasing rules in C99 6.5 are not violated here -- it is
not forbidden under that section to access an object of some type T
with an lvalue expression whose type is the signed or unsigned
version of T.

In other words, is there anything other than the aliasing rules that
restrict 'free' use of unions?

Click to expand...

C89: undefined.
It is undefined because the member of the union accessed is not the
member last stored, this is explicitly stated in the Standard.

Where? 6.3.2.3 says it's implementation-defined (ANSI standard if
it matters).

Yevgen

Robert Gamble · Mar 15, 2007

Old Wolf said:
Old Wolf said:

[ snip ]
union { int s; unsigned int us; } u;
u.us = 50;
printf("%d\n", u.s);

Click to expand...

I was looking at N1124. Annex J lists among the unspecified
behaviors,

-- The value of a union member other than the last one stored
into (6.2.6.1).

Annex J is informative, not normative,

Indeed it is not and in this case it is simply erroneous. The
sentence you cited is leftover from a previous version of the
Standard, there is no supporting text in the Standard proper.

but it still makes sense to
look at section 6.2.6.1 to see why the behavior is unspecified.

You are making the false assumption that 6.2.6.1 actually contains
verbiage to this effect, it doesn't.

There we see that u.s and u.us have object representations as
sequences of unsigned char (paragraphs 2,4), but that some object
representations may be trap representations leading to undefined
behavior when you access u.s (par. 5). Further down, 6.2.6.2(1-2)
leave open the possibility that int has more trap representations
than unsigned int, for example if there are padding bits or if ints
are sign-magnitude and M<N-1 in 6.2.6.2(2).

So it seems that your code has undefined (not just unspecified)
behavior, by 6.2.6.1(5).

I have no idea how you were able to come to that conclusion from
anything you have cited so far. You appear to have come to a
premature conclusion and then tried, unconvincingly, to make the
evidence fit that conclusion.

If you want to read your 50 as a signed int, you need to convert
the bits, e.g. by assignment, u.s= u.us; on a really wicked machine,
that assignment need not be a no-op !

My argument doesn't work the other way. 6.2.6.2(5) says:

A valid (non-trap) object representation of a signed integer
type where the sign bit is zero is a valid object representation
of the corresponding unsigned type, and shall represent
the same value.

The opposite is also true, 6.2.5p9:

"The range of nonnegative values of a signed integer type is a
subrange of the corresponding unsigned integer type, and the
representation of the same value in each type is the same."

so I haven't ruled out
u.s = 50;
printf("%u\n", u.us);

I can't see anything else in 6.2.6.1 that could rule it out, either.
Paragraph 6.2.6.1(7) applies, but I'm pretty sure that sizeof(int)==
sizeof(unsigned int) by 6.2.6.2 so there are no leftover bytes
for (7) to ... bite (D'oh).

Note that the aliasing rules in C99 6.5 are not violated here -- it is
not forbidden under that section to access an object of some type T
with an lvalue expression whose type is the signed or unsigned
version of T.

Click to expand...

I don't think that's relevant. It says that the compiler must
assume, for optimization purposes, that u.s and u.us are potentially
aliased (well, duh). For example,

memcpy(buf, &u.us, sizeof(u.us)); /* unsigned char buf[BIG] */
do_something((const unsigned char *)buf);
u.s= 50;
memcpy(buf, &u.us, sizeof(u.us)); /* can't optimize out */

In other words, is there anything other than the aliasing rules that
restrict 'free' use of unions?

Click to expand...

As I said, I don't think the aliasing rules matter. Your example
amounts to a C++ reinterpret_cast<int> and can hit a trap
representation.

I don't know what C++ has to do with this.

Robert Gamble

Yevgen Muntyan · Mar 15, 2007

Jack said:
Ok, we've had two long and haphazard threads about unions recently,
and I still don't feel any closer to certainty about what is permitted
and what isn't. The other thread topics were "Real Life Unions"
and "union {unsigned char u[10]; ...} ".

Click to expand...

Most of the rambling was caused by the original OP, I think, rather
than the material. I am not criticizing, just observing. [snip]
There is no difference in aliasing in a union than there is via
pointer casting.

Sorry if it's something obvious or stupid, but please consider this
(no pointers involved).
Suppose double is eight bytes big, int is four bytes, there are
no padding bits in int.

/* (1) get bits from a double and see what happens */
double a = 3.45; unsigned int b;
memcpy(&b, &a, sizeof b);
printf("%u", b);

/* (2) do same thing using a union */
union U {double a; unsigned int b;} u;
u.a = 3.14;
printf("%u", u.b);

/* (3) initialize union with memcpy and access its member */
double d;
union U {double a; unsigned int b;} u;
d = 3.14;
memcpy(&u, &d, sizeof d);
printf("%u", u.b);

Which of three are valid? I think (1) is; (3) maybe; (2) maybe, if
(3) valid and aliasing rules don't work here. If aliasing rules
do apply to (2), then how is first assignment in (2) different
from memcpy() in (3)?

It's in fact the same question as in the other post by OP, about
aliasing. Perhaps all such magic is allowed and that's it. Perhaps
I'm just stupid that I can't understand these simple things.

Yevgen

CBFalconer · Mar 15, 2007

Jack said:
.... snip ...

There is no difference in aliasing in a union than there is via
pointer casting.

Not so. The compiler is able to insert conversion code to
implement a cast. It is not able to do so for an aliased union.
Which is why it is implementation or undefined behavior to access a
union component as other than the form in which it was stored.

To illustrate, consider a perverse 16 bit machine in which pointers
are machine addresses stored hi order byte first, and integers are
stored low order byte first. How is the compiler to know when to
flip the bytes when accessed through a union?

Robert Gamble · Mar 15, 2007

Robert said:
Robert said:

Ok, we've had two long and haphazard threads about unions recently,
and I still don't feel any closer to certainty about what is permitted
and what isn't. The other thread topics were "Real Life Unions"
and "union {unsigned char u[10]; ...} ".
Here's a concrete example:
#include <stdio.h>
int main(void)
{
union { int s; unsigned int us; } u;
u.us = 50;
printf("%d\n", u.s);
return 0;
}
Is this program well-defined (printing 50), implementation-defined, or
UB ?
Note that the aliasing rules in C99 6.5 are not violated here -- it is
not forbidden under that section to access an object of some type T
with an lvalue expression whose type is the signed or unsigned
version of T.
In other words, is there anything other than the aliasing rules that
restrict 'free' use of unions?

Click to expand...

Click to expand...

C89: undefined.
It is undefined because the member of the union accessed is not the
member last stored, this is explicitly stated in the Standard.

Click to expand...

Where? 6.3.2.3 says it's implementation-defined (ANSI standard if
it matters).

It's a mistake in the Standard, it is supposed to be undefined.

Robert Gamble

Yevgen Muntyan · Mar 15, 2007

Robert said:
Robert said:

Ok, we've had two long and haphazard threads about unions recently,
and I still don't feel any closer to certainty about what is permitted
and what isn't. The other thread topics were "Real Life Unions"
and "union {unsigned char u[10]; ...} ".
Here's a concrete example:
#include <stdio.h>
int main(void)
{
union { int s; unsigned int us; } u;
u.us = 50;
printf("%d\n", u.s);
return 0;
}
Is this program well-defined (printing 50), implementation-defined, or
UB ?
Note that the aliasing rules in C99 6.5 are not violated here -- it is
not forbidden under that section to access an object of some type T
with an lvalue expression whose type is the signed or unsigned
version of T.
In other words, is there anything other than the aliasing rules that
restrict 'free' use of unions?
C89: undefined.
It is undefined because the member of the union accessed is not the
member last stored, this is explicitly stated in the Standard.

Click to expand...

Where? 6.3.2.3 says it's implementation-defined (ANSI standard if
it matters).

Click to expand...

It's a mistake in the Standard, it is supposed to be undefined.

So infamous compiler which doesn't strive to C99 compliance could make
the following explode (or Death Station version from 1991 for that
matter)?

union U {int a; unsigned char u[25];};
int main (void)
{
unsigned char c;
union U u;
u.a = 8;
c = u.u[0];
return 0;
}

Perhaps "do not touch union member which wasn't previously set" is
not just a product of my paranoidal mind?

DR's 257, 283, 236 and thread named "Union arrangement" in comp.lang.c
(the end of it) are interesting reading, by the way.
C faq is more loyal to this:

http://www.c-faq.com/struct/union.html
A union is essentially a structure in which all of the fields overlay
each other; you can only use one field at a time. (You can also cheat by
writing to one field and reading from another, to inspect a type's
bit patterns or interpret them differently, but that's obviously pretty
machine-dependent.)

Yevgen

Jack Klein · Mar 16, 2007

Jack said:
Jack said:

Ok, we've had two long and haphazard threads about unions recently,
and I still don't feel any closer to certainty about what is permitted
and what isn't. The other thread topics were "Real Life Unions"
and "union {unsigned char u[10]; ...} ".

Click to expand...

Most of the rambling was caused by the original OP, I think, rather
than the material. I am not criticizing, just observing. [snip]
There is no difference in aliasing in a union than there is via
pointer casting.

Click to expand...

Sorry if it's something obvious or stupid, but please consider this
(no pointers involved).

You are mistaken, of course there are pointers involved. Just not
pointer objects.

Suppose double is eight bytes big, int is four bytes, there are
no padding bits in int.

/* (1) get bits from a double and see what happens */
double a = 3.45; unsigned int b;
memcpy(&b, &a, sizeof b);

There are two pointers used in the function call statement. The &
operator generates two addresses, which are passed to memcpy() as
pointers to void.

printf("%u", b);

The behavior is undefined.

/* (2) do same thing using a union */
union U {double a; unsigned int b;} u;
u.a = 3.14;
printf("%u", u.b);

The behavior is undefined.

/* (3) initialize union with memcpy and access its member */
double d;
union U {double a; unsigned int b;} u;
d = 3.14;
memcpy(&u, &d, sizeof d);
printf("%u", u.b);

The behavior is undefined.

Which of three are valid? I think (1) is; (3) maybe; (2) maybe, if
(3) valid and aliasing rules don't work here. If aliasing rules
do apply to (2), then how is first assignment in (2) different
from memcpy() in (3)?

None of the three are valid. The standard does not give you
permission to access an lvalue of type unsigned int after writing some
or all of the bits of a double to it.

It's in fact the same question as in the other post by OP, about
aliasing. Perhaps all such magic is allowed and that's it. Perhaps
I'm just stupid that I can't understand these simple things.

I am beginning to wonder if you are being deliberately obtuse. The
only "magic" that is allowed unconditionally is that any object, or
any block of memory belonging to a program, may be read as an array of
unsigned characters without invoking undefined behavior.

There is often a need, even in strictly conforming C code, to access
raw memory regardless of the type of objects it contains. A simple
operation like generating an MD5 signature for a file, for example.

Unsigned char is the one and only "raw" data type in C. It is the one
and only type allowed to read the bytes of any object, and it cannot
trap because every possible combination of bits in a byte is a valid
representation of an unsigned char value.

Read the first five paragraphs of 6.2.6.1 about representation of
types, and the terms "object representation", and "trap
representation".

You, and many other C programmers, seem to think that a union has some
magical properties. It does not. Any aliasing of types you do with a
union can also be done via explicit pointer conversion and
dereference, or implicit pointer conversion as memcpy() does.

If the aliasing is defined by the standard in any one of the three
situations, it is well-defined for all of them. If it is undefined,
but works in an implementation-defined manner on your particular
platform, it will work the same way no matter what method you use to
actually alias it.

There is nothing magical about a union when it comes to type punning,
and all three of your examples cause undefined behavior because they
are omitted from the inclusive list of allowed aliasing in 6.5p7.
There is no mention of reading some or all of the object
representation of a double with an lvalue of type unsigned int.

Pierre Asselin · Mar 16, 2007

Robert Gamble said:
On Mar 14, 10:32 pm, (e-mail address removed) (Pierre Asselin)

Indeed it is not and in this case it is simply erroneous. The
sentence you cited is leftover from a previous version of the
Standard, there is no supporting text in the Standard proper.

Has a defect report been filed ?

[ ... ]
So it seems that [Old Wolf's] code has undefined (not just
unspecified) behavior, by 6.2.6.1(5).

Click to expand...

I have no idea how you were able to come to that conclusion from
anything you have cited so far. You appear to have come to a
premature conclusion and then tried, unconvincingly, to make the
evidence fit that conclusion.

Not at all. I was tried to prove from 6.2.6 that O.W. can store
and read the bits of a small positive integer interchangeably
through lvalues of type int and unsigned int, but I could only do
it for the signed --> unsigned direction.

The opposite is also true, 6.2.5p9:

"The range of nonnegative values of a signed integer type is a
subrange of the corresponding unsigned integer type, and the
representation of the same value in each type is the same."

Indeed. That takes care of the unsigned --> signed direction,
especially in view of footnote 31. So Old Wolf's example is
conforming after all.

This looks like a small defect in the standard. Section 6.2.6.1
opens with: "The representations of all types are unspecified except
as stated in this subclause". I took "subclause" to mean 6.2.6.1
itself but that doesn't work. It could concievably mean all of
6.2.6, which is titled "Representations of types", but not 6.2.5 .
Yet 5.2.5(9) is needed to close a loophole.

I don't know what C++ has to do with this.

It spelled out the concept of a bit-preserving cast, as opposed to
C's value-preserving (for the most part) cast.

Yevgen Muntyan · Mar 16, 2007

Jack said:
Jack said:

On 14 Mar 2007 15:10:44 -0700, "Old Wolf" <[email protected]>
wrote in comp.lang.c:

Ok, we've had two long and haphazard threads about unions recently,
and I still don't feel any closer to certainty about what is permitted
and what isn't. The other thread topics were "Real Life Unions"
and "union {unsigned char u[10]; ...} ".
Most of the rambling was caused by the original OP, I think, rather
than the material. I am not criticizing, just observing. [snip]
There is no difference in aliasing in a union than there is via
pointer casting.

Click to expand...

Sorry if it's something obvious or stupid, but please consider this
(no pointers involved).

Click to expand...

You are mistaken, of course there are pointers involved. Just not
pointer objects.

Um, of course there are pointers. I believe I do understand what &
operator means, and what's passed to memcpy(). I meant something like
"aliasing via pointer casting". If there is this sort of aliasing
involved, *then* I am mistaken.

There are two pointers used in the function call statement. The &
operator generates two addresses, which are passed to memcpy() as
pointers to void.

The behavior is undefined.

Why? This line does not access value of a. If anything violates
aliasing rules here, then it's memcpy() call. Does it violate
aliasing rules (I honestly can't see how they are applicable here).

The behavior is undefined.

The behavior is undefined.

None of the three are valid. The standard does not give you
permission to access an lvalue of type unsigned int after writing some
or all of the bits of a double to it.

Um, I guess this is the main thing. I can't understand this, since
I can't see the difference between

memcpy(&intvalue, &doublevalue,1);

and

memcpy(charbuf,&doublevalue,1); memcpy(&intvalue,charbuf,1);

This is the thing I can't get: how do bits copied from
an object carry object's type? There is explicit wording
about memcpy() in 6.5p5, but that's only about case
when destination doesn't have effective type.
Does 6.5p6 imply this: "you can read a bit from an object only using
an lvalue of appropriate type, regardless of how far away you carried
that bit using a char array or anything else" (sounds like some
physics to me); and to get it more ridiculous: can you print
bit sequence of a double to the screen, write it down, enter it
back, and initialize an unsigned int using that bit sequence?

I am beginning to wonder if you are being deliberately obtuse.

"Magic" was a leftover from extensive editing. Sorry for that.

Read the first five paragraphs of 6.2.6.1 about representation of
types, and the terms "object representation", and "trap
representation".

I said "Suppose double is eight bytes big, int is four bytes, there are
no padding bits in int." No problems with representation. If my examples
are bad, then it's for reasons other than invalid/trap representation.

[snip]

Yevgen

Flash Gordon · Mar 16, 2007

Yevgen Muntyan wrote, On 16/03/07 03:19:

Jack said:
Jack said:

Jack Klein wrote:
On 14 Mar 2007 15:10:44 -0700, "Old Wolf" <[email protected]>
wrote in comp.lang.c:

Ok, we've had two long and haphazard threads about unions recently,
and I still don't feel any closer to certainty about what is permitted
and what isn't. The other thread topics were "Real Life Unions"
and "union {unsigned char u[10]; ...} ".
Most of the rambling was caused by the original OP, I think, rather
than the material. I am not criticizing, just observing.
[snip]
There is no difference in aliasing in a union than there is via
pointer casting.
Sorry if it's something obvious or stupid, but please consider this
(no pointers involved).

Click to expand...

You are mistaken, of course there are pointers involved. Just not
pointer objects.

Click to expand...

Um, of course there are pointers. I believe I do understand what &
operator means, and what's passed to memcpy(). I meant something like
"aliasing via pointer casting". If there is this sort of aliasing
involved, *then* I am mistaken.

There are two pointers used in the function call statement. The &
operator generates two addresses, which are passed to memcpy() as
pointers to void.

The behavior is undefined.

Click to expand...

Why? This line does not access value of a. If anything violates
aliasing rules here, then it's memcpy() call. Does it violate
aliasing rules (I honestly can't see how they are applicable here).

It accesses the bit pattern of (part of a) double as if it was an
unsigned int. That bit pattern might be a trap representation for
unsigned int. As was suggested, look up object and trap representations.

Um, I guess this is the main thing. I can't understand this, since
I can't see the difference between

memcpy(&intvalue, &doublevalue,1);

and

memcpy(charbuf,&doublevalue,1); memcpy(&intvalue,charbuf,1);

There is no difference. In either case accessing intvalue afterwards
invokes undefined behaviour.

This is the thing I can't get: how do bits copied from
an object carry object's type?

They do not. However, what is a valid bit pattern for one type might be
a trap for another.

> There is explicit wording
about memcpy() in 6.5p5, but that's only about case
when destination doesn't have effective type.
Does 6.5p6 imply this: "you can read a bit from an object only using
an lvalue of appropriate type, regardless of how far away you carried
that bit using a char array or anything else" (sounds like some
physics to me); and to get it more ridiculous: can you print
bit sequence of a double to the screen, write it down, enter it
back, and initialize an unsigned int using that bit sequence?

You are the one assuming there is some kind of magic going on, there
isn't. There is not guarantee that the bit pattern that is valid for a
double is valid for any other type apart from unsigned char.

"Magic" was a leftover from extensive editing. Sorry for that.

I said "Suppose double is eight bytes big, int is four bytes, there are
no padding bits in int."

The C standard takes in to account implementations where that is not
true, therefore the C standard leave undefined things which might have
an obvious possible definition for a simple implementation you select.

Undefined does not mean is *will* go BANG, only that it is allowed to.

> No problems with representation. If my examples
are bad, then it's for reasons other than invalid/trap representation.

Again, you are wrong. All bits 1 is allowed to be a trap representation
for a signed integer type if there are no padding bits and 2s complement
is used. For sign-magnitude and ones complement machines -0 is allowed
to be a trap.

If you are going to exclude all the possible implementations where it
could go wrong then of course you will not see why it is undefined, but
if you allow for the range of implementations that the C standard allows
for then you will see that it makes sense for it to be undefined.

Yevgen Muntyan · Mar 16, 2007

Flash said:
Yevgen Muntyan wrote, On 16/03/07 03:19:

Jack said:

On Thu, 15 Mar 2007 05:55:15 GMT, Yevgen Muntyan

Jack Klein wrote:
On 14 Mar 2007 15:10:44 -0700, "Old Wolf" <[email protected]>
wrote in comp.lang.c:

Ok, we've had two long and haphazard threads about unions recently,
and I still don't feel any closer to certainty about what is
permitted
and what isn't. The other thread topics were "Real Life Unions"
and "union {unsigned char u[10]; ...} ".
Most of the rambling was caused by the original OP, I think, rather
than the material. I am not criticizing, just observing.
[snip]
There is no difference in aliasing in a union than there is via
pointer casting.
Sorry if it's something obvious or stupid, but please consider this
(no pointers involved).

You are mistaken, of course there are pointers involved. Just not
pointer objects.

Click to expand...

Um, of course there are pointers. I believe I do understand what &
operator means, and what's passed to memcpy(). I meant something like
"aliasing via pointer casting". If there is this sort of aliasing
involved, *then* I am mistaken.

Suppose double is eight bytes big, int is four bytes, there are
no padding bits in int.

/* (1) get bits from a double and see what happens */
double a = 3.45; unsigned int b;
memcpy(&b, &a, sizeof b);

There are two pointers used in the function call statement. The &
operator generates two addresses, which are passed to memcpy() as
pointers to void.

printf("%u", b);

The behavior is undefined.

Click to expand...

Why? This line does not access value of a. If anything violates
aliasing rules here, then it's memcpy() call. Does it violate
aliasing rules (I honestly can't see how they are applicable here).

Click to expand...

It accesses the bit pattern of (part of a) double as if it was an
unsigned int. That bit pattern might be a trap representation for
unsigned int. As was suggested, look up object and trap representations.
....

No problems with representation. If my examples
are bad, then it's for reasons other than invalid/trap representation.

Click to expand...

Again, you are wrong. All bits 1 is allowed to be a trap representation
for a signed integer type if there are no padding bits and 2s complement
is used. For sign-magnitude and ones complement machines -0 is allowed
to be a trap.

It was unsigned int. I specifically chose it and assumed no padding bits
in int to avoid problems with representation. I could formulate it in
more verbose ways, for instance wrap every example into an if() which
checks whether there are padding bits, to get pieces of code which
do *not* exhibit UB on any implementation. I did miss possibility of
padding bits in double. Here you go:

#include <limits.h>

#if 1
#define LONGSIZE 4
#else
#define LONGSIZE 8
#endif

unsigned long
kind_of_log2 (unsigned long n)
{
unsigned i;
for (i = 1; n; ++i)
n >> 1;
return i;
}

int main (void)
{
if (sizeof(int) == 4 && CHAR_BIT == 8 &&
kind_of_log2(UINT_MAX) == 32 &&
( (sizeof(long) == 4 && kind_of_log2(ULONG_MAX) == 32) ||
(sizeof(long) == 8 && kind_of_log2(ULONG_MAX) == 64) ))
{
long a = 1658; unsigned int b;
memcpy(&b, &a, sizeof b);
}
return 0;
}

kind_of_log2() may be wrong here, but it's certainly possible to
write a correct one which would compute number of value bits
(there are even constants for that, right?)

If you are going to exclude all the possible implementations where it
could go wrong then of course you will not see why it is undefined, but
if you allow for the range of implementations that the C standard allows
for then you will see that it makes sense for it to be undefined.

As far as I understand aliasing rules were invented to allow for
optimizations, not to avoid problems with trap representations.
I may be wrong, but rationale doesn't say so, IIRC. Now, does it
make sense for standard to make the code above undefined because
there may be trap representations somewhere?
It may be true, for exactly the reason you say, so standard doesn't
bother to spell out conditions when there are no problems with
trap representation and stuff (it'd be pointless since code like
this is bad). But then what place of the standard makes this
code undefined explicitly? If it doesn't, then 6.2 describes
exactly what happens in the code above (it doesn't tell what
value will be in 'b', that's implementation-defined, and the
program above will tell you).

Best regards,
Yevgen

Pierre Asselin · Mar 16, 2007

Not so. The compiler is able to insert conversion code to
implement a cast. It is not able to do so for an aliased union.

Jack said *pointer* casting. I assume he means something like
*((target_type *) &source_var), subject to size and alignment match.
The cast changes the bits of &source_var, but not source_var itself.

Robert Gamble · Mar 18, 2007

Jack Klein wrote:

... snip ...

Not so. The compiler is able to insert conversion code to
implement a cast. It is not able to do so for an aliased union.
Which is why it is implementation or undefined behavior to access a
union component as other than the form in which it was stored.

To illustrate, consider a perverse 16 bit machine in which pointers
are machine addresses stored hi order byte first, and integers are
stored low order byte first. How is the compiler to know when to
flip the bytes when accessed through a union?

DR 283 addresses this by adding a footnote to 6.5.2.3#3 which states:
"If the member used to access the contents of a union object is not
the same as the member last used to store a value in the object, the
appropriate part of the object representation of the value is
reinterpreted as an object representation in the new type as described
in 6.2.6 (a process sometimes called "type punning"). This might be a
trap representation."

Robert Gamble

Old Wolf · Mar 18, 2007

It was unsigned int. I specifically chose it and assumed no padding bits
in int to avoid problems with representation.
#if 1
#define LONGSIZE 4
#else
#define LONGSIZE 8
#endif

Look, none of this matters. The C Standard explicitly says that
it is undefined behaviour to read the bit pattern of a double as
if it were an unsigned int (and vice versa) -- end of story. It
doesn't say "sometimes undefined" or "undefined if size of long
is not 4 blah blah blah".

If you want to question why the Standard says this, or perhaps
suggest the Standard be changed -- the place for that is the
newsgroup comp.std.c .

muntyan · Mar 18, 2007

Look, none of this matters. The C Standard explicitly says that
it is undefined behaviour to read the bit pattern of a double as
if it were an unsigned int (and vice versa) -- end of story. It
doesn't say "sometimes undefined" or "undefined if size of long
is not 4 blah blah blah".

Where does it say that? Is it aliasing rules or something else?
Because I was suggested here it doesn't matter whether we do

memcpy(&intval, &doubleval, sizeof intval);

or

memcpy(chararray, &doubleval, sizeof intval);
memcpy(&intval, chararray, sizeof intval);

and clearly aliasing rules do not apply to the latter. If it's indeed
aliasing rules, then so be it, it'd be a magic and it'd be fine. In
two posts I was told about trap values, which I take as a possible
explanation of why standard would opt to say something is undefined
(i.e. the reason, the rationale), this is fine. But what exactly place
in the standard says that

unsigned long longval;
unsigned int intval;
/* make sure no trap representation involved */
....
memcpy(&intval, &longval, sizeof intval);

is undefined?

Best regards,
Yevgen

gcc, aliasing rules and unions	3	Apr 18, 2006
byte alignment in structures and unions	20	Aug 9, 2007
Can one get away with an under-allocated union?	5	Dec 25, 2010
Aliasing rules - int and long	10	Mar 14, 2007
strict aliasing rules in ISO C, someone understands them ?	20	Oct 13, 2005
Object persistence in C	11	Jun 29, 2005
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008

Unions Redux

Old Wolf

Pierre Asselin

Jack Klein

Robert Gamble

Old Wolf

Yevgen Muntyan

Robert Gamble

Yevgen Muntyan

CBFalconer

Robert Gamble

Yevgen Muntyan

Jack Klein

Pierre Asselin

Yevgen Muntyan

Flash Gordon

Yevgen Muntyan

Pierre Asselin

Robert Gamble

Old Wolf

muntyan

Ask a Question

Similar Threads

Staff online

Members online

Forum statistics

Latest Threads