Repeated types in union

E

Edward Rutherford

Hello :

Is the following code an undefined behavior?


union {
int a;
int b;
} u;
u.a = 3;
printf("%d\n", u.b);


Cheers

Edward
 
J

Jens Gustedt

Am 12/10/2011 11:06 PM, schrieb Edward Rutherford:
Is the following code an undefined behavior?


union {
int a;
int b;
} u;
u.a = 3;
printf("%d\n", u.b);

not that I see. Acessing a different member than the one was last stored
is only undefined behavior if the bit pattern results in a trap
representation for the new type.

If the first member has padding bytes that the other type uses for its
data representation, the value of these bytes is *unspecified* which is
not the same thing as UB.

In any case, none of these things can happen for your example.

Jens
 
E

Eric Sosman

Hello :

Is the following code an undefined behavior?


union {
int a;
int b;
} u;
u.a = 3;
printf("%d\n", u.b);

(I rush in where angels fear to tread...)

First, there's no problem with the issue mentioned in your
subject line: It's perfectly all right to have several union members
with distinct names but the same type. If that were not so, even
something as simple as `union { int i; time_t t; } u;' could be in
trouble. See also 6.2.5p20, which says that union members have
"possibly distinct" types.

The "write one member, read another" question has been discussed
more than once, and my impression of the debates is that there have
been two camps: Not "It's legal" and "It's illegal," but "It's legal"
and "You'll probably get away with it, but it might not be squeaky-
clean, and my head hurts can we talk about something else, please?"
(I'm in the latter camp.)

It's clear (from 6.2.6.1) that writing `u.a' deposits bytes that
represent `3', and that `u.b' thereby receives the same bytes. No
argument there: The storage allocated to `u.b' holds a representation
of `3'.

The part that makes my head ache is figuring out whether the
compiler is required to "notice" that storing to `u.a' affects the
value of `u.b'. If the compiler has already loaded `u.b' into a
register, say, is it required to re-fetch because `u.a' was changed?
Is the compiler allowed to consider `u.b' uninitialized because it
has never been stored to, despite the store to `u.a'?

To those in the "It's legal" camp, I offer a few puzzling and
possibly disturbing points:

- The footnote to 6.2.5p21 points out that "an object with union
type can only contain one member at a time" -- meaning that if
`u' contains `u.a', it does not contain `u.b'. Footnotes, of
course, are suggestive but non-normative.

- The footnote to 6.5.2.3p3 supports the "It's legal" camp by
describing the mechanism of type punning. Footnotes, of course,
are suggestive but non-normative.

- 6.5.2.3p5 gives a "special guarantee" for union members that
are structs, but does not extend a similar guarantee for other
member types.

- 6.7.2.1p14 has the normative language for the first footnote
mentioned above: "The value of at most one of the members can be
stored in a union object at any time." Your `u' can hold `u.a'
or `u.b', but not both at once.

Those are the citations I can find (if I've missed any I'm sure
others will point them out). Their cumulative impression on me is
that the matter is not settled beyond doubt, but the aforementioned
angels may see things differently.

As a practical matter, it's not all that important what I think
or what the angels think, but what the providers of your compilers
think. If a compiler does something unfortunate with your code you
will find yourself retracing this same argument with implementors
who are trying to stamp NOT A BUG on your complaint. If the angels
weigh in on your side, the implementors of the offending compiler
may eventually accede and agree to ship a fix -- "In a forthcoming
release," oh joy, oh joy. I think you might choose better battles:
Fight over things you Really Really Need and are Really Solid Bugs,
and don't waste troops trying to subjugate the unpopulated hinterland.
 
B

Barry Schwarz

Hello :

Is the following code an undefined behavior?


union {
int a;
int b;
} u;
u.a = 3;
printf("%d\n", u.b);

In C89, paragraph 3.3.2.3 states "With one exception, if a member of a
union object is accessed after a value has been stored in a different
member of the object, the behavior is implementation-defined." The
exception referred to is not related to your example. So the answer
to your question is: yes if the implementation says it is and no if
the implementation says something else.

In C99, the reference to implementation defined is removed.
Furthermore, paragraph 6.2.6.1-7 states "When a value is stored in a
member of an object of union type, the bytes of the object
representation that do not correspond to that member but do correspond
to other members take unspecified values." Since a and b occupy the
same bytes, none of those byte become unspecified. And footnote 82
indicates the intended behavior is for the bits of b to
"reinterpreted" for the type of b. Since both a and b have the same
type, it seems to me the intention is to retrieve the same value.
 
J

Jens Gustedt

Am 12/12/2011 06:49 PM, schrieb christian.bau:
You are right, but that seems to have some awful consequences. Take
this code:

union {
int a;
long b;
} u;
u.a = 3;
printf("%ld\n", u.b);

So on an implementation where int and long have the same size and
representation, this code would be well-defined and print "3"?

Now take this code:

void f (int* a, long* b) { *a = 3; *b = 4; *a = *a + 2; }

If I call f (&u.a, &u.b) is this required to set both to 6?
And since the compiler doesn't know that I'm going to make this call,
lots of optimization goes out of the window?

If I remember correctly the aliasing rules state that the compiler is
allowed to assume that a and b (insided the function) point to different
objects because they are of different types. Thus in the second
assignment to *a the compiler can assume that *a is still 3 and store 5
in place.

Jens
 
E

Edward Rutherford

Eric said:
(I rush in where angels fear to tread...)

First, there's no problem with the issue mentioned in your
subject line: It's perfectly all right to have several union members
with distinct names but the same type. If that were not so, even
something as simple as `union { int i; time_t t; } u;' could be in
trouble. See also 6.2.5p20, which says that union members have
"possibly distinct" types.

The "write one member, read another" question has been discussed
more than once, and my impression of the debates is that there have been
two camps: Not "It's legal" and "It's illegal," but "It's legal" and
"You'll probably get away with it, but it might not be squeaky- clean,
and my head hurts can we talk about something else, please?" (I'm in the
latter camp.)

It's clear (from 6.2.6.1) that writing `u.a' deposits bytes that
represent `3', and that `u.b' thereby receives the same bytes. No
argument there: The storage allocated to `u.b' holds a representation of
`3'.

The part that makes my head ache is figuring out whether the
compiler is required to "notice" that storing to `u.a' affects the value
of `u.b'. If the compiler has already loaded `u.b' into a register,
say, is it required to re-fetch because `u.a' was changed? Is the
compiler allowed to consider `u.b' uninitialized because it has never
been stored to, despite the store to `u.a'?

To those in the "It's legal" camp, I offer a few puzzling and
possibly disturbing points:

- The footnote to 6.2.5p21 points out that "an object with union
type can only contain one member at a time" -- meaning that if
`u' contains `u.a', it does not contain `u.b'. Footnotes, of
course, are suggestive but non-normative.

- The footnote to 6.5.2.3p3 supports the "It's legal" camp by
describing the mechanism of type punning. Footnotes, of course,
are suggestive but non-normative.

- 6.5.2.3p5 gives a "special guarantee" for union members that
are structs, but does not extend a similar guarantee for other
member types.

- 6.7.2.1p14 has the normative language for the first footnote
mentioned above: "The value of at most one of the members can be
stored in a union object at any time." Your `u' can hold `u.a'
or `u.b', but not both at once.

Those are the citations I can find (if I've missed any I'm sure
others will point them out). Their cumulative impression on me is that
the matter is not settled beyond doubt, but the aforementioned angels
may see things differently.

As a practical matter, it's not all that important what I think
or what the angels think, but what the providers of your compilers
think. If a compiler does something unfortunate with your code you will
find yourself retracing this same argument with implementors who are
trying to stamp NOT A BUG on your complaint. If the angels weigh in on
your side, the implementors of the offending compiler may eventually
accede and agree to ship a fix -- "In a forthcoming release," oh joy, oh
joy. I think you might choose better battles: Fight over things you
Really Really Need and are Really Solid Bugs, and don't waste troops
trying to subjugate the unpopulated hinterland.

Thanks for the explanation, Eric.

Does that mean the "It's Legal" brigade would say it's always legal to
read an unsigned char from an union, whatever was previously stored in
it, on the grounds that an unsigned char cannot contain a trap
representation?
 
R

ralph

As a practical matter, it's not all that important what I think
or what the angels think, but what the providers of your compilers
think. If a compiler does something unfortunate with your code you
will find yourself retracing this same argument with implementors
who are trying to stamp NOT A BUG on your complaint. If the angels
weigh in on your side, the implementors of the offending compiler
may eventually accede and agree to ship a fix -- "In a forthcoming
release," oh joy, oh joy. I think you might choose better battles:
Fight over things you Really Really Need and are Really Solid Bugs,
and don't waste troops trying to subjugate the unpopulated hinterland.

Here! Here!

Consider that quoted and stolen. <bg>

-ralph
 
J

Jens Gustedt

Am 12/12/2011 09:09 PM, schrieb Edward Rutherford:
Thanks for the explanation, Eric.

Does that mean the "It's Legal" brigade would say it's always legal to
read an unsigned char from an union, whatever was previously stored in
it, on the grounds that an unsigned char cannot contain a trap
representation?

One thing is sure, the standard explicitly mandates to copy any object
(with memcpy) to an array of `unsigned char`. This is even the way the
term object representation is introduced.

So first of all this means that we are allowed to read all the bytes of
a union. Second it means that all bytes of of the object representation
can be interpreted as unsigned char.

Jens
 
E

Eric Sosman

[...]
Does that mean the "It's Legal" brigade would say it's always legal to
read an unsigned char from an union, whatever was previously stored in
it, on the grounds that an unsigned char cannot contain a trap
representation?

The varieties of `char' are something of a special case, because
C has always had the notion that it's possible to inspect and maybe
fiddle with the individual bytes of a multi-byte object. At your
peril, of course, since you might invalidate the multi-byte thing.
But still: Things like memcpy() are defined in terms of copying the
individual bytes, and the copy of a valid object must itself be
valid.

The Standard tightens this just a trifle, by allowing the `char'
flavors other than `unsigned' to have trap representations. Still,
`unsigned char' remains as the "atom" of C memory: Its mapping between
representations and values is one-to-one, which guarantees fidelity
both in value and in representation when copying or comparing, and
also guarantees that there are no trap representations.

But back to the `union' issue: I'm still not 100% comfortable
with the idea of writing to one member and reading another. It sort
of looks like it should work, but I've not heard a watertight argument
that it *must* work, even in the face of a ferociously aggressive
optimizer. I think the "It's legal" faction have found arguments they
deem satisfactory; perhaps they've looked more diligently than I have.

Down to nuts and bolts: Is this a theoretical question, or do you
have an actual use case in mind? If the latter, could you describe it?
Maybe someone will be able to say "Well, in *that* case it works" or
"If you did it *this other* way you wouldn't care."
 
J

Jens Gustedt

Hello,

Am 12/14/2011 12:10 AM, schrieb christian.bau:
You are right. On the other hand, footnote 82 says:

"If the member used to access the contents of a union object is not
the same as the member last used to store a value in the object, the
appropriate part of the object representation of the value is
reinterpreted as an object representation in the new type as described
in 6.2.6 (a process sometimes called "type punning"). "

Which is a direct contradiction. I am assuming that the rules for
union members apply in the same way whether the compiler knows that it
is accessing different members of the same union or not.

I think this assumption can't be made. Generally, inside a function the
compiler has no way to know that the pointers originate from the same
object. In the contrary the aliasing rules were invented to assure that
under the given circumstances the *must* point to different objects.

And these things happen. gcc assumes (or at least there has been some
version of gcc) that they are different, even if the function is inlined
and it could deduce that both point to the same address.

(Also, footnotes in the standard are not normative)

Jens
 
K

Keith Thompson

christian.bau said:
I bet more than one person has tried to read the representation of a
float or double as a 32 or 64 bit integer. Last time I tried, I found
one way that worked on one compiler and failed on another, and another
way that worked on the other compiler and failed at the first (one
method was using a union, one was casting the address of a float to
"pointer to unsigned int"), but I couldn't find any code that worked
on both compilers. And having code with an #ifdef checking the
compiler that is used doesn't really inspire confidence in the code
:-(

So use memcpy(). (I suppose that's not strictly portable to
freestanding implementations, but I'd expect memcpy() to be one of the
things that most freestanding implementations actually provide.)
 
J

James Kuyper

On 12/13/2011 06:17 PM, christian.bau wrote:
....
I bet more than one person has tried to read the representation of a
float or double as a 32 or 64 bit integer. Last time I tried, I found
one way that worked on one compiler and failed on another, and another
way that worked on the other compiler and failed at the first (one
method was using a union, one was casting the address of a float to
"pointer to unsigned int"), but I couldn't find any code that worked
on both compilers.

Try reading it using unsigned char; if that doesn't work (for
appropriate values of "work"), the implementation is non-conforming.
 
N

Noob

christian.bau said:
I bet more than one person has tried to read the representation of a
float or double as a 32 or 64 bit integer. Last time I tried, I found
one way that worked on one compiler and failed on another, and another
way that worked on the other compiler and failed at the first (one
method was using a union, one was casting the address of a float to
"pointer to unsigned int"), but I couldn't find any code that worked
on both compilers. And having code with an #ifdef checking the
compiler that is used doesn't really inspire confidence in the code :-(

Doesn't the following work for you?

#include <stdlib.h>
#include <string.h>
unsigned char *foo(double d)
{
size_t n = sizeof d;
unsigned char *buf = malloc(n);
if (buf) memcpy(buf, &d, n);
return buf;
}
 
T

Tim Rentsch

christian.bau said:
Down to nuts and bolts: Is this a theoretical question, or do you
have an actual use case in mind? If the latter, could you describe it?
Maybe someone will be able to say "Well, in *that* case it works" or
"If you did it *this other* way you wouldn't care."

I bet more than one person has tried to read the representation of a
float or double as a 32 or 64 bit integer. Last time I tried, I found
one way that worked on one compiler and failed on another, and another
way that worked on the other compiler and failed at the first (one
method was using a union, one was casting the address of a float to
"pointer to unsigned int"), but I couldn't find any code that worked
on both compilers. [snip]

The straghtforward method using a union is required to work.
More specifically, this (assuming uint64_t exists and double
is 64 bits):

double d = ...something...;
union { double d; uint64_t u64; } u = { d };
uint64_t u64 = u.u64;

If it didn't then that compiler is not conforming.
 
T

Tim Rentsch

Barry Schwarz said:
In C89, paragraph 3.3.2.3 states "With one exception, if a member of a
union object is accessed after a value has been stored in a different
member of the object, the behavior is implementation-defined." The
exception referred to is not related to your example. So the answer
to your question is: yes if the implementation says it is and no if
the implementation says something else.

In C99, the reference to implementation defined is removed.
Furthermore, paragraph 6.2.6.1-7 states "When a value is stored in a
member of an object of union type, the bytes of the object
representation that do not correspond to that member but do correspond
to other members take unspecified values." Since a and b occupy the
same bytes, none of those byte become unspecified. And footnote 82
indicates the intended behavior is for the bits of b to
"reinterpreted" for the type of b. Since both a and b have the same
type, it seems to me the intention is to retrieve the same value.

I agree with your analysis, but just wanted to add one
item. Practically speaking, the behavior under C89/C90
and C99 is likely to be the same. This idea is also
supported by DR 283 (which is what prompted adding the
footnote), which makes it clear that the intended
semantics in the two cases is meant to be the same.
 
T

Tim Rentsch

christian.bau said:
You are right, but that seems to have some awful consequences. Take
this code:

union {
int a;
long b;
} u;
u.a = 3;
printf("%ld\n", u.b);

So on an implementation where int and long have the same size and
representation, this code would be well-defined and print "3"?
Yes.


Now take this code:

void f (int* a, long* b) { *a = 3; *b = 4; *a = *a + 2; }

If I call f (&u.a, &u.b) is this required to set both to 6?

No. The semantics of f() are different from the earlier
example because of some subtleties in effective type rules.
In fact that makes f() have undefined behavior for the
particular call mentioned.

And since the compiler doesn't know that I'm going to make this call,
lots of optimization goes out of the window?

No, the optimizations are still okay, because of
how effective type rules work.
 
T

Tim Rentsch

christian.bau said:
You are right. On the other hand, footnote 82 says:

"If the member used to access the contents of a union object is not
the same as the member last used to store a value in the object, the
appropriate part of the object representation of the value is
reinterpreted as an object representation in the new type as described
in 6.2.6 (a process sometimes called "type punning"). "

Which is a direct contradiction. I am assuming that the rules for
union members apply in the same way whether the compiler knows that it
is accessing different members of the same union or not.

It isn't a contradiction because of how the objects are
accessed is different in the two cases. When a member
is accessed (ie, using '.' or '->') the effective type
is determined by the declared type of the member.
When an object is accessed through a pointer, there
is no declared type, so the rule for what the effective
type is or must be is different.
 
T

Tim Rentsch

Eric Sosman said:
Hello :

Is the following code an undefined behavior?


union {
int a;
int b;
} u;
u.a = 3;
printf("%d\n", u.b);

[snip]

The "write one member, read another" question has been discussed
more than once, and my impression of the debates is that there have
been two camps: Not "It's legal" and "It's illegal," but "It's legal"
and "You'll probably get away with it, but it might not be squeaky-
clean, and my head hurts can we talk about something else, please?"
(I'm in the latter camp.)

Let's see if we can get you over into that other camp. :)
It's clear (from 6.2.6.1) that writing `u.a' deposits bytes that
represent `3', and that `u.b' thereby receives the same bytes. No
argument there: The storage allocated to `u.b' holds a representation
of `3'.

The part that makes my head ache is figuring out whether the
compiler is required to "notice" that storing to `u.a' affects the
value of `u.b'. If the compiler has already loaded `u.b' into a
register, say, is it required to re-fetch because `u.a' was changed?
Is the compiler allowed to consider `u.b' uninitialized because it
has never been stored to, despite the store to `u.a'?

The case in question is quite straightforward, because the two
members must occupy the same bytes (on every implementation)
and also have the same type. Hence the accesses do not violate
the effective type rules, and must proceed as described by the
semantics.

The semantics in this case are defined principally by 6.2.5 p20
and 6.3.2.1 p2. There is also the question of how the two
objects line up relative to one another, but that follows by
virtue of unions not having any padding before any members. (I'm
sure interested parties can find the appropriate references.)
These paragraphs are pretty simple to read; I don't see any
room for uncertainty. Since the accesses in this case clearly
do not violate the effective type rules, the behavior is
correspondingly well-defined.

To respond to your other points:
To those in the "It's legal" camp, I offer a few puzzling and
possibly disturbing points:

- The footnote to 6.2.5p21 points out that "an object with union
type can only contain one member at a time" -- meaning that if
`u' contains `u.a', it does not contain `u.b'. Footnotes, of
course, are suggestive but non-normative.

This comment is made in the context of defining the term
"aggregate type". Clearly a union is not an aggregate type
because it cannot hold two (or more) independent values. I don't
think there's any mystery about that.
- The footnote to 6.5.2.3p3 supports the "It's legal" camp by
describing the mechanism of type punning. Footnotes, of course,
are suggestive but non-normative.

And the comment in the footnote is supported by normative text,
as noted above.
- 6.5.2.3p5 gives a "special guarantee" for union members that
are structs, but does not extend a similar guarantee for other
member types.

It does, but notice that the guarantee made here is stronger
than just other member access. Under this passage we are
allowed to access struct members inside a union object _even
though no mention is made of a union at the point of access_.
It's a special guarantee because it's a stronger guarantee
than holds for other union member types.
- 6.7.2.1p14 has the normative language for the first footnote
mentioned above: "The value of at most one of the members can be
stored in a union object at any time." Your `u' can hold `u.a'
or `u.b', but not both at once.

What it says is that at most one member can be _stored_ at any
one time. That is obviously true since storing into another member
will eradicate the effects of the first store. The union can't
hold two independent values, but it does hold the object referred
to by u.b, and that happens to be the same object as the one
referred to by u.a. Again, I don't think there's any mystery
here -- all that's being described is the destructive effects
of a member store on previous stores, in much the same way
that the effects of 'i = 3;' are wiped out by a subsequent 'i = 4;'.
It isn't talking about read access, just stores.

Those are the citations I can find (if I've missed any I'm sure
others will point them out).

I've looked fairly carefully, and didn't find any others.
Their cumulative impression on me is
that the matter is not settled beyond doubt, but the aforementioned
angels may see things differently.

Hopefully you're a little closer now to seeing the light. :)

As a practical matter, it's not all that important what I think
or what the angels think, but what the providers of your compilers
think. [snip]

There always are practical considerations dealing with any C
language question on any compiler. My preference is to disentangle
the two sets of considerations, and work to understand one without
confusing myself thinking about the other. Then, having a thoroughly
considered understanding of questions in one area, that normally
helps make a more informed decision as regards the larger issues.
And I think that is a good course here.
 
T

Tim Rentsch

Eric Sosman said:
[snip]

But back to the `union' issue: I'm still not 100% comfortable
with the idea of writing to one member and reading another. It sort
of looks like it should work, but I've not heard a watertight argument
that it *must* work, even in the face of a ferociously aggressive
optimizer. I think the "It's legal" faction have found arguments they
deem satisfactory; perhaps they've looked more diligently than I have.

Questions about optimization are complicated because the rules
regarding effective types (obviously pertinent to optimization)
are subtle. However, the simple cases are not subtle. If we
consider a case like this:

double d;
union { double d; uint64_t u64; } u;
uint64_t u64_bits_of_double;

d = ... some value ... ;
u.d = d;
u64_bits_of_double = u.u64;

the effective type considerations are quite straightforward,
because all the accesses involved are done using declared types.
There is no doubt that the accesses here meet the requirements of
the effective type rules; so any optimizations, no matter how
aggressive, must be faithful to the defined semantics. (The
example of course assumes that double is 64 bits and uint64_t
is defined.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top