Repeated types in union

Discussion in 'C Programming' started by Edward Rutherford, Dec 10, 2011.

  1. Hello :

    Is the following code an undefined behavior?


    union {
    int a;
    int b;
    } u;
    u.a = 3;
    printf("%d\n", u.b);


    Cheers

    Edward
    Edward Rutherford, Dec 10, 2011
    #1
    1. Advertising

  2. Edward Rutherford

    Jens Gustedt Guest

    Am 12/10/2011 11:06 PM, schrieb Edward Rutherford:
    > Is the following code an undefined behavior?
    >
    >
    > union {
    > int a;
    > int b;
    > } u;
    > u.a = 3;
    > printf("%d\n", u.b);


    not that I see. Acessing a different member than the one was last stored
    is only undefined behavior if the bit pattern results in a trap
    representation for the new type.

    If the first member has padding bytes that the other type uses for its
    data representation, the value of these bytes is *unspecified* which is
    not the same thing as UB.

    In any case, none of these things can happen for your example.

    Jens
    Jens Gustedt, Dec 10, 2011
    #2
    1. Advertising

  3. Edward Rutherford

    Eric Sosman Guest

    On 12/10/2011 5:06 PM, Edward Rutherford wrote:
    > Hello :
    >
    > Is the following code an undefined behavior?
    >
    >
    > union {
    > int a;
    > int b;
    > } u;
    > u.a = 3;
    > printf("%d\n", u.b);


    (I rush in where angels fear to tread...)

    First, there's no problem with the issue mentioned in your
    subject line: It's perfectly all right to have several union members
    with distinct names but the same type. If that were not so, even
    something as simple as `union { int i; time_t t; } u;' could be in
    trouble. See also 6.2.5p20, which says that union members have
    "possibly distinct" types.

    The "write one member, read another" question has been discussed
    more than once, and my impression of the debates is that there have
    been two camps: Not "It's legal" and "It's illegal," but "It's legal"
    and "You'll probably get away with it, but it might not be squeaky-
    clean, and my head hurts can we talk about something else, please?"
    (I'm in the latter camp.)

    It's clear (from 6.2.6.1) that writing `u.a' deposits bytes that
    represent `3', and that `u.b' thereby receives the same bytes. No
    argument there: The storage allocated to `u.b' holds a representation
    of `3'.

    The part that makes my head ache is figuring out whether the
    compiler is required to "notice" that storing to `u.a' affects the
    value of `u.b'. If the compiler has already loaded `u.b' into a
    register, say, is it required to re-fetch because `u.a' was changed?
    Is the compiler allowed to consider `u.b' uninitialized because it
    has never been stored to, despite the store to `u.a'?

    To those in the "It's legal" camp, I offer a few puzzling and
    possibly disturbing points:

    - The footnote to 6.2.5p21 points out that "an object with union
    type can only contain one member at a time" -- meaning that if
    `u' contains `u.a', it does not contain `u.b'. Footnotes, of
    course, are suggestive but non-normative.

    - The footnote to 6.5.2.3p3 supports the "It's legal" camp by
    describing the mechanism of type punning. Footnotes, of course,
    are suggestive but non-normative.

    - 6.5.2.3p5 gives a "special guarantee" for union members that
    are structs, but does not extend a similar guarantee for other
    member types.

    - 6.7.2.1p14 has the normative language for the first footnote
    mentioned above: "The value of at most one of the members can be
    stored in a union object at any time." Your `u' can hold `u.a'
    or `u.b', but not both at once.

    Those are the citations I can find (if I've missed any I'm sure
    others will point them out). Their cumulative impression on me is
    that the matter is not settled beyond doubt, but the aforementioned
    angels may see things differently.

    As a practical matter, it's not all that important what I think
    or what the angels think, but what the providers of your compilers
    think. If a compiler does something unfortunate with your code you
    will find yourself retracing this same argument with implementors
    who are trying to stamp NOT A BUG on your complaint. If the angels
    weigh in on your side, the implementors of the offending compiler
    may eventually accede and agree to ship a fix -- "In a forthcoming
    release," oh joy, oh joy. I think you might choose better battles:
    Fight over things you Really Really Need and are Really Solid Bugs,
    and don't waste troops trying to subjugate the unpopulated hinterland.

    --
    Eric Sosman
    d
    Eric Sosman, Dec 10, 2011
    #3
  4. On Sat, 10 Dec 2011 22:06:08 +0000 (UTC), Edward Rutherford
    <> wrote:

    >Hello :
    >
    >Is the following code an undefined behavior?
    >
    >
    > union {
    > int a;
    > int b;
    > } u;
    > u.a = 3;
    > printf("%d\n", u.b);


    In C89, paragraph 3.3.2.3 states "With one exception, if a member of a
    union object is accessed after a value has been stored in a different
    member of the object, the behavior is implementation-defined." The
    exception referred to is not related to your example. So the answer
    to your question is: yes if the implementation says it is and no if
    the implementation says something else.

    In C99, the reference to implementation defined is removed.
    Furthermore, paragraph 6.2.6.1-7 states "When a value is stored in a
    member of an object of union type, the bytes of the object
    representation that do not correspond to that member but do correspond
    to other members take unspecified values." Since a and b occupy the
    same bytes, none of those byte become unspecified. And footnote 82
    indicates the intended behavior is for the bits of b to
    "reinterpreted" for the type of b. Since both a and b have the same
    type, it seems to me the intention is to retrieve the same value.

    --
    Remove del for email
    Barry Schwarz, Dec 11, 2011
    #4
  5. Edward Rutherford

    Jens Gustedt Guest

    Am 12/12/2011 06:49 PM, schrieb christian.bau:
    > On Dec 11, 6:04 am, Barry Schwarz <> wrote:
    > You are right, but that seems to have some awful consequences. Take
    > this code:
    >
    > union {
    > int a;
    > long b;
    > } u;
    > u.a = 3;
    > printf("%ld\n", u.b);
    >
    > So on an implementation where int and long have the same size and
    > representation, this code would be well-defined and print "3"?
    >
    > Now take this code:
    >
    > void f (int* a, long* b) { *a = 3; *b = 4; *a = *a + 2; }
    >
    > If I call f (&u.a, &u.b) is this required to set both to 6?
    > And since the compiler doesn't know that I'm going to make this call,
    > lots of optimization goes out of the window?


    If I remember correctly the aliasing rules state that the compiler is
    allowed to assume that a and b (insided the function) point to different
    objects because they are of different types. Thus in the second
    assignment to *a the compiler can assume that *a is still 3 and store 5
    in place.

    Jens
    Jens Gustedt, Dec 12, 2011
    #5
  6. Eric Sosman wrote:

    > On 12/10/2011 5:06 PM, Edward Rutherford wrote:
    >> Hello :
    >>
    >> Is the following code an undefined behavior?
    >>
    >>
    >> union {
    >> int a;
    >> int b;
    >> } u;
    >> u.a = 3;
    >> printf("%d\n", u.b);

    >
    > (I rush in where angels fear to tread...)
    >
    > First, there's no problem with the issue mentioned in your
    > subject line: It's perfectly all right to have several union members
    > with distinct names but the same type. If that were not so, even
    > something as simple as `union { int i; time_t t; } u;' could be in
    > trouble. See also 6.2.5p20, which says that union members have
    > "possibly distinct" types.
    >
    > The "write one member, read another" question has been discussed
    > more than once, and my impression of the debates is that there have been
    > two camps: Not "It's legal" and "It's illegal," but "It's legal" and
    > "You'll probably get away with it, but it might not be squeaky- clean,
    > and my head hurts can we talk about something else, please?" (I'm in the
    > latter camp.)
    >
    > It's clear (from 6.2.6.1) that writing `u.a' deposits bytes that
    > represent `3', and that `u.b' thereby receives the same bytes. No
    > argument there: The storage allocated to `u.b' holds a representation of
    > `3'.
    >
    > The part that makes my head ache is figuring out whether the
    > compiler is required to "notice" that storing to `u.a' affects the value
    > of `u.b'. If the compiler has already loaded `u.b' into a register,
    > say, is it required to re-fetch because `u.a' was changed? Is the
    > compiler allowed to consider `u.b' uninitialized because it has never
    > been stored to, despite the store to `u.a'?
    >
    > To those in the "It's legal" camp, I offer a few puzzling and
    > possibly disturbing points:
    >
    > - The footnote to 6.2.5p21 points out that "an object with union
    > type can only contain one member at a time" -- meaning that if
    > `u' contains `u.a', it does not contain `u.b'. Footnotes, of
    > course, are suggestive but non-normative.
    >
    > - The footnote to 6.5.2.3p3 supports the "It's legal" camp by
    > describing the mechanism of type punning. Footnotes, of course,
    > are suggestive but non-normative.
    >
    > - 6.5.2.3p5 gives a "special guarantee" for union members that
    > are structs, but does not extend a similar guarantee for other
    > member types.
    >
    > - 6.7.2.1p14 has the normative language for the first footnote
    > mentioned above: "The value of at most one of the members can be
    > stored in a union object at any time." Your `u' can hold `u.a'
    > or `u.b', but not both at once.
    >
    > Those are the citations I can find (if I've missed any I'm sure
    > others will point them out). Their cumulative impression on me is that
    > the matter is not settled beyond doubt, but the aforementioned angels
    > may see things differently.
    >
    > As a practical matter, it's not all that important what I think
    > or what the angels think, but what the providers of your compilers
    > think. If a compiler does something unfortunate with your code you will
    > find yourself retracing this same argument with implementors who are
    > trying to stamp NOT A BUG on your complaint. If the angels weigh in on
    > your side, the implementors of the offending compiler may eventually
    > accede and agree to ship a fix -- "In a forthcoming release," oh joy, oh
    > joy. I think you might choose better battles: Fight over things you
    > Really Really Need and are Really Solid Bugs, and don't waste troops
    > trying to subjugate the unpopulated hinterland.


    Thanks for the explanation, Eric.

    Does that mean the "It's Legal" brigade would say it's always legal to
    read an unsigned char from an union, whatever was previously stored in
    it, on the grounds that an unsigned char cannot contain a trap
    representation?
    Edward Rutherford, Dec 12, 2011
    #6
  7. Edward Rutherford

    ralph Guest

    On Sat, 10 Dec 2011 18:14:08 -0500, Eric Sosman
    <> wrote:

    >
    > As a practical matter, it's not all that important what I think
    >or what the angels think, but what the providers of your compilers
    >think. If a compiler does something unfortunate with your code you
    >will find yourself retracing this same argument with implementors
    >who are trying to stamp NOT A BUG on your complaint. If the angels
    >weigh in on your side, the implementors of the offending compiler
    >may eventually accede and agree to ship a fix -- "In a forthcoming
    >release," oh joy, oh joy. I think you might choose better battles:
    >Fight over things you Really Really Need and are Really Solid Bugs,
    >and don't waste troops trying to subjugate the unpopulated hinterland.


    Here! Here!

    Consider that quoted and stolen. <bg>

    -ralph
    ralph, Dec 12, 2011
    #7
  8. Edward Rutherford

    Jens Gustedt Guest

    Am 12/12/2011 09:09 PM, schrieb Edward Rutherford:

    > Thanks for the explanation, Eric.
    >
    > Does that mean the "It's Legal" brigade would say it's always legal to
    > read an unsigned char from an union, whatever was previously stored in
    > it, on the grounds that an unsigned char cannot contain a trap
    > representation?


    One thing is sure, the standard explicitly mandates to copy any object
    (with memcpy) to an array of `unsigned char`. This is even the way the
    term object representation is introduced.

    So first of all this means that we are allowed to read all the bytes of
    a union. Second it means that all bytes of of the object representation
    can be interpreted as unsigned char.

    Jens
    Jens Gustedt, Dec 12, 2011
    #8
  9. Edward Rutherford

    Eric Sosman Guest

    On 12/12/2011 3:09 PM, Edward Rutherford wrote:
    > [...]
    > Does that mean the "It's Legal" brigade would say it's always legal to
    > read an unsigned char from an union, whatever was previously stored in
    > it, on the grounds that an unsigned char cannot contain a trap
    > representation?


    The varieties of `char' are something of a special case, because
    C has always had the notion that it's possible to inspect and maybe
    fiddle with the individual bytes of a multi-byte object. At your
    peril, of course, since you might invalidate the multi-byte thing.
    But still: Things like memcpy() are defined in terms of copying the
    individual bytes, and the copy of a valid object must itself be
    valid.

    The Standard tightens this just a trifle, by allowing the `char'
    flavors other than `unsigned' to have trap representations. Still,
    `unsigned char' remains as the "atom" of C memory: Its mapping between
    representations and values is one-to-one, which guarantees fidelity
    both in value and in representation when copying or comparing, and
    also guarantees that there are no trap representations.

    But back to the `union' issue: I'm still not 100% comfortable
    with the idea of writing to one member and reading another. It sort
    of looks like it should work, but I've not heard a watertight argument
    that it *must* work, even in the face of a ferociously aggressive
    optimizer. I think the "It's legal" faction have found arguments they
    deem satisfactory; perhaps they've looked more diligently than I have.

    Down to nuts and bolts: Is this a theoretical question, or do you
    have an actual use case in mind? If the latter, could you describe it?
    Maybe someone will be able to say "Well, in *that* case it works" or
    "If you did it *this other* way you wouldn't care."

    --
    Eric Sosman
    d
    Eric Sosman, Dec 13, 2011
    #9
  10. Edward Rutherford

    Jens Gustedt Guest

    Hello,

    Am 12/14/2011 12:10 AM, schrieb christian.bau:
    > On Dec 12, 7:09 pm, Jens Gustedt <> wrote:
    >
    >> If I remember correctly the aliasing rules state that the compiler is
    >> allowed to assume that a and b (insided the function) point to different
    >> objects because they are of different types. Thus in the second
    >> assignment to *a the compiler can assume that *a is still 3 and store 5
    >> in place.

    >
    > You are right. On the other hand, footnote 82 says:
    >
    > "If the member used to access the contents of a union object is not
    > the same as the member last used to store a value in the object, the
    > appropriate part of the object representation of the value is
    > reinterpreted as an object representation in the new type as described
    > in 6.2.6 (a process sometimes called "type punning"). "
    >
    > Which is a direct contradiction. I am assuming that the rules for
    > union members apply in the same way whether the compiler knows that it
    > is accessing different members of the same union or not.


    I think this assumption can't be made. Generally, inside a function the
    compiler has no way to know that the pointers originate from the same
    object. In the contrary the aliasing rules were invented to assure that
    under the given circumstances the *must* point to different objects.

    And these things happen. gcc assumes (or at least there has been some
    version of gcc) that they are different, even if the function is inlined
    and it could deduce that both point to the same address.

    (Also, footnotes in the standard are not normative)

    Jens
    Jens Gustedt, Dec 13, 2011
    #10
  11. "christian.bau" <> writes:
    > On Dec 13, 12:48 am, Eric Sosman <> wrote:
    >>      Down to nuts and bolts: Is this a theoretical question, or do you
    >> have an actual use case in mind?  If the latter, could you describe it?
    >> Maybe someone will be able to say "Well, in *that* case it works" or
    >> "If you did it *this other* way you wouldn't care."

    >
    > I bet more than one person has tried to read the representation of a
    > float or double as a 32 or 64 bit integer. Last time I tried, I found
    > one way that worked on one compiler and failed on another, and another
    > way that worked on the other compiler and failed at the first (one
    > method was using a union, one was casting the address of a float to
    > "pointer to unsigned int"), but I couldn't find any code that worked
    > on both compilers. And having code with an #ifdef checking the
    > compiler that is used doesn't really inspire confidence in the code
    > :-(


    So use memcpy(). (I suppose that's not strictly portable to
    freestanding implementations, but I'd expect memcpy() to be one of the
    things that most freestanding implementations actually provide.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Dec 13, 2011
    #11
  12. Edward Rutherford

    James Kuyper Guest

    On 12/13/2011 06:17 PM, christian.bau wrote:
    ....
    > I bet more than one person has tried to read the representation of a
    > float or double as a 32 or 64 bit integer. Last time I tried, I found
    > one way that worked on one compiler and failed on another, and another
    > way that worked on the other compiler and failed at the first (one
    > method was using a union, one was casting the address of a float to
    > "pointer to unsigned int"), but I couldn't find any code that worked
    > on both compilers.


    Try reading it using unsigned char; if that doesn't work (for
    appropriate values of "work"), the implementation is non-conforming.
    James Kuyper, Dec 13, 2011
    #12
  13. Edward Rutherford

    Noob Guest

    christian.bau wrote:

    > On Dec 13, 12:48 am, Eric Sosman wrote:
    >
    >> Down to nuts and bolts: Is this a theoretical question, or do you
    >> have an actual use case in mind? If the latter, could you describe it?
    >> Maybe someone will be able to say "Well, in *that* case it works" or
    >> "If you did it *this other* way you wouldn't care."

    >
    > I bet more than one person has tried to read the representation of a
    > float or double as a 32 or 64 bit integer. Last time I tried, I found
    > one way that worked on one compiler and failed on another, and another
    > way that worked on the other compiler and failed at the first (one
    > method was using a union, one was casting the address of a float to
    > "pointer to unsigned int"), but I couldn't find any code that worked
    > on both compilers. And having code with an #ifdef checking the
    > compiler that is used doesn't really inspire confidence in the code :-(


    Doesn't the following work for you?

    #include <stdlib.h>
    #include <string.h>
    unsigned char *foo(double d)
    {
    size_t n = sizeof d;
    unsigned char *buf = malloc(n);
    if (buf) memcpy(buf, &d, n);
    return buf;
    }
    Noob, Dec 14, 2011
    #13
  14. Edward Rutherford

    Tim Rentsch Guest

    "christian.bau" <> writes:

    > On Dec 13, 12:48 am, Eric Sosman <> wrote:
    >
    >> Down to nuts and bolts: Is this a theoretical question, or do you
    >> have an actual use case in mind? If the latter, could you describe it?
    >> Maybe someone will be able to say "Well, in *that* case it works" or
    >> "If you did it *this other* way you wouldn't care."

    >
    > I bet more than one person has tried to read the representation of a
    > float or double as a 32 or 64 bit integer. Last time I tried, I found
    > one way that worked on one compiler and failed on another, and another
    > way that worked on the other compiler and failed at the first (one
    > method was using a union, one was casting the address of a float to
    > "pointer to unsigned int"), but I couldn't find any code that worked
    > on both compilers. [snip]


    The straghtforward method using a union is required to work.
    More specifically, this (assuming uint64_t exists and double
    is 64 bits):

    double d = ...something...;
    union { double d; uint64_t u64; } u = { d };
    uint64_t u64 = u.u64;

    If it didn't then that compiler is not conforming.
    Tim Rentsch, Jan 25, 2012
    #14
  15. Edward Rutherford

    Tim Rentsch Guest

    Barry Schwarz <> writes:

    > On Sat, 10 Dec 2011 22:06:08 +0000 (UTC), Edward Rutherford
    > <> wrote:
    >
    >>Hello :
    >>
    >>Is the following code an undefined behavior?
    >>
    >>
    >> union {
    >> int a;
    >> int b;
    >> } u;
    >> u.a = 3;
    >> printf("%d\n", u.b);

    >
    > In C89, paragraph 3.3.2.3 states "With one exception, if a member of a
    > union object is accessed after a value has been stored in a different
    > member of the object, the behavior is implementation-defined." The
    > exception referred to is not related to your example. So the answer
    > to your question is: yes if the implementation says it is and no if
    > the implementation says something else.
    >
    > In C99, the reference to implementation defined is removed.
    > Furthermore, paragraph 6.2.6.1-7 states "When a value is stored in a
    > member of an object of union type, the bytes of the object
    > representation that do not correspond to that member but do correspond
    > to other members take unspecified values." Since a and b occupy the
    > same bytes, none of those byte become unspecified. And footnote 82
    > indicates the intended behavior is for the bits of b to
    > "reinterpreted" for the type of b. Since both a and b have the same
    > type, it seems to me the intention is to retrieve the same value.


    I agree with your analysis, but just wanted to add one
    item. Practically speaking, the behavior under C89/C90
    and C99 is likely to be the same. This idea is also
    supported by DR 283 (which is what prompted adding the
    footnote), which makes it clear that the intended
    semantics in the two cases is meant to be the same.
    Tim Rentsch, Jan 25, 2012
    #15
  16. Edward Rutherford

    Tim Rentsch Guest

    "christian.bau" <> writes:

    > On Dec 11, 6:04 am, Barry Schwarz <> wrote:
    >> In C99, the reference to implementation defined is removed.
    >> Furthermore, paragraph 6.2.6.1-7 states "When a value is stored in a
    >> member of an object of union type, the bytes of the object
    >> representation that do not correspond to that member but do correspond
    >> to other members take unspecified values." Since a and b occupy the
    >> same bytes, none of those byte become unspecified. And footnote 82
    >> indicates the intended behavior is for the bits of b to
    >> "reinterpreted" for the type of b. Since both a and b have the same
    >> type, it seems to me the intention is to retrieve the same value.

    >
    > You are right, but that seems to have some awful consequences. Take
    > this code:
    >
    > union {
    > int a;
    > long b;
    > } u;
    > u.a = 3;
    > printf("%ld\n", u.b);
    >
    > So on an implementation where int and long have the same size and
    > representation, this code would be well-defined and print "3"?


    Yes.


    > Now take this code:
    >
    > void f (int* a, long* b) { *a = 3; *b = 4; *a = *a + 2; }
    >
    > If I call f (&u.a, &u.b) is this required to set both to 6?


    No. The semantics of f() are different from the earlier
    example because of some subtleties in effective type rules.
    In fact that makes f() have undefined behavior for the
    particular call mentioned.


    > And since the compiler doesn't know that I'm going to make this call,
    > lots of optimization goes out of the window?


    No, the optimizations are still okay, because of
    how effective type rules work.
    Tim Rentsch, Jan 25, 2012
    #16
  17. Edward Rutherford

    Tim Rentsch Guest

    "christian.bau" <> writes:

    > On Dec 12, 7:09 pm, Jens Gustedt <> wrote:
    >
    >> If I remember correctly the aliasing rules state that the compiler is
    >> allowed to assume that a and b (insided the function) point to different
    >> objects because they are of different types. Thus in the second
    >> assignment to *a the compiler can assume that *a is still 3 and store 5
    >> in place.

    >
    > You are right. On the other hand, footnote 82 says:
    >
    > "If the member used to access the contents of a union object is not
    > the same as the member last used to store a value in the object, the
    > appropriate part of the object representation of the value is
    > reinterpreted as an object representation in the new type as described
    > in 6.2.6 (a process sometimes called "type punning"). "
    >
    > Which is a direct contradiction. I am assuming that the rules for
    > union members apply in the same way whether the compiler knows that it
    > is accessing different members of the same union or not.


    It isn't a contradiction because of how the objects are
    accessed is different in the two cases. When a member
    is accessed (ie, using '.' or '->') the effective type
    is determined by the declared type of the member.
    When an object is accessed through a pointer, there
    is no declared type, so the rule for what the effective
    type is or must be is different.
    Tim Rentsch, Jan 25, 2012
    #17
  18. Edward Rutherford

    Tim Rentsch Guest

    Eric Sosman <> writes:

    > On 12/10/2011 5:06 PM, Edward Rutherford wrote:
    >> Hello :
    >>
    >> Is the following code an undefined behavior?
    >>
    >>
    >> union {
    >> int a;
    >> int b;
    >> } u;
    >> u.a = 3;
    >> printf("%d\n", u.b);

    >
    > [snip]
    >
    > The "write one member, read another" question has been discussed
    > more than once, and my impression of the debates is that there have
    > been two camps: Not "It's legal" and "It's illegal," but "It's legal"
    > and "You'll probably get away with it, but it might not be squeaky-
    > clean, and my head hurts can we talk about something else, please?"
    > (I'm in the latter camp.)


    Let's see if we can get you over into that other camp. :)

    > It's clear (from 6.2.6.1) that writing `u.a' deposits bytes that
    > represent `3', and that `u.b' thereby receives the same bytes. No
    > argument there: The storage allocated to `u.b' holds a representation
    > of `3'.
    >
    > The part that makes my head ache is figuring out whether the
    > compiler is required to "notice" that storing to `u.a' affects the
    > value of `u.b'. If the compiler has already loaded `u.b' into a
    > register, say, is it required to re-fetch because `u.a' was changed?
    > Is the compiler allowed to consider `u.b' uninitialized because it
    > has never been stored to, despite the store to `u.a'?


    The case in question is quite straightforward, because the two
    members must occupy the same bytes (on every implementation)
    and also have the same type. Hence the accesses do not violate
    the effective type rules, and must proceed as described by the
    semantics.

    The semantics in this case are defined principally by 6.2.5 p20
    and 6.3.2.1 p2. There is also the question of how the two
    objects line up relative to one another, but that follows by
    virtue of unions not having any padding before any members. (I'm
    sure interested parties can find the appropriate references.)
    These paragraphs are pretty simple to read; I don't see any
    room for uncertainty. Since the accesses in this case clearly
    do not violate the effective type rules, the behavior is
    correspondingly well-defined.

    To respond to your other points:

    > To those in the "It's legal" camp, I offer a few puzzling and
    > possibly disturbing points:
    >
    > - The footnote to 6.2.5p21 points out that "an object with union
    > type can only contain one member at a time" -- meaning that if
    > `u' contains `u.a', it does not contain `u.b'. Footnotes, of
    > course, are suggestive but non-normative.


    This comment is made in the context of defining the term
    "aggregate type". Clearly a union is not an aggregate type
    because it cannot hold two (or more) independent values. I don't
    think there's any mystery about that.

    > - The footnote to 6.5.2.3p3 supports the "It's legal" camp by
    > describing the mechanism of type punning. Footnotes, of course,
    > are suggestive but non-normative.


    And the comment in the footnote is supported by normative text,
    as noted above.

    > - 6.5.2.3p5 gives a "special guarantee" for union members that
    > are structs, but does not extend a similar guarantee for other
    > member types.


    It does, but notice that the guarantee made here is stronger
    than just other member access. Under this passage we are
    allowed to access struct members inside a union object _even
    though no mention is made of a union at the point of access_.
    It's a special guarantee because it's a stronger guarantee
    than holds for other union member types.

    > - 6.7.2.1p14 has the normative language for the first footnote
    > mentioned above: "The value of at most one of the members can be
    > stored in a union object at any time." Your `u' can hold `u.a'
    > or `u.b', but not both at once.


    What it says is that at most one member can be _stored_ at any
    one time. That is obviously true since storing into another member
    will eradicate the effects of the first store. The union can't
    hold two independent values, but it does hold the object referred
    to by u.b, and that happens to be the same object as the one
    referred to by u.a. Again, I don't think there's any mystery
    here -- all that's being described is the destructive effects
    of a member store on previous stores, in much the same way
    that the effects of 'i = 3;' are wiped out by a subsequent 'i = 4;'.
    It isn't talking about read access, just stores.


    > Those are the citations I can find (if I've missed any I'm sure
    > others will point them out).


    I've looked fairly carefully, and didn't find any others.

    > Their cumulative impression on me is
    > that the matter is not settled beyond doubt, but the aforementioned
    > angels may see things differently.


    Hopefully you're a little closer now to seeing the light. :)


    > As a practical matter, it's not all that important what I think
    > or what the angels think, but what the providers of your compilers
    > think. [snip]


    There always are practical considerations dealing with any C
    language question on any compiler. My preference is to disentangle
    the two sets of considerations, and work to understand one without
    confusing myself thinking about the other. Then, having a thoroughly
    considered understanding of questions in one area, that normally
    helps make a more informed decision as regards the larger issues.
    And I think that is a good course here.
    Tim Rentsch, Jan 25, 2012
    #18
  19. Edward Rutherford

    Tim Rentsch Guest

    Eric Sosman <> writes:

    > [snip]
    >
    > But back to the `union' issue: I'm still not 100% comfortable
    > with the idea of writing to one member and reading another. It sort
    > of looks like it should work, but I've not heard a watertight argument
    > that it *must* work, even in the face of a ferociously aggressive
    > optimizer. I think the "It's legal" faction have found arguments they
    > deem satisfactory; perhaps they've looked more diligently than I have.


    Questions about optimization are complicated because the rules
    regarding effective types (obviously pertinent to optimization)
    are subtle. However, the simple cases are not subtle. If we
    consider a case like this:

    double d;
    union { double d; uint64_t u64; } u;
    uint64_t u64_bits_of_double;

    d = ... some value ... ;
    u.d = d;
    u64_bits_of_double = u.u64;

    the effective type considerations are quite straightforward,
    because all the accesses involved are done using declared types.
    There is no doubt that the accesses here meet the requirements of
    the effective type rules; so any optimizations, no matter how
    aggressive, must be faithful to the defined semantics. (The
    example of course assumes that double is 64 bits and uint64_t
    is defined.)
    Tim Rentsch, Jan 25, 2012
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Matt Garman
    Replies:
    1
    Views:
    645
    Matt Garman
    Apr 25, 2004
  2. Andrey Brozhko
    Replies:
    0
    Views:
    394
    Andrey Brozhko
    Dec 21, 2004
  3. Peter Dunker

    union in struct without union name

    Peter Dunker, Apr 26, 2004, in forum: C Programming
    Replies:
    2
    Views:
    846
    Chris Torek
    Apr 26, 2004
  4. nicolas.sitbon

    union field access and compatible types

    nicolas.sitbon, Jan 8, 2010, in forum: C Programming
    Replies:
    6
    Views:
    498
    Tim Rentsch
    Jan 13, 2010
  5. Good Guy
    Replies:
    4
    Views:
    298
    Good Guy
    Oct 19, 2010
Loading...

Share This Page