Using a union to get different representation

Discussion in 'C Programming' started by Clint Olsen, Jul 7, 2005.

  1. Clint Olsen

    Clint Olsen Guest

    I was wondering if it's considered undefined behavior to use a member of a
    union when it wasn't initialized with that member. Example:

    typedef unsigned long hval_t;

    hval_t hval_init(void)
    {
    union hval_init_u { double dbl; hval_t hval; };
    union hval_init_u phi = { (sqrt(5) + 1) / 2 }; /* golden ratio */

    return phi.hval;
    }

    I saw on a thread here quite some time ago that it's reasonable to use a
    char * to index into a double value and copy the bytes to an unsigned long,
    but it seems much quicker and easier to do it this way.

    I was wanting to use this as my default bit pattern for a hash function.
    It doesn't have to be portable. It just needs to be somewhat random, and
    this seemed like a reasonable way.

    -Clint
    Clint Olsen, Jul 7, 2005
    #1
    1. Advertising

  2. Clint Olsen

    Me Guest

    > I was wondering if it's considered undefined behavior to use a member of a
    > union when it wasn't initialized with that member. Example:
    >
    > typedef unsigned long hval_t;
    >
    > hval_t hval_init(void)
    > {
    > union hval_init_u { double dbl; hval_t hval; };
    > union hval_init_u phi = { (sqrt(5) + 1) / 2 }; /* golden ratio */
    >
    > return phi.hval;
    > }


    I can't seem to find it in the C standard but here is text from the C++
    standard: "In a union, at most one of the data members can be active at
    any time, that is, the value of at most one of the data members can be
    stored in a union at any time. [Note: one special guarantee is made in
    order to simplify the use of unions: If a POD-union contains several
    POD-structs that share a common initial sequence (9.2), and if an
    object of this POD-union type contains one of the POD-structs, it is
    permitted to inspect the common initial sequence of any of POD-struct
    members; see 9.2. ]"

    The C99 standard has almost exactly the same the initial common
    sequence wording but not the active member wording, which is weird
    because it is related to it. I'm taking the C++ wording as the correct
    one, so I would say the above is undefined but works in practice due to
    people abusing it so much.

    > I saw on a thread here quite some time ago that it's reasonable to use a
    > char * to index into a double value and copy the bytes to an unsigned long,
    > but it seems much quicker and easier to do it this way.


    Don't be afraid to use memcpy. Most compilers implement it as an
    intrinsic so it should generate the most efficient code (except one
    that actually is guaranteed to work).
    Me, Jul 7, 2005
    #2
    1. Advertising

  3. Clint Olsen

    Clint Olsen Guest

    On 2005-07-07, Me <> wrote:
    > I can't seem to find it in the C standard but here is text from the C++
    > standard: "In a union, at most one of the data members can be active at
    > any time, that is, the value of at most one of the data members can be
    > stored in a union at any time. [Note: one special guarantee is made in
    > order to simplify the use of unions: If a POD-union contains several
    > POD-structs that share a common initial sequence (9.2), and if an object
    > of this POD-union type contains one of the POD-structs, it is permitted
    > to inspect the common initial sequence of any of POD-struct members; see
    > 9.2. ]"
    >
    > The C99 standard has almost exactly the same the initial common sequence
    > wording but not the active member wording, which is weird because it is
    > related to it. I'm taking the C++ wording as the correct one, so I would
    > say the above is undefined but works in practice due to people abusing it
    > so much.
    >
    > Don't be afraid to use memcpy. Most compilers implement it as an
    > intrinsic so it should generate the most efficient code (except one
    > that actually is guaranteed to work).


    I had considered that, but since it was short(er) to implement the bounds
    checking for both the unsigned long and double in the for loop directly, I
    just copied it manually. Ok, so I was too lazy to write a MIN macro :)
    It's just very convenient with the union since it handles any length
    differences for me.

    I was thinking that perhaps writing and reading from different members
    could cause a trap representation for the read value? It certainly
    couldn't cause alignment problems as the union guarantees this shouldn't be
    a problem.

    Thanks for looking into this.

    -Clint
    Clint Olsen, Jul 7, 2005
    #3
  4. Clint Olsen

    Me Guest

    > I was thinking that perhaps writing and reading from different members
    > could cause a trap representation for the read value? It certainly
    > couldn't cause alignment problems as the union guarantees this shouldn't be
    > a problem.


    No combination of value bits is allowed to be a trap representation on
    an unsigned integer but combinations of padding bits are (and the only
    thing guaranteed not to have padding bits is unsigned char). In your
    original post, you said it didn't have to be portable and I assumed you
    knew the format of doubles and unsigned longs on your implementation.
    Me, Jul 7, 2005
    #4
  5. Clint Olsen

    Clint Olsen Guest

    On 2005-07-07, Me <> wrote:
    > No combination of value bits is allowed to be a trap representation on an
    > unsigned integer but combinations of padding bits are (and the only thing
    > guaranteed not to have padding bits is unsigned char). In your original
    > post, you said it didn't have to be portable and I assumed you knew the
    > format of doubles and unsigned longs on your implementation.


    Actually, I don't know the format, but as you mentioned that isn't a
    consequence for me since I don't care. The bits will be 'random' enough to
    get the job done.

    On page 52 (6.2.6.1) of the standard, it says the following (7):

    "When a value is stored in a member of an object union type, the bytes of
    the object representation that do not correspond to that member but do
    correspond to other members take unspecified values, but the value of the
    union object shall not thereby become a trap representation."

    So, it would seem that this is also not an issue.

    -Clint
    Clint Olsen, Jul 7, 2005
    #5
  6. Clint Olsen

    Richard Bos Guest

    "Me" <> wrote:

    [ Please do not snip attribution lines of people whom you still quote. ]

    > > I was wondering if it's considered undefined behavior to use a member of a
    > > union when it wasn't initialized with that member. Example:


    > I can't seem to find it in the C standard but here is text from the C++
    > standard:


    The C++ Standard has no real bearing on what is and is not allowed in C,
    since the two are different languages; C++ was derived from C, but the
    type system is one of the areas where they differ, so I wouldn't trust
    the C++ Standard in this regard even if it does appear to agree with C.

    > "In a union, at most one of the data members can be active at
    > any time, that is, the value of at most one of the data members can be
    > stored in a union at any time. [Note: one special guarantee is made in
    > order to simplify the use of unions: If a POD-union


    For example, what is a POD-union, and how does it differ from a normal
    union? In C, there is no such thing.

    However, n1124 (the public draft of C99 + TC2) does contain this, in
    6.2.6.1#7:

    # When a value is stored in a member of an object of union type, the
    # bytes of the object representation that do not correspond to that
    # member but do correspond to other members take unspecified values.

    (I note also that the wording of 6.2.6.1#6 and 7 is, IMO, much improved
    over their counterparts in n869, but that the implications for the OP
    remain the same.)

    Richard
    Richard Bos, Jul 7, 2005
    #6
  7. On Wed, 06 Jul 2005 19:33:04 -0500, Clint Olsen wrote:

    > I was wondering if it's considered undefined behavior to use a member of a
    > union when it wasn't initialized with that member. Example:


    The state of definedness is a bit muddy but it is certainly not portable.

    > typedef unsigned long hval_t;
    >
    > hval_t hval_init(void)
    > {
    > union hval_init_u { double dbl; hval_t hval; };
    > union hval_init_u phi = { (sqrt(5) + 1) / 2 }; /* golden ratio */
    >
    > return phi.hval;
    > }
    >
    > I saw on a thread here quite some time ago that it's reasonable to use a
    > char * to index into a double value and copy the bytes to an unsigned long,


    That needs to be unsigned char *, plain char doesn't have adequate
    properties for this sort of copying.

    > but it seems much quicker and easier to do it this way.
    >
    > I was wanting to use this as my default bit pattern for a hash function.


    You need to be very careful here, hashing floating point values doesn't
    usually make a lot of sense because they are approximations to start with.
    Hashing is strongly associated with equality and where equality testing
    doesn't make sense hashing doesn't either. And in some cases where
    equality testing does make sense (e.g. as part of ordering ordering, e.g.
    a sort comparison) hashing isn't applicable.

    So what are you doing that requires the hashing of a floating point value?

    > It doesn't have to be portable. It just needs to be somewhat random, and
    > this seemed like a reasonable way.


    The normal requirement of a hash value is the equal input values hash to
    the same result (hence the relationship with equality). Is that what you
    are looking for?

    If so consider that it is possible for a floating point representation to
    have padding bits that don't contribute to the value. It is also possible
    to have things like +0 and -0 i.e. different representations of values
    that compare equal. IOW two equal values may not hash to the same result.

    I guess you could follow the Java bodginess by creating a floating point
    equality test method that is consistent with this method of hashing.

    Lawrence
    Lawrence Kirby, Jul 7, 2005
    #7
  8. On Thu, 07 Jul 2005 12:27:47 +0100, Lawrence Kirby wrote:

    ....

    > I guess you could follow the Java bodginess by creating a floating point
    > equality test method that is consistent with this method of hashing.


    To be fair to Java the property that NaNs always compare unequal means
    that normal comparison semantics are not viable for hashing. But this is
    also creates another potential problem here.

    Lawrence
    Lawrence Kirby, Jul 7, 2005
    #8
  9. Clint Olsen

    Clint Olsen Guest

    On 2005-07-07, Lawrence Kirby <> wrote:
    > That needs to be unsigned char *, plain char doesn't have adequate
    > properties for this sort of copying.


    Oh yes, sorry. I did see it was unsigned char *.

    > You need to be very careful here, hashing floating point values doesn't
    > usually make a lot of sense because they are approximations to start
    > with. Hashing is strongly associated with equality and where equality
    > testing doesn't make sense hashing doesn't either. And in some cases
    > where equality testing does make sense (e.g. as part of ordering
    > ordering, e.g. a sort comparison) hashing isn't applicable.
    >
    > So what are you doing that requires the hashing of a floating point
    > value?


    I'm actually not interested in hashing a floating point value. This is
    just my initial value in the unsigned long accumulator in the event that no
    initialization is provided to the function. It's just an arbitrary value.
    It just loads an unsigned long with as many bits as are applicable. I
    don't have to write some elaborate function to seed the accumulator with a
    value.

    If you notice, the floating point calculation is the golden ratio.
    Calculating the integer representation of the golden ratio is otherwise a
    somewhat tedious task. Generally only one bit at a time can be calculated.

    Unfortunately, there isn't a standard library function that allows you to
    extract (as an unsigned long) the mantissa of a floating point value. That
    would be better, but I'm willing to settle for something a bit more crude
    especially if it can be coded in one line and doesn't cause my hard drive
    to be formatted.

    > If so consider that it is possible for a floating point representation to
    > have padding bits that don't contribute to the value. It is also possible
    > to have things like +0 and -0 i.e. different representations of values
    > that compare equal. IOW two equal values may not hash to the same result.


    Yeah, the location of the mantissa and what-not are totally out of my
    control as this is platform dependent. But as I said this is not critical.

    -Clint
    Clint Olsen, Jul 7, 2005
    #9
  10. Clint Olsen

    Me Guest

    Clint Olsen wrote:
    > On 2005-07-07, Me <> wrote:
    > > I can't seem to find it in the C standard but here is text from the C++
    > > standard: "In a union, at most one of the data members can be active at
    > > any time, that is, the value of at most one of the data members can be
    > > stored in a union at any time. [Note: one special guarantee is made in
    > > order to simplify the use of unions: If a POD-union contains several
    > > POD-structs that share a common initial sequence (9.2), and if an object
    > > of this POD-union type contains one of the POD-structs, it is permitted
    > > to inspect the common initial sequence of any of POD-struct members; see
    > > 9.2. ]"
    > >
    > > The C99 standard has almost exactly the same the initial common sequence
    > > wording but not the active member wording, which is weird because it is
    > > related to it. I'm taking the C++ wording as the correct one, so I would
    > > say the above is undefined but works in practice due to people abusing it
    > > so much.
    > >
    > > Don't be afraid to use memcpy. Most compilers implement it as an
    > > intrinsic so it should generate the most efficient code (except one
    > > that actually is guaranteed to work).

    >
    > I had considered that, but since it was short(er) to implement the bounds
    > checking for both the unsigned long and double in the for loop directly, I
    > just copied it manually. Ok, so I was too lazy to write a MIN macro :)
    > It's just very convenient with the union since it handles any length
    > differences for me.
    >
    > I was thinking that perhaps writing and reading from different members
    > could cause a trap representation for the read value? It certainly
    > couldn't cause alignment problems as the union guarantees this shouldn't be
    > a problem.


    Ok, it appears that C99 doesn't have the "active member" wording that
    the C++ standard does. I did a bit more detailed searching because this
    troubles me and there was some discussion to make union member access
    apply to the definition of "effective type" (i.e. you don't need
    "active member" wording as "effective type" would take care of it),
    which makes sense because the examples you can come up for aliasing
    violations apply to both cases. I don't know what ever came from that
    discussion but I can't find the relevant sections updated in the TC2
    draft (most likely because nobody can agree on the correct wording
    yet). But I definitely would be siding with the "active member" wording
    from the C++ standard on this one because it seems like the right thing
    to do and I have a strong feeling that wording will be made to agree in
    a future C standard once somebody comes up with an agreed upon way to
    express it. For more information about this, have a look at:

    http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_219.htm
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_236.htm
    search for "alias" and continue reading from that point:
    http://www.open-std.org/JTC1/SC22/WG14/www/docs/n973.txt
    http://www.open-std.org/JTC1/SC22/WG14/www/docs/n987.txt
    Me, Jul 7, 2005
    #10
  11. Clint Olsen

    Chris Torek Guest

    In article <slrndcou20.5jh.clint@belle.0lsen.net>
    Clint Olsen <> wrote:
    >I was wondering if it's considered undefined behavior to use a member of a
    >union when it wasn't initialized with that member. ...


    Yes, if the member you use is not "unsigned char" (or an array thereof).

    >Example:
    >
    >typedef unsigned long hval_t;
    >
    >hval_t hval_init(void)
    >{
    > union hval_init_u { double dbl; hval_t hval; };
    > union hval_init_u phi = { (sqrt(5) + 1) / 2 }; /* golden ratio */
    >
    > return phi.hval;
    >}


    See the thread titled "aliasing/Torek's strtod" (or something along
    those lines). I ran into a problem using a version of gcc (2.x I
    think but maybe 3.x) on the i386 with the strtod() function in
    BSD/OS (which was a copy of the one written, I think, by David Gay
    and available for anyone to use under reasonably flexible limits).
    This code used a union similar to the above, except that the second
    member was an array of size 2, so as to exactly overlay all eight
    bytes of the (IEEE) double. In other words:

    union { double x; int y[2]; } var;

    The compiler "knew" that since var.x and var.y could not both
    be active simultaneously, it was OK to leave var.x in the CPU's
    floating-point registers and manipulate var.y in memory. Thus,
    assignments to var.x did not affect var.y, and vice versa.

    If your C compiler is similarly clever, it might compute the
    golden ratio in a floating-point register, not store that to
    memory, and then return the not-stored (i.e., junk) value out
    of the memory location.

    >I was wanting to use this as my default bit pattern for a hash function.
    >It doesn't have to be portable. It just needs to be somewhat random, and
    >this seemed like a reasonable way.


    It may be entirely random (or seem that way), in that hval_init()
    may return a different value depending on what is in memory at the
    time, instead of returning the same value on each call.

    I might suggest the following variant:


    hval_t hval_init(void) {
    static int beenhere;
    static hval_t phi_bits;
    #define min(x,y) ((x) < (y) ? (x) : (y))

    if (!beenhere) {
    union { double d; unsigned char bytes[sizeof(double)]; } u;

    /*
    * Initialize the "default value" to a bitwise representation
    * of phi, the golden ratio, as stored in a double. Use
    * however many bits fit in an hval_t (discarding those
    * that do not fit; we ignore endianness so we get different
    * answers on different machines but consistent answers within
    * any given run of this code on one machine).
    *
    * If hval_t has more bits than occur in a double, the extra
    * bits will be all-zeros.
    */
    u.d = (sqrt(5.0) + 1.0) / 2.0;
    memcpy(&phi_bits, u.bytes, min(sizeof phi_bits, sizeof u.d));
    }
    return phi_bits;
    #undef min
    }

    This could still potentially copy a trap representation into
    phi_bits, but will actually work on real machines, including the
    x86.
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
    Chris Torek, Jul 8, 2005
    #11
  12. Clint Olsen

    Chris Torek Guest

    In article <> I wrote, in part:
    > hval_t hval_init(void) {
    > static int beenhere;
    > static hval_t phi_bits;
    > #define min(x,y) ((x) < (y) ? (x) : (y))
    >
    > if (!beenhere) {
    > union { double d; unsigned char bytes[sizeof(double)]; } u;

    [comment snippage]
    > u.d = (sqrt(5.0) + 1.0) / 2.0;
    > memcpy(&phi_bits, u.bytes, min(sizeof phi_bits, sizeof u.d));
    > }
    > return phi_bits;
    > #undef min
    > }


    Of course, inside the test for "!beenhere", one should also set
    beenhere:

    if (!beenhere) {
    ... as before ...
    beenhere = 1;
    }

    since the point is to avoid re-computing phi() on each call.
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
    Chris Torek, Jul 8, 2005
    #12
  13. Clint Olsen

    Clint Olsen Guest

    On 2005-07-08, Chris Torek <> wrote:
    > union { double x; int y[2]; } var;
    >
    > The compiler "knew" that since var.x and var.y could not both be
    > active simultaneously, it was OK to leave var.x in the CPU's
    > floating-point registers and manipulate var.y in memory. Thus,
    > assignments to var.x did not affect var.y, and vice versa.


    Since a union is defined as an overlapping set of member objects, would you
    consider this a conforming compiler? I suppose I could play around with
    the optimizer settings and see if gcc tries to do this.

    > If your C compiler is similarly clever, it might compute the
    > golden ratio in a floating-point register, not store that to
    > memory, and then return the not-stored (i.e., junk) value out
    > of the memory location.
    >
    > hval_t hval_init(void) {
    > static int beenhere;
    > static hval_t phi_bits;
    > #define min(x,y) ((x) < (y) ? (x) : (y))
    >
    > if (!beenhere) {
    > union { double d; unsigned char bytes[sizeof(double)]; } u;
    >
    > /*
    > * Initialize the "default value" to a bitwise representation
    > * of phi, the golden ratio, as stored in a double. Use
    > * however many bits fit in an hval_t (discarding those
    > * that do not fit; we ignore endianness so we get different
    > * answers on different machines but consistent answers within
    > * any given run of this code on one machine).
    > *
    > * If hval_t has more bits than occur in a double, the extra
    > * bits will be all-zeros.
    > */
    > u.d = (sqrt(5.0) + 1.0) / 2.0;
    > memcpy(&phi_bits, u.bytes, min(sizeof phi_bits, sizeof u.d));
    > }
    > return phi_bits;
    > #undef min
    > }


    So, if we're going to resort to just using memcpy, why bother using the
    union at all?

    hval_t phi_bits;
    double d = (sqrt(5.0) + 1.0) / 2.0;

    memcpy(&phi_bits, &d, min(sizeof(hval_t), sizeof(double)));

    > This could still potentially copy a trap representation into phi_bits,
    > but will actually work on real machines, including the x86.


    I mentioned something about trap representations and unions in another post
    in this thread. I'm curious if I'm interpreting that correctly.

    Thanks,

    -Clint
    Clint Olsen, Jul 8, 2005
    #13
  14. Clint Olsen

    Chris Torek Guest

    >On 2005-07-08, Chris Torek <> wrote:
    >> union { double x; int y[2]; } var;
    >>
    >> The compiler "knew" that since var.x and var.y could not both be
    >> active simultaneously, it was OK to leave var.x in the CPU's
    >> floating-point registers and manipulate var.y in memory. Thus,
    >> assignments to var.x did not affect var.y, and vice versa.


    In article <slrndcsamj.14u.clint@belle.0lsen.net>
    Clint Olsen <> wrote:
    >Since a union is defined as an overlapping set of member objects,
    >would you consider this a conforming compiler?


    I believe so. The C standard only guarantees what you want (and
    in this case, what I wanted, if only to avoid changing the code)
    for "unsigned char" union members.

    >So, if we're going to resort to just using memcpy, why bother using the
    >union at all?
    >
    >hval_t phi_bits;
    >double d = (sqrt(5.0) + 1.0) / 2.0;
    >
    >memcpy(&phi_bits, &d, min(sizeof(hval_t), sizeof(double)));


    Indeed, this should work fine. Stylistically, I would prefer:

    memcpy(&phi_bits, &d, min(sizeof phi_bits, sizeof d));

    since I prefer to name the objects, rather than their types, when
    possible.

    >I mentioned something about trap representations and unions in another post
    >in this thread. I'm curious if I'm interpreting that correctly.


    I have little to add on trap representations. The Univac 1100 had
    them, but did not actually trap on them: some particular signed
    bit patterns represented -0, which was only "sort of equal" to +0
    and would sometimes give peculiar results for some operations.
    Many machines today still have trap representations for floating
    point (any signalling NaN). Few if any modern machines have trap
    representations for any kind of integer, though.
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
    Chris Torek, Jul 8, 2005
    #14
  15. Clint Olsen

    Tim Rentsch Guest

    Chris Torek <> writes:

    > >On 2005-07-08, Chris Torek <> wrote:
    > >> union { double x; int y[2]; } var;
    > >>
    > >> The compiler "knew" that since var.x and var.y could not both be
    > >> active simultaneously, it was OK to leave var.x in the CPU's
    > >> floating-point registers and manipulate var.y in memory. Thus,
    > >> assignments to var.x did not affect var.y, and vice versa.

    >
    > In article <slrndcsamj.14u.clint@belle.0lsen.net>
    > Clint Olsen <> wrote:
    > >Since a union is defined as an overlapping set of member objects,
    > >would you consider this a conforming compiler?

    >
    > I believe so. The C standard only guarantees what you want (and
    > in this case, what I wanted, if only to avoid changing the code)
    > for "unsigned char" union members.


    Apparently this question has caused some confusion, and it's
    supposed to be addressed by a TC. See:

    http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm

    My reading of the wording in the TC is that storing in one
    member then accessing through another should (after the TC
    takes effect) get the re-interpreted value that was stored
    in the previously-stored-into member.
    Tim Rentsch, Jul 13, 2005
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Matt Garman
    Replies:
    1
    Views:
    651
    Matt Garman
    Apr 25, 2004
  2. Peter Dunker

    union in struct without union name

    Peter Dunker, Apr 26, 2004, in forum: C Programming
    Replies:
    2
    Views:
    853
    Chris Torek
    Apr 26, 2004
  3. manders2k
    Replies:
    6
    Views:
    379
    Michael Spencer
    Feb 11, 2004
  4. Saeed Amrollahi
    Replies:
    1
    Views:
    315
    Alf P. Steinbach
    Nov 20, 2007
  5. Johannes Schaub (litb)

    Reading from different union members

    Johannes Schaub (litb), Dec 30, 2009, in forum: C Programming
    Replies:
    5
    Views:
    488
    Tim Rentsch
    Jan 13, 2010
Loading...

Share This Page