The union version doesn't work because the standard only
allows you to inspect the common initial sequence of two structs.
What does it mean?
union X
{
int i;
char c[sizeof(int)];
};
X tmp;
1. "tmp" is aligned for "X::i" X::i (has offset zero)
2. "X::i" and "X::c" is started from the same memory adr
3. type "int" has "sizeof(int)" chars placed without holes - one by one
Why not?
C89 specifically required that only the most recently written member of
a union could be read, and violating this resulted in implementation-
defined behavior ("With on exception, if a member of a union object is
accessed after a value has been stored in a different member of the
object, the behavior is implementation-defined." $6.3.2.3). Interpreted
literally, that means the following code gives implementation defined
behavior:
union X {
int i;
float j;
};
int main() {
union X x;
int a;
x.i = 1;
x.j = 1.0;
x.i = 0;
a = x.i;
return 0;
}
Even though x.i was the most recently stored member when x.i is
accessed, the access to x.i does take place "after a value has been
stord in a different member of the object."
In both C++ and C99, this (explicit) requirement seems to have
disappeared (though both still contain language about a "special" rule
dispensing with the requirement on the common initial sequence, sort of
implying that the disappearance of the rule may not have been entirely
intentional). It's open to argument that the undefined behavior still
exists, simply because neither explicitly defines what happens when you
read from a different member than was last written.
OTOH, the standard explicitly requires that the storage for the objects
in the union overlap, and that the union be aligned so that a pointer to
the beginning of the union can be used to dereference any member (and
vice versa) -- and this is true in both C and C++. So the alignment is
guaranteed to work, but the type-pun (arguably) might not.
A reinterpret_cast (even if it looks like a C-style cast) usually has a
problem with alignment: even though everybody "knows" that char has no
alignment requirements, the standard doesn't seem to directly guarantee
it (then again, the required similarity between pointer to char and
pointer to void could be interpreted as such). If you put the int into
dynamically allocated memory, it guarantees that its first byte is
aligned to be accessed as a char, but the remainder still might not be.
The shift and mask method works for essentially any data, but it's
clumsy (at best) to make it entirely portable. You need to convert the
int to unsigned before you do right shifting, and you need to use
CHAR_BIT to figure out how many bits there are in a byte, and use that
as the basis for your mask, etc. Even with all that, you have to live
with the fact that the int could contain some padding bits, so you could
have some number of bits in the byte-by-byte representation that are
zero for all possible inputs.
AFAIK, the lack of portability of either the cast or the union method is
purely theoretical. None of the methods is what I'd call beautiful by
any means, though the (portable) version of the shift/mask method is
undoubtedly the longest, probably the ugliest, and the most likely to
involve extra instructions. Between the cast and the union, it's close
to a toss-up: neither guarantees portability (in C+++; the cast is semi-
portable in C99), but both are for all practical purposes. Both strike
me as ugly, though I think the cast is somewhat more so. The cast by
itself is fairly ugly, but when you add in the requirement to take the
address of the int, the cast that to pointer to char, then dereference
the resulting pointer, the whole is really pretty hideous (and the fact
that it's basically the only way to use a reinterpret_cast doesn't make
it any less hideous, IMO).
If possible, the real answer is to avoid all of the above, and simply
find an entirely different way to solve the problem.