Proper union use

Adrian · Nov 26, 2007

I remember from C that you are not supposed to access different
members of a union at the same time?

It seems to say the same thing in the C++ standard (9.5.1)
"In a union, at most one of the data members can be active at any
time, that is, the value of at most one of the data members can be
stored in a union at any time."

What does it mean by "any time" forever, block scope, function scope?

So is any of the code below in this simplistic example valid? I know
it works on any implementation I've used but that doesn't make it
correct

PS The whole point of this is to make some IPC messaging code more
readable
PPS I know about struct padding ;-)

#include <iostream>

union IP {
unsigned long plain;
char dotted[4];
};

void fill_by_ref(IP &ip);

int main(int argc, char *argv[])
{
IP ip;

ip.dotted[0]=127;
ip.dotted[1]=0;
ip.dotted[2]=0;
ip.dotted[3]=1;
std::cout << "ip.plain=" << ip.plain << std::endl; // Illegal I
assume from 9.5.1?

IP ip2;
fill_by_ref(ip2);
std::cout << "ip2.plain=" << ip2.plain << std::endl; // Is this any
better?

IP ip3(ip);
std::cout << "ip3.plain=" << ip3.plain << std::endl; // Or is this
any better?

return 0;
}

void fill_by_ref(IP &ip)
{
ip.dotted[0]=127;
ip.dotted[1]=0;
ip.dotted[2]=0;
ip.dotted[3]=1;
}

Victor Bazarov · Nov 26, 2007

Adrian said:
I remember from C that you are not supposed to access different
members of a union at the same time?

Are you asking? How would we know what you remember?

It seems to say the same thing in the C++ standard (9.5.1)
"In a union, at most one of the data members can be active at any
time, that is, the value of at most one of the data members can be
stored in a union at any time."

What does it mean by "any time" forever, block scope, function scope?

Any time during the lifetime of the instance.

So is any of the code below in this simplistic example valid? I know
it works on any implementation I've used but that doesn't make it
correct

"Is valid" and "has defined behaviour" are not necessarily the same.

PS The whole point of this is to make some IPC messaging code more
readable
PPS I know about struct padding ;-)

#include <iostream>

union IP {
unsigned long plain;
char dotted[4];
};

void fill_by_ref(IP &ip);

int main(int argc, char *argv[])
{
IP ip;

ip.dotted[0]=127;
ip.dotted[1]=0;
ip.dotted[2]=0;
ip.dotted[3]=1;
std::cout << "ip.plain=" << ip.plain << std::endl; // Illegal I
assume from 9.5.1?

Not illegal -- undefined.

IP ip2;
fill_by_ref(ip2);
std::cout << "ip2.plain=" << ip2.plain << std::endl; // Is this any
better?
Nope.

IP ip3(ip);
std::cout << "ip3.plain=" << ip3.plain << std::endl; // Or is this
any better?

Not really. 'ip' has been assigned using the 'dotted' member. So,
any other union initialised from it would also have the invisible
"flag" set to only use the 'dotted' member, AFAIUI.

The point of unions is not to "assign one member and extract another".
It's to share memory. Whatever you assign, you extract. You can only
safely switch between those when assigning. Assign A, assign B (safe),
read B (safe), assign A (safe), assign B (safe), assign A (safe), read
A (safe), read A (safe)... And so on. You cannot assign A and then
read B. Ever.

return 0;
}

void fill_by_ref(IP &ip)
{
ip.dotted[0]=127;
ip.dotted[1]=0;
ip.dotted[2]=0;
ip.dotted[3]=1;
}

V

Tomás Ó hÉilidhe · Nov 26, 2007

Adrian:

union IP {
unsigned long plain;
char dotted[4];
};

You could always go with something like:

typedef whatever_32_bit_type IP4addrs;

char unsigned &Octet(IP4addrs &addrs,unsigned const i)
{
return reinterpret_cast<char unsigned*>(&addrs);
}

I would have made a "whatever_8_bit_type" type, but you've either got 8-
Bit bytes or you don't.

int main()
{
IP4addrs x;

Octet(x,0) = 192;
Octet(x,1) = 168;
Octet(x,2) = 1;
Octet(x,3) = 254;
}

Then of course you could always turn IP4addrs into a class and have a
method called Octet. The possibilities.

James Kanze · Nov 27, 2007

Adrian wrote:

[...]

"Is valid" and "has defined behaviour" are not necessarily the
same.

I would say that code which has undefined behavior isn't valid.

PS The whole point of this is to make some IPC messaging
code more readable PPS I know about struct padding ;-)
#include <iostream>
union IP {
unsigned long plain;
char dotted[4];
};
void fill_by_ref(IP &ip);
int main(int argc, char *argv[])
{
IP ip;

Click to expand...

ip.dotted[0]=127;
ip.dotted[1]=0;
ip.dotted[2]=0;
ip.dotted[3]=1;
std::cout << "ip.plain=" << ip.plain << std::endl; // Illegal I
assume from 9.5.1?

Click to expand...

Not illegal -- undefined.

Undefined behavior according to the standard.

There are two fundamental problems. The first is that he
assumes that an unsigned long is four bytes; on the machines I
usually work on, it is 8, so accessing ip.plain here results in
accessing uninitialized memory. He definitely should be using
uint32_t. The second problem is, of course, the fact that
accessing plain when the union currently holds dotted is
undefined behavior. (The issue is somewhat clouded by the fact
that the standard does give accesses through a char type some
special privileges, but I don't think that they apply here.)
From the point of view of the standard, something like:

uint32_t ip ;
uint8_t* dotted = reinterpret_cast< uint8_t* >( &ip ) ;
dotted[ 0 ] = 127 ;
dotted[ 1 ] = 0 ;
dotted[ 2 ] = 0 ;
dotted[ 3 ] = 1 ;
std::cout << ip << std::endl ;

is defined.

Technically speaking, the standard doesn't yet contain the types
uint32_t and uint8_t. However, they are part of the C standard,
and have been adopted for the C++ standard. Also, they're not
necessarily present---they are only present if the hardware
actually supports such types. If it doesn't of course, if the
machine word is, say, 36 bits, with 9 bit char's, then he'll
need something even more complicated, I think. Of course, if
uint32_t isn't defined, there's also the possibility that
initializing it as raw memory, with byte values, may result in a
trapping representation.

What compilers actually support is another thing. I've actually
used compilers which didn't support any sort of type punning.
The standard requires type punning to work if 1) it's done by a
cast of the address, and 2) one of the types is either char or
unsigned char. IIRC, however, some compilers don't respect this
either. (From a compiler's point of view: it's easy to detect
and respect the union. Type punning with pointers is more
difficult, since it may not be visible in the translation unit
being compiled. I think that current releases of g++ "define"
the case with the union, for example, but not the ones with the
cast.)

An even more portable solution would involve something like:

LargeEnoughIntType ip ;
ip = (127 << 24)
| ( 0 << 16)
| ( 0 << 8)
| ( 1 ) ;

Note that this does not have the same semantics, however. You
cannot simply memcpy into your transmission buffer, and expect
it to be correct. On the other hand, with the type punning
solutions, the value in the transmission buffer may be correct,
but displaying it as a decimal or hexadecimal value probably
won't give anything useful.

Pete Becker · Nov 27, 2007

Technically speaking, the standard doesn't yet contain the types
uint32_t and uint8_t. However, they are part of the C standard,
and have been adopted for the C++ standard. Also, they're not
necessarily present---they are only present if the hardware
actually supports such types.

Just to close the loop for everyone who's panicked because these aren't
necessarily present: in general, use uint_least32_t and uint_least8_t.
They're always present, and they're required to have at least the
designated number of bits. Assuming, of course, that your library
provides <stdint.h> or <cstdint>.

James Kanze · Nov 28, 2007

Just to close the loop for everyone who's panicked because these aren't
necessarily present: in general, use uint_least32_t and uint_least8_t.
They're always present, and they're required to have at least the
designated number of bits. Assuming, of course, that your library
provides <stdint.h> or <cstdint>.

Good point. For various reasons, historically, uint32_t, etc.
were available on a lot of compilers even before C99, and I've
gotten into the habit of using them. In practice, however, I
think that uint_least32_t etc. would be more appropriate in just
about all of the cases I use them. (The particular case of type
punning with a union might be an exception. But then, it's
undefined behavior, regardless of the types involved.)

Although not relevant here, the question is a bit more awkward
in the case of signed ints. If I'm reading a signed 32 bit int
off the network, I need a type which can hold all possible
values in the range -2147483648...2147483647. The smallest
required type which provides this guarantee is int_least64_t,
but on a 32 bit 2's complement machine, this is far from ideal.
(In addition, the algorithm is a lot easier if I actually have
int32_t and uint32_t, since memcpy'ing the uint32_t into the
int32_t is guaranteed to give the correct results. As a recent
thread in comp.std.c++ pointed out, reading a signed int in
Internet format is very hard to do in a 100% portable way.)

In this case, the fact that the type might not be defined is a
feature. I can freely use it, knowing that if my code is ever
ported to a machine where the hardware types are such that it
won't work, the code won't compile, so whoever is maintaining it
at that time will have to rework it to avoid the underlying
assumptions (which hold on 99.9% of the material around today).

"Automatic" Resource Cleanup?	10	Jun 23, 2011
UNION global variabl initialize	10	Sep 12, 2011
Cannot find my infinite loop	1	Sep 23, 2023
Proper subclassing of streambuf	1	Feb 25, 2011
Character operations in C++	2	Jan 28, 2024
CIN Input #2 gets skipped, I don't understand why.	1	Feb 9, 2023
How to multiply two matrices of size in using inline assembly in C++	2	Mar 3, 2024
Remote SSH and Configuring code help	0	Dec 13, 2023

Proper union use

Adrian

Victor Bazarov

Tomás Ó hÉilidhe

James Kanze

Pete Becker

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads