Proper union use

A

Adrian

I remember from C that you are not supposed to access different
members of a union at the same time?

It seems to say the same thing in the C++ standard (9.5.1)
"In a union, at most one of the data members can be active at any
time, that is, the value of at most one of the data members can be
stored in a union at any time."

What does it mean by "any time" forever, block scope, function scope?

So is any of the code below in this simplistic example valid? I know
it works on any implementation I've used but that doesn't make it
correct :)

PS The whole point of this is to make some IPC messaging code more
readable
PPS I know about struct padding ;-)

#include <iostream>

union IP {
unsigned long plain;
char dotted[4];
};

void fill_by_ref(IP &ip);

int main(int argc, char *argv[])
{
IP ip;

ip.dotted[0]=127;
ip.dotted[1]=0;
ip.dotted[2]=0;
ip.dotted[3]=1;
std::cout << "ip.plain=" << ip.plain << std::endl; // Illegal I
assume from 9.5.1?

IP ip2;
fill_by_ref(ip2);
std::cout << "ip2.plain=" << ip2.plain << std::endl; // Is this any
better?

IP ip3(ip);
std::cout << "ip3.plain=" << ip3.plain << std::endl; // Or is this
any better?

return 0;
}

void fill_by_ref(IP &ip)
{
ip.dotted[0]=127;
ip.dotted[1]=0;
ip.dotted[2]=0;
ip.dotted[3]=1;
}
 
V

Victor Bazarov

Adrian said:
I remember from C that you are not supposed to access different
members of a union at the same time?

Are you asking? How would we know what you remember?
It seems to say the same thing in the C++ standard (9.5.1)
"In a union, at most one of the data members can be active at any
time, that is, the value of at most one of the data members can be
stored in a union at any time."

What does it mean by "any time" forever, block scope, function scope?

Any time during the lifetime of the instance.
So is any of the code below in this simplistic example valid? I know
it works on any implementation I've used but that doesn't make it
correct :)

"Is valid" and "has defined behaviour" are not necessarily the same.
PS The whole point of this is to make some IPC messaging code more
readable
PPS I know about struct padding ;-)

#include <iostream>

union IP {
unsigned long plain;
char dotted[4];
};

void fill_by_ref(IP &ip);

int main(int argc, char *argv[])
{
IP ip;

ip.dotted[0]=127;
ip.dotted[1]=0;
ip.dotted[2]=0;
ip.dotted[3]=1;
std::cout << "ip.plain=" << ip.plain << std::endl; // Illegal I
assume from 9.5.1?

Not illegal -- undefined.
IP ip2;
fill_by_ref(ip2);
std::cout << "ip2.plain=" << ip2.plain << std::endl; // Is this any
better?
Nope.


IP ip3(ip);
std::cout << "ip3.plain=" << ip3.plain << std::endl; // Or is this
any better?

Not really. 'ip' has been assigned using the 'dotted' member. So,
any other union initialised from it would also have the invisible
"flag" set to only use the 'dotted' member, AFAIUI.

The point of unions is not to "assign one member and extract another".
It's to share memory. Whatever you assign, you extract. You can only
safely switch between those when assigning. Assign A, assign B (safe),
read B (safe), assign A (safe), assign B (safe), assign A (safe), read
A (safe), read A (safe)... And so on. You cannot assign A and then
read B. Ever.
return 0;
}

void fill_by_ref(IP &ip)
{
ip.dotted[0]=127;
ip.dotted[1]=0;
ip.dotted[2]=0;
ip.dotted[3]=1;
}

V
 
T

Tomás Ó hÉilidhe

Adrian:
union IP {
unsigned long plain;
char dotted[4];
};


You could always go with something like:

typedef whatever_32_bit_type IP4addrs;


char unsigned &Octet(IP4addrs &addrs,unsigned const i)
{
return reinterpret_cast<char unsigned*>(&addrs);
}


I would have made a "whatever_8_bit_type" type, but you've either got 8-
Bit bytes or you don't.


int main()
{
IP4addrs x;

Octet(x,0) = 192;
Octet(x,1) = 168;
Octet(x,2) = 1;
Octet(x,3) = 254;
}

Then of course you could always turn IP4addrs into a class and have a
method called Octet. The possibilities.
 
J

James Kanze

Adrian wrote:

[...]
"Is valid" and "has defined behaviour" are not necessarily the
same.

I would say that code which has undefined behavior isn't valid.
PS The whole point of this is to make some IPC messaging
code more readable PPS I know about struct padding ;-)
#include <iostream>
union IP {
unsigned long plain;
char dotted[4];
};
void fill_by_ref(IP &ip);
int main(int argc, char *argv[])
{
IP ip;
ip.dotted[0]=127;
ip.dotted[1]=0;
ip.dotted[2]=0;
ip.dotted[3]=1;
std::cout << "ip.plain=" << ip.plain << std::endl; // Illegal I
assume from 9.5.1?
Not illegal -- undefined.

Undefined behavior according to the standard.

There are two fundamental problems. The first is that he
assumes that an unsigned long is four bytes; on the machines I
usually work on, it is 8, so accessing ip.plain here results in
accessing uninitialized memory. He definitely should be using
uint32_t. The second problem is, of course, the fact that
accessing plain when the union currently holds dotted is
undefined behavior. (The issue is somewhat clouded by the fact
that the standard does give accesses through a char type some
special privileges, but I don't think that they apply here.)
From the point of view of the standard, something like:

uint32_t ip ;
uint8_t* dotted = reinterpret_cast< uint8_t* >( &ip ) ;
dotted[ 0 ] = 127 ;
dotted[ 1 ] = 0 ;
dotted[ 2 ] = 0 ;
dotted[ 3 ] = 1 ;
std::cout << ip << std::endl ;

is defined.

Technically speaking, the standard doesn't yet contain the types
uint32_t and uint8_t. However, they are part of the C standard,
and have been adopted for the C++ standard. Also, they're not
necessarily present---they are only present if the hardware
actually supports such types. If it doesn't of course, if the
machine word is, say, 36 bits, with 9 bit char's, then he'll
need something even more complicated, I think. Of course, if
uint32_t isn't defined, there's also the possibility that
initializing it as raw memory, with byte values, may result in a
trapping representation.

What compilers actually support is another thing. I've actually
used compilers which didn't support any sort of type punning.
The standard requires type punning to work if 1) it's done by a
cast of the address, and 2) one of the types is either char or
unsigned char. IIRC, however, some compilers don't respect this
either. (From a compiler's point of view: it's easy to detect
and respect the union. Type punning with pointers is more
difficult, since it may not be visible in the translation unit
being compiled. I think that current releases of g++ "define"
the case with the union, for example, but not the ones with the
cast.)

An even more portable solution would involve something like:

LargeEnoughIntType ip ;
ip = (127 << 24)
| ( 0 << 16)
| ( 0 << 8)
| ( 1 ) ;

Note that this does not have the same semantics, however. You
cannot simply memcpy into your transmission buffer, and expect
it to be correct. On the other hand, with the type punning
solutions, the value in the transmission buffer may be correct,
but displaying it as a decimal or hexadecimal value probably
won't give anything useful.
 
P

Pete Becker

Technically speaking, the standard doesn't yet contain the types
uint32_t and uint8_t. However, they are part of the C standard,
and have been adopted for the C++ standard. Also, they're not
necessarily present---they are only present if the hardware
actually supports such types.

Just to close the loop for everyone who's panicked because these aren't
necessarily present: in general, use uint_least32_t and uint_least8_t.
They're always present, and they're required to have at least the
designated number of bits. Assuming, of course, that your library
provides <stdint.h> or <cstdint>.
 
J

James Kanze

Just to close the loop for everyone who's panicked because these aren't
necessarily present: in general, use uint_least32_t and uint_least8_t.
They're always present, and they're required to have at least the
designated number of bits. Assuming, of course, that your library
provides <stdint.h> or <cstdint>.

Good point. For various reasons, historically, uint32_t, etc.
were available on a lot of compilers even before C99, and I've
gotten into the habit of using them. In practice, however, I
think that uint_least32_t etc. would be more appropriate in just
about all of the cases I use them. (The particular case of type
punning with a union might be an exception. But then, it's
undefined behavior, regardless of the types involved.)

Although not relevant here, the question is a bit more awkward
in the case of signed ints. If I'm reading a signed 32 bit int
off the network, I need a type which can hold all possible
values in the range -2147483648...2147483647. The smallest
required type which provides this guarantee is int_least64_t,
but on a 32 bit 2's complement machine, this is far from ideal.
(In addition, the algorithm is a lot easier if I actually have
int32_t and uint32_t, since memcpy'ing the uint32_t into the
int32_t is guaranteed to give the correct results. As a recent
thread in comp.std.c++ pointed out, reading a signed int in
Internet format is very hard to do in a 100% portable way.)

In this case, the fact that the type might not be defined is a
feature. I can freely use it, knowing that if my code is ever
ported to a machine where the hardware types are such that it
won't work, the code won't compile, so whoever is maintaining it
at that time will have to rework it to avoid the underlying
assumptions (which hold on 99.9% of the material around today).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,196
Latest member
ScottChare

Latest Threads

Top