Safe use of unions

D

dingoatemydonut

The quoted text below is from comp.std.c which originated
from a discussion on comp.lang.c. I've edited out the parts
that do not apply to my question.

Robert said:
Dann said:
#include <stdio.h>

int main(void)
{
typedef union foo_u {
struct a {
unsigned char carr[sizeof(unsigned int)];
} aa;
struct b {
unsigned int ui;
} bb;
} foo;

foo bar;
bar.bb.ui = 1;
printf("%u\n", (unsigned)bar.aa.carr[0]);
return 0;
}
#include <stdio.h>

int main(void)
{
typedef union foo_u {
unsigned char carr[sizeof(unsigned int)];
unsigned int ui;
} foo;

foo bar;
bar.ui = 1;
printf("%u\n", (unsigned)bar.carr[0]);
return 0;
}

Is the first sample safe but the second not safe?

Neither are safe.

Why is either example unsafe? I understand the output of
the printf calls is unspecified. But I do not see anything
that would be cause for concern other than that.
 
J

Jack Klein

The quoted text below is from comp.std.c which originated
from a discussion on comp.lang.c. I've edited out the parts
that do not apply to my question.

Robert said:
Dann said:
#include <stdio.h>

int main(void)
{
typedef union foo_u {
struct a {
unsigned char carr[sizeof(unsigned int)];
} aa;
struct b {
unsigned int ui;
} bb;
} foo;

foo bar;
bar.bb.ui = 1;
printf("%u\n", (unsigned)bar.aa.carr[0]);
return 0;
}
#include <stdio.h>

int main(void)
{
typedef union foo_u {
unsigned char carr[sizeof(unsigned int)];
unsigned int ui;
} foo;

foo bar;
bar.ui = 1;
printf("%u\n", (unsigned)bar.carr[0]);
return 0;
}

Is the first sample safe but the second not safe?

Neither are safe.

Why is either example unsafe? I understand the output of
the printf calls is unspecified. But I do not see anything
that would be cause for concern other than that.

I disagree with Robert's assessment. They are both perfectly safe.
Any area of memory at all that a program has a right to access
(static, automatic, or allocated) may be read as an array of unsigned
char.

The standard still uses the phrase "character type" in several places,
which is an anachronism from the C89/C90 days. Only unsigned char is
truly safe now, since C99 specifically allows signed char, and
therefore plain char if signed, to have padding bits and trap
representations.

It is also perfectly safe to write to any such memory via an lvalue of
any character type, not just unsigned char, provided that the memory
is not accesses with an lvalue of another type until being modified by
said lvalue of the other type first.

For example, paragraph 5 of 6.2.6 Representations of types 6.2.6.1
General:

"Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined. Such a representation is called
a trap representation."

Also, paragraph 7 of 6.5 Expressions:

"An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:73)

— a type compatible with the effective type of the object,

— a qualified version of a type compatible with the effective type of
the object,

— a type that is the signed or unsigned type corresponding to the
effective type of the object,

— a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,

— an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or

— a character type."

Recognition of this special dispensation for unsigned char actually
caused a change in the C99 standard's definition for the term
"undefined behavior" between C90 and C99 draft N869, and the final C((
standard.

C90: "3.16 undefined behavior: Behavior, upon use of a nonponable or
erroneous program construct, of erroneous data, or of indeterminately
valued objects, for which this International Standard imposes no
requirements"

N869: "3.18
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct, of
erroneous data, or of indeterminately valued objects, for which this
International Standard imposes no requirements"

ISO 9899:1999: "3.4.3
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements"

The phrase "or of indeterminately valued objects" was specifically
removed because accessing any object as a suitably sized array of
unsigned char is not undefined, as unsigned char has no trap
representations.
 
R

Robert Gamble

Jack said:
The quoted text below is from comp.std.c which originated
from a discussion on comp.lang.c. I've edited out the parts
that do not apply to my question.

Robert said:
Dann Corbit wrote:
#include <stdio.h>

int main(void)
{
typedef union foo_u {
struct a {
unsigned char carr[sizeof(unsigned int)];
} aa;
struct b {
unsigned int ui;
} bb;
} foo;

foo bar;
bar.bb.ui = 1;
printf("%u\n", (unsigned)bar.aa.carr[0]);
return 0;
}

#include <stdio.h>

int main(void)
{
typedef union foo_u {
unsigned char carr[sizeof(unsigned int)];
unsigned int ui;
} foo;

foo bar;
bar.ui = 1;
printf("%u\n", (unsigned)bar.carr[0]);
return 0;
}

Is the first sample safe but the second not safe?

Neither are safe.

Why is either example unsafe? I understand the output of
the printf calls is unspecified. But I do not see anything
that would be cause for concern other than that.

I disagree with Robert's assessment. They are both perfectly safe.

I agree with you about the second example, I was wrong. The C89
Standard had the following wording:

"With one exception, if a member of a union object is accessed after
a value has been stored in a different member of the object, the
behavior is implementation-defined."

It was widely accepted that the behavior was actually meant to be
undefined.

Section J.1 - "Unspecified Behavior" in the C99 Standard states:

"The value of a union member other than the last one stored into
(6.2.6.1)."

However, the associated verbiage in the C99 Standard has been removed,
I did not realize this at the time I wrote my original response. The
validity of such a construct is now determined by aliasing rules in
which case the specific example is well-defined. (Note to self: stop
accepting statements made in Section J without further research).

I still am not convinced about the first example though.
Any area of memory at all that a program has a right to access
(static, automatic, or allocated) may be read as an array of unsigned
char.
[snip]

Also, paragraph 7 of 6.5 Expressions:

"An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:73)

- a type compatible with the effective type of the object,

- a qualified version of a type compatible with the effective type of
the object,

- a type that is the signed or unsigned type corresponding to the
effective type of the object,

- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,

- an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or

- a character type."

The fact that the "a character type" appears *after* the statement
about aggregate or union types makes me skeptical as to whether this
section validates the first example.

Robert Gamble
 
R

Richard G. Riley

Robert Gamble said:
Jack said:
The quoted text below is from comp.std.c which originated
from a discussion on comp.lang.c. I've edited out the parts
that do not apply to my question.

Robert Gamble wrote:
Dann Corbit wrote:
#include <stdio.h>

int main(void)
{
typedef union foo_u {
struct a {
unsigned char carr[sizeof(unsigned int)];
} aa;
struct b {
unsigned int ui;
} bb;
} foo;

foo bar;
bar.bb.ui = 1;
printf("%u\n", (unsigned)bar.aa.carr[0]);
return 0;
}

#include <stdio.h>

int main(void)
{
typedef union foo_u {
unsigned char carr[sizeof(unsigned int)];
unsigned int ui;
} foo;

foo bar;
bar.ui = 1;
printf("%u\n", (unsigned)bar.carr[0]);
return 0;
}

Is the first sample safe but the second not safe?

Neither are safe.

Why is either example unsafe? I understand the output of
the printf calls is unspecified. But I do not see anything
that would be cause for concern other than that.

I disagree with Robert's assessment. They are both perfectly safe.

I agree with you about the second example, I was wrong. The C89
Standard had the following wording:

"With one exception, if a member of a union object is accessed after
a value has been stored in a different member of the object, the
behavior is implementation-defined."

It was widely accepted that the behavior was actually meant to be
undefined.

How does this affect the oft seen habit of "field overlaying" where, for
example, a struct of 4 chars overlays an int in a union in order to give
byte access to certain bit groups of the int value? (see the union
bits32_tag example in Expert C Programming) Or am I misunderstanding or
reading out of context? Or is this habit non-standard and non portable?
Section J.1 - "Unspecified Behavior" in the C99 Standard states:

"The value of a union member other than the last one stored into
(6.2.6.1)."

However, the associated verbiage in the C99 Standard has been removed,
I did not realize this at the time I wrote my original response. The
validity of such a construct is now determined by aliasing rules in
which case the specific example is well-defined. (Note to self: stop
accepting statements made in Section J without further research).

I still am not convinced about the first example though.
Any area of memory at all that a program has a right to access
(static, automatic, or allocated) may be read as an array of unsigned
char.
[snip]

Also, paragraph 7 of 6.5 Expressions:

"An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:73)

- a type compatible with the effective type of the object,

- a qualified version of a type compatible with the effective type of
the object,

- a type that is the signed or unsigned type corresponding to the
effective type of the object,

- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,

- an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or

- a character type."

The fact that the "a character type" appears *after* the statement
about aggregate or union types makes me skeptical as to whether this
section validates the first example.

Robert Gamble
 
R

Robert Gamble

Richard said:
Robert Gamble said:
Jack said:
On 30 Jun 2006 07:55:27 -0700, (e-mail address removed) wrote in
comp.lang.c:

The quoted text below is from comp.std.c which originated
from a discussion on comp.lang.c. I've edited out the parts
that do not apply to my question.

Robert Gamble wrote:
Dann Corbit wrote:
#include <stdio.h>

int main(void)
{
typedef union foo_u {
struct a {
unsigned char carr[sizeof(unsigned int)];
} aa;
struct b {
unsigned int ui;
} bb;
} foo;

foo bar;
bar.bb.ui = 1;
printf("%u\n", (unsigned)bar.aa.carr[0]);
return 0;
}

#include <stdio.h>

int main(void)
{
typedef union foo_u {
unsigned char carr[sizeof(unsigned int)];
unsigned int ui;
} foo;

foo bar;
bar.ui = 1;
printf("%u\n", (unsigned)bar.carr[0]);
return 0;
}

Is the first sample safe but the second not safe?

Neither are safe.

Why is either example unsafe? I understand the output of
the printf calls is unspecified. But I do not see anything
that would be cause for concern other than that.

I disagree with Robert's assessment. They are both perfectly safe.

I agree with you about the second example, I was wrong. The C89
Standard had the following wording:

"With one exception, if a member of a union object is accessed after
a value has been stored in a different member of the object, the
behavior is implementation-defined."

It was widely accepted that the behavior was actually meant to be
undefined.

How does this affect the oft seen habit of "field overlaying" where, for
example, a struct of 4 chars overlays an int in a union in order to give
byte access to certain bit groups of the int value? (see the union
bits32_tag example in Expert C Programming) Or am I misunderstanding or
reading out of context? Or is this habit non-standard and non portable?

The technique you describe (and the example you cite) is very
unportable. Size assumptions aside, there is no guarantee that there
won't be padding between the members of the structure as there is with
an array. The technique is undefined in C90 for the reasons cited
above which is why it is often advised to use a pointer to unsigned
char to examine the contents instead. If the struct contained unsigned
chars instead of chars and it was guaranteed that there was no padding
between the members then (ignoring size assumptions again) this
technique might be safe in C99; but it doesn't and it's not so it is
better to use either the pointer to unsigned char technique which will
work equally well in C90 and C99 or an union to map an array of
sizeof(type) unsigned chars to type which is safe in C99 but undefined
in C90.

Robert Gamble
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top