The quoted text below is from comp.std.c which originated
from a discussion on comp.lang.c. I've edited out the parts
that do not apply to my question.
Robert said:
Dann said:
#include <stdio.h>
int main(void)
{
typedef union foo_u {
struct a {
unsigned char carr[sizeof(unsigned int)];
} aa;
struct b {
unsigned int ui;
} bb;
} foo;
foo bar;
bar.bb.ui = 1;
printf("%u\n", (unsigned)bar.aa.carr[0]);
return 0;
}
#include <stdio.h>
int main(void)
{
typedef union foo_u {
unsigned char carr[sizeof(unsigned int)];
unsigned int ui;
} foo;
foo bar;
bar.ui = 1;
printf("%u\n", (unsigned)bar.carr[0]);
return 0;
}
Is the first sample safe but the second not safe?
Neither are safe.
Why is either example unsafe? I understand the output of
the printf calls is unspecified. But I do not see anything
that would be cause for concern other than that.
I disagree with Robert's assessment. They are both perfectly safe.
Any area of memory at all that a program has a right to access
(static, automatic, or allocated) may be read as an array of unsigned
char.
The standard still uses the phrase "character type" in several places,
which is an anachronism from the C89/C90 days. Only unsigned char is
truly safe now, since C99 specifically allows signed char, and
therefore plain char if signed, to have padding bits and trap
representations.
It is also perfectly safe to write to any such memory via an lvalue of
any character type, not just unsigned char, provided that the memory
is not accesses with an lvalue of another type until being modified by
said lvalue of the other type first.
For example, paragraph 5 of 6.2.6 Representations of types 6.2.6.1
General:
"Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined. Such a representation is called
a trap representation."
Also, paragraph 7 of 6.5 Expressions:
"An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:73)
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of
the object,
— a type that is the signed or unsigned type corresponding to the
effective type of the object,
— a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
— a character type."
Recognition of this special dispensation for unsigned char actually
caused a change in the C99 standard's definition for the term
"undefined behavior" between C90 and C99 draft N869, and the final C((
standard.
C90: "3.16 undefined behavior: Behavior, upon use of a nonponable or
erroneous program construct, of erroneous data, or of indeterminately
valued objects, for which this International Standard imposes no
requirements"
N869: "3.18
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct, of
erroneous data, or of indeterminately valued objects, for which this
International Standard imposes no requirements"
ISO 9899:1999: "3.4.3
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements"
The phrase "or of indeterminately valued objects" was specifically
removed because accessing any object as a suitably sized array of
unsigned char is not undefined, as unsigned char has no trap
representations.