Can one get away with an under-allocated union?


A

Alexander Klauer

Hi,

suppose I allocate space for a structure, can I safely interpret
the allocated object as a union, even if the size of the space
allocated is smaller than the size of the union type?

This question appears to have come up before; at least I found
the (very old) threads

"struct pointer casting", c.std.c, 1993/03/22
http://groups.google.com/group/comp.lang.c/browse_thread/thread/1c12a4c6afb312a4

"Union and malloc", c.l.c, 1998/08/22
http://groups.google.com/group/comp.std.c/browse_thread/thread/960a336931f63f02

and the answer appears to lean towards "No". Does the C99
perspective change anything (I have the N1256 draft)? In order
to create some practical ground for discussion, consider the
following C99 program:

-----> start <-----

#include<stdio.h>
#include<stdlib.h>

enum Type {
T_SCALAR,
T_VECTOR
};

struct S {
enum Type type;
};

struct S1 {
enum Type type;
int scalar;
};

struct S2 {
enum Type type;
int vector[3];
};

union U {
struct S s;
struct S1 s1;
struct S2 s2;
};

void print_u(const union U * u) {
switch (u->s.type) {
case T_SCALAR:
printf("%d\n", u->s1.scalar);
break;
case T_VECTOR:
printf("(%d,%d,%d)\n",
u->s2.vector[0],
u->s2.vector[1],
u->s2.vector[2]);
break;
}
}

int main(void) {
struct S1 * s1 = malloc(sizeof(*s1));
if (s1 == NULL)
exit(EXIT_FAILURE);
*s1 = (struct S1) { .type = T_SCALAR, .scalar = 42 };

print_u((union U *) s1);
}

----> end <-----

There are several issues with this program.

* The cast "(union U *) s1": 6.3.2.3p7 allows this cast,
provided that the resulting pointer is correctly aligned for
the union. One should think this requirement to be fulfilled
because the value of s1 was returned by a successful call to
malloc. However, as Mark Brader has pointed out in
<[email protected]>, the wording of
7.20.3p1, "The pointer returned if the allocation succeeds is
suitably aligned so that it may be assigned to a pointer to any
type of object and then used to access such an object or an
array of such objects in the space allocated (until the space
is explicitly deallocated)", may be construed to imply that
malloc may return pointers not suitably aligned for types whose
size exceeds the allocated size. Is this still an accepted
interpretation of the wording of the standard?

* Strict aliasing and the access to u->s: the strict aliasing
rule laid down in 6.5p7, next-to-last item, allows the access
to u->s1 after the cast discussed above. Furthermore,
the "special guarantee" from 6.5.2.3p5 allows the access of
struct S in an object of type union U containing a struct S1.
Do 6.5p7 and 6.5.2.3p5 combine, making the access to u->s
legal?

* u is under-allocated for its type (assume sizeof(*u) >
sizeof(struct S1)). Does this, in itself, evoke UB? Clearly, an
assignment to u->s2 would be UB, caused by under-allocation.
(In the present case, the type of *u is const-qualified, so
this assignment is not possible. However, const-qualification
is not recursive, so with slightly more complicated structure
types, an UB assignment is possible.) But in the absence of
such explicit violations, may the compiler assume, the non-NULL
pointer u points to at least sizeof(*u) bytes and thus may UB
ensue?

The reason I ask this question is that I have the following
situation (which I think is fairly common, but I may be wrong):
I have a list of pointers to objects of different sizes. When I
retrieve a pointer, I want to know what type of data it points
to, and then operate on that data accordingly. The natural
solution appears to be using a union type. But allocating an
entire union for each object is wasteful.

There is, of course, a simple workaround. Just replace each

struct SomeStruct {
enum Type type;
/* lots of members */
};

with

struct SomeStructReal {
/* lots of members */
};

struct SomeStruct {
enum Type type;
struct SomeStructReal * p;
};

and then allocate space for struct SomeStructReal and the union
in which objects of type struct SomeStruct and similar reside.
But isn't this a little unnatural? In other words: if
under-allocating unions leads to undefined behaviour, are there
any actual implementations exhibiting unintended behaviour in
such a case? If not, the standard should IMHO be fixed to make
such use of unions well-defined. Or is there any compelling
reason the standard makes under-allocating unions undefined (if
it does)?

Finally, if I am right in surmising that my situation is common,
maybe this question should go into the FAQ?

Alexander
 
Ad

Advertisements

B

Barry Schwarz

Hi,

suppose I allocate space for a structure, can I safely interpret
the allocated object as a union, even if the size of the space
allocated is smaller than the size of the union type?

This question appears to have come up before; at least I found
the (very old) threads

"struct pointer casting", c.std.c, 1993/03/22
http://groups.google.com/group/comp.lang.c/browse_thread/thread/1c12a4c6afb312a4

"Union and malloc", c.l.c, 1998/08/22
http://groups.google.com/group/comp.std.c/browse_thread/thread/960a336931f63f02

and the answer appears to lean towards "No". Does the C99
perspective change anything (I have the N1256 draft)? In order
to create some practical ground for discussion, consider the
following C99 program:

If you make some reasonable assumptions, the answer remains
emphatically no.
-----> start <-----

#include<stdio.h>
#include<stdlib.h>

enum Type {
T_SCALAR,
T_VECTOR
};

These constants have type int. Assume sizeof(int) is 4.
struct S {
enum Type type;
};

While it is possible for the compiler to decide that type should be a
char or short, it only makes a difference if the compiler decides to
make type a long or long long. Let's assume it is an int.

While terminal padding is allowed, most compilers will set the
sizeof(struct S) to the sizeof(enum Type) which is 4 for this example.
struct S1 {
enum Type type;
int scalar;
};

Similarly, sizeof(struct S1) usually will be 8.
struct S2 {
enum Type type;
int vector[3];
};

And sizeof(struct S2) will be 12.
union U {
struct S s;
struct S1 s1;
struct S2 s2;
};

void print_u(const union U * u) {

For sake of the example, let u contain the value 0x1000. It points to
an allocated area of 8 bytes.
switch (u->s.type) {

Since all the members of the union begin at the front of the union, s
begins at 0x1000. Since the first member of a struct begins at the
front of the struct, s.type also begins at 0x1000.
case T_SCALAR:

This is the only code that executes based on main below.
printf("%d\n", u->s1.scalar);

s1.scalar will begin at 0x1004.
break;
case T_VECTOR:

If this code were to execute,
printf("(%d,%d,%d)\n",
u->s2.vector[0],

s2.vector[0] would begin at 0x1004.
u->s2.vector[1],

But s2.vector[1] would begin at 0x1008 which is not part of the
allocated memory.
u->s2.vector[2]);

The same is true for s2.vector[2] which would begin at 0x100C.
break;
}
}

int main(void) {
struct S1 * s1 = malloc(sizeof(*s1));
if (s1 == NULL)
exit(EXIT_FAILURE);
*s1 = (struct S1) { .type = T_SCALAR, .scalar = 42 };

print_u((union U *) s1);
}

----> end <-----

There are several issues with this program.

* The cast "(union U *) s1": 6.3.2.3p7 allows this cast,
provided that the resulting pointer is correctly aligned for
the union. One should think this requirement to be fulfilled
because the value of s1 was returned by a successful call to
malloc. However, as Mark Brader has pointed out in
<[email protected]>, the wording of
7.20.3p1, "The pointer returned if the allocation succeeds is
suitably aligned so that it may be assigned to a pointer to any
type of object and then used to access such an object or an
array of such objects in the space allocated (until the space
is explicitly deallocated)", may be construed to imply that
malloc may return pointers not suitably aligned for types whose
size exceeds the allocated size. Is this still an accepted
interpretation of the wording of the standard?

Was it ever? malloc has no clue about what type of pointer the result
will be stored in nor the size of any object to be stored in the area.
That is part of the reason why the returned value must be properly
aligned for any object regardless of any mismatch between requested
size and actual size of the object.
* Strict aliasing and the access to u->s: the strict aliasing
rule laid down in 6.5p7, next-to-last item, allows the access
to u->s1 after the cast discussed above. Furthermore,
the "special guarantee" from 6.5.2.3p5 allows the access of
struct S in an object of type union U containing a struct S1.
Do 6.5p7 and 6.5.2.3p5 combine, making the access to u->s
legal?

* u is under-allocated for its type (assume sizeof(*u) >
sizeof(struct S1)). Does this, in itself, evoke UB? Clearly, an
assignment to u->s2 would be UB, caused by under-allocation.

As noted above, assignment is not necessary to invoke UB. Any
reference to any part of s2 that is not within the allocated area will
invoke UB.
(In the present case, the type of *u is const-qualified, so
this assignment is not possible. However, const-qualification
is not recursive, so with slightly more complicated structure
types, an UB assignment is possible.) But in the absence of
such explicit violations, may the compiler assume, the non-NULL
pointer u points to at least sizeof(*u) bytes and thus may UB
ensue?

The compiler always assumes you are telling it the truth. You defined
u as a pointer to union. The compiler will generate any code
accessing the union as if were true. Thus, the generated code will
cause UB when the size is insufficient, when the pointer is
indeterminate (prior to being assigned a value or after an allocated
area has been freed), when the pointer is NULL, and possibly others I
haven't thought of.
 
A

Alexander Klauer

Hi Barry,

Barry said:
If you make some reasonable assumptions, the answer remains
emphatically no.

OK. Moreover, I've just found 6.2.6.1p7 in N1256, which gives
the compiler explicit licence to store whatever it wants to
store in the surplus bytes in a union, which in turn implies
that it may assume all bytes of the union to be accessible,
even if the values of some of them may be unspecified. So the
answer is definitely no.

[snip example source]
Was it ever? malloc has no clue about what type of pointer
the result will be stored in nor the size of any object to be
stored in the area. That is part of the reason why the
returned value must be properly aligned for any object
regardless of any mismatch between requested size and actual
size of the object.

But isn't the size passed to malloc a clue? Suppose some
(hypothetical) implementation has a long double type with
sizeof(long double) == 16 and alignment requirement of 16,
whereas all other types have an alignment requirement of 8 or
less. Now assume, 8 bytes are malloc'd. Then malloc knows that
no long double is ever going to be stored in the allocated
space because its size is insufficient for a long double.
Hence, malloc is not required to return a 16-byte-aligned
pointer in this case.
As noted above, assignment is not necessary to invoke UB. Any
reference to any part of s2 that is not within the allocated
area will invoke UB.

Indeed, as does any other reference to any part of the union,
see above.
The compiler always assumes you are telling it the truth. You
defined
u as a pointer to union. The compiler will generate any code
accessing the union as if were true. Thus, the generated code
will cause UB when the size is insufficient, when the pointer
is indeterminate (prior to being assigned a value or after an
allocated area has been freed), when the pointer is NULL, and
possibly others I haven't thought of.

OK.

I guess that means that if an union type is used often, with
roughly uniform distribution of member usage, its members
should all be of roughly the same size. Or use the "pointer
with type information" approach.

Alexander
 
E

Eric Sosman

Barry said:
[...] That is part of the reason why the
returned value must be properly aligned for any object
regardless of any mismatch between requested size and actual
size of the object.

But isn't the size passed to malloc a clue? Suppose some
(hypothetical) implementation has a long double type with
sizeof(long double) == 16 and alignment requirement of 16,
whereas all other types have an alignment requirement of 8 or
less. Now assume, 8 bytes are malloc'd. Then malloc knows that
no long double is ever going to be stored in the allocated
space because its size is insufficient for a long double.
Hence, malloc is not required to return a 16-byte-aligned
pointer in this case.

Nonetheless, the `void*' returned by malloc() must be convertible
to a `long double*' and back without loss of information. If your
hypothetical implementation's `long double*' doesn't preserve the
low-order four bits ("you know what I mean") of the corresponding
`void*', then malloc(1) must return a 16-byte-aligned pointer. See
7.20.3p1, second sentence, and by "see" I don't mean "sort of recall
a paraphrase."
 
A

Alexander Klauer

Eric said:
Barry said:
[...] That is part of the reason why the
returned value must be properly aligned for any object
regardless of any mismatch between requested size and actual
size of the object.

But isn't the size passed to malloc a clue? Suppose some
(hypothetical) implementation has a long double type with
sizeof(long double) == 16 and alignment requirement of 16,
whereas all other types have an alignment requirement of 8 or
less. Now assume, 8 bytes are malloc'd. Then malloc knows
that no long double is ever going to be stored in the
allocated space because its size is insufficient for a long
double. Hence, malloc is not required to return a
16-byte-aligned pointer in this case.

Nonetheless, the `void*' returned by malloc() must be
convertible
to a `long double*' and back without loss of information. If
your hypothetical implementation's `long double*' doesn't
preserve the low-order four bits ("you know what I mean") of
the corresponding
`void*', then malloc(1) must return a 16-byte-aligned pointer.
See 7.20.3p1, second sentence, and by "see" I don't mean
"sort of recall a paraphrase."

In my original post, I quoted that sentence from the N1256
document because I do not have the original standard. At least
compared to C89 the wording does not appear to have changed
except for the bit in parentheses. According to
<[email protected]> and its follow-up, both
views have been argued (though I don't know what specific
discussion(s) "It has been argued" refers to, sorry). To me, it
seems to hinge on whether "may be assigned to a pointer to any
type of object" is meant to be a standalone requirement, or to
be weakened by the later qualification "in the space
allocated".

Anyway, this is a minor nit which has lost its bearing with my
original question. My feeling is that any sane implementation
should simply use all-purpose alignment regardless of the
argument passed to malloc() and thus be conformant with both
interpretations.

Alexander
 
Ad

Advertisements

E

Eric Sosman

[...]
Anyway, this is a minor nit which has lost its bearing with my
original question. My feeling is that any sane implementation
should simply use all-purpose alignment regardless of the
argument passed to malloc() and thus be conformant with both
interpretations.

All the actual implementations I've seen behave this way (I have
not, of course, seen all actual implementations). One likely reason
is simplicity: If malloc() always delivers maximally-aligned memory,
there's nothing further to worry about.

Another likely reason is that being tricky probably has only a
small payoff: There's no gain at all unless malloc()'s argument is
less than the maximal alignment (call it M), and even then the
savings is strictly less than M. Since M is usually small -- 8 or
less, often -- a per-allocation savings of fewer than M bytes is not
likely to be worth a lot of effort and/or risk.
 
Ad

Advertisements


Top