Can one get away with an under-allocated union?

Discussion in 'C Programming' started by Alexander Klauer, Dec 25, 2010.

  1. Hi,

    suppose I allocate space for a structure, can I safely interpret
    the allocated object as a union, even if the size of the space
    allocated is smaller than the size of the union type?

    This question appears to have come up before; at least I found
    the (very old) threads

    "struct pointer casting", c.std.c, 1993/03/22
    http://groups.google.com/group/comp.lang.c/browse_thread/thread/1c12a4c6afb312a4

    "Union and malloc", c.l.c, 1998/08/22
    http://groups.google.com/group/comp.std.c/browse_thread/thread/960a336931f63f02

    and the answer appears to lean towards "No". Does the C99
    perspective change anything (I have the N1256 draft)? In order
    to create some practical ground for discussion, consider the
    following C99 program:

    -----> start <-----

    #include<stdio.h>
    #include<stdlib.h>

    enum Type {
    T_SCALAR,
    T_VECTOR
    };

    struct S {
    enum Type type;
    };

    struct S1 {
    enum Type type;
    int scalar;
    };

    struct S2 {
    enum Type type;
    int vector[3];
    };

    union U {
    struct S s;
    struct S1 s1;
    struct S2 s2;
    };

    void print_u(const union U * u) {
    switch (u->s.type) {
    case T_SCALAR:
    printf("%d\n", u->s1.scalar);
    break;
    case T_VECTOR:
    printf("(%d,%d,%d)\n",
    u->s2.vector[0],
    u->s2.vector[1],
    u->s2.vector[2]);
    break;
    }
    }

    int main(void) {
    struct S1 * s1 = malloc(sizeof(*s1));
    if (s1 == NULL)
    exit(EXIT_FAILURE);
    *s1 = (struct S1) { .type = T_SCALAR, .scalar = 42 };

    print_u((union U *) s1);
    }

    ----> end <-----

    There are several issues with this program.

    * The cast "(union U *) s1": 6.3.2.3p7 allows this cast,
    provided that the resulting pointer is correctly aligned for
    the union. One should think this requirement to be fulfilled
    because the value of s1 was returned by a successful call to
    malloc. However, as Mark Brader has pointed out in
    <>, the wording of
    7.20.3p1, "The pointer returned if the allocation succeeds is
    suitably aligned so that it may be assigned to a pointer to any
    type of object and then used to access such an object or an
    array of such objects in the space allocated (until the space
    is explicitly deallocated)", may be construed to imply that
    malloc may return pointers not suitably aligned for types whose
    size exceeds the allocated size. Is this still an accepted
    interpretation of the wording of the standard?

    * Strict aliasing and the access to u->s: the strict aliasing
    rule laid down in 6.5p7, next-to-last item, allows the access
    to u->s1 after the cast discussed above. Furthermore,
    the "special guarantee" from 6.5.2.3p5 allows the access of
    struct S in an object of type union U containing a struct S1.
    Do 6.5p7 and 6.5.2.3p5 combine, making the access to u->s
    legal?

    * u is under-allocated for its type (assume sizeof(*u) >
    sizeof(struct S1)). Does this, in itself, evoke UB? Clearly, an
    assignment to u->s2 would be UB, caused by under-allocation.
    (In the present case, the type of *u is const-qualified, so
    this assignment is not possible. However, const-qualification
    is not recursive, so with slightly more complicated structure
    types, an UB assignment is possible.) But in the absence of
    such explicit violations, may the compiler assume, the non-NULL
    pointer u points to at least sizeof(*u) bytes and thus may UB
    ensue?

    The reason I ask this question is that I have the following
    situation (which I think is fairly common, but I may be wrong):
    I have a list of pointers to objects of different sizes. When I
    retrieve a pointer, I want to know what type of data it points
    to, and then operate on that data accordingly. The natural
    solution appears to be using a union type. But allocating an
    entire union for each object is wasteful.

    There is, of course, a simple workaround. Just replace each

    struct SomeStruct {
    enum Type type;
    /* lots of members */
    };

    with

    struct SomeStructReal {
    /* lots of members */
    };

    struct SomeStruct {
    enum Type type;
    struct SomeStructReal * p;
    };

    and then allocate space for struct SomeStructReal and the union
    in which objects of type struct SomeStruct and similar reside.
    But isn't this a little unnatural? In other words: if
    under-allocating unions leads to undefined behaviour, are there
    any actual implementations exhibiting unintended behaviour in
    such a case? If not, the standard should IMHO be fixed to make
    such use of unions well-defined. Or is there any compelling
    reason the standard makes under-allocating unions undefined (if
    it does)?

    Finally, if I am right in surmising that my situation is common,
    maybe this question should go into the FAQ?

    Alexander
     
    Alexander Klauer, Dec 25, 2010
    #1
    1. Advertising

  2. On Sat, 25 Dec 2010 20:13:46 +0100, Alexander Klauer
    <> wrote:

    >Hi,
    >
    >suppose I allocate space for a structure, can I safely interpret
    >the allocated object as a union, even if the size of the space
    >allocated is smaller than the size of the union type?
    >
    >This question appears to have come up before; at least I found
    >the (very old) threads
    >
    >"struct pointer casting", c.std.c, 1993/03/22
    >http://groups.google.com/group/comp.lang.c/browse_thread/thread/1c12a4c6afb312a4
    >
    >"Union and malloc", c.l.c, 1998/08/22
    >http://groups.google.com/group/comp.std.c/browse_thread/thread/960a336931f63f02
    >
    >and the answer appears to lean towards "No". Does the C99
    >perspective change anything (I have the N1256 draft)? In order
    >to create some practical ground for discussion, consider the
    >following C99 program:


    If you make some reasonable assumptions, the answer remains
    emphatically no.

    >
    >-----> start <-----
    >
    >#include<stdio.h>
    >#include<stdlib.h>
    >
    >enum Type {
    > T_SCALAR,
    > T_VECTOR
    >};


    These constants have type int. Assume sizeof(int) is 4.

    >
    >struct S {
    > enum Type type;
    >};


    While it is possible for the compiler to decide that type should be a
    char or short, it only makes a difference if the compiler decides to
    make type a long or long long. Let's assume it is an int.

    While terminal padding is allowed, most compilers will set the
    sizeof(struct S) to the sizeof(enum Type) which is 4 for this example.

    >
    >struct S1 {
    > enum Type type;
    > int scalar;
    >};


    Similarly, sizeof(struct S1) usually will be 8.

    >
    >struct S2 {
    > enum Type type;
    > int vector[3];
    >};


    And sizeof(struct S2) will be 12.

    >
    >union U {
    > struct S s;
    > struct S1 s1;
    > struct S2 s2;
    >};
    >
    >void print_u(const union U * u) {


    For sake of the example, let u contain the value 0x1000. It points to
    an allocated area of 8 bytes.

    > switch (u->s.type) {


    Since all the members of the union begin at the front of the union, s
    begins at 0x1000. Since the first member of a struct begins at the
    front of the struct, s.type also begins at 0x1000.

    > case T_SCALAR:


    This is the only code that executes based on main below.

    > printf("%d\n", u->s1.scalar);


    s1.scalar will begin at 0x1004.

    > break;
    > case T_VECTOR:


    If this code were to execute,

    > printf("(%d,%d,%d)\n",
    > u->s2.vector[0],


    s2.vector[0] would begin at 0x1004.

    > u->s2.vector[1],


    But s2.vector[1] would begin at 0x1008 which is not part of the
    allocated memory.

    > u->s2.vector[2]);


    The same is true for s2.vector[2] which would begin at 0x100C.

    > break;
    > }
    >}
    >
    >int main(void) {
    > struct S1 * s1 = malloc(sizeof(*s1));
    > if (s1 == NULL)
    > exit(EXIT_FAILURE);
    > *s1 = (struct S1) { .type = T_SCALAR, .scalar = 42 };
    >
    > print_u((union U *) s1);
    >}
    >
    >----> end <-----
    >
    >There are several issues with this program.
    >
    >* The cast "(union U *) s1": 6.3.2.3p7 allows this cast,
    >provided that the resulting pointer is correctly aligned for
    >the union. One should think this requirement to be fulfilled
    >because the value of s1 was returned by a successful call to
    >malloc. However, as Mark Brader has pointed out in
    ><>, the wording of
    >7.20.3p1, "The pointer returned if the allocation succeeds is
    >suitably aligned so that it may be assigned to a pointer to any
    >type of object and then used to access such an object or an
    >array of such objects in the space allocated (until the space
    >is explicitly deallocated)", may be construed to imply that
    >malloc may return pointers not suitably aligned for types whose
    >size exceeds the allocated size. Is this still an accepted
    >interpretation of the wording of the standard?


    Was it ever? malloc has no clue about what type of pointer the result
    will be stored in nor the size of any object to be stored in the area.
    That is part of the reason why the returned value must be properly
    aligned for any object regardless of any mismatch between requested
    size and actual size of the object.

    >
    >* Strict aliasing and the access to u->s: the strict aliasing
    >rule laid down in 6.5p7, next-to-last item, allows the access
    >to u->s1 after the cast discussed above. Furthermore,
    >the "special guarantee" from 6.5.2.3p5 allows the access of
    >struct S in an object of type union U containing a struct S1.
    >Do 6.5p7 and 6.5.2.3p5 combine, making the access to u->s
    >legal?
    >
    >* u is under-allocated for its type (assume sizeof(*u) >
    >sizeof(struct S1)). Does this, in itself, evoke UB? Clearly, an
    >assignment to u->s2 would be UB, caused by under-allocation.


    As noted above, assignment is not necessary to invoke UB. Any
    reference to any part of s2 that is not within the allocated area will
    invoke UB.

    >(In the present case, the type of *u is const-qualified, so
    >this assignment is not possible. However, const-qualification
    >is not recursive, so with slightly more complicated structure
    >types, an UB assignment is possible.) But in the absence of
    >such explicit violations, may the compiler assume, the non-NULL
    >pointer u points to at least sizeof(*u) bytes and thus may UB
    >ensue?


    The compiler always assumes you are telling it the truth. You defined
    u as a pointer to union. The compiler will generate any code
    accessing the union as if were true. Thus, the generated code will
    cause UB when the size is insufficient, when the pointer is
    indeterminate (prior to being assigned a value or after an allocated
    area has been freed), when the pointer is NULL, and possibly others I
    haven't thought of.

    >
    >The reason I ask this question is that I have the following
    >situation (which I think is fairly common, but I may be wrong):
    >I have a list of pointers to objects of different sizes. When I
    >retrieve a pointer, I want to know what type of data it points
    >to, and then operate on that data accordingly. The natural
    >solution appears to be using a union type. But allocating an
    >entire union for each object is wasteful.
    >
    >There is, of course, a simple workaround. Just replace each
    >
    >struct SomeStruct {
    > enum Type type;
    > /* lots of members */
    >};
    >
    >with
    >
    >struct SomeStructReal {
    > /* lots of members */
    >};
    >
    >struct SomeStruct {
    > enum Type type;
    > struct SomeStructReal * p;
    >};
    >
    >and then allocate space for struct SomeStructReal and the union
    >in which objects of type struct SomeStruct and similar reside.
    >But isn't this a little unnatural? In other words: if
    >under-allocating unions leads to undefined behaviour, are there
    >any actual implementations exhibiting unintended behaviour in
    >such a case? If not, the standard should IMHO be fixed to make
    >such use of unions well-defined. Or is there any compelling
    >reason the standard makes under-allocating unions undefined (if
    >it does)?
    >
    >Finally, if I am right in surmising that my situation is common,
    >maybe this question should go into the FAQ?
    >
    >Alexander


    --
    Remove del for email
     
    Barry Schwarz, Dec 26, 2010
    #2
    1. Advertising

  3. Hi Barry,

    Barry Schwarz wrote:
    > On Sat, 25 Dec 2010 20:13:46 +0100, Alexander Klauer
    > <> wrote:
    >
    >>Hi,
    >>
    >>suppose I allocate space for a structure, can I safely
    >>interpret the allocated object as a union, even if the size of
    >>the space allocated is smaller than the size of the union
    >>type?
    >>
    >>This question appears to have come up before; at least I found
    >>the (very old) threads
    >>
    >>"struct pointer casting", c.std.c, 1993/03/22
    >>http://groups.google.com/group/comp.lang.c/browse_thread/thread/1c12a4c6afb312a4
    >>
    >>"Union and malloc", c.l.c, 1998/08/22
    >>http://groups.google.com/group/comp.std.c/browse_thread/thread/960a336931f63f02
    >>
    >>and the answer appears to lean towards "No". Does the C99
    >>perspective change anything (I have the N1256 draft)? In order
    >>to create some practical ground for discussion, consider the
    >>following C99 program:

    >
    > If you make some reasonable assumptions, the answer remains
    > emphatically no.


    OK. Moreover, I've just found 6.2.6.1p7 in N1256, which gives
    the compiler explicit licence to store whatever it wants to
    store in the surplus bytes in a union, which in turn implies
    that it may assume all bytes of the union to be accessible,
    even if the values of some of them may be unspecified. So the
    answer is definitely no.

    [snip example source]

    >>There are several issues with this program.
    >>
    >>* The cast "(union U *) s1": 6.3.2.3p7 allows this cast,
    >>provided that the resulting pointer is correctly aligned for
    >>the union. One should think this requirement to be fulfilled
    >>because the value of s1 was returned by a successful call to
    >>malloc. However, as Mark Brader has pointed out in
    >><>, the wording of
    >>7.20.3p1, "The pointer returned if the allocation succeeds is
    >>suitably aligned so that it may be assigned to a pointer to
    >>any type of object and then used to access such an object or
    >>an array of such objects in the space allocated (until the
    >>space is explicitly deallocated)", may be construed to imply
    >>that malloc may return pointers not suitably aligned for types
    >>whose size exceeds the allocated size. Is this still an
    >>accepted interpretation of the wording of the standard?

    >
    > Was it ever? malloc has no clue about what type of pointer
    > the result will be stored in nor the size of any object to be
    > stored in the area. That is part of the reason why the
    > returned value must be properly aligned for any object
    > regardless of any mismatch between requested size and actual
    > size of the object.


    But isn't the size passed to malloc a clue? Suppose some
    (hypothetical) implementation has a long double type with
    sizeof(long double) == 16 and alignment requirement of 16,
    whereas all other types have an alignment requirement of 8 or
    less. Now assume, 8 bytes are malloc'd. Then malloc knows that
    no long double is ever going to be stored in the allocated
    space because its size is insufficient for a long double.
    Hence, malloc is not required to return a 16-byte-aligned
    pointer in this case.

    >>* Strict aliasing and the access to u->s: the strict aliasing
    >>rule laid down in 6.5p7, next-to-last item, allows the access
    >>to u->s1 after the cast discussed above. Furthermore,
    >>the "special guarantee" from 6.5.2.3p5 allows the access of
    >>struct S in an object of type union U containing a struct S1.
    >>Do 6.5p7 and 6.5.2.3p5 combine, making the access to u->s
    >>legal?
    >>
    >>* u is under-allocated for its type (assume sizeof(*u) >
    >>sizeof(struct S1)). Does this, in itself, evoke UB? Clearly,
    >>an assignment to u->s2 would be UB, caused by
    >>under-allocation.

    >
    > As noted above, assignment is not necessary to invoke UB. Any
    > reference to any part of s2 that is not within the allocated
    > area will invoke UB.


    Indeed, as does any other reference to any part of the union,
    see above.

    >>(In the present case, the type of *u is const-qualified, so
    >>this assignment is not possible. However, const-qualification
    >>is not recursive, so with slightly more complicated structure
    >>types, an UB assignment is possible.) But in the absence of
    >>such explicit violations, may the compiler assume, the
    >>non-NULL pointer u points to at least sizeof(*u) bytes and
    >>thus may UB ensue?

    >
    > The compiler always assumes you are telling it the truth. You
    > defined
    > u as a pointer to union. The compiler will generate any code
    > accessing the union as if were true. Thus, the generated code
    > will cause UB when the size is insufficient, when the pointer
    > is indeterminate (prior to being assigned a value or after an
    > allocated area has been freed), when the pointer is NULL, and
    > possibly others I haven't thought of.


    OK.

    I guess that means that if an union type is used often, with
    roughly uniform distribution of member usage, its members
    should all be of roughly the same size. Or use the "pointer
    with type information" approach.

    Alexander
     
    Alexander Klauer, Dec 26, 2010
    #3
  4. Alexander Klauer

    Eric Sosman Guest

    On 12/26/2010 3:27 PM, Alexander Klauer wrote:
    >
    > Barry Schwarz wrote:
    >> [...] That is part of the reason why the
    >> returned value must be properly aligned for any object
    >> regardless of any mismatch between requested size and actual
    >> size of the object.

    >
    > But isn't the size passed to malloc a clue? Suppose some
    > (hypothetical) implementation has a long double type with
    > sizeof(long double) == 16 and alignment requirement of 16,
    > whereas all other types have an alignment requirement of 8 or
    > less. Now assume, 8 bytes are malloc'd. Then malloc knows that
    > no long double is ever going to be stored in the allocated
    > space because its size is insufficient for a long double.
    > Hence, malloc is not required to return a 16-byte-aligned
    > pointer in this case.


    Nonetheless, the `void*' returned by malloc() must be convertible
    to a `long double*' and back without loss of information. If your
    hypothetical implementation's `long double*' doesn't preserve the
    low-order four bits ("you know what I mean") of the corresponding
    `void*', then malloc(1) must return a 16-byte-aligned pointer. See
    7.20.3p1, second sentence, and by "see" I don't mean "sort of recall
    a paraphrase."

    --
    Eric Sosman
    lid
     
    Eric Sosman, Dec 26, 2010
    #4
  5. Eric Sosman wrote:
    > On 12/26/2010 3:27 PM, Alexander Klauer wrote:
    >>
    >> Barry Schwarz wrote:
    >>> [...] That is part of the reason why the
    >>> returned value must be properly aligned for any object
    >>> regardless of any mismatch between requested size and actual
    >>> size of the object.

    >>
    >> But isn't the size passed to malloc a clue? Suppose some
    >> (hypothetical) implementation has a long double type with
    >> sizeof(long double) == 16 and alignment requirement of 16,
    >> whereas all other types have an alignment requirement of 8 or
    >> less. Now assume, 8 bytes are malloc'd. Then malloc knows
    >> that no long double is ever going to be stored in the
    >> allocated space because its size is insufficient for a long
    >> double. Hence, malloc is not required to return a
    >> 16-byte-aligned pointer in this case.

    >
    > Nonetheless, the `void*' returned by malloc() must be
    > convertible
    > to a `long double*' and back without loss of information. If
    > your hypothetical implementation's `long double*' doesn't
    > preserve the low-order four bits ("you know what I mean") of
    > the corresponding
    > `void*', then malloc(1) must return a 16-byte-aligned pointer.
    > See 7.20.3p1, second sentence, and by "see" I don't mean
    > "sort of recall a paraphrase."


    In my original post, I quoted that sentence from the N1256
    document because I do not have the original standard. At least
    compared to C89 the wording does not appear to have changed
    except for the bit in parentheses. According to
    <> and its follow-up, both
    views have been argued (though I don't know what specific
    discussion(s) "It has been argued" refers to, sorry). To me, it
    seems to hinge on whether "may be assigned to a pointer to any
    type of object" is meant to be a standalone requirement, or to
    be weakened by the later qualification "in the space
    allocated".

    Anyway, this is a minor nit which has lost its bearing with my
    original question. My feeling is that any sane implementation
    should simply use all-purpose alignment regardless of the
    argument passed to malloc() and thus be conformant with both
    interpretations.

    Alexander
     
    Alexander Klauer, Dec 26, 2010
    #5
  6. Alexander Klauer

    Eric Sosman Guest

    On 12/26/2010 5:11 PM, Alexander Klauer wrote:
    > [...]
    > Anyway, this is a minor nit which has lost its bearing with my
    > original question. My feeling is that any sane implementation
    > should simply use all-purpose alignment regardless of the
    > argument passed to malloc() and thus be conformant with both
    > interpretations.


    All the actual implementations I've seen behave this way (I have
    not, of course, seen all actual implementations). One likely reason
    is simplicity: If malloc() always delivers maximally-aligned memory,
    there's nothing further to worry about.

    Another likely reason is that being tricky probably has only a
    small payoff: There's no gain at all unless malloc()'s argument is
    less than the maximal alignment (call it M), and even then the
    savings is strictly less than M. Since M is usually small -- 8 or
    less, often -- a per-allocation savings of fewer than M bytes is not
    likely to be worth a lot of effort and/or risk.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Dec 26, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Matt Garman
    Replies:
    1
    Views:
    677
    Matt Garman
    Apr 25, 2004
  2. Replies:
    5
    Views:
    628
    Matt Wharton
    Dec 9, 2004
  3. Peter Dunker

    union in struct without union name

    Peter Dunker, Apr 26, 2004, in forum: C Programming
    Replies:
    2
    Views:
    890
    Chris Torek
    Apr 26, 2004
  4. Michael Bray
    Replies:
    7
    Views:
    408
    =?Utf-8?B?UGV0ZXIgQnJvbWJlcmcgW0MjIE1WUF0=?=
    Apr 4, 2007
  5. Ralph Snart

    can't get away from innerHTML

    Ralph Snart, Jan 5, 2005, in forum: Javascript
    Replies:
    3
    Views:
    101
    Martin Honnen
    Jan 5, 2005
Loading...

Share This Page