[union] Pointers to inherited structs are valid ?

Discussion in 'C Programming' started by Maciej Labanowicz, Jan 1, 2013.

  1. Hi,

    Please analyze following example:

    /*--[beg:test.c]-------------------------------------------------*/
    01:
    02: #include <stdio.h> /* printf */
    03: #include <stdlib.h> /* EXIT_SUCCESS */
    04:
    05: struct a_s { int x; };
    06: struct b_s { struct a_s super; int y; };
    07: struct c_s { struct b_s super; int z; };
    08:
    09: union common_u {
    10: struct a_s * ptr_a;
    11: struct b_s * ptr_b;
    12: struct c_s * ptr_c;
    13: };
    14:
    15: int main(void)
    16: {
    17: struct c_s c;
    18: union common_u common;
    19:
    20: ((struct a_s *)(&c))->x = 5;
    21: ((struct b_s *)(&c))->y = 6;
    22: c.z = 7;
    23:
    24: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
    25:
    26: common.ptr_c = &c;
    27: common.ptr_c->z += 10;
    28:
    29: common.ptr_a->x += 20;
    30: common.ptr_b->y += 30;
    31:
    32: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
    33:
    34: return EXIT_SUCCESS;
    35: }
    /*--[eof:test.c]-------------------------------------------------*/

    /*--[beg:eek:utput]-------------------------------------------------*/
    01: x=5,y=6,z=7
    02: x=25,y=36,z=17
    /*--[eof:eek:utput]-------------------------------------------------*/

    There are structs that implements inheritance of members:

    a_s
    |
    +b_s
    |
    +c_s

    So, casts in lines 20,21 are valid in C.

    'union common_u' contains pointers to all of those structs.

    Line 26 contains assignment of address of 'c' (leaf in the tree) to
    union member: ptr_c.

    So 'common.ptr_c' pointer is valid (line 27 is correct).

    Question is:
    Is the rest of the union pointers are valid (ANSI-C/ISO/C89) ?
    common.ptr_b and common.ptr_a (lines: 29,30)

    Best Regards

    --
    Maciek
     
    Maciej Labanowicz, Jan 1, 2013
    #1
    1. Advertising

  2. On Tue, 1 Jan 2013 03:45:48 -0800 (PST), Maciej Labanowicz
    <> wrote:

    >Hi,
    >
    >Please analyze following example:
    >
    >/*--[beg:test.c]-------------------------------------------------*/
    >01:
    >02: #include <stdio.h> /* printf */
    >03: #include <stdlib.h> /* EXIT_SUCCESS */
    >04:
    >05: struct a_s { int x; };
    >06: struct b_s { struct a_s super; int y; };
    >07: struct c_s { struct b_s super; int z; };
    >08:
    >09: union common_u {
    >10: struct a_s * ptr_a;
    >11: struct b_s * ptr_b;
    >12: struct c_s * ptr_c;
    >13: };
    >14:
    >15: int main(void)
    >16: {
    >17: struct c_s c;
    >18: union common_u common;
    >19:
    >20: ((struct a_s *)(&c))->x = 5;
    >21: ((struct b_s *)(&c))->y = 6;
    >22: c.z = 7;
    >23:
    >24: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
    >25:
    >26: common.ptr_c = &c;
    >27: common.ptr_c->z += 10;
    >28:
    >29: common.ptr_a->x += 20;
    >30: common.ptr_b->y += 30;
    >31:
    >32: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
    >33:
    >34: return EXIT_SUCCESS;
    >35: }
    >/*--[eof:test.c]-------------------------------------------------*/
    >
    >/*--[beg:eek:utput]-------------------------------------------------*/
    >01: x=5,y=6,z=7
    >02: x=25,y=36,z=17
    >/*--[eof:eek:utput]-------------------------------------------------*/
    >
    >There are structs that implements inheritance of members:
    >
    > a_s
    > |
    > +b_s
    > |
    > +c_s
    >
    >So, casts in lines 20,21 are valid in C.
    >
    >'union common_u' contains pointers to all of those structs.
    >
    >Line 26 contains assignment of address of 'c' (leaf in the tree) to
    >union member: ptr_c.
    >
    >So 'common.ptr_c' pointer is valid (line 27 is correct).
    >
    >Question is:
    > Is the rest of the union pointers are valid (ANSI-C/ISO/C89) ?
    > common.ptr_b and common.ptr_a (lines: 29,30)


    Assuming N1570 is still current in this area, look at footnote 95.

    BTW, in the real world this code justifies terminating employment.

    --
    Remove del for email
     
    Barry Schwarz, Jan 2, 2013
    #2
    1. Advertising

  3. Maciej Labanowicz

    Shao Miller Guest

    On 1/1/2013 06:45, Maciej Labanowicz wrote:
    >[...]
    > 04:
    > 05: struct a_s { int x; };
    > 06: struct b_s { struct a_s super; int y; };
    > 07: struct c_s { struct b_s super; int z; };
    > 08:
    > 09: union common_u {
    > 10: struct a_s * ptr_a;
    > 11: struct b_s * ptr_b;
    > 12: struct c_s * ptr_c;
    > 13: };
    > 14:
    > 15: int main(void)
    > 16: {
    > 17: struct c_s c;
    > 18: union common_u common;
    > [...]
    > 26: common.ptr_c = &c;
    > 27: common.ptr_c->z += 10;
    > 28:
    > 29: common.ptr_a->x += 20;
    > 30: common.ptr_b->y += 30;
    > 31:
    > 32: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
    > 33:
    > 34: return EXIT_SUCCESS;
    > 35: }
    > [...]
    >
    >
    > Question is:
    > Is the rest of the union pointers are valid (ANSI-C/ISO/C89) ?
    > common.ptr_b and common.ptr_a (lines: 29,30)


    You appear to be type-punning the value of the 'ptr_c' member as a
    'struct a_s *' on line 29 and as a 'struct b_s *' on line 30.

    In C89, we can see this:

    "A pointer to void shall have the same representation and alignment
    requirements as a pointer to a character type. Similarly. pointers to
    qualified or unqualified versions of compatible types shall have the
    same representation and alignment requirements. ” Pointers to other
    types need not have the same representation or alignment requirements."

    so my answer to your question would be "no". However in practice, it's
    probably always going to work. In C99, we can see this:

    "A pointer to void shall have the same representation and alignment
    requirements as a pointer to a character type.39) Similarly, pointers to
    qualified or unqualified versions of compatible types shall have the
    same representation and alignment requirements. All pointers to
    structure types shall have the same representation and alignment
    requirements as each other. All pointers to union types shall have the
    same representation and alignment requirements as each other. Pointers
    to other types need not have the same representation or alignment
    requirements.

    39) The same representation and alignment requirements are meant to
    imply interchangeability as arguments to functions, return values from
    functions, and members of unions."

    But the implementation's actual pointer representation could be
    complicated and so there's still no guarantee. If you can dream up a
    pointer representation, then you can dream up a counter-example to your
    code's portability.

    - Shao Miller
     
    Shao Miller, Jan 2, 2013
    #3
  4. Maciej Labanowicz

    Tim Rentsch Guest

    Barry Schwarz <> writes:

    > On Tue, 1 Jan 2013 03:45:48 -0800 (PST), Maciej Labanowicz
    > <> wrote: [condensed]
    >>
    >>05: struct a_s { int x; };
    >>06: struct b_s { struct a_s super; int y; };
    >>07: struct c_s { struct b_s super; int z; };
    >>08:
    >>09: union common_u {
    >>10: struct a_s * ptr_a;
    >>11: struct b_s * ptr_b;
    >>12: struct c_s * ptr_c;
    >>13: };
    >>14:


    >>18: union common_u common;


    >>26: common.ptr_c = &c;
    >>27: common.ptr_c->z += 10;
    >>28:
    >>29: common.ptr_a->x += 20;
    >>30: common.ptr_b->y += 30;


    >>So 'common.ptr_c' pointer is valid (line 27 is correct).
    >>
    >>Question is:
    >> Is the rest of the union pointers are valid (ANSI-C/ISO/C89) ?
    >> common.ptr_b and common.ptr_a (lines: 29,30)

    >
    > Assuming N1570 is still current in this area, look at footnote 95.


    It isn't just that the union member access needs to get the right
    bytes -- it is also important that the representations of the
    different members agree. That agreement holds under C99 and C11,
    but not under C89/C90.
     
    Tim Rentsch, Jan 2, 2013
    #4
  5. Maciej Labanowicz

    Tim Rentsch Guest

    Maciej Labanowicz <> writes:

    > Hi,
    >
    > Please analyze following example:
    >
    > /*--[beg:test.c]-------------------------------------------------*/
    > 01:
    > 02: #include <stdio.h> /* printf */
    > 03: #include <stdlib.h> /* EXIT_SUCCESS */
    > 04:
    > 05: struct a_s { int x; };
    > 06: struct b_s { struct a_s super; int y; };
    > 07: struct c_s { struct b_s super; int z; };
    > 08:
    > 09: union common_u {
    > 10: struct a_s * ptr_a;
    > 11: struct b_s * ptr_b;
    > 12: struct c_s * ptr_c;
    > 13: };
    > 14:
    > 15: int main(void)
    > 16: {
    > 17: struct c_s c;
    > 18: union common_u common;
    > 19:
    > 20: ((struct a_s *)(&c))->x = 5;
    > 21: ((struct b_s *)(&c))->y = 6;
    > 22: c.z = 7;
    > 23:
    > 24: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
    > 25:
    > 26: common.ptr_c = &c;
    > 27: common.ptr_c->z += 10;
    > 28:
    > 29: common.ptr_a->x += 20;
    > 30: common.ptr_b->y += 30;
    > 31:
    > 32: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
    > 33:
    > 34: return EXIT_SUCCESS;
    > 35: }
    > /*--[eof:test.c]-------------------------------------------------*/
    >
    > /*--[beg:eek:utput]-------------------------------------------------*/
    > 01: x=5,y=6,z=7
    > 02: x=25,y=36,z=17
    > /*--[eof:eek:utput]-------------------------------------------------*/
    >
    > There are structs that implements inheritance of members:
    >
    > a_s
    > |
    > +b_s
    > |
    > +c_s
    >
    > So, casts in lines 20,21 are valid in C.
    >
    > 'union common_u' contains pointers to all of those structs.
    >
    > Line 26 contains assignment of address of 'c' (leaf in the tree) to
    > union member: ptr_c.
    >
    > So 'common.ptr_c' pointer is valid (line 27 is correct).
    >
    > Question is:
    > Is the rest of the union pointers are valid (ANSI-C/ISO/C89) ?
    > common.ptr_b and common.ptr_a (lines: 29,30)


    As a practical matter it should work. Strictly speaking it is
    not guaranteed under C89/C90/C95, though it is under C99 and the
    current standard, C11.

    However, even though you can (most probably) get away with this
    approach, code like this should raise a BIG RED FLAG whenever you
    see it, especially if you are the one writing it. What you want
    to do can easily be done in a way that's completely type safe
    (ie, without using either casts or void *), as the printf() call
    shows. Why use casting or type punning when not absolutely
    necessary? Is there something else about what you're trying to
    do that makes a cast-free approach unattractive? If there is,
    you probably should ask about that, because it's likely a
    different approach would reduce or eliminate that shortcoming,
    and give an overall better result.
     
    Tim Rentsch, Jan 2, 2013
    #5
  6. Maciej Labanowicz

    Tim Rentsch Guest

    Shao Miller <> writes:

    > On 1/1/2013 06:45, Maciej Labanowicz wrote:
    >>[...]
    >> 04:
    >> 05: struct a_s { int x; };
    >> 06: struct b_s { struct a_s super; int y; };
    >> 07: struct c_s { struct b_s super; int z; };
    >> 08:
    >> 09: union common_u {
    >> 10: struct a_s * ptr_a;
    >> 11: struct b_s * ptr_b;
    >> 12: struct c_s * ptr_c;
    >> 13: };
    >> 14:
    >> 15: int main(void)
    >> 16: {
    >> 17: struct c_s c;
    >> 18: union common_u common;
    >> [...]
    >> 26: common.ptr_c = &c;
    >> 27: common.ptr_c->z += 10;
    >> 28:
    >> 29: common.ptr_a->x += 20;
    >> 30: common.ptr_b->y += 30;
    >> 31:
    >> 32: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
    >> 33:
    >> 34: return EXIT_SUCCESS;
    >> 35: }
    >> [...]
    >>
    >>
    >> Question is:
    >> Is the rest of the union pointers are valid (ANSI-C/ISO/C89) ?
    >> common.ptr_b and common.ptr_a (lines: 29,30)

    >
    > You appear to be type-punning the value of the 'ptr_c' member as a
    > struct a_s *' on line 29 and as a 'struct b_s *' on line 30.
    >
    > [in C99 and C11, pointers to struct have the same representation,
    > but in C89/C90 this guarantee is not present.]


    Good to have this pointed out - thank you for tracking it down.


    > [under the C99 rules --]
    > But the implementation's actual pointer representation could be
    > complicated and so there's still no guarantee. If you can dream up a
    > pointer representation, then you can dream up a counter-example to
    > your code's portability.


    The stipulation that all pointers to structs have the same
    representation and alignment requirements means that the
    type-punning union member access has to work. That's what
    having the same represention means -- that the same object
    representation (ie, the same bytes) will have the same value.
    Any choice of representations for the two cases that doesn't
    produce identical results here means the two representations
    are not the same, ie, the implementation is not conforming
    (under C99/C11 rules).
     
    Tim Rentsch, Jan 2, 2013
    #6
  7. Maciej Labanowicz

    Shao Miller Guest

    On 1/2/2013 13:45, Tim Rentsch wrote:
    > Shao Miller <> writes:
    >> But the implementation's actual pointer representation could be
    >> complicated and so there's still no guarantee. If you can dream up a
    >> pointer representation, then you can dream up a counter-example to
    >> your code's portability.

    >
    > The stipulation that all pointers to structs have the same
    > representation and alignment requirements means that the
    > type-punning union member access has to work. That's what
    > having the same represention means -- that the same object
    > representation (ie, the same bytes) will have the same value.
    > Any choice of representations for the two cases that doesn't
    > produce identical results here means the two representations
    > are not the same, ie, the implementation is not conforming
    > (under C99/C11 rules).
    >


    Here are three examples that I would consider to be counter-examples:

    1. A 'struct any *' pointer representation that is a simple index.

    This could provide a level of indirection into a table. The table
    element could have type and bounds information, along with some other
    form of address for the pointee. When the representation (a simple
    index) is loaded into a 'struct bar *' instead of into a 'struct foo *',
    a trap could be generated.

    2. A 'struct any *' pointer representation that encodes bounds
    information. While the original post "has this covered" because the
    bounds of the the original pointee encompass the bounds of the members
    and sub-members, it's not safe in the general case. When the
    representation is loaded into a 'struct bigger *' instead of a 'struct
    smaller *', the bounds mismatch could generate a trap.

    3. A 'struct any *' pointer representation that encodes type
    information. Maybe for the sole reason of generating a trap when the
    representation is loaded into an incompatible pointer type of object.

    It seems clear to me that size, alignment, argument promotion (none) and
    format of 'struct foo *' and 'struct bar *' must be the same, but I
    don't yet understand how that ties into compatible types nor into
    defined behaviour, since

    "Certain object representations need not represent a value of the
    object type. If the stored value of an object has such a representation
    and is read by an lvalue expression that does not have character type,
    the behavior is undefined. If such a representation is produced by a
    side effect that modifies all or any part of the object by an lvalue
    expression that does not have character type, the behavior is
    undefined.41) Such a representation is called a trap representation."

    Why can a valid 'struct foo *' value's representation represent a valid
    'struct foo *' value but not a trap for a 'struct bar *'? For example,
    it might be useful to trap a 'const struct baz *' representation read
    into a 'struct baz *' object. A single bit in the representation would
    be sufficient for that. The representation would be the same, wouldn't it?

    - Shao Miller
     
    Shao Miller, Jan 3, 2013
    #7
  8. Maciej Labanowicz

    Shao Miller Guest

    On 1/2/2013 19:56, Shao Miller wrote:
    > On 1/2/2013 13:45, Tim Rentsch wrote:
    >> Shao Miller <> writes:
    >>> But the implementation's actual pointer representation could be
    >>> complicated and so there's still no guarantee. If you can dream up a
    >>> pointer representation, then you can dream up a counter-example to
    >>> your code's portability.

    >>
    >> The stipulation that all pointers to structs have the same
    >> representation and alignment requirements means that the
    >> type-punning union member access has to work. That's what
    >> having the same represention means -- that the same object
    >> representation (ie, the same bytes) will have the same value.
    >> Any choice of representations for the two cases that doesn't
    >> produce identical results here means the two representations
    >> are not the same, ie, the implementation is not conforming
    >> (under C99/C11 rules).
    >>

    >
    > Here are three examples that I would consider to be counter-examples:
    >
    > 1. A 'struct any *' pointer representation that is a simple index.
    >
    > This could provide a level of indirection into a table. The table
    > element could have type and bounds information, along with some other
    > form of address for the pointee. When the representation (a simple
    > index) is loaded into a 'struct bar *' instead of into a 'struct foo *',
    > a trap could be generated.
    >
    > 2. A 'struct any *' pointer representation that encodes bounds
    > information. While the original post "has this covered" because the
    > bounds of the the original pointee encompass the bounds of the members
    > and sub-members, it's not safe in the general case. When the
    > representation is loaded into a 'struct bigger *' instead of a 'struct
    > smaller *', the bounds mismatch could generate a trap.
    >
    > 3. A 'struct any *' pointer representation that encodes type
    > information. Maybe for the sole reason of generating a trap when the
    > representation is loaded into an incompatible pointer type of object.
    >
    > It seems clear to me that size, alignment, argument promotion (none) and
    > format of 'struct foo *' and 'struct bar *' must be the same, but I
    > don't yet understand how that ties into compatible types nor into
    > defined behaviour, since
    >
    > "Certain object representations need not represent a value of the
    > object type. If the stored value of an object has such a representation
    > and is read by an lvalue expression that does not have character type,
    > the behavior is undefined. If such a representation is produced by a
    > side effect that modifies all or any part of the object by an lvalue
    > expression that does not have character type, the behavior is
    > undefined.41) Such a representation is called a trap representation."
    >
    > Why can a valid 'struct foo *' value's representation represent a valid
    > 'struct foo *' value but not a trap for a 'struct bar *'? For example,
    > it might be useful to trap a 'const struct baz *' representation read
    > into a 'struct baz *' object. A single bit in the representation would
    > be sufficient for that. The representation would be the same, wouldn't it?


    Example #1: "...interchangeability as arguments to functions..."

    /* libbaz.h */

    typedef void f_baz_callback(structptr_t);

    extern void BazFunc(f_baz_callback * Callback, structptr_t StructPtr);

    /* libbaz.c */

    typedef struct any * structptr_t;
    #include "libbaz.h"

    void BazFunc(f_baz_callback * callback, structptr_t sptr) {
    /*
    * 'struct any' is an incomplete object type.
    * Trap representations are more limited than if it was a
    * a complete object type.
    *
    * A trap representation for _any_ pointer type could
    * still be present. A trapresentation for _any_
    * 'struct XXX *' could still be present.
    *
    * A trapresentation based on bounds could still be present
    * if 'sptr' is non-null, but somehow indicates 0 bytes
    * of storage, or some other invalid value.
    *
    * A trapresentation based on lifetime could still be
    * present. Same with 'const'-ness.
    *
    * etc.
    *
    * foo.c and bar.c have a different type for 'sptr', but
    * since the representation is the same, there's no problem.
    */
    callback(sptr);
    }

    /* foo.c */

    typedef struct s_foo * structptr_t;
    #include "libbaz.h"

    struct s_foo {
    int i;
    };

    f_baz_callback foo_callback;
    void foo_callback(structptr_t sptr) {
    sptr->i = 42;
    }

    void foo_func(void) {
    struct s_foo foo;

    BazFunc(foo_callback, &foo);
    }

    /* bar.c */

    typedef struct s_bar * structptr_t;
    #include "libbaz.h"

    struct s_bar {
    double d;
    };

    f_baz_callback bar_callback;
    void bar_callback(structptr_t sptr) {
    sptr->d = 3.14159;
    }

    void bar_func(void) {
    struct s_bar bar;

    BazFunc(bar_callback, &bar);
    }

    Example #2: "...and members of unions."

    /* libnextgen.h version 1.0 */

    struct apple;
    struct orange;

    union u_dyn_obj {
    struct apple * apple;
    struct orange * orange;
    };

    extern void NextGenFunc(union u_dyn_obj * DynamicObject);

    /* libnextgen.h version 2.0 */

    struct apple;
    struct orange;
    struct dog;
    struct cat;

    union u_dyn_obj {
    struct apple * apple;
    struct orange * orange;
    struct dog * dog;
    struct cat * cat;
    };

    extern void NextGenFunc(union u_dyn_obj DynamicObject);

    /* user.c */

    #include "libnextgen.h"

    void UserFunc(void) {
    struct apple apple;
    union u_dyn_obj dyn_obj;

    /*
    * It doesn't matter which version of the header we
    * were built with, _nor_ which version of the library
    * is installed, because the representation (and thus
    * size) and alignment are always going to be the same.
    *
    * We only work with apples and oranges, but 2.0's
    * support for dogs and cats doesn't affect us.
    */
    dyn_obj.apple = &apple;
    NextGenFunc(dyn_obj);
    }

    Example #3: "... Such a representation is called a trap representation."

    /* hmmm1.c */

    #include <stdlib.h>
    #include <stdio.h>

    struct s_smaller {
    char arr[4];
    };

    struct s_bigger {
    char arr[sizeof (struct s_smaller)];
    double d;
    };

    int main(void) {
    void * storage;
    struct s_smaller * smaller;

    /* Allocate enough storage for an s_smaller */
    storage = calloc(1, sizeof (struct s_smaller));
    if (!storage)
    return 0;
    smaller = storage;

    /*
    * Problem #3.1: Although the representation is
    * the same for both types, the value cannot
    * point to an s_bigger due to insufficient storage.
    * There's enough storage for arr, but that's
    * irrelevant.
    */
    (*((struct s_bigger **) &smaller))->arr[0] = 'C';

    printf("Result: %s\n", (char *) storage);

    return 0;
    }

    Example #4: "... Such a representation is called a trap representation."

    /* hmmm2.c */

    #include <stdlib.h>
    #include <string.h>
    #include <stdio.h>

    struct s_smaller {
    char arr[4];
    };

    struct s_bigger {
    char arr[sizeof (struct s_smaller)];
    double d;
    };

    union u_of_ptrs {
    struct s_smaller * smaller;
    struct s_bigger * bigger;
    };

    void discard_provenance(
    union u_of_ptrs * left,
    union u_of_ptrs * right,
    union u_of_ptrs * combined
    );

    int main(void) {
    void * storage;
    union u_of_ptrs first, first_backup, second, third;

    /* Allocate enough storage for an s_bigger */
    storage = calloc(1, sizeof (struct s_bigger));
    if (!storage)
    return 0;

    /* Plenty of storage for an s_smaller */
    first.smaller = storage;

    /* Backup */
    memcpy(&first_backup, &first, sizeof first_backup);

    /* Free some storage */
    storage = realloc(storage, sizeof (struct s_smaller));
    if (!storage)
    return 0;

    /* Right amount of storage */
    second.smaller = storage;

    /* Compare the representations */
    if (memcmp(&first_backup, &second, sizeof first_backup))
    return 0;

    /* Discard any "provenance" for a later test */
    discard_provenance(&first_backup, &second, &third);

    /*
    * Problem #4.1: second.bigger cannot point to an
    * s_bigger, as there's insufficient storage.
    * There's storage enough for arr, but that's
    * irrelevant.
    */
    second.bigger->arr[0] = '1';

    /*
    * Problem #4.2: Same problem with first.bigger, even
    * though its "provenance" was from the earlier allocation.
    */
    first.bigger->arr[1] = '2';

    /*
    * Problem #4.3: Same problem with third.bigger, even
    * though its "provenance" has been discarded.
    */
    third.bigger->arr[2] = '3';

    printf("Result: %s\n", (char *) storage);

    return 0;
    }

    void discard_provenance(
    union u_of_ptrs * left,
    union u_of_ptrs * right,
    union u_of_ptrs * combined
    ) {
    unsigned char * lp = (void *) left;
    unsigned char * rp = (void *) right;
    unsigned char * cp = (void *) combined;
    unsigned char * end = (void *) (combined + 1);

    while (cp < end)
    *cp++ = *lp++ & *rp++;
    }

    I would certainly appreciate a C99-/C11-conforming implementation that
    is able to catch the problems of examples #3 & #4. One way would be to
    deem trap representations for one object type and not for another, where
    the types are not compatible.

    My interpretation of "same representation and alignment requirements"
    for struct pointer types is along the lines of:

    - If there are padding bits in one, there are padding bits at the
    same positions in the other
    - If there are parity bits in one, there are parity bits at the same
    positions in the other
    - If a segment is encoded in one, then it is encoded in the same way
    in the other
    - If type information is encoded in one, then it is encoded in the
    same way in the other
    - If bounds information is encoded in one, then it is encoded in the
    same way in the other
    - If lifetime/duration information is encoded in one, then it is
    encoded in the same way in the other
    - etc.

    Since this interpretation supports the fair examples #1 & #2 as well as
    the more contrived examples #3 & #4, I fail to understand the benefit of
    adopting a more restrictive interpretation which seemingly prohibits the
    problems of #3 and #4 from being caught; perhaps with trap
    representations. But perhaps I've misunderstood.

    - Shao Miller
     
    Shao Miller, Jan 3, 2013
    #8
  9. Maciej Labanowicz

    Tim Rentsch Guest

    Shao Miller <> writes:

    > On 1/2/2013 13:45, Tim Rentsch wrote:
    >> Shao Miller <> writes:
    >>> But the implementation's actual pointer representation could be
    >>> complicated and so there's still no guarantee. If you can dream up a
    >>> pointer representation, then you can dream up a counter-example to
    >>> your code's portability.

    >>
    >> The stipulation that all pointers to structs have the same
    >> representation and alignment requirements means that the
    >> type-punning union member access has to work. That's what
    >> having the same represention means -- that the same object
    >> representation (ie, the same bytes) will have the same value.
    >> Any choice of representations for the two cases that doesn't
    >> produce identical results here means the two representations
    >> are not the same, ie, the implementation is not conforming
    >> (under C99/C11 rules).
    >>

    >
    > Here are three examples that I would consider to be counter-examples:
    >
    > 1. A 'struct any *' pointer representation that is a simple index.
    >
    > This could provide a level of indirection into a table. The table
    > element could have type and bounds information, along with some other
    > form of address for the pointee. When the representation (a simple
    > index) is loaded into a 'struct bar *' instead of into a 'struct foo
    > *', a trap could be generated.
    >
    > 2. A 'struct any *' pointer representation that encodes bounds
    > information. While the original post "has this covered" because the
    > bounds of the the original pointee encompass the bounds of the members
    > and sub-members, it's not safe in the general case. When the
    > representation is loaded into a 'struct bigger *' instead of a 'struct
    > smaller *', the bounds mismatch could generate a trap.
    >
    > 3. A 'struct any *' pointer representation that encodes type
    > information. Maybe for the sole reason of generating a trap when the
    > representation is loaded into an incompatible pointer type of object.


    These ideas aren't consistent with how the Standard uses the
    notion of having the same representation in other instances. For
    example, an object of type (int) has the same representation and
    alignment requirements as an object of type (const int). Yet it's
    ridiculous to think that loading an (int) object through a pointer
    of type (const int *) might cause a trap when accessing the object
    just as a plain int wouldn't, despite the two types being distinct
    and not compatible.

    > It seems clear to me that size, alignment, argument promotion (none)
    > and format of 'struct foo *' and 'struct bar *' must be the same, but
    > I don't yet understand how that ties into compatible types nor into
    > defined behaviour, since
    >
    > "Certain object representations need not represent a value of the
    > object type. If the stored value of an object has such a
    > representation and is read by an lvalue expression that does not have
    > character type, the behavior is undefined. If such a representation is
    > produced by a side effect that modifies all or any part of the object
    > by an lvalue expression that does not have character type, the
    > behavior is undefined.41) Such a representation is called a trap
    > representation."
    >
    > Why can a valid 'struct foo *' value's representation represent a
    > valid 'struct foo *' value but not a trap for a 'struct bar *'? For
    > example, it might be useful to trap a 'const struct baz *'
    > representation read into a 'struct baz *' object. A single bit in the
    > representation would be sufficient for that. The representation would
    > be the same, wouldn't it?


    No. I expect you're thinking of "representation" as more or less
    synonymous with "format", but representation means more than that.
    The representation of a type is the mapping from the bits (ie, the
    byte values of the object representation) to values in the type's
    abstract value space, including trap values. If two types have
    the same representation, that means the two mappings produce
    corresponding values (ie, for each object representatioon) in the
    two abstract value spaces. For C, corresponding values are what
    would be produced by conversion between the two types in question.
    In other words, if types A and B have the same representation,
    then copying the bytes (eg, with memcpy()) from an 'A a;' into a
    'B b;' must give the same results as 'b = (B) a;'. Any change in
    behavior between the two cases means the two representations are
    not the same. Accessing via type B using a union member access
    works the same way that the memcpy() would.

    For pointers, there is the additional concern that the converted
    or corresponding value be a non-trap value in the abstract value
    space of the new pointer type. However, in the particular example
    here (ie, in the original posting, even though since disappeared
    in the subthread), we know the pointer conversions have to work
    because of the way the particular structs being pointed to are
    nested.
     
    Tim Rentsch, Jan 3, 2013
    #9
  10. Maciej Labanowicz

    Shao Miller Guest

    On 1/3/2013 18:31, Tim Rentsch wrote:
    > Shao Miller <> writes:
    >
    >> On 1/2/2013 13:45, Tim Rentsch wrote:
    >>> Shao Miller <> writes:
    >>>> But the implementation's actual pointer representation could be
    >>>> complicated and so there's still no guarantee. If you can dream up a
    >>>> pointer representation, then you can dream up a counter-example to
    >>>> your code's portability.
    >>>
    >>> The stipulation that all pointers to structs have the same
    >>> representation and alignment requirements means that the
    >>> type-punning union member access has to work. That's what
    >>> having the same represention means -- that the same object
    >>> representation (ie, the same bytes) will have the same value.
    >>> Any choice of representations for the two cases that doesn't
    >>> produce identical results here means the two representations
    >>> are not the same, ie, the implementation is not conforming
    >>> (under C99/C11 rules).
    >>>

    >>
    >> Here are three examples that I would consider to be counter-examples:
    >>
    >> 1. A 'struct any *' pointer representation that is a simple index.
    >>
    >> This could provide a level of indirection into a table. The table
    >> element could have type and bounds information, along with some other
    >> form of address for the pointee. When the representation (a simple
    >> index) is loaded into a 'struct bar *' instead of into a 'struct foo
    >> *', a trap could be generated.
    >>
    >> 2. A 'struct any *' pointer representation that encodes bounds
    >> information. While the original post "has this covered" because the
    >> bounds of the the original pointee encompass the bounds of the members
    >> and sub-members, it's not safe in the general case. When the
    >> representation is loaded into a 'struct bigger *' instead of a 'struct
    >> smaller *', the bounds mismatch could generate a trap.
    >>
    >> 3. A 'struct any *' pointer representation that encodes type
    >> information. Maybe for the sole reason of generating a trap when the
    >> representation is loaded into an incompatible pointer type of object.

    >
    > These ideas aren't consistent with how the Standard uses the
    > notion of having the same representation in other instances. For
    > example, an object of type (int) has the same representation and
    > alignment requirements as an object of type (const int). Yet it's
    > ridiculous to think that loading an (int) object through a pointer
    > of type (const int *) might cause a trap when accessing the object
    > just as a plain int wouldn't, despite the two types being distinct
    > and not compatible.
    >


    Ok. I agree with your example. But 2 points:

    - The representation of 'int' is discussed in much greater detail than
    the representation of any pointer type. Pointer representations are
    much more opaque and free for the implementation to decide upon.

    - I don't think it makes practical sense to encode type information in
    the padding bits of an 'int', but it certainly seems useful to encode
    extra information in a pointer representation, since they are derived
    types with abstract values.

    Surely if, in

    void somefunc(void) {
    unsigned char c;
    /* ... */
    }

    'c' is permitted to have a trap representation due to its "provenance,"
    then it is especially convenient that pointer representations are
    opaque, so "provenance" or other meta-data can be encoded directly. No?

    >> It seems clear to me that size, alignment, argument promotion (none)
    >> and format of 'struct foo *' and 'struct bar *' must be the same, but
    >> I don't yet understand how that ties into compatible types nor into
    >> defined behaviour, since
    >>
    >> "Certain object representations need not represent a value of the
    >> object type. If the stored value of an object has such a
    >> representation and is read by an lvalue expression that does not have
    >> character type, the behavior is undefined. If such a representation is
    >> produced by a side effect that modifies all or any part of the object
    >> by an lvalue expression that does not have character type, the
    >> behavior is undefined.41) Such a representation is called a trap
    >> representation."
    >>
    >> Why can a valid 'struct foo *' value's representation represent a
    >> valid 'struct foo *' value but not a trap for a 'struct bar *'? For
    >> example, it might be useful to trap a 'const struct baz *'
    >> representation read into a 'struct baz *' object. A single bit in the
    >> representation would be sufficient for that. The representation would
    >> be the same, wouldn't it?

    >
    > No. I expect you're thinking of "representation" as more or less
    > synonymous with "format",


    Yes, you are right about that.

    > but representation means more than that.
    > The representation of a type is the mapping from the bits (ie, the
    > byte values of the object representation) to values in the type's
    > abstract value space, including trap values. If two types have
    > the same representation, that means the two mappings produce
    > corresponding values (ie, for each object representatioon) in the
    > two abstract value spaces. For C, corresponding values are what
    > would be produced by conversion between the two types in question.
    > In other words, if types A and B have the same representation,
    > then copying the bytes (eg, with memcpy()) from an 'A a;' into a
    > 'B b;' must give the same results as 'b = (B) a;'. Any change in
    > behavior between the two cases means the two representations are
    > not the same.


    I'm struggling to reconcile that with C99's 3.17p1 and 6.2.6.1. 3.17p1:

    "value
    precise meaning of the contents of an object when interpreted as
    having a specific type"

    I'm missing the part where it's possible for the same object
    representation to represent the same value for two incompatible types,
    since the value depends on the type.

    Regarding conversion, 6.3p2 has that

    "Conversion of an operand value to a compatible type causes no change
    to the value or the representation."

    Why mention both of them instead of simply "representation," if there's
    a one-to-one correspondence between representation and value, given
    compatible type? (Let alone incompatible types with the same
    representation.)

    Regarding pointer conversion, 6.3.2.3p1 has that

    "For any qualifier q, a pointer to a non-q-qualified type may be
    converted to a pointer to the q-qualified version of the type; the
    values stored in the original and converted pointers shall compare equal."

    Doesn't this explicitly hint that a 'const int *' value's representation
    is permitted to be a trap representation for an 'int *', but not the
    other way around? It seems convenient that such meta-data can be
    directly encoded into the pointer representation, since pointer
    representation is so opaque.

    There's also p7:

    "A pointer to an object or incomplete type may be converted to a
    pointer to a different object or incomplete type. If the resulting
    pointer is not correctly aligned57) for the pointed-to type, the
    behavior is undefined. Otherwise, when converted back again, the result
    shall compare equal to the original pointer. ..."

    Doesn't this explicitly hint that it's not the most portable idea to do
    anything much with a converted pointer other than to eventually convert
    it back before using it? If I understand you correctly, there's no
    conversion happening, as the value is simply becoming one in a different
    type's value space, so there's no problem with p7.

    Regarding your equivalence between the 'memcpy' and the cast for two
    types with the same representation, 6.5.4p4 has that

    "Preceding an expression by a parenthesized type name converts the
    value of the expression to the named type. This construction is called a
    cast.89) A cast that specifies no conversion has no effect on the type
    or value of an expression."

    If '(B) a' is already the same value as 'a' due to the types having the
    same representation, then there is no conversion, right? If that's the
    case, then the type of '(B) a' should be 'A'. Like 3.17p1, type and
    value are once again tied together, so it seems to me that incompatible
    types can have incompatible values.

    HOWEVER, you said _corresponding_values_. So I'd ask: May a value in
    the value space for type 'A' not have a corresponding, but invalid value
    in the value space for type 'B'? If it may, then I fail to understand
    why the original post's code is well-defined in C99 and C11.

    > Accessing via type B using a union member access
    > works the same way that the memcpy() would.


    I absolutely agree with your equivalence between 'memcpy' and union
    members. Also: Re-interpreting the object representation with something
    like:

    A * ptr;
    (*((B **) &ptr));

    (where types 'A' and 'B' have the same representation.)

    > For pointers, there is the additional concern that the converted
    > or corresponding value be a non-trap value in the abstract value
    > space of the new pointer type. However, in the particular example
    > here (ie, in the original posting, even though since disappeared
    > in the subthread), we know the pointer conversions have to work
    > because of the way the particular structs being pointed to are
    > nested.
    >


    Ah, that answers my last question, above. But there's a bit of a jump
    in the logic that I can't grasp, and that's why the nesting of the
    structures in the original example has anything at all to do with the
    corresponding pointer value having to work. Yes, I agree that the
    original example's bounds are covered because of the nesting, but I
    don't understand why that's the only important subject.

    To back up a bit from the original example, 'char *' and 'void *' have
    the same representation. Would you say that in:


    void reinterpret(void) {
    void * vp = &vp;
    vp = (*((char **) &vp)) + 1;
    }

    the expression-statement has Standard-defined behaviour? I'm worried
    about this example because an implementation might wish to represent
    "the stride" of the pointer arithmetic, just as "Multi-Dimensional Array
    Simulator"[1] does. Implicit and explicit conversions (like the
    promotions, casts, equality and ternary semantics, etc.) seem to offer
    all the protection we need, while re-interpretation does not.

    - Shao Miller

    [1] http://www.iso-9899.info/wiki/Code_snippets
     
    Shao Miller, Jan 4, 2013
    #10
  11. Maciej Labanowicz

    Shao Miller Guest

    On 1/3/2013 20:55, Shao Miller wrote:
    >
    > I absolutely agree with your equivalence between 'memcpy' and union
    > members. Also: Re-interpreting the object representation with something
    > like:
    >
    > A * ptr;
    > (*((B **) &ptr));
    >
    > (where types 'A' and 'B' have the same representation.)
    >


    I meant where 'A *' and 'B *' have the same representation.
     
    Shao Miller, Jan 4, 2013
    #11
  12. Maciej Labanowicz

    Shao Miller Guest

    On 1/3/2013 21:00, Shao Miller wrote:
    > On 1/3/2013 20:55, Shao Miller wrote:
    >>
    >> I absolutely agree with your equivalence between 'memcpy' and union
    >> members. Also: Re-interpreting the object representation with something
    >> like:
    >>
    >> A * ptr;
    >> (*((B **) &ptr));
    >>
    >> (where types 'A' and 'B' have the same representation.)
    >>

    >
    > I meant where 'A *' and 'B *' have the same representation.
    >


    (And alignment requirements.)

    However, please allow me to retract this equivalence with type-punning
    via union members and 'memcpy'. After reviewing some discussion with
    Mr. Clive Feather, now I'm not sure so... He points out that there is
    an effective type involved, but we end up with an lvalue attempting to
    access a stored value with that effective type associated, but the
    lvalue attempting to access it has a type not permitted by 6.5p7.

    - Shao Miller
     
    Shao Miller, Jan 4, 2013
    #12
  13. Maciej Labanowicz

    Shao Miller Guest

    On 1/3/2013 20:55, Shao Miller wrote:
    > On 1/3/2013 18:31, Tim Rentsch wrote:
    >> Shao Miller <> writes:
    >>
    >>> On 1/2/2013 13:45, Tim Rentsch wrote:
    >>>> Shao Miller <> writes:
    >>>>> But the implementation's actual pointer representation could be
    >>>>> complicated and so there's still no guarantee. If you can dream up a
    >>>>> pointer representation, then you can dream up a counter-example to
    >>>>> your code's portability.
    >>>>
    >>>> The stipulation that all pointers to structs have the same
    >>>> representation and alignment requirements means that the
    >>>> type-punning union member access has to work. That's what
    >>>> having the same represention means -- that the same object
    >>>> representation (ie, the same bytes) will have the same value.
    >>>> Any choice of representations for the two cases that doesn't
    >>>> produce identical results here means the two representations
    >>>> are not the same, ie, the implementation is not conforming
    >>>> (under C99/C11 rules).
    >>>>
    >>>
    >>> Here are three examples that I would consider to be counter-examples:
    >>>
    >>> 1. A 'struct any *' pointer representation that is a simple index.
    >>>
    >>> This could provide a level of indirection into a table. The table
    >>> element could have type and bounds information, along with some other
    >>> form of address for the pointee. When the representation (a simple
    >>> index) is loaded into a 'struct bar *' instead of into a 'struct foo
    >>> *', a trap could be generated.
    >>>
    >>> 2. A 'struct any *' pointer representation that encodes bounds
    >>> information. While the original post "has this covered" because the
    >>> bounds of the the original pointee encompass the bounds of the members
    >>> and sub-members, it's not safe in the general case. When the
    >>> representation is loaded into a 'struct bigger *' instead of a 'struct
    >>> smaller *', the bounds mismatch could generate a trap.
    >>>
    >>> 3. A 'struct any *' pointer representation that encodes type
    >>> information. Maybe for the sole reason of generating a trap when the
    >>> representation is loaded into an incompatible pointer type of object.

    >>
    >> These ideas aren't consistent with how the Standard uses the
    >> notion of having the same representation in other instances. For
    >> example, an object of type (int) has the same representation and
    >> alignment requirements as an object of type (const int). Yet it's
    >> ridiculous to think that loading an (int) object through a pointer
    >> of type (const int *) might cause a trap when accessing the object
    >> just as a plain int wouldn't, despite the two types being distinct
    >> and not compatible.
    >>

    >
    > Ok. I agree with your example. But 2 points:
    >
    > - The representation of 'int' is discussed in much greater detail than
    > the representation of any pointer type. Pointer representations are
    > much more opaque and free for the implementation to decide upon.
    >
    > - I don't think it makes practical sense to encode type information in
    > the padding bits of an 'int', but it certainly seems useful to encode
    > extra information in a pointer representation, since they are derived
    > types with abstract values.
    >
    > Surely if, in
    >
    > void somefunc(void) {
    > unsigned char c;
    > /* ... */
    > }
    >
    > 'c' is permitted to have a trap representation due to its "provenance,"
    > then it is especially convenient that pointer representations are
    > opaque, so "provenance" or other meta-data can be encoded directly. No?
    >
    >>> It seems clear to me that size, alignment, argument promotion (none)
    >>> and format of 'struct foo *' and 'struct bar *' must be the same, but
    >>> I don't yet understand how that ties into compatible types nor into
    >>> defined behaviour, since
    >>>
    >>> "Certain object representations need not represent a value of the
    >>> object type. If the stored value of an object has such a
    >>> representation and is read by an lvalue expression that does not have
    >>> character type, the behavior is undefined. If such a representation is
    >>> produced by a side effect that modifies all or any part of the object
    >>> by an lvalue expression that does not have character type, the
    >>> behavior is undefined.41) Such a representation is called a trap
    >>> representation."
    >>>
    >>> Why can a valid 'struct foo *' value's representation represent a
    >>> valid 'struct foo *' value but not a trap for a 'struct bar *'? For
    >>> example, it might be useful to trap a 'const struct baz *'
    >>> representation read into a 'struct baz *' object. A single bit in the
    >>> representation would be sufficient for that. The representation would
    >>> be the same, wouldn't it?

    >>
    >> No. I expect you're thinking of "representation" as more or less
    >> synonymous with "format",

    >
    > Yes, you are right about that.
    >
    >> but representation means more than that.
    >> The representation of a type is the mapping from the bits (ie, the
    >> byte values of the object representation) to values in the type's
    >> abstract value space, including trap values. If two types have
    >> the same representation, that means the two mappings produce
    >> corresponding values (ie, for each object representatioon) in the
    >> two abstract value spaces. For C, corresponding values are what
    >> would be produced by conversion between the two types in question.
    >> In other words, if types A and B have the same representation,
    >> then copying the bytes (eg, with memcpy()) from an 'A a;' into a
    >> 'B b;' must give the same results as 'b = (B) a;'. Any change in
    >> behavior between the two cases means the two representations are
    >> not the same.

    >
    > I'm struggling to reconcile that with C99's 3.17p1 and 6.2.6.1. 3.17p1:
    >
    > "value
    > precise meaning of the contents of an object when interpreted as
    > having a specific type"
    >
    > I'm missing the part where it's possible for the same object
    > representation to represent the same value for two incompatible types,
    > since the value depends on the type.
    >
    > Regarding conversion, 6.3p2 has that
    >
    > "Conversion of an operand value to a compatible type causes no change
    > to the value or the representation."
    >
    > Why mention both of them instead of simply "representation," if there's
    > a one-to-one correspondence between representation and value, given
    > compatible type? (Let alone incompatible types with the same
    > representation.)
    >
    > Regarding pointer conversion, 6.3.2.3p1 has that
    >
    > "For any qualifier q, a pointer to a non-q-qualified type may be
    > converted to a pointer to the q-qualified version of the type; the
    > values stored in the original and converted pointers shall compare equal."
    >
    > Doesn't this explicitly hint that a 'const int *' value's representation
    > is permitted to be a trap representation for an 'int *', but not the
    > other way around? It seems convenient that such meta-data can be
    > directly encoded into the pointer representation, since pointer
    > representation is so opaque.
    >
    > There's also p7:
    >
    > "A pointer to an object or incomplete type may be converted to a
    > pointer to a different object or incomplete type. If the resulting
    > pointer is not correctly aligned57) for the pointed-to type, the
    > behavior is undefined. Otherwise, when converted back again, the result
    > shall compare equal to the original pointer. ..."
    >
    > Doesn't this explicitly hint that it's not the most portable idea to do
    > anything much with a converted pointer other than to eventually convert
    > it back before using it? If I understand you correctly, there's no
    > conversion happening, as the value is simply becoming one in a different
    > type's value space, so there's no problem with p7.
    >
    > Regarding your equivalence between the 'memcpy' and the cast for two
    > types with the same representation, 6.5.4p4 has that
    >
    > "Preceding an expression by a parenthesized type name converts the
    > value of the expression to the named type. This construction is called a
    > cast.89) A cast that specifies no conversion has no effect on the type
    > or value of an expression."
    >
    > If '(B) a' is already the same value as 'a' due to the types having the
    > same representation, then there is no conversion, right? If that's the
    > case, then the type of '(B) a' should be 'A'. Like 3.17p1, type and
    > value are once again tied together, so it seems to me that incompatible
    > types can have incompatible values.
    >
    > HOWEVER, you said _corresponding_values_. So I'd ask: May a value in
    > the value space for type 'A' not have a corresponding, but invalid value
    > in the value space for type 'B'? If it may, then I fail to understand
    > why the original post's code is well-defined in C99 and C11.
    >
    >> Accessing via type B using a union member access
    >> works the same way that the memcpy() would.

    >
    > I absolutely agree with your equivalence between 'memcpy' and union
    > members. Also: Re-interpreting the object representation with something
    > like:
    >
    > A * ptr;
    > (*((B **) &ptr));
    >
    > (where types 'A' and 'B' have the same representation.)
    >
    >> For pointers, there is the additional concern that the converted
    >> or corresponding value be a non-trap value in the abstract value
    >> space of the new pointer type. However, in the particular example
    >> here (ie, in the original posting, even though since disappeared
    >> in the subthread), we know the pointer conversions have to work
    >> because of the way the particular structs being pointed to are
    >> nested.
    >>

    >
    > Ah, that answers my last question, above. But there's a bit of a jump
    > in the logic that I can't grasp, and that's why the nesting of the
    > structures in the original example has anything at all to do with the
    > corresponding pointer value having to work. Yes, I agree that the
    > original example's bounds are covered because of the nesting, but I
    > don't understand why that's the only important subject.
    >
    > To back up a bit from the original example, 'char *' and 'void *' have
    > the same representation. Would you say that in:
    >
    >
    > void reinterpret(void) {
    > void * vp = &vp;
    > vp = (*((char **) &vp)) + 1;
    > }
    >


    Since else-thread I'm retracting the union member type-punning
    equivalence with this kind of raw re-interpretation, please allow me to
    also retract this example and replace it with:

    void reinterpret(void) {
    union {
    void * vp;
    char * cp;
    } u = { &u };
    u.cp = u.cp + 1;
    }

    > the expression-statement has Standard-defined behaviour? I'm worried
    > about this example because an implementation might wish to represent
    > "the stride" of the pointer arithmetic, just as "Multi-Dimensional Array
    > Simulator"[1] does. Implicit and explicit conversions (like the
    > promotions, casts, equality and ternary semantics, etc.) seem to offer
    > all the protection we need, while re-interpretation does not.
    >
    > [1] http://www.iso-9899.info/wiki/Code_snippets
     
    Shao Miller, Jan 4, 2013
    #13
  14. Maciej Labanowicz

    Tim Rentsch Guest

    Shao Miller <> writes:

    > On 1/3/2013 18:31, Tim Rentsch wrote:
    >> Shao Miller <> writes:
    >>
    >>> On 1/2/2013 13:45, Tim Rentsch wrote:
    >>>> Shao Miller <> writes:
    >>>>> But the implementation's actual pointer representation could be
    >>>>> complicated and so there's still no guarantee. If you can dream up a
    >>>>> pointer representation, then you can dream up a counter-example to
    >>>>> your code's portability.
    >>>>
    >>>> The stipulation that all pointers to structs have the same
    >>>> representation and alignment requirements means that the
    >>>> type-punning union member access has to work. That's what
    >>>> having the same represention means -- that the same object
    >>>> representation (ie, the same bytes) will have the same value.
    >>>> Any choice of representations for the two cases that doesn't
    >>>> produce identical results here means the two representations
    >>>> are not the same, ie, the implementation is not conforming
    >>>> (under C99/C11 rules).
    >>>>
    >>>
    >>> Here are three examples that I would consider to be counter-examples:
    >>>
    >>> 1. A 'struct any *' pointer representation that is a simple index.
    >>>
    >>> This could provide a level of indirection into a table. The table
    >>> element could have type and bounds information, along with some other
    >>> form of address for the pointee. When the representation (a simple
    >>> index) is loaded into a 'struct bar *' instead of into a 'struct foo
    >>> *', a trap could be generated.
    >>>
    >>> 2. A 'struct any *' pointer representation that encodes bounds
    >>> information. While the original post "has this covered" because the
    >>> bounds of the the original pointee encompass the bounds of the members
    >>> and sub-members, it's not safe in the general case. When the
    >>> representation is loaded into a 'struct bigger *' instead of a 'struct
    >>> smaller *', the bounds mismatch could generate a trap.
    >>>
    >>> 3. A 'struct any *' pointer representation that encodes type
    >>> information. Maybe for the sole reason of generating a trap when the
    >>> representation is loaded into an incompatible pointer type of object.

    >>
    >> These ideas aren't consistent with how the Standard uses the
    >> notion of having the same representation in other instances. For
    >> example, an object of type (int) has the same representation and
    >> alignment requirements as an object of type (const int). Yet it's
    >> ridiculous to think that loading an (int) object through a pointer
    >> of type (const int *) might cause a trap when accessing the object
    >> just as a plain int wouldn't, despite the two types being distinct
    >> and not compatible.

    >
    > Ok. I agree with your example. But 2 points:
    >
    > - The representation of 'int' is discussed in much greater
    > detail than the representation of any pointer type. Pointer
    > representations are much more opaque and free for the
    > implementation to decide upon.


    That doesn't change the point I was making.

    > - I don't think it makes practical sense to encode type
    > information in the padding bits of an 'int', but it certainly
    > seems useful to encode extra information in a pointer
    > representation, since they are derived types with abstract
    > values.


    Even if that's true, it doesn't change what the Standard mandates.

    > Surely if, in
    >
    > void somefunc(void) {
    > unsigned char c;
    > /* ... */
    > }
    >
    > 'c' is permitted to have a trap representation due to its
    > "provenance,"


    It isn't. You are either mis-remembering or have misunderstood.

    > then it is especially convenient that pointer
    > representations are opaque, so "provenance" or other meta-data can be
    > encoded directly. No?


    Irrelevant. Such a statement might be an argument for changing
    a future Standard, but it has no bearing on what is said
    in the current Standard.

    >>> It seems clear to me that size, alignment, argument promotion (none)
    >>> and format of 'struct foo *' and 'struct bar *' must be the same, but
    >>> I don't yet understand how that ties into compatible types nor into
    >>> defined behaviour, since
    >>>
    >>> "Certain object representations need not represent a value of the
    >>> object type. If the stored value of an object has such a
    >>> representation and is read by an lvalue expression that does not have
    >>> character type, the behavior is undefined. If such a representation is
    >>> produced by a side effect that modifies all or any part of the object
    >>> by an lvalue expression that does not have character type, the
    >>> behavior is undefined.41) Such a representation is called a trap
    >>> representation."
    >>>
    >>> Why can a valid 'struct foo *' value's representation represent a
    >>> valid 'struct foo *' value but not a trap for a 'struct bar *'? For
    >>> example, it might be useful to trap a 'const struct baz *'
    >>> representation read into a 'struct baz *' object. A single bit in the
    >>> representation would be sufficient for that. The representation would
    >>> be the same, wouldn't it?

    >>
    >> No. I expect you're thinking of "representation" as more or less
    >> synonymous with "format",

    >
    > Yes, you are right about that.
    >
    >> but representation means more than that.
    >> The representation of a type is the mapping from the bits (ie, the
    >> byte values of the object representation) to values in the type's
    >> abstract value space, including trap values. If two types have
    >> the same representation, that means the two mappings produce
    >> corresponding values (ie, for each object representatioon) in the
    >> two abstract value spaces. For C, corresponding values are what
    >> would be produced by conversion between the two types in question.
    >> In other words, if types A and B have the same representation,
    >> then copying the bytes (eg, with memcpy()) from an 'A a;' into a
    >> 'B b;' must give the same results as 'b = (B) a;'. Any change in
    >> behavior between the two cases means the two representations are
    >> not the same.

    >
    > I'm struggling to reconcile that with C99's 3.17p1 and 6.2.6.1.
    > [quoted paragraph snipped]
    >
    > I'm missing the part where it's possible for the same object
    > representation to represent the same value for two incompatible
    > types, since the value depends on the type.


    I don't see why you are confused. There is no wording that
    forbids it, and it's obviously possible, as 'int' and 'const int'
    illustrate. On many machines 'int' and 'long' provide another
    example. Or two of the three character types.

    > Regarding conversion, 6.3p2 has that
    >
    > "Conversion of an operand value to a compatible type causes no
    > change to the value or the representation."
    >
    > Why mention both of them instead of simply "representation," if
    > there's a one-to-one correspondence between representation and
    > value, given compatible type? (Let alone incompatible types
    > with the same representation.)


    Do you think the Standard includes a sentence saying compatible
    types must have the same representation and alignment requirements?

    Incidentally, there isn't a one-to-one correspondence between object
    representations and values (necessarily, that is). The mapping is
    _from_ object representations _to_ the abstract value space, but it
    need not be one-to-one; also, the abstract value space includes
    "trap values" which correspond to trap representations but are not
    'values' as the Standard normally uses the term.

    > Regarding pointer conversion, 6.3.2.3p1 has that
    >
    > "For any qualifier q, a pointer to a non-q-qualified type may be
    > converted to a pointer to the q-qualified version of the type; the
    > values stored in the original and converted pointers shall compare
    > equal."
    >
    > Doesn't this explicitly hint that a 'const int *' value's
    > representation is permitted to be a trap representation for an 'int
    > *', but not the other way around? [snip]


    No. Converting a valid 'const int *' to an 'int *' is well-defined
    and must succeed.

    > There's also p7:
    >
    > "A pointer to an object or incomplete type may be converted to a
    > pointer to a different object or incomplete type. If the resulting
    > pointer is not correctly aligned57) for the pointed-to type, the
    > behavior is undefined. Otherwise, when converted back again, the
    > result shall compare equal to the original pointer. ..."
    >
    > Doesn't this explicitly hint that it's not the most portable idea to
    > do anything much with a converted pointer other than to eventually
    > convert it back before using it?


    No.

    > If I understand you correctly, there's no conversion happening,
    > as the value is simply becoming one in a different type's value
    > space, so there's no problem with p7.


    What I think you mean is there is no change to the object
    representation (which I didn't say and which doesn't have to
    be true). What I said was basically that the result must be the
    same whether the object representation changes or not (in cases
    where the two types involved have the same representation).

    > Regarding your equivalence between the 'memcpy' and the cast for two
    > types with the same representation, 6.5.4p4 has that
    >
    > "Preceding an expression by a parenthesized type name converts the
    > value of the expression to the named type. This construction is called
    > a cast.89) A cast that specifies no conversion has no effect on the
    > type or value of an expression."
    >
    > If '(B) a' is already the same value as 'a' due to the types having
    > the same representation, then there is no conversion, right?


    Wrong. Casting always does a conversion, even if the conversion
    doesn't change either the value or the object representation.
    Assignment also always does a conversion, even if the types are
    the same. Furthermore for the case we are discussing, namely two
    pointer-to-structure types, if the referenced types are different
    then the value spaces of the two pointer types are disjoint, so
    it can't be the case that the two values are the same.

    > If that's the case, then the type of '(B) a' should be 'A'.
    > Like 3.17p1, type and value are once again tied together, so it
    > seems to me that incompatible types can have incompatible
    > values.


    This sentence is gibberish.

    > HOWEVER, you said _corresponding_values_. So I'd ask: May a
    > value in the value space for type 'A' not have a corresponding,
    > but invalid value in the value space for type 'B'? If it may,
    > then I fail to understand why the original post's code is
    > well-defined in C99 and C11.


    I shouldn't have to explain this again. Converting the value
    with a cast has to work, because of how the struct's are nested.
    Therefore reinterpreting the object representation using a union
    member access has to work, because that's what "having the same
    representation" means.

    >> Accessing via type B using a union member access
    >> works the same way that the memcpy() would.

    >
    > I absolutely agree with your equivalence between 'memcpy' and union
    > members. Also: Re-interpreting the object representation with
    > something like:
    >
    > A * ptr;
    > (*((B **) &ptr));
    >
    > (where types 'A' and 'B' have the same representation.)


    That doesn't work, as I think you pointed out subsequently,
    because of effective type rules. Except for that, yes, same
    idea.

    >> For pointers, there is the additional concern that the converted
    >> or corresponding value be a non-trap value in the abstract value
    >> space of the new pointer type. However, in the particular example
    >> here (ie, in the original posting, even though since disappeared
    >> in the subthread), we know the pointer conversions have to work
    >> because of the way the particular structs being pointed to are
    >> nested.

    >
    > Ah, that answers my last question, above. But there's a bit of
    > a jump in the logic that I can't grasp, and that's why the
    > nesting of the structures in the original example has anything
    > at all to do with the corresponding pointer value having to
    > work. Yes, I agree that the original example's bounds are
    > covered because of the nesting, but I don't understand why
    > that's the only important subject.


    There are two important facts: one, the struct values are
    nested appropriately; and two, the pointers to those structs
    have the same representation (and alignment requirements).
    Therefore the type-punning union member access gets a set
    of bits that are both interpreted correctly and valid for
    the type in question.

    > To back up a bit from the original example, 'char *' and 'void
    > *' have the same representation. Would you say that in:
    >
    >
    > void reinterpret(void) {
    > void * vp = &vp;
    > vp = (*((char **) &vp)) + 1;
    > }
    >


    Again, there is a violation of effective type rules in this case,
    but if the analogous thing were done using union member access
    then yes it has to work.

    > the expression-statement has Standard-defined behaviour? I'm
    > worried about this example because an implementation might wish
    > to represent "the stride" of the pointer arithmetic, just as
    > "Multi-Dimensional Array Simulator"[1] does. Implicit and
    > explicit conversions (like the promotions, casts, equality and
    > ternary semantics, etc.) seem to offer all the protection we
    > need, while re-interpretation does not.


    You're confusing what you think might be a good idea with
    what the Standard mandates. My comments are concerned only
    with the latter.
     
    Tim Rentsch, Jan 7, 2013
    #14
  15. Maciej Labanowicz

    Shao Miller Guest

    On 1/6/2013 23:00, Tim Rentsch wrote:
    > Shao Miller <> writes:
    >> To back up a bit from the original example, 'char *' and 'void
    >> *' have the same representation. Would you say that in:
    >>
    >> void reinterpret(void) {
    >> void * vp = &vp;
    >> vp = (*((char **) &vp)) + 1;
    >> }

    >
    > Again, there is a violation of effective type rules in this case,
    > but if the analogous thing were done using union member access
    > then yes it has to work.
    >


    And here is the analogous thing, offered elsethread:

    void reinterpret(void) {
    union {
    void * vp;
    char * cp;
    } u = { &u };
    u.cp = u.cp + 1;
    /* Hmm ^^^^ */
    }

    The 'u.cp' expression marked by the comment (having type 'char *') is an
    lvalue whose type is not one of those listed by 6.5p7, but it attempts
    to access the value of 'u.vp'. (Doesn't it?) This appears to yield
    undefined behaviour, doesn't it? Or would you suggest that the 'u'
    sub-expression (having the union type) is the lvalue for purposes of
    6.5p7, and that the type of the containing expression 'u.cp' doesn't matter?

    - Shao Miller
     
    Shao Miller, Jan 7, 2013
    #15
  16. Maciej Labanowicz

    Tim Rentsch Guest

    Shao Miller <> writes:

    > On 1/6/2013 23:00, Tim Rentsch wrote:
    >> Shao Miller <> writes:
    >>> To back up a bit from the original example, 'char *' and 'void
    >>> *' have the same representation. Would you say that in:
    >>>
    >>> void reinterpret(void) {
    >>> void * vp = &vp;
    >>> vp = (*((char **) &vp)) + 1;
    >>> }

    >>
    >> Again, there is a violation of effective type rules in this case,
    >> but if the analogous thing were done using union member access
    >> then yes it has to work.
    >>

    >
    > And here is the analogous thing, offered elsethread:
    >
    > void reinterpret(void) {
    > union {
    > void * vp;
    > char * cp;
    > } u = { &u };
    > u.cp = u.cp + 1;
    > /* Hmm ^^^^ */
    > }
    >
    > The 'u.cp' expression marked by the comment (having type 'char *') is
    > an lvalue whose type is not one of those listed by 6.5p7, but it
    > attempts to access the value of 'u.vp'. (Doesn't it?) This appears
    > to yield undefined behaviour, doesn't it? Or would you suggest that
    > the 'u' sub-expression (having the union type) is the lvalue for
    > purposes of 6.5p7, and that the type of the containing expression
    > u.cp' doesn't matter?


    Look harder. Think more. Write less.
     
    Tim Rentsch, Jan 7, 2013
    #16
  17. Maciej Labanowicz

    Shao Miller Guest

    On 1/7/2013 02:06, Tim Rentsch wrote:
    > Shao Miller <> writes:
    >
    >> On 1/6/2013 23:00, Tim Rentsch wrote:
    >>> Shao Miller <> writes:
    >>>> To back up a bit from the original example, 'char *' and 'void
    >>>> *' have the same representation. Would you say that in:
    >>>>
    >>>> void reinterpret(void) {
    >>>> void * vp = &vp;
    >>>> vp = (*((char **) &vp)) + 1;
    >>>> }
    >>>
    >>> Again, there is a violation of effective type rules in this case,
    >>> but if the analogous thing were done using union member access
    >>> then yes it has to work.
    >>>

    >>
    >> And here is the analogous thing, offered elsethread:
    >>
    >> void reinterpret(void) {
    >> union {
    >> void * vp;
    >> char * cp;
    >> } u = { &u };
    >> u.cp = u.cp + 1;
    >> /* Hmm ^^^^ */
    >> }
    >>
    >> The 'u.cp' expression marked by the comment (having type 'char *') is
    >> an lvalue whose type is not one of those listed by 6.5p7, but it
    >> attempts to access the value of 'u.vp'. (Doesn't it?) This appears
    >> to yield undefined behaviour, doesn't it? Or would you suggest that
    >> the 'u' sub-expression (having the union type) is the lvalue for
    >> purposes of 6.5p7, and that the type of the containing expression
    >> u.cp' doesn't matter?

    >
    > Look harder. Think more. Write less.
    >


    Please don't resort to this sort of personally-directed nonsense as
    you've done before. If you don't have an answer, please simply say so.
    If you really think I've missed something, it'd certainly be more
    helpful to point it out instead of implying laziness or stupidity.

    If you think I write too much, well, I think you write too little
    Standard, and too much "Mr. T. Rentsch knows best." Unfortunately, that
    doesn't work for me, as your knowledge isn't directly accessible to me.
    I'm sorry if that makes our discussions difficult! If you choose to
    help me to understand your valuable perspective, I'll be appreciative.

    Just in case you're nit-picking an error in the code that hardly seems
    relevant to the meat of the question, please allow me to offer the
    corrected code:

    void reinterpret(void) {
    union {
    void * vp;
    char * cp;
    } u;
    u.vp = &u;
    u.cp = u.cp + 1;
    /* Hmm ^^^^ */
    }

    int main(void) {
    reinterpret();
    return 0;
    }

    Otherwise, would anyone else please point out what I might've missed
    about whether or not the above example results in undefined behaviour?
    The "shall"[6.5p7] is outside of a constraint, so that'd seem to be
    undefined behaviour if the lvalue under consideration is 'u.cp'. If the
    lvalue is 'u', then its union type _is_ permitted by 6.5p7 (as
    acknowledged in a previous post, above), but it'd be good to know
    _which_ is the lvalue under consideration.

    - Shao Miller
     
    Shao Miller, Jan 7, 2013
    #17
  18. Maciej Labanowicz

    Shao Miller Guest

    On 1/6/2013 23:00, Tim Rentsch wrote:
    > Shao Miller <> writes:
    >> Surely if, in
    >>
    >> void somefunc(void) {
    >> unsigned char c;
    >> /* ... */
    >> }
    >>
    >> 'c' is permitted to have a trap representation due to its
    >> "provenance,"

    >
    > It isn't. You are either mis-remembering or have misunderstood.
    >


    Committee Discussion in Defect Report #260:

    "In addition the C Standard does not prohibit an implementation from
    tracking the provenance of the bit-pattern representing a value. An
    indeterminate value happening to have a bit pattern that is identical to
    a bit pattern representing a determinate value is not sufficient to
    allow access to the indeterminate value free from undefined behavior."

    That suggests to me that real implementation representatives discussed
    it, and some of them must have argued that there is more to object
    representation and value than a simple mapping. I suggest that there
    are other meta-considerations (such as "indeterminate value"), some of
    which are crucial to an implementation that wishes to have "enforceable
    coding rules":

    http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1663.pdf

    'c' above is permitted to have a trap representation without even having
    that fact coded into its object representation. If I've misunderstood,
    then I apologize. If you have further knowledge of the status of DR
    #260, then please share! :)

    - Shao Miller
     
    Shao Miller, Jan 7, 2013
    #18
  19. Maciej Labanowicz

    Shao Miller Guest

    On 1/7/2013 04:03, Shao Miller wrote:
    >
    > void reinterpret(void) {
    > union {
    > void * vp;
    > char * cp;
    > } u;
    > u.vp = &u;
    > u.cp = u.cp + 1;
    > /* Hmm ^^^^ */
    > }
    >
    > int main(void) {
    > reinterpret();
    > return 0;
    > }
    >
    > Otherwise, would anyone else please point out what I might've missed
    > about whether or not the above example results in undefined behaviour?
    > The "shall"[6.5p7] is outside of a constraint, so that'd seem to be
    > undefined behaviour if the lvalue under consideration is 'u.cp'. If the
    > lvalue is 'u', then its union type _is_ permitted by 6.5p7 (as
    > acknowledged in a previous post, above), but it'd be good to know
    > _which_ is the lvalue under consideration.


    Mr. Clive D. W. Feather very kindly gave his valuable time and shared in
    agreement about this code.

    6.5p7 makes this undefined behaviour, just as it does for the original
    post's use of the two different union members, despite the two
    pointer-to-structure types having the same representation and alignment
    requirements.

    The penultimate bullet of 6.5p7 regarding unions is so that the
    following code is well-defined:

    void reinterpret(void) {
    union {
    void * vp;
    char * cp;
    } u, v;
    u.vp = &u;

    /* Union lvalue on right accesses the stored value */
    v = u;
    (void) v;
    }

    int main(void) {
    reinterpret();
    return 0;
    }

    I'm glad that if I've lost some marbles, someone else lost the same ones. :)

    - Shao Miller
     
    Shao Miller, Jan 7, 2013
    #19
  20. Maciej Labanowicz

    Tim Rentsch Guest

    Shao Miller <> writes:

    > On 1/6/2013 23:00, Tim Rentsch wrote:
    >> Shao Miller <> writes:
    >>> Surely if, in
    >>>
    >>> void somefunc(void) {
    >>> unsigned char c;
    >>> /* ... */
    >>> }
    >>>
    >>> 'c' is permitted to have a trap representation due to its
    >>> "provenance,"

    >>
    >> It isn't. You are either mis-remembering or have misunderstood.

    >
    > Committee Discussion in Defect Report #260: [snip]


    The type unsigned char does not have trap representations. There
    are no exceptions. Types that don't have trap representations
    never have a trap representation.

    In C11, accessing a variable like 'c' above before it has been
    initialiized is undefined behavior. But that is because C11
    added (relative to, eg, N1256) a specific statement regarding
    such cases, stating explicitly that the behavior is undefined;
    it has nothing to do with provenance or trap representations.
    Indeed, seeing that this proviso was added in C11 makes it
    obvious that DR 260 doesn't apply to cases like the example
    above, because otherwise there would be no reason to add it.
     
    Tim Rentsch, Jan 12, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Matt Garman
    Replies:
    1
    Views:
    677
    Matt Garman
    Apr 25, 2004
  2. Peter Dunker

    union in struct without union name

    Peter Dunker, Apr 26, 2004, in forum: C Programming
    Replies:
    2
    Views:
    895
    Chris Torek
    Apr 26, 2004
  3. Paminu
    Replies:
    5
    Views:
    651
    Eric Sosman
    Oct 11, 2005
  4. Tim Rentsch

    Re: union of pointers to structs

    Tim Rentsch, Jan 13, 2010, in forum: C Programming
    Replies:
    1
    Views:
    736
    Tim Rentsch
    Jan 15, 2010
  5. Michael Foukarakis

    Re: union of pointers to structs

    Michael Foukarakis, Jan 14, 2010, in forum: C Programming
    Replies:
    0
    Views:
    387
    Michael Foukarakis
    Jan 14, 2010
Loading...

Share This Page