strict aliasing rules in ISO C, someone understands them ?

Discussion in 'C Programming' started by nicolas.riesch@genevoise.ch, Oct 13, 2005.

  1. Guest

    I try to understand strict aliasing rules that are in the C Standard.
    As gcc applies these rules by default, I just want to be sure to
    understand fully this issue.

    For questions (1), (2) and (3), I think that the answers are all "yes",
    but I would be glad to have strong confirmation.

    About questions (4), (5) and (6), I really don't know. Please help ! !
    !

    --------

    The Standard says (
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf chapter 6.5
    ):

    An object shall have its stored value accessed only by an lvalue
    expression that has one of
    the following types:
    - a type compatible with the effective type of the object,
    - a qualified version of a type compatible with the effective type
    of the object,
    - a type that is the signed or unsigned type corresponding to the
    effective type of the object,
    - a type that is the signed or unsigned type corresponding to a
    qualified version of the effective type of the object,
    - an aggregate or union type that includes one of the aforementioned
    types among its members
    (including, recursively, a member of a subaggregate or contained
    union), or
    - a character type.


    ***** Question (1) *****

    Let's have two struct having different tag names, like:

    struct s1 {int i;};
    struct s2 {int i;};

    struct s1 *p1;
    struct s2 *p2;

    The compiler is free to assume that p1 and p2 point to different memory
    locations and don't alias.
    Two struct having different names are considered to be different types.

    In the standard, we read the wording "effective type of the object"
    many times.

    This "effective type of the object" may be an "int", "double", etc, but
    may also be a "struct" type, right ???

    And I suppose it may also be an "array" type or an "union" type as
    well, is it correct ???


    ***** Question (2) *****

    In the little program that follows, the line "printf("%d\n", *x);"
    normally returns 123,
    but an optimizing compiler can return garbage instead of 123.
    Is my reasoning correct ???

    On the other side, the line "printf("%d\n", p1->i);" always returns 999
    as expected, right ???

    ----

    #include <stdio.h>
    #include <stdlib.h>

    struct s1 { int i; double f; };


    int main(void)
    {
    struct s1* p1;
    int* x;

    p1 = malloc(sizeof(*p1));
    p1->i = 123; // object of type 'struct s1' contains 123

    x = &(p1->i);

    printf("%d\n", *x); // I try to access a value stored in an
    object of type 'struct s1'
    // through *x which is of type 'int'.
    // I think this is not allowed by the
    standard !

    *x = 999; // I store 999 in *x, which is of type 'int'

    printf("%d\n", p1->i); // I access a value stored in *x which is of
    type 'int'
    // by *p1 ( as p1->i is a shortcut for
    (*p1).i )
    // which is of type 'struct s1',
    // but contains a member of type 'int'.
    // I think this is allowed by the standard.


    return 0;
    }


    ***** Question (3) *****

    The Standard forbids ( if I am not mistaken ) pointer of type "struct A
    *" to access data written by a pointer of type "struct B *", as the are
    different types.

    This means that the common usage of faking inheritance in C like in
    this code sniplet is now utterly wrong, is it correct ???


    --- myfile.c ---

    #include <stdio.h>
    #include <stdlib.h>

    typedef enum { RED, BLUE, GREEN } Color;

    struct Point { int x;
    int y;
    };

    struct Color_Point { int x;
    int y;
    Color color;
    };

    struct Color_Point2{ struct Point point;
    Color color;
    };

    int main(int argc, char* argv[])
    {

    struct Point* p;

    struct Color_Point* my_color_point = malloc(sizeof(struct
    Color_Point));
    my_color_point->x = 10;
    my_color_point->y = 20;
    my_color_point->color = GREEN;

    p = (struct Point*)my_color_point;

    printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
    a "struct Color_Point" object using a "struct Point*" pointer is
    forbidden by the Standard ???


    struct Color_Point2* my_color_point2 = malloc(sizeof(struct
    Color_Point2));
    my_color_point2->point.x = 100;
    my_color_point2->point.y = 200;
    my_color_point2->color = RED;

    p = (struct Point*)my_color_point2;

    printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
    a "struct Color_Point2" object using a "struct Point*" pointer is
    forbidden by the Standard ???


    p = &my_color_point2->point;

    printf("x:%d, y:%d\n", p->x, p->y); // but this is correct, right ???


    return 0;
    }


    Is the line "p = (struct Point*)my_color_point" also a case of what is
    called "type-punning" ???


    ***** Question (4) *****

    In the Standard, chapter 6.5.2.3, it is written:

    One special guarantee is made in order to simplify the use of unions:
    if a union contains
    several structures that share a common initial sequence (see below),
    and if the union
    object currently contains one of these structures, it is permitted to
    inspect the common
    initial part of any of them anywhere that a declaration of the complete
    type of the union is
    visible. Two structures share a common initial sequence if
    corresponding members have
    compatible types (and, for bit-fields, the same widths) for a sequence
    of one or more
    initial members.

    I find this statement completely obscure.

    Let's have:

    struct s1 {int i;};
    struct s2 {int i;};

    struct s1 *p1;
    struct s2 *p2;

    A compiler is free to assume that *p1 and *p2 don't alias.

    If we just put a union declaration like this before this code, then it
    acts like a flag to the compiler, indicating that pointers to "struct
    s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
    to the same location.

    union p1_p2_alias_flag { struct s1 st1;
    struct s2 st2;
    };

    There is no need to use "union p1_p2_alias_flag" for accessing data,
    and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
    anywhere else.
    I mean, it is possible to access data using directly p1 and p2.

    Do you agree, everybody ???


    ***** Question (5) *****

    This question is really hard.

    Let's have this code sniplet:

    ---------
    #include <stdio.h>

    int main (void)
    {

    struct s1 {int i;
    };

    struct s1 s = {77};

    unsigned char* x = (unsigned char*)&s;
    printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);
    // Standard says data stored in "struct s1" type can be read by pointer
    to "char"

    x[0] = 100; // here, I write data in "char" objects !!!
    x[1] = 101;
    x[2] = 102;
    x[3] = 103;

    printf("%d\n", s.i); // but data stored in "char" objects cannot be
    read by pointer to "struct s1" ???

    return 0;
    }
    -----------

    For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2],
    (int)x[3]);", I can rewrite the Standard clause like this:

    An object [ here, s of type "struct s1" ] shall have its stored value
    accessed only by an lvalue expression that has one of
    the following types:
    [ blah blah blah ]
    - a character type [ in our example, x[0], x[1], x[2], x[3] ]. //
    it is our case, so everything is OK so far !


    But what about the line "printf("%d\n", s.i);" ??????
    I read the Standard again and again, but I cannot express how is can
    work.
    If I rewrite the Standard clause, it gives:

    An object [ in our example, x[0], x[1], x[2], and x[3] ] shall have its
    stored value accessed only by an lvalue expression that has one of
    the following types:
    - a type compatible with the effective type of the object, [ this is
    not our case ]
    - a qualified version of a type compatible with the effective type
    of the object, [ still not our case ]
    - a type that is the signed or unsigned type corresponding to the
    effective type of the object, [ still not our case ]
    - a type that is the signed or unsigned type corresponding to a
    qualified version of the effective type of the object, [ still not our
    case ]
    - an aggregate or union type that includes one of the aforementioned
    types among its members [ we read through "s" which is of type "struct
    s1", but it does not contain a member of type "char" ]
    (including, recursively, a member of a subaggregate or contained
    union), or
    - a character type. [ definitely not our case ]

    We see that none of these conditions applies in our case.

    Where is the flaw in my reasoning ???
    Does the last "printf" line of this code sniplet work or not ??? and
    why ???


    ***** Question (6) *****

    I often see this code used with socket programming:

    struct sockaddr_in my_addr;
    ...
    bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));

    The function bind(...) needs a pointer to "struct sockaddr", but
    my_addr is a "struct sockaddr_in".
    So, in my opinion, the function bind is not guaranteed to access safely
    the content of object my_addr.

    Someone knows why this code is not broken ( or if it is ) ???
    , Oct 13, 2005
    #1
    1. Advertising

  2. In article <>,
    wrote:

    > ***** Question (2) *****
    >
    > In the little program that follows, the line "printf("%d\n", *x);"
    > normally returns 123,
    > but an optimizing compiler can return garbage instead of 123.
    > Is my reasoning correct ???
    >
    > On the other side, the line "printf("%d\n", p1->i);" always returns 999
    > as expected, right ???
    >
    > ----
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    >
    > struct s1 { int i; double f; };
    >
    >
    > int main(void)
    > {
    > struct s1* p1;
    > int* x;
    >
    > p1 = malloc(sizeof(*p1));
    > p1->i = 123; // object of type 'struct s1' contains 123
    >
    > x = &(p1->i);
    >
    > printf("%d\n", *x); // I try to access a value stored in an
    > object of type 'struct s1'
    > // through *x which is of type 'int'.
    > // I think this is not allowed by the
    > standard !
    >
    > *x = 999; // I store 999 in *x, which is of type 'int'
    >
    > printf("%d\n", p1->i); // I access a value stored in *x which is of
    > type 'int'
    > // by *p1 ( as p1->i is a shortcut for
    > (*p1).i )
    > // which is of type 'struct s1',
    > // but contains a member of type 'int'.
    > // I think this is allowed by the standard.
    >
    >
    > return 0;
    > }


    This is all ok. The only unusual thing with structs is that there can be
    padding, and that storing into any struct member could modify any
    padding in the struct. If there is padding between int i and double f,
    then p1->i = 123 could modify the padding, while *x = 999 couldn't.


    > ***** Question (3) *****
    >
    > The Standard forbids ( if I am not mistaken ) pointer of type "struct A
    > *" to access data written by a pointer of type "struct B *", as the are
    > different types.
    >
    > This means that the common usage of faking inheritance in C like in
    > this code sniplet is now utterly wrong, is it correct ???
    >
    >
    > --- myfile.c ---
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    >
    > typedef enum { RED, BLUE, GREEN } Color;
    >
    > struct Point { int x;
    > int y;
    > };
    >
    > struct Color_Point { int x;
    > int y;
    > Color color;
    > };
    >
    > struct Color_Point2{ struct Point point;
    > Color color;
    > };
    >
    > int main(int argc, char* argv[])
    > {
    >
    > struct Point* p;
    >
    > struct Color_Point* my_color_point = malloc(sizeof(struct
    > Color_Point));
    > my_color_point->x = 10;
    > my_color_point->y = 20;
    > my_color_point->color = GREEN;
    >
    > p = (struct Point*)my_color_point;


    This is undefined behavior. There is no guarantee that my_color_point is
    correctly aligned for a pointer of type (struct Point *).

    > printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
    > a "struct Color_Point" object using a "struct Point*" pointer is
    > forbidden by the Standard ???


    Yes. There is an exception: If the compiler has seen a declaration of a
    union with members of type "struct Point" and "struct Color_Point", then
    accessing the common members initial members of both structs is legal;
    even writing to a member of one struct and reading as a member of
    another struct.

    > struct Color_Point2* my_color_point2 = malloc(sizeof(struct
    > Color_Point2));
    > my_color_point2->point.x = 100;
    > my_color_point2->point.y = 200;
    > my_color_point2->color = RED;
    >
    > p = (struct Point*)my_color_point2;


    Yes, you can always cast a pointer to struct to a pointer of the first
    member.

    > printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
    > a "struct Color_Point2" object using a "struct Point*" pointer is
    > forbidden by the Standard ???


    That's fine.

    > p = &my_color_point2->point;
    >
    > printf("x:%d, y:%d\n", p->x, p->y); // but this is correct, right ???
    >
    >
    > return 0;
    > }



    > Is the line "p = (struct Point*)my_color_point" also a case of what is
    > called "type-punning" ???
    >
    >
    > ***** Question (4) *****
    >
    > In the Standard, chapter 6.5.2.3, it is written:
    >
    > One special guarantee is made in order to simplify the use of unions:
    > if a union contains
    > several structures that share a common initial sequence (see below),
    > and if the union
    > object currently contains one of these structures, it is permitted to
    > inspect the common
    > initial part of any of them anywhere that a declaration of the complete
    > type of the union is
    > visible. Two structures share a common initial sequence if
    > corresponding members have
    > compatible types (and, for bit-fields, the same widths) for a sequence
    > of one or more
    > initial members.
    >
    > I find this statement completely obscure.
    >
    > Let's have:
    >
    > struct s1 {int i;};
    > struct s2 {int i;};
    >
    > struct s1 *p1;
    > struct s2 *p2;
    >
    > A compiler is free to assume that *p1 and *p2 don't alias.


    Exactly.

    > If we just put a union declaration like this before this code, then it
    > acts like a flag to the compiler, indicating that pointers to "struct
    > s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
    > to the same location.
    >
    > union p1_p2_alias_flag { struct s1 st1;
    > struct s2 st2;
    > };
    >
    > There is no need to use "union p1_p2_alias_flag" for accessing data,
    > and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
    > anywhere else.
    > I mean, it is possible to access data using directly p1 and p2.


    Yes, that is right.


    > ***** Question (5) *****
    >
    > This question is really hard.
    >
    > Let's have this code sniplet:
    >
    > ---------
    > #include <stdio.h>
    >
    > int main (void)
    > {
    >
    > struct s1 {int i;
    > };
    >
    > struct s1 s = {77};
    >
    > unsigned char* x = (unsigned char*)&s;
    > printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);
    > // Standard says data stored in "struct s1" type can be read by pointer
    > to "char"


    That is if sizeof (int) >= 4, which is nowhere guaranteed.


    > x[0] = 100; // here, I write data in "char" objects !!!
    > x[1] = 101;
    > x[2] = 102;
    > x[3] = 103;
    >
    > printf("%d\n", s.i); // but data stored in "char" objects cannot be
    > read by pointer to "struct s1" ???


    Assuming that sizeof (int) == 4, you have changed exactly every bit in
    the representation of x. If the representation is not a trap
    representation, you are fine. And it is even ok if for example the
    result after storing three bytes, combined with the last remaining byte
    of the number 77 were a trap representation, because you never access
    that value.



    > return 0;
    > }





    > For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2],
    > (int)x[3]);", I can rewrite the Standard clause like this:
    >
    > An object [ here, s of type "struct s1" ] shall have its stored value
    > accessed only by an lvalue expression that has one of
    > the following types:
    > [ blah blah blah ]
    > - a character type [ in our example, x[0], x[1], x[2], x[3] ]. //
    > it is our case, so everything is OK so far !
    >
    >
    > But what about the line "printf("%d\n", s.i);" ??????
    > I read the Standard again and again, but I cannot express how is can
    > work.


    If the bytes stored are a valid representation of an int, then that is
    what it prints. If not, it is undefined behavior. A specific compiler
    might guarantee that int's have no trap representations.

    > ***** Question (6) *****
    >
    > I often see this code used with socket programming:
    >
    > struct sockaddr_in my_addr;
    > ...
    > bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));
    >
    > The function bind(...) needs a pointer to "struct sockaddr", but
    > my_addr is a "struct sockaddr_in".
    > So, in my opinion, the function bind is not guaranteed to access safely
    > the content of object my_addr.
    >
    > Someone knows why this code is not broken ( or if it is ) ???


    Depends on the declarations of the types involved. And remember that the
    C Standard is not the only standard. For example, C Standard doesn't
    guarantee that 'a' + 1 == 'b', but if your C implementation uses ASCII
    or Unicode for its character set, then the ASCII standard or the Unicode
    standard would give you that guarantee.

    In your case, it could be that POSIX guarantees that the code is
    correct. So it will work on any implementation that conforms to the
    POSIX standard (no matter whether it conforms to the C Standard or not),
    even though it might not work on an implementation that conforms to the
    C Standard but not to POSIX.
    Christian Bau, Oct 13, 2005
    #2
    1. Advertising

  3. Jack Klein Guest

    On 13 Oct 2005 07:39:48 -0700, wrote in
    comp.lang.c:

    >
    > I try to understand strict aliasing rules that are in the C Standard.
    > As gcc applies these rules by default, I just want to be sure to
    > understand fully this issue.
    >
    > For questions (1), (2) and (3), I think that the answers are all "yes",
    > but I would be glad to have strong confirmation.
    >
    > About questions (4), (5) and (6), I really don't know. Please help ! !
    > !
    >
    > --------
    >
    > The Standard says (
    > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf chapter 6.5
    > ):
    >
    > An object shall have its stored value accessed only by an lvalue
    > expression that has one of
    > the following types:
    > - a type compatible with the effective type of the object,
    > - a qualified version of a type compatible with the effective type
    > of the object,
    > - a type that is the signed or unsigned type corresponding to the
    > effective type of the object,
    > - a type that is the signed or unsigned type corresponding to a
    > qualified version of the effective type of the object,
    > - an aggregate or union type that includes one of the aforementioned
    > types among its members
    > (including, recursively, a member of a subaggregate or contained
    > union), or
    > - a character type.
    >
    >
    > ***** Question (1) *****
    >
    > Let's have two struct having different tag names, like:
    >
    > struct s1 {int i;};
    > struct s2 {int i;};
    >
    > struct s1 *p1;
    > struct s2 *p2;
    >
    > The compiler is free to assume that p1 and p2 point to different memory
    > locations and don't alias.
    > Two struct having different names are considered to be different types.
    >
    > In the standard, we read the wording "effective type of the object"
    > many times.
    >
    > This "effective type of the object" may be an "int", "double", etc, but
    > may also be a "struct" type, right ???
    >
    > And I suppose it may also be an "array" type or an "union" type as
    > well, is it correct ???


    Yes.

    > ***** Question (2) *****
    >
    > In the little program that follows, the line "printf("%d\n", *x);"
    > normally returns 123,
    > but an optimizing compiler can return garbage instead of 123.


    No, an optimizing compiler must still output "123" for this line.

    > Is my reasoning correct ???
    >
    > On the other side, the line "printf("%d\n", p1->i);" always returns 999
    > as expected, right ???
    >
    > ----
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    >
    > struct s1 { int i; double f; };
    >
    >
    > int main(void)
    > {
    > struct s1* p1;
    > int* x;
    >
    > p1 = malloc(sizeof(*p1));
    > p1->i = 123; // object of type 'struct s1' contains 123
    >
    > x = &(p1->i);
    >
    > printf("%d\n", *x); // I try to access a value stored in an
    > object of type 'struct s1'
    > // through *x which is of type 'int'.
    > // I think this is not allowed by the
    > standard !


    The effective type of *p1 is 'struct s1'. The effective type of s1.i
    is 'int'. 'x' is a pointer to int, and you have initialized it with a
    pointer to an int. This is perfectly legal.

    Since the int contains the value 123, and 'x' quite properly points to
    that int, *x must retrieve the int value 123. It can't do anything
    else.

    > *x = 999; // I store 999 in *x, which is of type 'int'
    >
    > printf("%d\n", p1->i); // I access a value stored in *x which is of
    > type 'int'
    > // by *p1 ( as p1->i is a shortcut for
    > (*p1).i )
    > // which is of type 'struct s1',
    > // but contains a member of type 'int'.
    > // I think this is allowed by the standard.
    >
    >
    > return 0;
    > }
    >
    >
    > ***** Question (3) *****
    >
    > The Standard forbids ( if I am not mistaken ) pointer of type "struct A
    > *" to access data written by a pointer of type "struct B *", as the are
    > different types.
    >
    > This means that the common usage of faking inheritance in C like in
    > this code sniplet is now utterly wrong, is it correct ???
    >
    >
    > --- myfile.c ---
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    >
    > typedef enum { RED, BLUE, GREEN } Color;
    >
    > struct Point { int x;
    > int y;
    > };
    >
    > struct Color_Point { int x;
    > int y;
    > Color color;
    > };
    >
    > struct Color_Point2{ struct Point point;
    > Color color;
    > };
    >
    > int main(int argc, char* argv[])
    > {
    >
    > struct Point* p;
    >
    > struct Color_Point* my_color_point = malloc(sizeof(struct
    > Color_Point));
    > my_color_point->x = 10;
    > my_color_point->y = 20;
    > my_color_point->color = GREEN;
    >
    > p = (struct Point*)my_color_point;
    >
    > printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in


    This is undefined behavior, pure and simple. It works on many
    implementations, but is not guaranteed at all.

    [snip]

    > Is the line "p = (struct Point*)my_color_point" also a case of what is
    > called "type-punning" ???


    Type punning is not a term defined by the standard, but I would say
    that the act of assigning the pointer via a cast is not type punning.
    Accessing a member of the foreign structure type through the pointer
    is.

    > ***** Question (4) *****
    >
    > In the Standard, chapter 6.5.2.3, it is written:
    >
    > One special guarantee is made in order to simplify the use of unions:
    > if a union contains
    > several structures that share a common initial sequence (see below),
    > and if the union
    > object currently contains one of these structures, it is permitted to
    > inspect the common
    > initial part of any of them anywhere that a declaration of the complete
    > type of the union is
    > visible. Two structures share a common initial sequence if
    > corresponding members have
    > compatible types (and, for bit-fields, the same widths) for a sequence
    > of one or more
    > initial members.
    >
    > I find this statement completely obscure.
    >
    > Let's have:
    >
    > struct s1 {int i;};
    > struct s2 {int i;};
    >
    > struct s1 *p1;
    > struct s2 *p2;
    >
    > A compiler is free to assume that *p1 and *p2 don't alias.
    >
    > If we just put a union declaration like this before this code, then it
    > acts like a flag to the compiler, indicating that pointers to "struct
    > s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
    > to the same location.
    >
    > union p1_p2_alias_flag { struct s1 st1;
    > struct s2 st2;
    > };
    >
    > There is no need to use "union p1_p2_alias_flag" for accessing data,
    > and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
    > anywhere else.
    > I mean, it is possible to access data using directly p1 and p2.


    It seems unlikely that a compiler could find a way to prevent it from
    working in general, even if the implementer tried, but such behavior
    would not render the compiler non-conforming.

    On the other hand, since your structure only contains a single member,
    and the first member always begins at the same address as the
    structure itself, this particular usage can't fail.

    Still, the behavior is undefined. Which means the language standard
    places no requirements on it at all.
    >
    > Do you agree, everybody ???
    >
    >
    > ***** Question (5) *****
    >
    > This question is really hard.
    >
    > Let's have this code sniplet:
    >
    > ---------
    > #include <stdio.h>
    >
    > int main (void)
    > {
    >
    > struct s1 {int i;
    > };
    >
    > struct s1 s = {77};
    >
    > unsigned char* x = (unsigned char*)&s;
    > printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);
    > // Standard says data stored in "struct s1" type can be read by pointer
    > to "char"
    >
    > x[0] = 100; // here, I write data in "char" objects !!!
    > x[1] = 101;
    > x[2] = 102;
    > x[3] = 103;


    The standard does not say that you can do this. You are assuming that
    sizeof(int) is at least 4, and there are implementations where that is
    not true. Accessing, let alone writing to, x[1], x[2], or x[3] might
    be outside the bounds of the int and the struct, producing undefined
    behavior.

    > printf("%d\n", s.i); // but data stored in "char" objects cannot be
    > read by pointer to "struct s1" ???
    >
    > return 0;
    > }


    No, the point is that accessing s.i, an int, after storing data into
    that memory using a different object type, is undefined. You might
    have created a bit pattern that does not represent a valid value for
    the int, called a trap representation.

    > -----------
    >
    > For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2],
    > (int)x[3]);", I can rewrite the Standard clause like this:
    >
    > An object [ here, s of type "struct s1" ] shall have its stored value
    > accessed only by an lvalue expression that has one of
    > the following types:
    > [ blah blah blah ]
    > - a character type [ in our example, x[0], x[1], x[2], x[3] ]. //
    > it is our case, so everything is OK so far !


    I have worked on a platform where sizeof(int) is 1, and several where
    sizeof(int) is 2. I have never worked on a platform where sizeof(int)
    is 3, but C allows it. On any of these platforms you would be
    invoking undefined behavior.

    > But what about the line "printf("%d\n", s.i);" ??????


    Even assuming that sizeof(int) >= 4 on your implementation, you have
    to understand that all types, other than unsigned char, can have trap
    representations, that is bit patterns that do not represent a valid
    value for the type. By writing arbitrary bit patterns into an int,
    you may have created an invalid bit pattern in that int. When you
    access that invalid bit pattern as an int, the behavior is undefined.

    > I read the Standard again and again, but I cannot express how is can
    > work.
    > If I rewrite the Standard clause, it gives:
    >
    > An object [ in our example, x[0], x[1], x[2], and x[3] ] shall have its
    > stored value accessed only by an lvalue expression that has one of
    > the following types:
    > - a type compatible with the effective type of the object, [ this is
    > not our case ]
    > - a qualified version of a type compatible with the effective type
    > of the object, [ still not our case ]
    > - a type that is the signed or unsigned type corresponding to the
    > effective type of the object, [ still not our case ]
    > - a type that is the signed or unsigned type corresponding to a
    > qualified version of the effective type of the object, [ still not our
    > case ]
    > - an aggregate or union type that includes one of the aforementioned
    > types among its members [ we read through "s" which is of type "struct
    > s1", but it does not contain a member of type "char" ]
    > (including, recursively, a member of a subaggregate or contained
    > union), or
    > - a character type. [ definitely not our case ]
    >
    > We see that none of these conditions applies in our case.


    The standard provides a specific list of what is allowed. Lists like
    this are always exhaustive. That means anything on the list is
    specifically undefined.

    > Where is the flaw in my reasoning ???


    There is no flaw in your reasoning, the code produces undefined
    behavior.

    > Does the last "printf" line of this code sniplet work or not ??? and
    > why ???


    There is no question of "work". Whatever it does is just as right or
    wrong as anything else that might happen as far as the language is
    concerned. That's what undefined behavior means. The C standard does
    not know or care what happens.

    > ***** Question (6) *****
    >
    > I often see this code used with socket programming:
    >
    > struct sockaddr_in my_addr;
    > ...
    > bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));
    >
    > The function bind(...) needs a pointer to "struct sockaddr", but
    > my_addr is a "struct sockaddr_in".
    > So, in my opinion, the function bind is not guaranteed to access safely
    > the content of object my_addr.
    >
    > Someone knows why this code is not broken ( or if it is ) ???


    That depends on the definition of 'struct sockaddr_in'. If its first
    member is a 'struct sockaddr', the code is legal and well defined
    because a pointer to a structure can always be converted to a pointer
    to its first member. If not, then the code produces undefined
    behavior if the called function actually uses the pointer to access
    members of a 'struct sockaddr'.

    You use terms like "broken" and "work", which do not really apply as
    far as undefined behavior in C is concerned. They are subjective
    terms at best. Code is "broken" if it does not do what you want, you
    consider it to "work" if it does. If it produces undefined behavior,
    it may "work" on one compiler but be "broken" on another, and both
    compilers can be standard conforming.

    --
    Jack Klein
    Home: http://JK-Technology.Com
    FAQs for
    comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
    comp.lang.c++ http://www.parashift.com/c -faq-lite/
    alt.comp.lang.learn.c-c++
    http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
    Jack Klein, Oct 14, 2005
    #3
  4. S.Tobias Guest

    Christian Bau <> wrote:
    > In article <>,
    > wrote:
    >

    [snip]
    >> ***** Question (5) *****
    >>
    >> This question is really hard.
    >>
    >> Let's have this code sniplet:
    >>
    >> ---------
    >> #include <stdio.h>
    >>
    >> int main (void)
    >> {
    >>
    >> struct s1 {int i;
    >> };
    >>
    >> struct s1 s = {77};
    >>
    >> unsigned char* x = (unsigned char*)&s;
    >> printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);
    >> // Standard says data stored in "struct s1" type can be read by pointer
    >> to "char"

    >
    > That is if sizeof (int) >= 4, which is nowhere guaranteed.
    >
    >
    >> x[0] = 100; // here, I write data in "char" objects !!!
    >> x[1] = 101;
    >> x[2] = 102;
    >> x[3] = 103;
    >>

    Let's suppose that we copy value from another int:
    int i = 42;
    unsigned char *y = (void*)&i;
    assert(sizeof(int) == 4);
    x[0] = y[0];
    //...etc.
    >> printf("%d\n", s.i); // but data stored in "char" objects cannot be
    >> read by pointer to "struct s1" ???


    Storing values through character lvalues did not change the effective
    type of the struct, or it's member, therefore it's okay (compiler must
    reread the value from memory).

    Effective type for declared objects is always the declared type.
    Effective type for allocated objects is the last imprinted by
    storing a value, by copying (memcpy, memmove, char array), or, if
    none, is the type of the lvalue it is accessed with.

    > Assuming that sizeof (int) == 4, you have changed exactly every bit in
    > the representation of x. If the representation is not a trap
    > representation, you are fine. And it is even ok if for example the
    > result after storing three bytes, combined with the last remaining byte
    > of the number 77 were a trap representation, because you never access
    > that value.


    (all agreed)

    [snip]
    >> For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2],
    >> (int)x[3]);", I can rewrite the Standard clause like this:
    >>
    >> An object [ here, s of type "struct s1" ] shall have its stored value
    >> accessed only by an lvalue expression that has one of
    >> the following types:
    >> [ blah blah blah ]
    >> - a character type [ in our example, x[0], x[1], x[2], x[3] ]. //
    >> it is our case, so everything is OK so far !
    >>
    >>
    >> But what about the line "printf("%d\n", s.i);" ??????
    >> I read the Standard again and again, but I cannot express how is can
    >> work.


    It means this: struct s1 object can be legally accessed with a character
    lvalue (including writing data to the struct). Since it's legal,
    the compiler must take it into consideration when later accessing
    struct s1. Either it can prove that character lvalues did not refer
    to the struct object, or it must re-read the struct value from memory.

    This is not the case with other types:
    assert(sizeof(int) == sizeof(short))
    int i = 42;
    short *ps = &i; //assume that alignment is the same
    *ps = 54; //this access is UB; since it is not legal to access int object
    //with short lvalue, compiler need not assume that object `i'
    //was actually changed
    printf("%d\n", i); //may print cached value 42
    //(the Std says it can do or not do virtually anything)

    For another example: when a value is stored through `short' lvalue,
    the compiler need not assume that `struct s1' object was changed,
    because `struct s1' does not contain a `short' member.

    --
    Stan Tobias
    mailx `echo LID | sed s/[[:upper:]]//g`
    S.Tobias, Oct 14, 2005
    #4
  5. S.Tobias Guest

    Christian Bau <> wrote:
    > In article <>,
    > wrote:


    >> ***** Question (4) *****
    >>
    >> In the Standard, chapter 6.5.2.3, it is written:
    >>
    >> One special guarantee is made in order to simplify the use of unions:
    >> if a union contains
    >> several structures that share a common initial sequence (see below),
    >> and if the union
    >> object currently contains one of these structures, it is permitted to
    >> inspect the common
    >> initial part of any of them anywhere that a declaration of the complete
    >> type of the union is
    >> visible. Two structures share a common initial sequence if
    >> corresponding members have
    >> compatible types (and, for bit-fields, the same widths) for a sequence
    >> of one or more
    >> initial members.
    >>
    >> I find this statement completely obscure.
    >>
    >> Let's have:
    >>
    >> struct s1 {int i;};
    >> struct s2 {int i;};
    >>
    >> struct s1 *p1;
    >> struct s2 *p2;
    >>
    >> A compiler is free to assume that *p1 and *p2 don't alias.

    >
    > Exactly.
    >

    What's more important: `p1->i' and `p2->i' don't alias, despite that they
    have the same type!

    However p1 and p2 _may_ point at the same object.
    ((char*)p1)[0] = 0;
    At this point the compiler cannot blindly assume that `*p2' wasn't modified.

    >> If we just put a union declaration like this before this code, then it
    >> acts like a flag to the compiler, indicating that pointers to "struct
    >> s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
    >> to the same location.


    (As I said above, they may point to the same location.)

    >>
    >> union p1_p2_alias_flag { struct s1 st1;
    >> struct s2 st2;
    >> };
    >>
    >> There is no need to use "union p1_p2_alias_flag" for accessing data,
    >> and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
    >> anywhere else.

    (I don't quite understand what you mean here.)
    >> I mean, it is possible to access data using directly p1 and p2.


    After the compiler sees the union declaration, it is obliged to assume
    that `p1->i' and `p2->i' may refer to (alias) the same object.
    (However, it still need not assume that expressions `*p1' and `*p2' alias
    the same object, since they are incompatible types).

    --
    Stan Tobias
    mailx `echo LID | sed s/[[:upper:]]//g`
    S.Tobias, Oct 14, 2005
    #5
  6. In article <> "S.Tobias" <> writes:
    ....
    > >> struct s1 {int i;};
    > >> struct s2 {int i;};
    > >>
    > >> struct s1 *p1;
    > >> struct s2 *p2;
    > >>
    > >> A compiler is free to assume that *p1 and *p2 don't alias.

    > >
    > > Exactly.


    With a caveat. It is free to assume that as long as nothing is assigned
    to either p1 or p2.

    > However p1 and p2 _may_ point at the same object.


    In that case the compiler can not assume that *p1 and *p2 don't alias.
    --
    dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
    home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
    Dik T. Winter, Oct 14, 2005
    #6
  7. S.Tobias Guest

    Dik T. Winter <> wrote:
    > In article <> "S.Tobias" <> writes:
    > ...
    > > >> struct s1 {int i;};
    > > >> struct s2 {int i;};
    > > >>
    > > >> struct s1 *p1;
    > > >> struct s2 *p2;
    > > >>
    > > >> A compiler is free to assume that *p1 and *p2 don't alias.

    [snip]
    > > However p1 and p2 _may_ point at the same object.

    >
    > In that case the compiler can not assume that *p1 and *p2 don't alias.


    I don't agree, otherwise aliasing rules would have no purpose.
    Since `*p1' and `*p2' have incompatible types, the compiler may assume
    (act as if) they don't refer to the same object, it doesn't have to prove
    that both pointers don't point at the same location.
    I believe that the compiler even needn't assume that these two alias
    the same object:
    *p1
    *(struct s2 *)p1
    The decision whether to alias or not to alias can be based on
    the type of lvalue (mainly).

    Can you give an example where `*p1' and `*p2' alias the same object
    while the behaviour is defined? (...And where the aliasing is actually
    relevant, eg.: `&*p1' and `&*p2' doesn't count.)
    Perhaps reading from allocated and separately initialized object, but
    this is not a situation when aliasing rules are very important.

    --
    Stan Tobias
    mailx `echo LID | sed s/[[:upper:]]//g`
    S.Tobias, Oct 14, 2005
    #7
  8. In article <> "S.Tobias" <> writes:
    > Dik T. Winter <> wrote:
    > > In article <> "S.Tobias" <> writes:
    > > ...
    > > > >> struct s1 {int i;};
    > > > >> struct s2 {int i;};
    > > > >>
    > > > >> struct s1 *p1;
    > > > >> struct s2 *p2;
    > > > >>
    > > > >> A compiler is free to assume that *p1 and *p2 don't alias.

    > [snip]
    > > > However p1 and p2 _may_ point at the same object.

    > >
    > > In that case the compiler can not assume that *p1 and *p2 don't alias.

    >
    > I don't agree,


    Sorry, I missed that p1 and p2 have different types. Indeed, p1 and p2
    _may_ point at the same object, but the only way to let that happen is
    by either undefined or implementation defined behaviour. So you were
    right.
    --
    dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
    home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
    Dik T. Winter, Oct 15, 2005
    #8
  9. Thad Smith Guest

    Christian Bau wrote:
    > In article <>,
    > wrote:


    >>--- myfile.c ---
    >>
    >>#include <stdio.h>
    >>#include <stdlib.h>
    >>
    >>typedef enum { RED, BLUE, GREEN } Color;
    >>
    >>struct Point { int x;
    >> int y;
    >> };
    >>
    >>struct Color_Point { int x;
    >> int y;
    >> Color color;
    >> };
    >>
    >>struct Color_Point2{ struct Point point;
    >> Color color;
    >> };
    >>
    >>int main(int argc, char* argv[])
    >>{
    >>
    >>struct Point* p;
    >>
    >>struct Color_Point* my_color_point = malloc(sizeof(struct
    >>Color_Point));
    >>my_color_point->x = 10;
    >>my_color_point->y = 20;
    >>my_color_point->color = GREEN;
    >>
    >>p = (struct Point*)my_color_point;

    >
    >
    > This is undefined behavior. There is no guarantee that my_color_point is
    > correctly aligned for a pointer of type (struct Point *).


    Doesn't the fact that the value of my_color_point was returned by malloc
    guarantee correct alignment?

    Thad
    Thad Smith, Oct 16, 2005
    #9
  10. In article <4351bc52$0$27308$>,
    Thad Smith <> wrote:

    > Christian Bau wrote:
    > > In article <>,
    > > wrote:

    >
    > >>--- myfile.c ---
    > >>
    > >>#include <stdio.h>
    > >>#include <stdlib.h>
    > >>
    > >>typedef enum { RED, BLUE, GREEN } Color;
    > >>
    > >>struct Point { int x;
    > >> int y;
    > >> };
    > >>
    > >>struct Color_Point { int x;
    > >> int y;
    > >> Color color;
    > >> };
    > >>
    > >>struct Color_Point2{ struct Point point;
    > >> Color color;
    > >> };
    > >>
    > >>int main(int argc, char* argv[])
    > >>{
    > >>
    > >>struct Point* p;
    > >>
    > >>struct Color_Point* my_color_point = malloc(sizeof(struct
    > >>Color_Point));
    > >>my_color_point->x = 10;
    > >>my_color_point->y = 20;
    > >>my_color_point->color = GREEN;
    > >>
    > >>p = (struct Point*)my_color_point;

    > >
    > >
    > > This is undefined behavior. There is no guarantee that my_color_point is
    > > correctly aligned for a pointer of type (struct Point *).

    >
    > Doesn't the fact that the value of my_color_point was returned by malloc
    > guarantee correct alignment?


    In this case, yes.

    If you use

    struct Color_Point* my_color_point = malloc(sizeof(struct
    Color_Point) * 2);
    ++my_color_point;
    my_color_point->x = 10;
    my_color_point->y = 20;
    my_color_point->color = GREEN;

    p = (struct Point*)my_color_point;

    you get undefined behavior.
    Christian Bau, Oct 16, 2005
    #10
  11. Old Wolf Guest

    Christian Bau wrote:
    > wrote:
    >
    >>
    >> struct Point { int x;
    >> int y;
    >> };
    >>
    >> struct Color_Point { int x;
    >> int y;
    >> Color color;
    >> };
    >> int main(int argc, char* argv[])
    >> {
    >>
    >> struct Point* p;
    >> p = (struct Point*)my_color_point;

    >
    > This is undefined behavior. There is no guarantee that
    > my_color_point is correctly aligned for a pointer of type
    > (struct Point *).


    I think all structs must have the same alignment requirements.
    However there is UB because one struct might have different
    padding to the other.
    Old Wolf, Oct 16, 2005
    #11
  12. Tim Rentsch Guest

    Jack Klein <> writes:

    > On 13 Oct 2005 07:39:48 -0700, wrote in
    > comp.lang.c:

    [snip]
    > > ***** Question (4) *****
    > >
    > > In the Standard, chapter 6.5.2.3, it is written:
    > >
    > > One special guarantee is made in order to simplify the use of unions:
    > > if a union contains
    > > several structures that share a common initial sequence (see below),
    > > and if the union
    > > object currently contains one of these structures, it is permitted to
    > > inspect the common
    > > initial part of any of them anywhere that a declaration of the complete
    > > type of the union is
    > > visible. Two structures share a common initial sequence if
    > > corresponding members have
    > > compatible types (and, for bit-fields, the same widths) for a sequence
    > > of one or more
    > > initial members.
    > >
    > > I find this statement completely obscure.
    > >
    > > Let's have:
    > >
    > > struct s1 {int i;};
    > > struct s2 {int i;};
    > >
    > > struct s1 *p1;
    > > struct s2 *p2;
    > >
    > > A compiler is free to assume that *p1 and *p2 don't alias.
    > >
    > > If we just put a union declaration like this before this code, then it
    > > acts like a flag to the compiler, indicating that pointers to "struct
    > > s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
    > > to the same location.
    > >
    > > union p1_p2_alias_flag { struct s1 st1;
    > > struct s2 st2;
    > > };
    > >
    > > There is no need to use "union p1_p2_alias_flag" for accessing data,
    > > and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
    > > anywhere else.
    > > I mean, it is possible to access data using directly p1 and p2.

    >
    > It seems unlikely that a compiler could find a way to prevent it from
    > working in general, even if the implementer tried, but such behavior
    > would not render the compiler non-conforming.
    >
    > On the other hand, since your structure only contains a single member,
    > and the first member always begins at the same address as the
    > structure itself, this particular usage can't fail.
    >
    > Still, the behavior is undefined. Which means the language standard
    > places no requirements on it at all.


    It isn't clear what behavior you think is undefined, since
    what is supposed to be executed is stated only approximately.
    However, let's consider a particular example:

    struct s1 {int i; int j;};
    struct s2 {int x; int y;};

    union p1_p2_alias_flag {
    struct s1 st1;
    struct s2 st2;
    };

    int
    affected_function( struct s1 *p1, struct s2 *p2 ){
    p1->j = 3;
    p2->y = 4;
    return p1->j;
    }


    There is no undefined behavior in 'affected_function'.
    Moreover, there are legal calls to the function that must
    return '4' as a value.

    Of course, it is possible to choose argument values (such as
    NULL) for calls to the function that result in undefined
    behavior; but the function must work for the legal cases
    when the two pointers point to the same address. And I
    think that's what the OP was asking about.
    Tim Rentsch, Oct 18, 2005
    #12
  13. Guest

    Thank you very much, all of you, for having taken the time to answer my
    quite confused questions.
    I understand now that my interpretation of the standard was totally
    wrong.

    For those who will have problems with these aliasing rules and will
    read this thread, this is my final interpretation of the standard.
    I hope this time, I have made no mistake ( but else, tell me ).

    The Standard says (
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf chapter 6.5
    ):

    An object shall have its stored value accessed only by an lvalue
    expression that has one of the following types:
    - a type compatible with the effective type of the object,
    - a qualified version of a type compatible with the effective type of
    the object,
    - a type that is the signed or unsigned type corresponding to the
    effective type of the object,
    - a type that is the signed or unsigned type corresponding to a
    qualified version of the effective type of the object,
    - an aggregate or union type that includes one of the aforementioned
    types among its members
    (including, recursively, a member of a subaggregate or contained
    union), or
    - a character type.

    The wording of these rules is not very clear, but this is my tentative
    of explanation.

    Vocabulary:
    an object is a memory location.
    an aggregate is a struct or an array.
    a character type can be "char", "signed char", or "unsigned char".


    Let's have this code:

    struct s1 {int i; double d;};
    struct s2 {int i; double d;};
    // struct s1 and struct s2 are different types, because their tag
    names s1 and s2 are different.

    int *pi;
    struct s1 *p1;
    struct s1 *p1_a;
    struct s2 *p2;


    1) The "objects" which are mentioned in the Standard are really just
    memory locations.
    So far, there is NO NOTION OF POINTERS at all.
    Pointers are just a means of accessing the objects, no more.
    You just take a sheet of paper ( which represents your computer's
    memory ), and draw rectangles symbolizing all the objects you work with
    in your code.
    Let's suppose that in our computer memory, we have one location
    containing an int, three instances of struct s1 and two of struct s2.


    You should obtain something like this ( rectangles are represented
    here by pairs of brackets [...] ) :

    [int]
    [ struct s1 [int] [double] ]
    [ struct s1 [int] [double] ]
    [ struct s1 [int] [double] ]
    [ struct s2 [int] [double] ]
    [ struct s2 [int] [double] ]

    So far, we have a visual representation of all the object we work
    with.
    We have 16 objects on your paper:
    - one "int" object
    - three "struct s1" objects
    - each struct s1 objet contains an "int" object
    - each struct s1 objet contains a "double" object
    - two "struct s2" objects
    - each struct s2 objet contains an "int" object
    - each struct s2 objet contains a "double" object

    For accessing these objects, we use these pointers in our code:

    pi, p1, p1_a, p2

    Our work will be now to find for each object (=location) which
    pointers may access it.


    2) I take the visual representation hereabove, and I just write (obj1)
    (obj2) ... to represent the objects so that I can explain more easily.

    [int (obj1)]
    [ struct s1 (obj2) [int (obj3)] [double (obj4)] ]
    [ struct s1 (obj5) [int (obj6)] [double (obj7)] ]
    [ struct s1 (obj8) [int (obj9)] [double (obj10)] ]
    [ struct s2 (obj11) [int (obj12)] [double (obj13)] ]
    [ struct s2 (obj14) [int (obj15)] [double (obj16)] ]

    Now, let's take each location one after another and see which
    pointers may also access them.

    The object (=memory location) obj1 is of type "int".
    It can be accessed (= read or modified) by *pi, which is a lvalue of
    type "int".
    It can also be accessed by p1->i, which is a shortcut for (*p1).i,
    and *p1 is of type "struct s1 containing "int" as a member".
    It can similarly be accessed by p1_a->i and p2->i.

    The object obj2 is of type "struct s1".
    It can be accessed by *p1_a which is also of type "struct s1".
    It cannot be accessed by *pi which is of type "int".
    It cannot be accessed by *p2 which is of type "struct s2".

    The object obj3 is of type "int".
    It can be accessed by *pi which is of type "int".
    It can be accessed by p1->i, which is a shortcut for (*p1).i, and
    *p1 is of type "struct s1 containing "int" as a member".
    The same way, it can be accessed by p1_a->i.
    But it cannot be accessed by p2->i, which is a shorcut for (*p2).i,
    because *p2 is of type "struct s2 containing "int" as a member"."
    It is not explicitly mentioned in the standard, but if access is
    done through a struct, its type must match the type of the container of
    the object we want to access.

    We can do similar analysis for all the remaining locations obj4,
    obj5 ...

    Just one word about my misunderstanding of the Standard as I first read
    it.
    At first, I tried to find directly if two pointers may alias, but I was
    the wrong way to do and leads to a dead end.
    I understand now that it is easier to think first about MEMORY
    LOCATIONS (=objects), AND ONLY THEN think about which pointers may
    access this location, by seeing if they comply with the rules of the
    Standard, as I just did hereabove.
    This gives for each location a set of pointers that may access it, and
    the compiler considers each of these sets as pointers that may alias
    and access the same object.
    This way, the Standard becomes more readable and logical.

    In practice, the problem is often not to do this thorough analysis for
    each object in memory.
    It is more of the kind "I work with this object, can I access it with
    this pointer ? and can I also access it with this other pointer ?".
    "In particular, if I write data in this object using this pointer, can
    this other pointer read these data ?"


    *** about type-punning ***

    double d = 1.234;
    int* i = &d;

    printf("%d\n", *i); // WRONG

    1.234 is stored in an object of type "double".
    We try to access it through *i, which is of type "int".
    The result is undefined.

    If you want to inspect the content of d ( assuming that a double is 4
    bytes long and beeing aware about possible trap representations ), you
    can do this:
    unsigned char* c = (unsigned char*)&d;
    and you can access the data with c[0], c[1], c[2] and c[3].


    *** about pointer to char ***

    Besides, don't forget that as the Standard rule says, a pointer to char
    can access any object of any type !
    When the location referenced by a pointer to char is updated, the
    compiler must assume that any data stored in any type may have been
    modified.

    But don't think that this kind of code allows you to bypass the
    aliasing rules:

    struct A *a;
    struct B *b;

    b = (struct B*)(char*)a;

    This won't make "*b" able to access data in "struct A", because "*b" is
    of type "struct B".
    It is the type of the dereferenced pointer that matters. The
    intermediate casting to "char*" is thus totally useless and won't give
    "b" more access possibilities.


    *** about inheritance ***

    struct Point { int x;
    int y;
    };


    struct Color_Point { int x;
    int y;
    Color color;
    };


    struct Color_Point2{ struct Point point;
    Color color;
    };

    struct Point* p;
    struct Color_Point* my_color_point;
    struct Color_Point2* my_color_point2;

    my_color_point = malloc(sizeof(struct Color_Point));
    my_color_point2 = malloc(sizeof(struct Color_Point2));

    p = (struct Point*)my_color_point; // WRONG
    // *p, which is of type "struct Point", cannot access data stored at
    location *my_color_point, which is an object of type "struct
    Color_Point".

    p = &my_color_point2->point; // GOOD
    // *p, which is of type "struct Point", can access data stored at
    location (*my_color_point2).point, which is also of type "struct
    Point".

    p = (struct Point*)my_color_point2; // GOOD
    // *p, which is of type "struct Point", can access data stored at
    location (*my_color_point2).point, which is also of type "struct
    Point".
    // We see that in fact, this is exactly the same case as the
    previous one !
    // C gives the guarantee that we can cast the pointer to a struct to
    the type of its first member, it gives a pointer to this first member
    object.
    // Just notice that this guarantee is about alignment, and that the
    fact that we can access data stored in an object is granted to us by
    the aliasing rules, exactly as in the previous example.


    *** final word ***

    When working with pointers, there seems to be no need to cast pointers.
    ( I don't speak here of casting objects, like casting a "double" to an
    "int" for instance, which is of course allowed.
    It is casting pointers, like "double*" to "int*" or "struct s1*" to
    "struct s2*" which is dangerous. )
    In fact, every time a pointer is cast to point to a different type, the
    alias rules interfere and lead to undefined behaviour.
    So, to avoid any aliasing problem, the best way seems never to cast
    pointers, with these two exceptions:

    a) cast a pointer to char*, so that it can access the byte
    representation of the object ( cast to unsigned char* is best ), as
    allowed by the aliasing rules.

    b) cast of a pointer to struct to a pointer to its first member type,
    like in the last example "p = (struct Point*)my_color_point2;".
    ( but this one is not really necessary, as we can just pass the
    address of the first member as in the last example "p =
    &my_color_point2->point;", so that a cast is avoided ).


    As for pointers to void, such as those returned by malloc, there is no
    need to cast them, as pointers to void may be assigned to and from
    pointers to any type.


    Any suggestion about something I could have missed or misunderstood ?


    Best regards
    , Oct 19, 2005
    #13
  14. Netocrat Guest

    On Wed, 19 Oct 2005 00:17:33 -0700, nicolas.riesch wrote:

    A few corrections but generally what you wrote was accurate.

    > struct s1 {int i; double d;};
    > struct s2 {int i; double d;};
    > // struct s1 and struct s2 are different types, because their tag
    > names s1 and s2 are different.


    They are of the same "effective type".

    ....
    > struct s2 *p2;

    ....
    > [int (obj1)]
    > [ struct s1 (obj2) [int (obj3)] [double (obj4)] ]

    ....
    > The object obj2 is of type "struct s1".

    ....
    > It cannot be accessed by *p2 which is of type "struct s2".


    Actually it can be, since s2 has "a type compatible with the effective
    type of" s1.

    > The object obj3 is of type "int".

    ....
    > But it cannot be accessed by p2->i, which is a shorcut for (*p2).i,
    > because *p2 is of type "struct s2 containing "int" as a member"."


    Same applies here.

    [...]
    > *** about inheritance ***
    >
    > struct Point { int x;
    > int y;
    > };
    >
    >
    > struct Color_Point { int x;
    > int y;
    > Color color;
    > };
    >
    >
    > struct Color_Point2{ struct Point point;
    > Color color;
    > };
    >
    > struct Point* p;
    > struct Color_Point* my_color_point;
    > struct Color_Point2* my_color_point2;
    >
    > my_color_point = malloc(sizeof(struct Color_Point)); my_color_point2 =
    > malloc(sizeof(struct Color_Point2));


    Your analysis being based on malloc'd memory is flawed - malloc'd memory
    is properly aligned for any object and until it is written to, it has no
    effective type. So let's assume instead that you'd caused the pointers to
    reference static objects of the type that they point to or that your code
    has written such an object into the malloc'd memory to establish its
    effective type.

    > p = (struct Point*)my_color_point; // WRONG // *p, which is of type
    > "struct Point", cannot access data stored at location *my_color_point,
    > which is an object of type "struct Color_Point".


    But this code is not accessing data, it's setting a pointer. Since struct
    Color_Point's initial elements are those of struct Point in the same
    order, it can't have stricter alignment requirements. There's nothing
    wrong with the code.

    With the above assumption that my_color_point points to an object with the
    effective type struct Color_Point, it is "wrong" to try to access p->y,
    but not to access p->x. This is because the first member of a structure
    is always at the same (initial, unpadded) location, but the second member
    may be preceded by an arbitrary amount of padding. In practice it's
    unlikely that a compiler that would precede y with different amounts of
    padding in each structure type, but in theory it's possible.

    [...]
    > the best way seems never to cast pointers, with these two exceptions:

    [to char* so as to access an object's bytes; to a pointer type compatible
    with the first member(s) of a struct]

    That's good practice (there are occasional other exceptions).

    [...]
    --
    http://members.dodo.com.au/~netocrat
    Netocrat, Oct 19, 2005
    #14
  15. S.Tobias Guest

    wrote:
    > An object shall have its stored value accessed only by an lvalue
    > expression that has one of the following types:
    > - a type compatible with the effective type of the object,
    > - a qualified version of a type compatible with the effective type of
    > the object,
    > - a type that is the signed or unsigned type corresponding to the
    > effective type of the object,
    > - a type that is the signed or unsigned type corresponding to a
    > qualified version of the effective type of the object,
    > - an aggregate or union type that includes one of the aforementioned
    > types among its members
    > (including, recursively, a member of a subaggregate or contained
    > union), or

    This one means that an object of type int may be accessed through
    a (bigger) struct type that contains an int member.
    struct s { double d; int i; };
    void f(int *pi, struct s *ps)
    {
    *pi;
    *ps = /*...*/;
    *pi; /* must be re-read from memory */
    }
    > - a character type.


    Yes, but remember that this is not the whole story. They are type-based
    aliasing rules, and there are other (expression-based) rules, too.

    For example:
    struct sx { int x; } *px;
    struct sy { int y; } *py;
    void *pv = malloc(...);
    px = pv; py = pv;
    *px = ...;
    py->y; //BAD, object does not have `y' member

    Example 2:
    int ai[2][2];
    ai[0][2]; //BAD, this is not the same as ai[1][1]

    ....
    > struct s1 {int i; double d;};
    > struct s2 {int i; double d;};
    > // struct s1 and struct s2 are different types, because their tag
    > names s1 and s2 are different.
    >
    > int *pi;
    > struct s1 *p1;
    > struct s1 *p1_a;
    > struct s2 *p2;



    > For accessing these objects, we use these pointers in our code:
    > pi, p1, p1_a, p2
    >
    > Our work will be now to find for each object (=location) which
    > pointers may access it.
    >

    Your hooked onto a bad terminology. What matters is the type of
    lvalue. Lvalue is like a window though which you access an object,
    a pointer is like an arrow. You don't access objects with pointers.
    Pointers merely may be part of expressions that eventually may
    be lvalues. Objects are not locations, but are memory ranges.


    > [int (obj1)]
    > [ struct s1 (obj2) [int (obj3)] [double (obj4)] ]
    > [ struct s1 (obj5) [int (obj6)] [double (obj7)] ]
    > [ struct s1 (obj8) [int (obj9)] [double (obj10)] ]
    > [ struct s2 (obj11) [int (obj12)] [double (obj13)] ]
    > [ struct s2 (obj14) [int (obj15)] [double (obj16)] ]
    >
    > Now, let's take each location one after another and see which
    > pointers may also access them.
    >
    > The object (=memory location) obj1 is of type "int".
    > It can be accessed (= read or modified) by *pi, which is a lvalue of
    > type "int".

    Yes.
    > It can also be accessed by p1->i, which is a shortcut for (*p1).i,
    > and *p1 is of type "struct s1 containing "int" as a member".

    No, `obj1' doesn't have member `i' (in fact, it's not a struct at all).
    > It can similarly be accessed by p1_a->i and p2->i.

    Idem.


    > The object obj2 is of type "struct s1".
    > It can be accessed by *p1_a which is also of type "struct s1".

    Right.
    > It cannot be accessed by *pi which is of type "int".

    It can. One of its member (obj3) is `int' type, so that one can
    be accessed, which means that the containing object can be accessed
    as well (when you access a member, you access the whole object, too).

    > It cannot be accessed by *p2 which is of type "struct s2".

    Indeed, it can't.

    >
    > The object obj3 is of type "int".
    > It can be accessed by *pi which is of type "int".
    > It can be accessed by p1->i, which is a shortcut for (*p1).i, and
    > *p1 is of type "struct s1 containing "int" as a member".
    > The same way, it can be accessed by p1_a->i.

    Yes. Moreover, it can be accessed with an expression `*p1' (IOW, that
    expression may read value, or change the subobject), provided that
    `p1' points at the right location (obj2).

    > But it cannot be accessed by p2->i, which is a shorcut for (*p2).i,
    > because *p2 is of type "struct s2 containing "int" as a member"."

    [assuming that `p2' may point to obj2]
    No, this is because `struct s1' (which is the type of obj2) does not have
    `s2::i' member (sorry for C++ notation; struct members have their own
    namespace for each struct type).
    > It is not explicitly mentioned in the standard, but if access is
    > done through a struct, its type must match the type of the container of
    > the object we want to access.

    It is mentioned at the member access operators. If it weren't, nobody
    whould argure this.


    > Just one word about my misunderstanding of the Standard as I first read
    > it.
    > At first, I tried to find directly if two pointers may alias, but I was
    > the wrong way to do and leads to a dead end.

    Again: pointers don't alias, lvalues may...
    > I understand now that it is easier to think first about MEMORY
    > LOCATIONS (=objects), AND ONLY THEN think about which pointers may
    > access this location, by seeing if they comply with the rules of the
    > Standard, as I just did hereabove.

    Pointers may or may not point to locations, which is covered by
    different rules.
    > This gives for each location a set of pointers that may access it, and
    > the compiler considers each of these sets as pointers that may alias
    > and access the same object.
    > This way, the Standard becomes more readable and logical.
    >
    > In practice, the problem is often not to do this thorough analysis for
    > each object in memory.
    > It is more of the kind "I work with this object, can I access it with
    > this pointer ? and can I also access it with this other pointer ?".
    > "In particular, if I write data in this object using this pointer, can
    > this other pointer read these data ?"
    >

    Again: what matters is the EXPRESSION, not pointers that may be
    one of its components.

    > *** about type-punning ***
    >
    > double d = 1.234;
    > int* i = &d;

    The last one is suspicious.
    >
    > printf("%d\n", *i); // WRONG

    Right. Technically, it's UB.

    > unsigned char* c = (unsigned char*)&d;
    > and you can access the data with c[0], c[1], c[2] and c[3].

    Right


    > *** about pointer to char ***
    >
    > Besides, don't forget that as the Standard rule says, a pointer to char
    > can access any object of any type !

    Pointer to character type may *point to* any object (of any type).
    So can pointer to void.
    > When the location referenced by a pointer to char is updated, the
    > compiler must assume that any data stored in any type may have been
    > modified.

    No, when an object is modified though an *lvalue of character type*,
    then compiler must assume anything might have been modified (unless
    it can prove otherwise).

    > But don't think that this kind of code allows you to bypass the
    > aliasing rules:
    >
    > struct A *a;
    > struct B *b;
    >
    > b = (struct B*)(char*)a;

    The struct cast is suspicious.
    >
    > This won't make "*b" able to access data in "struct A", because "*b" is
    > of type "struct B".
    > It is the type of the dereferenced pointer that matters.

    More-or-less, yes.


    > *** final word ***
    >
    > When working with pointers, there seems to be no need to cast pointers.
    > ( I don't speak here of casting objects, like casting a "double" to an
    > "int" for instance, which is of course allowed.
    > It is casting pointers, like "double*" to "int*" or "struct s1*" to
    > "struct s2*" which is dangerous. )

    No, casting is sometimes necessary (where there's no implicit conversion),
    and is always safe where conversion is well defined.

    > In fact, every time a pointer is cast to point to a different type, the
    > alias rules interfere and lead to undefined behaviour.


    No, aliasing rules have to do with lvalues. Period. End of story.

    Pointers (in the way you talk about them) are subject to conversion rules.


    --
    Stan Tobias
    mailx `echo LID | sed s/[[:upper:]]//g`
    S.Tobias, Oct 19, 2005
    #15
  16. Netocrat Guest

    On Wed, 19 Oct 2005 09:05:33 +0000, Netocrat wrote:
    > On Wed, 19 Oct 2005 00:17:33 -0700, nicolas.riesch wrote:
    >
    > A few corrections but generally what you wrote was accurate.
    >
    >> struct s1 {int i; double d;};
    >> struct s2 {int i; double d;};
    >> // struct s1 and struct s2 are different types, because their tag
    >> names s1 and s2 are different.

    >
    > They are of the same "effective type".


    OK my reading of the standard was incomplete - they're not of the same
    effective type after all. Your original statement and the follow-ons that
    I mistakenly corrected stand.

    [...]
    >> p = (struct Point*)my_color_point; // WRONG // *p, which is of type
    >> "struct Point", cannot access data stored at location *my_color_point,
    >> which is an object of type "struct Color_Point".

    >
    > But this code is not accessing data, it's setting a pointer. Since struct
    > Color_Point's initial elements are those of struct Point in the same
    > order, it can't have stricter alignment requirements. There's nothing
    > wrong with the code.
    >
    > With the above assumption that my_color_point points to an object with the
    > effective type struct Color_Point, it is "wrong" to try to access p->y,
    > but not to access p->x.


    ....but in the context of aliasing, yes, it's not guaranteed that you will
    get the expected value when reading p->x.

    > This is because the first member of a structure
    > is always at the same (initial, unpadded) location, but the second member
    > may be preceded by an arbitrary amount of padding. In practice it's
    > unlikely that a compiler that would precede y with different amounts of
    > padding in each structure type, but in theory it's possible.


    --
    http://members.dodo.com.au/~netocrat
    Netocrat, Oct 19, 2005
    #16
  17. Guest

    S. Thobias wrote:

    >> [int (obj1)]
    >> [ struct s1 (obj2) [int (obj3)] [double (obj4)] ]
    >> [ struct s1 (obj5) [int (obj6)] [double (obj7)] ]
    >> [ struct s1 (obj8) [int (obj9)] [double (obj10)] ]
    >> [ struct s2 (obj11) [int (obj12)] [double (obj13)] ]
    >> [ struct s2 (obj14) [int (obj15)] [double (obj16)] ]


    >> The object (=memory location) obj1 is of type "int".
    >> It can be accessed (= read or modified) by *pi, which is a lvalue of
    >> type "int".
    >> It can also be accessed by p1->i, which is a shortcut for (*p1).i,
    >> and *p1 is of type "struct s1 containing "int" as a member".


    >No, `obj1' doesn't have member `i' (in fact, it's not a struct at all).


    I was having this example in mind, in fact:

    struct s1 mys1;

    struct s1* p1 = &mys1;
    int* pi = &mys1.i;

    (*p1).i = 123;

    printf("%d\n", *pi); // we read here the value of *pi, which is of
    type "int",
    // which has been written in the previous line
    // by using *p1 which is of type "struct s1"



    >Again: pointers don't alias, lvalues may...


    Absolutely, I must never forget that.


    >Pointer to character type may *point to* any object (of any type).
    >So can pointer to void.


    Yes, pointer to void can *point to* any object, but it cannot be
    dereferenced, so it cannot *access* it.


    >> When the location referenced by a pointer to char is updated, the
    >> compiler must assume that any data stored in any type may have been
    >> modified.


    >No, when an object is modified though an *lvalue of character type*,
    >then compiler must assume anything might have been modified (unless
    >it can prove otherwise).


    Expressing it this way is better, yes.


    And thank you very much for your comment.
    I still must read it carefully until I am sure to understand
    everything.
    , Oct 20, 2005
    #17
  18. Netocrat Guest

    On Wed, 19 Oct 2005 13:24:27 +0000, S.Tobias wrote:
    > wrote:

    [...]
    >> It is not explicitly mentioned in the standard, but if access is
    >> done through a struct, its type must match the type of the container of
    >> the object we want to access.

    > It is mentioned at the member access operators. If it weren't, nobody
    > whould argure this.


    Is this an area where the draft and final version differ? I see no
    mention of it in N869's "6.5.2.3 Structure and union members" which is the
    section to which I presume you're referring.

    [...]
    --
    http://members.dodo.com.au/~netocrat
    Netocrat, Oct 21, 2005
    #18
  19. Guest

    On Wed, 19 Oct 2005 13:24:27 +0000, S.Tobias wrote:
    > wrote:

    [...]
    >> It is not explicitly mentioned in the standard, but if access is
    >> done through a struct, its type must match the type of the container of
    >> the object we want to access.

    > It is mentioned at the member access operators. If it weren't, nobody
    > whould argure this.


    I see my explanation was unclear.
    Here is the reason why obj3 cannot be accessed by p2->i :

    p2->i is a shorcut for (*p2).i, and *p2 is of type "struct s2
    containing "int" as a member"."

    So far, one could think that *p2 could access obj3, as no rule seems to
    forbid it.
    But the Standard doesn't say that a lvalue complying to these rules CAN
    also access the object (it only MAY, and sometimes, it even CANNOT for
    other reasons).
    Here, the answer is that obj3 is included in a "struct s1", which is in
    a different location from any "struct s2" object because they are
    different types.
    So, a pointer to any "struct s2" OR TO ANY OF ITS MEMBERS cannot access
    any location of a "struct s1".
    , Oct 24, 2005
    #19
  20. S.Tobias Guest

    Netocrat <> wrote:
    > On Wed, 19 Oct 2005 13:24:27 +0000, S.Tobias wrote:
    >> wrote:

    > [...]
    >>> It is not explicitly mentioned in the standard, but if access is
    >>> done through a struct, its type must match the type of the container of
    >>> the object we want to access.

    >> It is mentioned at the member access operators. If it weren't, nobody
    >> whould argure this.

    >
    > Is this an area where the draft and final version differ?

    In the relevant parts - no. (In p.5 the first sentence has been dropped,
    and the rest differs by one letter.)
    >I see no
    > mention of it in N869's "6.5.2.3 Structure and union members" which is the
    > section to which I presume you're referring.
    >

    My bad, sorry. It's not explicitly mentioned, but can be derived.
    Pp. 3 and 4 refer to a "member of a structure or union object"; it means
    the operator (and behaviour) is defined iff the _object_ has the specified
    member. (However, I decline to explain what exactly it should be; I think
    the Std means the effective type of the object; it's one of the questions
    on my list to c.s.c.)

    Anyway, if the Std text is not enough, then at least Example 3 shows
    the intention; if it were allowed (in the example) to access `t1::m'
    with `p2->m' (or vv.), then the second part of the example would
    be moot, as well as the "special guarantee" of p. 5 would.

    --
    Stan Tobias
    mailx `echo LID | sed s/[[:upper:]]//g`
    S.Tobias, Oct 24, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Yuri Victorovich
    Replies:
    4
    Views:
    370
  2. Hallvard B Furuseth

    gcc, aliasing rules and unions

    Hallvard B Furuseth, Apr 18, 2006, in forum: C Programming
    Replies:
    3
    Views:
    364
    Ben C
    May 2, 2006
  3. Old Wolf

    Aliasing rules - int and long

    Old Wolf, Mar 14, 2007, in forum: C Programming
    Replies:
    10
    Views:
    467
    Yevgen Muntyan
    Mar 16, 2007
  4. David Mathog
    Replies:
    3
    Views:
    689
    Chris Torek
    Jul 5, 2007
  5. Noob
    Replies:
    9
    Views:
    574
    Tim Rentsch
    May 7, 2012
Loading...

Share This Page