Re: Compatible structs

Discussion in 'C Programming' started by Eric Sosman, Nov 2, 2010.

  1. Eric Sosman

    Eric Sosman Guest

    On 11/2/2010 7:12 AM, Vrtt wrote:
    > Hi all,
    >
    > I couldn't find this question in the FAQ, although it looks like a
    > frequently asked one. Apologies if it's been answered here before.
    >
    > I've seen this idiom a lot in libraries:-
    >
    > struct base {
    > int x;
    > char y;
    > };
    >
    > struct extended {
    > int x;
    > char y;
    > double z;
    > };
    >
    > void base_func(struct base *b);
    >
    > struct extended e;
    > base_func((struct base*)&e);
    >
    > Is this actually guaranteed to work, or can struct alignment (or something
    > else) mess it up?


    It will almost certainly work as expected, but it is not
    actually guaranteed to work. If there were a union containing
    both struct types it and if `e' resided in a union instance:

    union { struct base b; struct extended e; } u;
    base_func((struct base*)&u.e);
    /* or even ... */ base_func(&u.b);

    .... would be guaranteed to work. Absent the enclosing union,
    C is in theory free to arrange the structs' first two elements
    differently -- but since C supports separate compilation, the
    compiler must usually assume that such a union might exist in
    some other module, and since all `struct extended' must look
    the same throughout the entire program, it most likely arranges
    things as you'd expect.

    However, the layout of the structs is not the only issue.
    I've encountered actual trouble with actual real compilers in
    handling constructs closely related to this one, and my advice
    would be to shun the practice whenever possible. In the case
    at hand it's pretty easy to avoid the problem altogether:

    struct extended {
    struct base b;
    double z;
    };
    ...
    struct extended e;
    base_func (&e.b);

    .... is pure as the driven snow, 100% safe, and highly recommended.
    Yes, you now must write `e.b.x' instead of `e.x', but that's not
    usually a serious hardship in actual use.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Nov 2, 2010
    #1
    1. Advertising

  2. Eric Sosman

    BartC Guest

    "Eric Sosman" <> wrote in message
    news:iaouhn$cqi$-september.org...
    > On 11/2/2010 7:12 AM, Vrtt wrote:
    >> Hi all,
    >>
    >> I couldn't find this question in the FAQ, although it looks like a
    >> frequently asked one. Apologies if it's been answered here before.
    >>
    >> I've seen this idiom a lot in libraries:-
    >>
    >> struct base {
    >> int x;
    >> char y;
    >> };
    >>
    >> struct extended {
    >> int x;
    >> char y;
    >> double z;
    >> };
    >>
    >> void base_func(struct base *b);
    >>
    >> struct extended e;
    >> base_func((struct base*)&e);
    >>
    >> Is this actually guaranteed to work, or can struct alignment (or
    >> something
    >> else) mess it up?

    >
    > It will almost certainly work as expected, but it is not
    > actually guaranteed to work. If there were a union containing
    > both struct types it and if `e' resided in a union instance:
    >
    > union { struct base b; struct extended e; } u;
    > base_func((struct base*)&u.e);
    > /* or even ... */ base_func(&u.b);
    >
    > ... would be guaranteed to work. Absent the enclosing union,
    > C is in theory free to arrange the structs' first two elements
    > differently


    Why should the union make a difference?

    One access to the .y field might be u.b.y, and another might be u.e.y; why
    do the offsets of the two .y fields need to be guaranteed the same?

    I can't see that it's any different to having a base struct like this:

    struct base {
    char y;
    int x;
    };

    There's no reason to reverse the fields to match the extended struct, even
    if they are both in the same union.

    --
    Bartc
     
    BartC, Nov 2, 2010
    #2
    1. Advertising

  3. Eric Sosman

    Tim Rentsch Guest

    Eric Sosman <> writes:

    > On 11/2/2010 7:12 AM, Vrtt wrote:
    >> Hi all,
    >>
    >> I couldn't find this question in the FAQ, although it looks like a
    >> frequently asked one. Apologies if it's been answered here before.
    >>
    >> I've seen this idiom a lot in libraries:-
    >>
    >> struct base {
    >> int x;
    >> char y;
    >> };
    >>
    >> struct extended {
    >> int x;
    >> char y;
    >> double z;
    >> };
    >>
    >> void base_func(struct base *b);
    >>
    >> struct extended e;
    >> base_func((struct base*)&e);
    >>
    >> Is this actually guaranteed to work, or can struct alignment (or something
    >> else) mess it up?

    >
    > It will almost certainly work as expected, but it is not
    > actually guaranteed to work. If there were a union containing
    > both struct types it and if `e' resided in a union instance:
    >
    > union { struct base b; struct extended e; } u;
    > base_func((struct base*)&u.e);
    > /* or even ... */ base_func(&u.b);
    >
    > ... would be guaranteed to work. [snip]


    Not quite. To ensure defined-ness, the union definition
    needs to be visible at the point where the accesses are
    done, ie, in base_func. Unless base_func is defined
    later on in the same file as the above call, this code is
    still undefined behavior.
     
    Tim Rentsch, Nov 2, 2010
    #3
  4. Eric Sosman

    Eric Sosman Guest

    On 11/2/2010 8:55 AM, Vrtt wrote:
    > "Eric Sosman"<> wrote in message
    > news:iaouhn$cqi$-september.org...
    >>
    >> struct extended {
    >> struct base b;
    >> double z;
    >> };
    >> ...
    >> struct extended e;
    >> base_func (&e.b);
    >>
    >> ... is pure as the driven snow, 100% safe, and highly recommended.
    >> Yes, you now must write `e.b.x' instead of `e.x', but that's not
    >> usually a serious hardship in actual use.
    >>

    >
    > Could I not reliably do:
    >
    > base_func((void*)&e);
    >
    > ...in this case?


    Yes, but only because your base_func() has a prototype calling
    for a `struct base*' argument. The call as written begins with a
    `struct extended*', converts that to a `void*' by means of a cast,
    and then converts the `void*' to a `struct base*' by virtue of the
    prototype. You could equally well have written

    base_func((struct base*)&e);

    .... to avoid the intermediate conversion.

    I'd suggest, though, that adding a conversion where a conversion-
    free alternative exists is a step in the wrong direction. Theorem:
    When you write a cast to "get the compiler to do the right thing,"
    you are probably asking it to do something wrong. In the present case
    you've got a function that takes a `struct base*' and you've got a
    `struct base' instance ready to hand: just point at it and be done.
    Costuming the `struct base*' as a `struct extended*' and then as a
    `void*' and then as a `struct base*' again is just an indication that
    you're too enthusiastic about playing dress-up, even after Halloween
    is past.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Nov 3, 2010
    #4
  5. Eric Sosman

    BartC Guest

    "christian.bau" <> wrote in message
    news:...
    > On Nov 2, 12:37 pm, "BartC" <> wrote:
    >
    >> Why should the union make a difference?

    >
    > Because the C Standard says so.
    >
    > If you have two structs s1 and s2, and there is a union u containing
    > both struct s1 and struct s2, then the compiler must assume that any
    > pointer to a struct s1 is actually a pointer to an element of a union
    > u, and any pointer to a struct s2 might also be a pointer to an
    > element of the same union, and it must produce code that is correct in
    > that case. And the reason why the compiler has to assume this is
    > because the C Standard says so.


    Well I still don't get it (why the offsets of compatible fields in two
    struct types must match, when they belong in the same union, but not
    otherwise).

    And I can think of times when you don't want that behaviour (the two structs
    might have different alignments of the fields, even if the structs are in
    the same union).

    --
    Bartc
     
    BartC, Nov 3, 2010
    #5
  6. Eric Sosman

    crisgoogle Guest

    On Nov 3, 5:03 am, "BartC" <> wrote:
    > "christian.bau" <> wrote in message
    >
    > news:...
    >
    > > On Nov 2, 12:37 pm, "BartC" <> wrote:

    >
    > >> Why should the union make a difference?

    >
    > > Because the C Standard says so.

    >
    > > If you have two structs s1 and s2, and there is a union u containing
    > > both struct s1 and struct s2, then the compiler must assume that any
    > > pointer to a struct s1 is actually a pointer to an element of a union
    > > u, and any pointer to a struct s2 might also be a pointer to an
    > > element of the same union, and it must produce code that is correct in
    > > that case. And the reason why the compiler has to assume this is
    > > because the C Standard says so.

    >
    > Well I still don't get it (why the offsets of compatible fields in two
    > struct types must match, when they belong in the same union, but not
    > otherwise).


    And I still don't get exactly what your question is. Does the
    matching, or
    non-matching behaviour bother you?

    Given C's inclination not to define things any more strictly than
    necessary, allowing otherwise-identical-looking structs to have
    different
    internal alignments makes sense.

    However, forcing the alignment to be the same under special
    circumstances,
    thereby allowing aliasing of different struct types, seems useful.

    > And I can think of times when you don't want that behaviour (the two structs
    > might have different alignments of the fields, even if the structs are in
    > the same union).


    And I can't possibly imagine a scenario where you specifically want to
    prevent having compatible alignment. Or was that not what you're
    getting at
    here?
     
    crisgoogle, Nov 3, 2010
    #6
  7. Eric Sosman

    BartC Guest

    "crisgoogle" <> wrote in message
    news:...
    > On Nov 3, 5:03 am, "BartC" <> wrote:


    >> Well I still don't get it (why the offsets of compatible fields in two
    >> struct types must match, when they belong in the same union, but not
    >> otherwise).

    >
    > And I still don't get exactly what your question is. Does the
    > matching, or
    > non-matching behaviour bother you?


    It's why the standard specifically says they must match when part of a
    union. What's so special about a union?

    > Given C's inclination not to define things any more strictly than
    > necessary, allowing otherwise-identical-looking structs to have
    > different
    > internal alignments makes sense.


    OK, sometimes it makes sense to align the fields the same way, sometimes it
    doesn't. But according to this thread, using union will guarantee the
    alignment. I just wondered what is was about an union that made it important
    to align fields (eg. see the field .a in the example below).

    > However, forcing the alignment to be the same under special
    > circumstances,
    > thereby allowing aliasing of different struct types, seems useful.
    >
    >> And I can think of times when you don't want that behaviour (the two
    >> structs
    >> might have different alignments of the fields, even if the structs are in
    >> the same union).

    >
    > And I can't possibly imagine a scenario where you specifically want to
    > prevent having compatible alignment. Or was that not what you're
    > getting at
    > here?


    Well, yes. For example:

    struct {char c; int a;} s1; /* 8 bytes with pack(4) */
    struct {char c; int a;} s2; /* 5 bytes with pack(1) */

    You might want s1 for efficiency, or compatibility with external software.
    And you might want s2 for the same sorts of reasons (but space vs. speed)

    Whatever the reasons, you might well want a union of these two structs, but
    where s1.a has offset 4, and s2.a has offset 1.

    --
    Bartc
     
    BartC, Nov 3, 2010
    #7
  8. Eric Sosman

    Mark Wooding Guest

    crisgoogle <> writes:

    > Given C's inclination not to define things any more strictly than
    > necessary, allowing otherwise-identical-looking structs to have
    > different internal alignments makes sense.
    >
    > However, forcing the alignment to be the same under special
    > circumstances, thereby allowing aliasing of different struct types,
    > seems useful.


    The puzzling thing is that this leaves implementations with exceedingly
    little leeway.

    Let's put some stuff in header files to save typing:

    foo.h:
    struct foo {
    int x;
    char y;
    double z;
    struct foo *a;
    };

    extern void print_foo(const struct foo *f);

    bar.h:
    struct bar {
    int x;
    char y;
    double z;
    unsigned long a;
    struct bar *b;
    };

    extern void print_bar(const struct bar *b);

    Now for some actual source files.

    foo.c:
    #include <stdio.h>
    #include "foo.h"

    void print_foo(const struct foo *f)
    { printf("x = %d, y = %c, z = %g\n", f->x, f->y, f->z); }

    bar.c:
    #include <stdio.h>
    #include "bar.h"

    void print_bar(const struct bar *b)
    {
    printf("x = %d, y = %c, z = %g, a = %lu\n",
    b->x, b->y, b->z, b->a);
    }

    So we feed these two source files to our typical compiler, and it
    produces object files. Was the compiler allowed to make the structures
    incompatible?

    I say `no'.

    splat.c:
    #include <stdio.h>
    #include <string.h>
    #include "foo.h"
    #include "bar.h"

    union splat {
    struct foo f;
    struct bar b;
    };

    static void populate_foo(struct foo *f)
    { f->x = 4; f->y = 'q'; f->z = 3.141; f->a = &f; }

    static void populate_bar(struct bar *b)
    {
    b->x = 5; b->y = 'z'; b->z = 2.183;
    b->a = 0xdeadbeef; b->b = &b;
    }

    int main(int argc, char *argv[])
    {
    /* see below */

    return (0);
    }

    There are several interesting things we might put in `main'. We might
    say this, for example:

    in `main':
    union splat s;

    populate_bar(&s.b);
    print_foo(&s.f);

    Of course, `print_foo' doesn't know anything about the union (barring
    linker connivance that it's really hard to see provides any benefit to
    anyone) because it's in a different translation unit. But it has to
    work anyway.

    It's not just because the structure is part of a union.

    in `main' (alternate 2):
    union splat s;
    struct foo f;
    struct bar b;

    populate_bar(&b);
    memcpy(&s.b, &b, sizeof(b));
    memcpy(&f, &s.f, sizeof(f));
    /* careful now: f.a is indeterminate */
    print_foo(&f); /* doesn't touch f.a */

    Now we wonder what good the union actually does here. There's no
    address magic, because a union is at the same address as all of its
    members. All of the bits were hauled about as unsigned chars (courtesy
    of memcpy). Why would the above be different from this?

    in `main' (alternate 3):
    struct foo f;
    struct bar b;

    populate_bar(&b);
    memcpy(&f, &b, sizeof(b) < sizeof(f) ? sizeof(b) : sizeof(f));
    /* careful now: f.a is indeterminate */
    print_foo(&f); /* doesn't touch f.a */

    For extra fun, why does the union declaration have to be in /this/
    translation unit? We can get more amusing action-at-a-distance by
    putting it somewhere else.

    So, is there actually some valuable latitude here? Was there intended
    to be any latitude in laying out common prefixes of structures?

    -- [mdw]
     
    Mark Wooding, Nov 3, 2010
    #8
  9. Eric Sosman

    crisgoogle Guest

    On Nov 3, 2:18 pm, "BartC" <> wrote:
    > "crisgoogle" <> wrote in message
    >
    > news:...
    >
    > > On Nov 3, 5:03 am, "BartC" <> wrote:
    > >> Well I still don't get it (why the offsets of compatible fields in two
    > >> struct types must match, when they belong in the same union, but not
    > >> otherwise).

    >
    > > And I still don't get exactly what your question is. Does the
    > > matching, or
    > > non-matching behaviour bother you?

    >
    > It's why the standard specifically says they must match when part of a
    > union. What's so special about a union?


    Well, according to the standard itself, it's "to simplify the use of
    unions". It allows you to overlay different struct types in a union,
    but access common initial members safely. The alternative would be
    to force you to use a union of struct types _each_ of which shared
    the same "base" struct. This would add a layer that isn't needed
    when the union rule is in place.

    > > Given C's inclination not to define things any more strictly than
    > > necessary, allowing otherwise-identical-looking structs to have
    > > different
    > > internal alignments makes sense.

    >
    > OK, sometimes it makes sense to align the fields the same way, sometimes it
    > doesn't. But according to this thread, using union will guarantee the
    > alignment. I just wondered what is was about an union that made it important
    > to align fields (eg. see the field .a in the example below).
    >
    > > However, forcing the alignment to be the same under special
    > > circumstances,
    > > thereby allowing aliasing of different struct types, seems useful.

    >
    > >> And I can think of times when you don't want that behaviour (the two
    > >> structs
    > >> might have different alignments of the fields, even if the structs are in
    > >> the same union).

    >
    > > And I can't possibly imagine a scenario where you specifically want to
    > > prevent having compatible alignment. Or was that not what you're
    > > getting at
    > > here?

    >
    > Well, yes. For example:
    >
    > struct {char c; int a;} s1;     /* 8 bytes with pack(4) */
    > struct {char c; int a;} s2;     /* 5 bytes with pack(1) */


    The obvious response here is that any way to affect the packing
    is non-standard right off the bat, so who cares what the standard
    has to say about unions of these things?

    > You might want s1 for efficiency, or compatibility with external software..
    > And you might want s2 for the same sorts of reasons (but space vs. speed)


    But if you have a union of the things, the total size has to be the
    same
    regardless of the alignments of the individual bits. So there is no
    savings
    in size -- there _may_ be savings in speed due to alignment I suppose.

    > Whatever the reasons, you might well want a union of these two structs, but
    > where s1.a has offset 4, and s2.a has offset 1.


    And if the compiler wants to add this non-standard behaviour on top
    of other non-standard behaviour (the packing) then it's free to do so.

    Now that I think of it, you can even bypass the union rule like so:

    struct {char c; int a;} s1; /* 8 bytes with pack(4) */
    struct {char c; int a;} s2; /* 5 bytes with pack(1) */

    struct { struct s1; } s3;
    struct { struct s2; } s4;

    union u1 { struct s3; struct s4; };

    I _think_ that this union does not meet the "common initial sequence"
    criterion, so the two base structs, s1 and s2, can maintain
    their different alignment as you require.
     
    crisgoogle, Nov 3, 2010
    #9
  10. Eric Sosman

    Seebs Guest

    On 2010-11-03, Mark Wooding <> wrote:
    > So, is there actually some valuable latitude here? Was there intended
    > to be any latitude in laying out common prefixes of structures?


    Not really. The key is that you're allowed to optimize as though there
    is no way for a reference through one type to affect something of another
    type unless the aliasing rules would allow it.

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    I am not speaking for my employer, although they do rent some of my opinions.
     
    Seebs, Nov 3, 2010
    #10
  11. Eric Sosman

    Eric Sosman Guest

    On 11/3/2010 5:18 PM, BartC wrote:
    >
    >
    > "crisgoogle" <> wrote in message
    > news:...
    >> On Nov 3, 5:03 am, "BartC" <> wrote:

    >
    >>> Well I still don't get it (why the offsets of compatible fields in two
    >>> struct types must match, when they belong in the same union, but not
    >>> otherwise).

    >>
    >> And I still don't get exactly what your question is. Does the
    >> matching, or
    >> non-matching behaviour bother you?

    >
    > It's why the standard specifically says they must match when part of a
    > union. What's so special about a union?
    >
    >> Given C's inclination not to define things any more strictly than
    >> necessary, allowing otherwise-identical-looking structs to have
    >> different
    >> internal alignments makes sense.

    >
    > OK, sometimes it makes sense to align the fields the same way, sometimes
    > it doesn't. But according to this thread, using union will guarantee the
    > alignment. I just wondered what is was about an union that made it
    > important to align fields (eg. see the field .a in the example below).


    It's really not all *that* important that the union aligns
    the struct elements identically -- although that's a consequence
    of what the Standard requires. The point is that identical
    alignment, in and of itself, is not sufficient to guarantee that
    the type-punning will work as the type-punner hoped.

    Consider a function `void f1(int *ip, long *lp)'. It may be
    the case that `int' and `long' have the same representation, but
    the compiler is nonetheless entitled to assume that storing via
    `*ip' does not affect the value found at `*lp'. The two pointers
    are of different types, and cannot legitimately point at the same
    object. No, not even at a union of `int' and `long' elements,
    because it is undefined behavior to read from an element other
    than the most recently stored.

    Okay, f1() can legitimately assume that its arguments point
    to distinct objects, because its arguments are of distinct pointer
    types. Now consider `void f2(struct a *ap, struct b *bp)'. The
    exact same argument applies: The arguments are of different types,
    and cannot legitimately point to the same object. If the code
    stores to a->x, the function can assume that b->x is unchanged,
    and that a value already fetched from b->x is still valid. This
    is the case even if `struct a' and `struct b' have identical
    representations, just as `int' and `long' do (on some systems).
    They are distinct and incompatible types, so pointers to the two
    of them cannot legitimately point to the same piece of memory.

    ... except for the special case described by the Standard.
    *If* the first few elements of the structs agree, and *if* the
    compiler can see complete declarations of both structs, and *if*
    the compiler can also see the declaration of a union holding them
    both, *then* the compiler has been "told" that the structs may
    cohabit in a single memory area, and can no longer assume that
    pointers to the two struct types point at different places. In
    the absence of any single piece of this information, the compiler
    can revert to its "Different types, different memory" assumption,
    and type-punning becomes unreliable.

    (As I read the Standard, the type-punning guarantees also
    require that the pointed-at structs actually inhabit such a union;
    it's not enough that they "might" do so. I've described in other
    threads a circumstance where could-be-in-a-union-but-actually-isn't
    led to a SIGSEGV, though the compiler had done nothing wrong.)

    Why is the Standard so restrictive on this matter? Keep in
    mind that a round-trip to memory is frightfully expensive nowadays;
    RAM is several hundred times slower than CPU's are. Even the fastest
    and closest caches are nine to twelve cycles distant; do you like
    the idea of your 3GHz CPU being shackled to a 300MHz cache? A compiler
    will therefore expend a lot of effort in figuring out whether a store
    does or doesn't invalidate something already fetched; there are very
    large performance gains to be had if an additional fetch can be avoided.
    The Standard's aliasing rules favor "permissive" optimizers, and lay
    down very strict requirements for code that wishes to hobble the
    optimizations.

    > Well, yes. For example:
    >
    > struct {char c; int a;} s1; /* 8 bytes with pack(4) */
    > struct {char c; int a;} s2; /* 5 bytes with pack(1) */
    >
    > You might want s1 for efficiency, or compatibility with external software.
    > And you might want s2 for the same sorts of reasons (but space vs. speed)


    Aside from the (loose) requirements the Standard imposes, issues
    of representation are entirely up to the platform. "You might want,"
    but the C language does not grant, not portably.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Nov 4, 2010
    #11
  12. Eric Sosman <> writes:

    > On 11/3/2010 5:18 PM, BartC wrote:
    >>
    >>
    >> "crisgoogle" <> wrote in message
    >> news:...
    >>> On Nov 3, 5:03 am, "BartC" <> wrote:

    >>
    >>>> Well I still don't get it (why the offsets of compatible fields in two
    >>>> struct types must match, when they belong in the same union, but not
    >>>> otherwise).
    >>>
    >>> And I still don't get exactly what your question is. Does the
    >>> matching, or
    >>> non-matching behaviour bother you?

    >>
    >> It's why the standard specifically says they must match when part of a
    >> union. What's so special about a union?
    >>
    >>> Given C's inclination not to define things any more strictly than
    >>> necessary, allowing otherwise-identical-looking structs to have
    >>> different
    >>> internal alignments makes sense.

    >>
    >> OK, sometimes it makes sense to align the fields the same way, sometimes
    >> it doesn't. But according to this thread, using union will guarantee the
    >> alignment. I just wondered what is was about an union that made it
    >> important to align fields (eg. see the field .a in the example below).

    >
    > It's really not all *that* important that the union aligns
    > the struct elements identically -- although that's a consequence
    > of what the Standard requires. The point is that identical
    > alignment, in and of itself, is not sufficient to guarantee that
    > the type-punning will work as the type-punner hoped.
    >
    > Consider a function `void f1(int *ip, long *lp)'. It may be
    > the case that `int' and `long' have the same representation, but
    > the compiler is nonetheless entitled to assume that storing via
    > `*ip' does not affect the value found at `*lp'. The two pointers
    > are of different types, and cannot legitimately point at the same
    > object. No, not even at a union of `int' and `long' elements,
    > because it is undefined behavior to read from an element other
    > than the most recently stored.


    I don't think that's true -- at least for C99. To be clear, it is only
    that last phrase that I think is not correct. Reading from a member of
    a union other than that which was last written is taken to be a
    reinterpretation of the bits. You might get a trap representation but
    it is not undefined per se. There is a footnote to help make this
    intent clear.

    The notion of "effective type" and the aliasing rules prevent 'ip' and
    'lp' from being used to access the same object, so I am sure your "No,
    not even..." part is correct. Since your post is really about
    the aliasing rules, this is something of nit-pick.

    <snip>
    --
    Ben.
     
    Ben Bacarisse, Nov 4, 2010
    #12
  13. Eric Sosman

    BartC Guest

    "Eric Sosman" <> wrote in message
    news:iat2o2$hlu$-september.org...
    > On 11/3/2010 5:18 PM, BartC wrote:


    >> It's why the standard specifically says they must match when part of a
    >> union. What's so special about a union?


    > Consider a function `void f1(int *ip, long *lp)'. It may be


    > types. Now consider `void f2(struct a *ap, struct b *bp)'. The


    > They are distinct and incompatible types, so pointers to the two
    > of them cannot legitimately point to the same piece of memory.


    > ... except for the special case described by the Standard.
    > *If* the first few elements of the structs agree, and *if* the
    > compiler can see complete declarations of both structs, and *if*
    > the compiler can also see the declaration of a union holding them
    > both, *then* the compiler has been "told" that the structs may
    > cohabit in a single memory area, and can no longer assume that
    > pointers to the two struct types point at different places. In
    > the absence of any single piece of this information, the compiler
    > can revert to its "Different types, different memory" assumption,
    > and type-punning becomes unreliable.


    OK, thanks. So the union thing is just a mechanism for the programmer to
    tell the compiler to take care; but he doesn't otherwise need to make use of
    the union.

    > Why is the Standard so restrictive on this matter? Keep in
    > mind that a round-trip to memory is frightfully expensive nowadays;
    > RAM is several hundred times slower than CPU's are. Even the fastest
    > and closest caches are nine to twelve cycles distant; do you like
    > the idea of your 3GHz CPU being shackled to a 300MHz cache? A compiler
    > will therefore expend a lot of effort in figuring out whether a store
    > does or doesn't invalidate something already fetched; there are very
    > large performance gains to be had if an additional fetch can be avoided.


    (I rather thought that was the CPU's job; it will know whether a cache entry
    was invalidated or not, and will not access main memory unnecessarily. The
    only thing the compiler can save is executing the actual instructions that
    might reload a register from memory (for example), and not the (data) memory
    access itself.)

    --
    Bartc
     
    BartC, Nov 4, 2010
    #13
  14. Eric Sosman

    Eric Sosman Guest

    On 11/4/2010 7:46 AM, BartC wrote:
    > "Eric Sosman" <> wrote in message
    > news:iat2o2$hlu$-september.org...
    >> On 11/3/2010 5:18 PM, BartC wrote:

    >
    >>> It's why the standard specifically says they must match when part of a
    >>> union. What's so special about a union?

    >
    >> Consider a function `void f1(int *ip, long *lp)'. It may be

    >
    >> types. Now consider `void f2(struct a *ap, struct b *bp)'. The

    >
    >> They are distinct and incompatible types, so pointers to the two
    >> of them cannot legitimately point to the same piece of memory.

    >
    >> ... except for the special case described by the Standard.
    >> *If* the first few elements of the structs agree, and *if* the
    >> compiler can see complete declarations of both structs, and *if*
    >> the compiler can also see the declaration of a union holding them
    >> both, *then* the compiler has been "told" that the structs may
    >> cohabit in a single memory area, and can no longer assume that
    >> pointers to the two struct types point at different places. In
    >> the absence of any single piece of this information, the compiler
    >> can revert to its "Different types, different memory" assumption,
    >> and type-punning becomes unreliable.

    >
    > OK, thanks. So the union thing is just a mechanism for the programmer to
    > tell the compiler to take care; but he doesn't otherwise need to make
    > use of
    > the union.


    A strict interpretation of the Standard (and my own experience
    in a similar though not identical circumstance) suggests that the
    union is in fact essential, because the initially-similar structs
    must actually reside in an instance of the union. That is, given

    struct a { int x, y; };
    struct b { int x, y, z; };
    union u { struct a sa; struct b sb; } uu;

    .... the Standard says it is all right to populate either the sa or
    sb element of u_instance and then to access the other. But even with
    these declarations in view, the Standard does *not* guarantee that

    struct a a_instance = { 1, 2, 3 };
    struct b *bp = (struct b)&a;
    printf ("x = %d, y = %d\n", bp->x, bp->y);

    .... will work. The a_instance object is "free-standing" and not a
    member of a union that also includes struct b, so all bets are off.

    >> Why is the Standard so restrictive on this matter? Keep in
    >> mind that a round-trip to memory is frightfully expensive nowadays;
    >> RAM is several hundred times slower than CPU's are. Even the fastest
    >> and closest caches are nine to twelve cycles distant; do you like
    >> the idea of your 3GHz CPU being shackled to a 300MHz cache? A compiler
    >> will therefore expend a lot of effort in figuring out whether a store
    >> does or doesn't invalidate something already fetched; there are very
    >> large performance gains to be had if an additional fetch can be avoided.

    >
    > (I rather thought that was the CPU's job; it will know whether a cache
    > entry
    > was invalidated or not, and will not access main memory unnecessarily. The
    > only thing the compiler can save is executing the actual instructions that
    > might reload a register from memory (for example), and not the (data)
    > memory access itself.)


    You've overlooked the fact that cache is farther from the CPU
    than its own registers are. You've also forgotten that the CPU does
    *not* know whether a cache entry is valid or not; if the CPU could
    keep track of that many addresses and their contents, the cache would
    just be duplicative.

    Suppose the CPU fetches bp->x into R3, and then stores some
    unrelated value to ap->x. Is the value in R3 current or stale? Can
    you avoid a stall by producing the answer in less than one CPU cycle?

    --
    Eric Sosman
    lid
     
    Eric Sosman, Nov 5, 2010
    #14
  15. Eric Sosman

    Eric Sosman Guest

    On 11/4/2010 9:44 PM, Eric Sosman wrote:
    > [...] That is, given
    >
    > struct a { int x, y; };
    > struct b { int x, y, z; };
    > union u { struct a sa; struct b sb; } uu;
    >
    > ... the Standard says it is all right to populate either the sa or
    > sb element of u_instance and then to access the other. [...]


    For clarity: It is all right to access the x and y elements of
    either. If you store to uu.sa and then try to use uu.sb.z, all
    bets are off. Sorry if I've caused confusion.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Nov 5, 2010
    #15
  16. Eric Sosman

    BartC Guest

    "Eric Sosman" <> wrote in message
    news:iavnjl$luo$-september.org...
    > On 11/4/2010 7:46 AM, BartC wrote:


    >> OK, thanks. So the union thing is just a mechanism for the programmer to
    >> tell the compiler to take care; but he doesn't otherwise need to make
    >> use of the union.

    >
    > A strict interpretation of the Standard (and my own experience
    > in a similar though not identical circumstance) suggests that the
    > union is in fact essential, because the initially-similar structs
    > must actually reside in an instance of the union. That is, given
    >
    > struct a { int x, y; };
    > struct b { int x, y, z; };
    > union u { struct a sa; struct b sb; } uu;
    >
    > ... the Standard says it is all right to populate either the sa or
    > sb element of u_instance and then to access the other. But even with
    > these declarations in view, the Standard does *not* guarantee that
    >
    > struct a a_instance = { 1, 2, 3 };
    > struct b *bp = (struct b)&a;
    > printf ("x = %d, y = %d\n", bp->x, bp->y);
    >
    > ... will work. The a_instance object is "free-standing" and not a
    > member of a union that also includes struct b, so all bets are off.


    Your initial comments in the thread suggested this would ('most likely')
    work when there was even the possibility that bp might point to something
    that was part of a union elsewhere in the program.

    OK. (I don't know if it's exactly the same issue, but Microsoft seemed to
    use this technique a lot in their Win32 APIs: struct pointer arguments often
    relied on a size field at the start of the struct, to tell it which of
    several versions were being pointed to. There would be a common set of
    fields, and presumably these were expected to have matching offsets. And no
    union in sight.)

    >>> RAM is several hundred times slower than CPU's are. Even the fastest
    >>> and closest caches are nine to twelve cycles distant; do you like
    >>> the idea of your 3GHz CPU being shackled to a 300MHz cache? A compiler


    >> (I rather thought that was the CPU's job; it will know whether a cache
    >> entry
    >> was invalidated or not, and will not access main memory unnecessarily.
    >> The
    >> only thing the compiler can save is executing the actual instructions
    >> that
    >> might reload a register from memory (for example), and not the (data)
    >> memory access itself.)

    >
    > You've overlooked the fact that cache is farther from the CPU
    > than its own registers are. You've also forgotten that the CPU does
    > *not* know whether a cache entry is valid or not; if the CPU could
    > keep track of that many addresses and their contents, the cache would
    > just be duplicative.
    >
    > Suppose the CPU fetches bp->x into R3, and then stores some
    > unrelated value to ap->x. Is the value in R3 current or stale? Can
    > you avoid a stall by producing the answer in less than one CPU cycle?


    The CPU doesn't know nor care whether R3 is supposed to contain the latest
    snapshot of whatever was at bp->x.

    It is the compiler that is concerned that R3 contains the current value of
    bp->x (because 'bp->x' is what has been coded), and may have to consider
    that ap->x might be an alias that could render R3 invalid. So you're right
    in that it can save some valuable instructions, but if it did decide to
    reload R3 from bp->x, I don't think the CPU will do the main memory fetch
    if it was not necessary.

    --
    Bartc
     
    BartC, Nov 5, 2010
    #16
  17. Eric Sosman

    Jon Guest

    BartC wrote:

    > OK. (I don't know if it's exactly the same issue, but Microsoft
    > seemed to use this technique a lot in their Win32 APIs: struct
    > pointer arguments often relied on a size field at the start of the
    > struct, to tell it which of several versions were being pointed to.
    > There would be a common set of fields, and presumably these were
    > expected to have matching offsets. And no union in sight.)
    >


    Ah what a relief it is to have control of your own compiler: you then
    *know* what is/isn't guaranteed by looking at the implementation or
    making the implementation conform to your needs (even if it means
    "bending" fuzzy ISO standard "rules" or taking advantage of "the gray
    areas").
     
    Jon, Nov 6, 2010
    #17
  18. Eric Sosman

    Jon Guest

    Eric Sosman wrote:
    > On 11/2/2010 8:55 AM, Vrtt wrote:
    >> "Eric Sosman"<> wrote in message
    >> news:iaouhn$cqi$-september.org...
    >>>
    >>> struct extended {
    >>> struct base b;
    >>> double z;
    >>> };
    >>> ...
    >>> struct extended e;
    >>> base_func (&e.b);
    >>>
    >>> ... is pure as the driven snow, 100% safe, and highly recommended.
    >>> Yes, you now must write `e.b.x' instead of `e.x', but that's not
    >>> usually a serious hardship in actual use.
    >>>

    >>
    >> Could I not reliably do:
    >>
    >> base_func((void*)&e);
    >>
    >> ...in this case?

    >
    > Yes, but only because your base_func() has a prototype calling
    > for a `struct base*' argument. The call as written begins with a
    > `struct extended*', converts that to a `void*' by means of a cast,
    > and then converts the `void*' to a `struct base*' by virtue of the
    > prototype. You could equally well have written
    >
    > base_func((struct base*)&e);
    >
    > ... to avoid the intermediate conversion.
    >
    > I'd suggest, though, that adding a conversion where a conversion-
    > free alternative exists is a step in the wrong direction. Theorem:
    > When you write a cast to "get the compiler to do the right thing,"
    > you are probably asking it to do something wrong. In the present case
    > you've got a function that takes a `struct base*' and you've got a
    > `struct base' instance ready to hand: just point at it and be done.
    > Costuming the `struct base*' as a `struct extended*' and then as a
    > `void*' and then as a `struct base*' again is just an indication that
    > you're too enthusiastic about playing dress-up, even after Halloween
    > is past.


    If the structs *were* declared the same though, there certainly would be
    no issue whatsoever, right? As in:

    struct X {
    int x;
    char y;
    double z;
    };

    struct Y{
    int x;
    char y;
    double z;
    };

    void X_func(struct X *x);

    struct Y y;
    X_func((struct X*)&y);


    Assuming, of course, that they weren't declared with differing packings
    using pragmas or something.
     
    Jon, Nov 6, 2010
    #18
  19. "Jon" <> writes:

    [...]

    >
    > If the structs *were* declared the same though, there certainly would be
    > no issue whatsoever, right? As in:
    >
    > struct X {
    > int x;
    > char y;
    > double z;
    > };
    >
    > struct Y{
    > int x;
    > char y;
    > double z;
    > };
    >
    > void X_func(struct X *x);
    >
    > struct Y y;
    > X_func((struct X*)&y);
    >


    I can try to answer since I've got a related question a couple month ago
    which was kindly and patiently resovled by Tim Rentsch.

    I think it's possible for these 2 structures to have different alignment
    requirements although it's probably unlikely.

    I also think that if X_func() definition dereferences its converted
    argument then it invokes undefined behaviour.

    And eventually another concern is, since there's no visible union that
    includes both 'struct X' and 'struct Y' object, X_func() can be hit by
    aliasing issues if X_func() access to 'y' object. Ok that sounds
    unlikely looking at the name of thr function.

    --
    Francis
     
    Francis Moreau, Nov 6, 2010
    #19
  20. Eric Sosman

    Nick Guest

    Francis Moreau <> writes:

    > "Jon" <> writes:
    >
    > [...]
    >
    >>
    >> If the structs *were* declared the same though, there certainly would be
    >> no issue whatsoever, right? As in:
    >>
    >> struct X {
    >> int x;
    >> char y;
    >> double z;
    >> };
    >>
    >> struct Y{
    >> int x;
    >> char y;
    >> double z;
    >> };
    >>
    >> void X_func(struct X *x);
    >>
    >> struct Y y;
    >> X_func((struct X*)&y);
    >>

    >
    > I can try to answer since I've got a related question a couple month ago
    > which was kindly and patiently resovled by Tim Rentsch.
    >
    > I think it's possible for these 2 structures to have different alignment
    > requirements although it's probably unlikely.
    >
    > I also think that if X_func() definition dereferences its converted
    > argument then it invokes undefined behaviour.
    >
    > And eventually another concern is, since there's no visible union that
    > includes both 'struct X' and 'struct Y' object, X_func() can be hit by
    > aliasing issues if X_func() access to 'y' object. Ok that sounds
    > unlikely looking at the name of thr function.


    I agree. I asked a year or so back about some code I've got that does a
    merge-sort of generic linked lists as long as the "next" pointer is the
    first item (it casts a void * to a structure type of its own with just a
    next pointer).

    When I get round to it I'll probably re-write it to the offsetof trick
    to get back and ask you to pass it the next pointer inside the head node
    instead. Not quite as neat an interface as head node and comparison
    function pointer, but slightly safer.
    --
    Online waterways route planner | http://canalplan.eu
    Plan trips, see photos, check facilities | http://canalplan.org.uk
     
    Nick, Nov 6, 2010
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Patricia  Van Hise

    structs with fields that are structs

    Patricia Van Hise, Apr 5, 2004, in forum: C Programming
    Replies:
    5
    Views:
    642
    Al Bowers
    Apr 5, 2004
  2. Chris Hauxwell

    const structs in other structs

    Chris Hauxwell, Apr 23, 2004, in forum: C Programming
    Replies:
    6
    Views:
    561
    Chris Hauxwell
    Apr 27, 2004
  3. Ben Bacarisse

    Re: Compatible structs

    Ben Bacarisse, Nov 2, 2010, in forum: C Programming
    Replies:
    2
    Views:
    264
    Tim Rentsch
    Nov 2, 2010
  4. Mark Wooding

    Re: Compatible structs

    Mark Wooding, Nov 2, 2010, in forum: C Programming
    Replies:
    3
    Views:
    307
    Tim Rentsch
    Nov 3, 2010
  5. pantagruel
    Replies:
    0
    Views:
    246
    pantagruel
    Feb 17, 2006
Loading...

Share This Page