partial struct usage

Discussion in 'C Programming' started by Mark Adler, Sep 24, 2011.

  1. Mark Adler

    Mark Adler Guest

    I have an internal structure with lots of stuff in it for controlling a
    data input operation. E.g.:

    struct data {
    int have;
    char *next;
    ... lots more stuff to keep track of the state
    };

    I would like to expose to the user (in the interface header file) only
    those first two elements, so that I can provide a macro (also in the
    header file) analogous to getc() to pull data that is readily available
    in the buffer, otherwise to call a function to go get more data. But I
    want to hide the other stuff in the structure, in order to make sure
    that users of the library don't end up depending on things that I
    change later for internal use. And for efficiency I don't want yet
    another layer of indirection pointing to another structure from within
    the exposed structure for the other stuff.

    E.g.

    struct ex {
    int have;
    char *next;
    };

    #define getd(d) (((struct ex *)d)->have ? (((struct ex *)d)->have--,
    *(((struct ex *)d)->next)++) : getdf(d))

    where d is a struct data *.

    My question is: does the C standard provide an assurance that this will
    work? I.e. that two different structures with identical sets of prefix
    elements will have those prefix elements layed out identically in
    memory so that casting one structure to the other will permit correct
    access to those prefix elements?

    Mark
    Mark Adler, Sep 24, 2011
    #1
    1. Advertising

  2. On 9/24/11 1:23 PM, Mark Adler wrote:
    > I have an internal structure with lots of stuff in it for controlling a
    > data input operation. E.g.:
    >
    > struct data {
    > int have;
    > char *next;
    > ... lots more stuff to keep track of the state
    > };
    >
    > I would like to expose to the user (in the interface header file) only
    > those first two elements, so that I can provide a macro (also in the
    > header file) analogous to getc() to pull data that is readily available
    > in the buffer, otherwise to call a function to go get more data. But I
    > want to hide the other stuff in the structure, in order to make sure
    > that users of the library don't end up depending on things that I change
    > later for internal use. And for efficiency I don't want yet another
    > layer of indirection pointing to another structure from within the
    > exposed structure for the other stuff.
    >
    > E.g.
    >
    > struct ex {
    > int have;
    > char *next;
    > };
    >
    > #define getd(d) (((struct ex *)d)->have ? (((struct ex *)d)->have--,
    > *(((struct ex *)d)->next)++) : getdf(d))
    >
    > where d is a struct data *.
    >
    > My question is: does the C standard provide an assurance that this will
    > work? I.e. that two different structures with identical sets of prefix
    > elements will have those prefix elements layed out identically in memory
    > so that casting one structure to the other will permit correct access to
    > those prefix elements?
    >
    > Mark
    >


    If you look at the specifications for unions, you will see a requirement
    for access compatibility for structs with equivalent initial sequences.
    While it technically means it only applies in unions, to make it work
    for unions but not elsewhere is hard, so unless you are using a "debug"
    version of the compiler checking for this sort of "error", it should work.

    Another method it to place the "public" members in a separate stuct
    which it the first member of the full struct, then use the fact that the
    address of the first member of a struct is the same as the address of
    the struct (which means the cast gives a usable value). No pointer
    indirection costs, (just a level of . syntax).
    Richard Damon, Sep 24, 2011
    #2
    1. Advertising

  3. Mark Adler

    BGB Guest

    On 9/24/2011 10:23 AM, Mark Adler wrote:
    > I have an internal structure with lots of stuff in it for controlling a
    > data input operation. E.g.:
    >
    > struct data {
    > int have;
    > char *next;
    > ... lots more stuff to keep track of the state
    > };
    >
    > I would like to expose to the user (in the interface header file) only
    > those first two elements, so that I can provide a macro (also in the
    > header file) analogous to getc() to pull data that is readily available
    > in the buffer, otherwise to call a function to go get more data. But I
    > want to hide the other stuff in the structure, in order to make sure
    > that users of the library don't end up depending on things that I change
    > later for internal use. And for efficiency I don't want yet another
    > layer of indirection pointing to another structure from within the
    > exposed structure for the other stuff.
    >
    > E.g.
    >
    > struct ex {
    > int have;
    > char *next;
    > };
    >
    > #define getd(d) (((struct ex *)d)->have ? (((struct ex *)d)->have--,
    > *(((struct ex *)d)->next)++) : getdf(d))
    >
    > where d is a struct data *.
    >
    > My question is: does the C standard provide an assurance that this will
    > work? I.e. that two different structures with identical sets of prefix
    > elements will have those prefix elements layed out identically in memory
    > so that casting one structure to the other will permit correct access to
    > those prefix elements?
    >


    I think, technically, no...
    but off-hand, I don't know of any implementations where it wont work.

    however, if the structs are physically nested, like:
    struct mylib_publicstuff_s
    {
    ....
    };

    struct mylib_privatestuff_s
    {
    struct mylib_publicstuff_s pu;
    ....
    };

    then potentially, one could just use pointer-casts (although, I am not
    certain if this is guaranteed to work either, so alas...).


    admittedly, I have not generally dealt with this case, as more commonly
    I expose/hide entire structs (externally, only a basic typedef is
    visible), and would probably just use the pointer indirection.

    maybe also possible would be to hide the entire struct, and internally
    make use of getter/setter functions?...
    BGB, Sep 24, 2011
    #3
  4. Mark Adler

    Phil Carmody Guest

    Mark Adler <> writes:
    > I have an internal structure with lots of stuff in it for controlling
    > a data input operation. E.g.:
    >
    > struct data {
    > int have;
    > char *next;
    > ... lots more stuff to keep track of the state
    > };
    >
    > I would like to expose to the user (in the interface header file) only
    > those first two elements, so that I can provide a macro (also in the
    > header file) analogous to getc() to pull data that is readily
    > available in the buffer, otherwise to call a function to go get more
    > data. But I want to hide the other stuff in the structure, in order
    > to make sure that users of the library don't end up depending on
    > things that I change later for internal use. And for efficiency I
    > don't want yet another layer of indirection pointing to another
    > structure from within the exposed structure for the other stuff.
    >
    > E.g.
    >
    > struct ex {
    > int have;
    > char *next;
    > };
    >
    > #define getd(d) (((struct ex *)d)->have ? (((struct ex *)d)->have--,
    > *(((struct ex *)d)->next)++) : getdf(d))
    >
    > where d is a struct data *.
    >
    > My question is: does the C standard provide an assurance that this
    > will work? I.e. that two different structures with identical sets of
    > prefix elements will have those prefix elements layed out identically
    > in memory so that casting one structure to the other will permit
    > correct access to those prefix elements?


    struct private {
    struct public {
    int have;
    char *next;
    } pub;
    other stuff;
    }

    let the clients see and pass around pointers to pub, and then either
    use something like container_of() (which is not strictly portable), or
    just rely on the address of the first member being the address of the
    structure and just cast, to map that back to the struct private.

    I wonder whether the union of structures with common initial sequences
    might be the way of absolutely ensuring the fields are in the same
    place:

    union private {
    struct public {
    int have;
    char *next;
    } pub;
    struct priv {
    int have;
    char *next;
    other stuff;
    } priv;
    };

    The address of the union is guaranteed to be the same as the address
    of pub and priv, so casting is safe, but in addition, have and next
    will always be at the same relative address thanks to 6.5.2.3p6:

    """
    6 One special guarantee is made in order to simplify the use of unions: if a union contains
    several structures that share a common initial sequence (see below), and if the union
    object currently contains one of these structures, it is permitted to inspect the common
    initial part of any of them anywhere that a declaration of the completed type of the union
    is visible. Two structures share a common initial sequence if corresponding members
    have compatible types (and, for bit-#elds, the same widths) for a sequence of one or more
    initial members.
    """


    Phil

    --
    "Religion is what keeps the poor from murdering the rich."
    -- Napoleon
    Phil Carmody, Sep 24, 2011
    #4
  5. Mark Adler

    Eric Sosman Guest

    On 9/24/2011 1:23 PM, Mark Adler wrote:
    > I have an internal structure with lots of stuff in it for controlling a
    > data input operation. E.g.:
    >
    > struct data {
    > int have;
    > char *next;
    > ... lots more stuff to keep track of the state
    > };
    >
    > I would like to expose to the user (in the interface header file) only
    > those first two elements, so that I can provide a macro (also in the
    > header file) analogous to getc() to pull data that is readily available
    > in the buffer, otherwise to call a function to go get more data. But I
    > want to hide the other stuff in the structure, in order to make sure
    > that users of the library don't end up depending on things that I change
    > later for internal use. And for efficiency I don't want yet another
    > layer of indirection pointing to another structure from within the
    > exposed structure for the other stuff.
    >
    > E.g.
    >
    > struct ex {
    > int have;
    > char *next;
    > };
    >
    > #define getd(d) (((struct ex *)d)->have ? (((struct ex *)d)->have--,
    > *(((struct ex *)d)->next)++) : getdf(d))
    >
    > where d is a struct data *.
    >
    > My question is: does the C standard provide an assurance that this will
    > work? I.e. that two different structures with identical sets of prefix
    > elements will have those prefix elements layed out identically in memory
    > so that casting one structure to the other will permit correct access to
    > those prefix elements?


    The layout will be the same if both structs inhabit the same
    union. As a practical matter this means the layout will be the
    same even without the union, because the compiler will not be able
    to prove that in some as-yet-unseen compilation unit the two might
    appear in the same union.

    But beware the optimizer! Identical layout does not imply
    interchangeability, not when the compiler starts getting clever
    about re-ordering things, pre-fetching things, adjusting things
    for best cache line usage, and so on. In your case it will probably
    be all right, because `struct ex' is no larger than and needs no
    stricter alignment than `struct data'. You're skating on thin ice,
    but it's quite unlikely to crack -- just don't carry anything extra
    on your back, okay?

    IMHO, the suggestion offered by both BGB and Phil Carmody is a
    safer and better way to proceed. It has the disadvantage that an
    ignorant or confused client might cook up its own `struct ex' and
    present it to you, and when you try to access the members of the
    nonexistent containing struct trouble will ensue. But then, your
    original has that same drawback, so adopting their method leads to
    no disimprovement.

    --
    Eric Sosman
    d
    Eric Sosman, Sep 24, 2011
    #5
  6. Mark Adler

    Mark Adler Guest

    Thank you guys. I got what I needed.

    Mark
    Mark Adler, Sep 25, 2011
    #6
  7. Mark Adler

    Gene Guest

    On Sep 25, 4:23 pm, Mark Adler <> wrote:
    > Thank you guys.  I got what I needed.
    >
    > Mark
    >
    >


    Questions about pointers to struct sharing fields seem popular. As
    others have said there are few compilers that won't allow you to cast
    blithely among structs to access common initial elements.

    But there is a way to do it that the Standard guarentees will work.
    Someone here kindly pointed me to the key phrase.

    From the publicly available draft of C99 at
    http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf (but I think
    it's in C90, too):

    6.7.2.13 ... A pointer to a structure object, suitably converted,
    points to its initial member (or if that member is a
    bit-field, then to the unit in which it resides), and vice versa.

    In your case you'd expose the initial field (singular, but it can be a
    struct) by casting from your own full-powered pointer to that field.
    So you'd end up with code like

    // what the user sees
    struct api_visible_data {
    int have;
    char *next;
    };

    // what your code can see
    struct data {
    // one-element array trick for uniform syntax
    struct api_visible_data user_fields[1];

    ... lots more stuff to keep track of the state

    };

    struct api_visible_data *get_data( ... )
    {
    struct data *the_data = malloc(sizeof(struct data));

    ... set up the data

    return data->user_fields;
    }

    void process_data(struct api_visible_data *data_user_fields)
    {
    struct data *data = (struct data *)data_user_fields;

    ... use all the fields including
    data->user_fields->have and ->next

    }

    The need to touch user fields with different syntax can be bad or good
    depending on your point of view.
    Gene, Sep 26, 2011
    #7
  8. On 9/26/11 4:14 AM, Gene wrote:
    >
    > Questions about pointers to struct sharing fields seem popular. As
    > others have said there are few compilers that won't allow you to cast
    > blithely among structs to access common initial elements.
    >
    > But there is a way to do it that the Standard guarentees will work.
    > Someone here kindly pointed me to the key phrase.
    >
    > From the publicly available draft of C99 at
    > http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf (but I think
    > it's in C90, too):
    >
    > 6.7.2.13 ... A pointer to a structure object, suitably converted,
    > points to its initial member (or if that member is a
    > bit-field, then to the unit in which it resides), and vice versa.
    >


    You also get a fairly good promise from 6.5.2.3 para 5

    One special guarantee is made in order to simplify the use of unions: if
    a union contains several structures that share a common initial sequence
    (see below), and if the union object currently contains one of these
    structures, it is permitted to inspect the common initial part of any of
    them anywhere that a declaration of the complete type of the union is
    visible. Two structures share a common initial sequence if corresponding
    members have compatible types (and, for bit-fields, the same widths) for
    a sequence of one or more initial members.

    While technically only applying to structs within a union, since a
    pointer to the structs can be passed to other code, unless the pointer
    includes information that the struct is actually part of a union, and
    modifies access rules based on that, it will hold for more general
    access. The only case were I would expect the compiler to possible go to
    that effort would be for a heavy debug build. (this is a somewhat common
    idiom).
    Richard Damon, Sep 26, 2011
    #8
  9. Mark Adler

    Mark Adler Guest

    Thanks! 6.7.2.13 is now my friend.

    On 2011-09-26 01:14:06 -0700, Gene said:
    > // what your code can see
    > struct data {
    > // one-element array trick for uniform syntax
    > struct api_visible_data user_fields[1];
    >
    > ... lots more stuff to keep track of the state
    >
    > };


    Why does it need to be or want to be a one-element array? If I left
    off the "[1]", wouldn't it do the same thing?

    Mark
    Mark Adler, Sep 27, 2011
    #9
  10. Mark Adler <> writes:

    > Thanks! 6.7.2.13 is now my friend.
    >
    > On 2011-09-26 01:14:06 -0700, Gene said:
    >> // what your code can see
    >> struct data {
    >> // one-element array trick for uniform syntax
    >> struct api_visible_data user_fields[1];
    >>
    >> ... lots more stuff to keep track of the state
    >>
    >> };

    >
    > Why does it need to be or want to be a one-element array? If I left
    > off the "[1]", wouldn't it do the same thing?


    It doesn't need to be. I think the advantage is small -- maybe too
    small to be worth the puzzlement that it causes. The point is that
    simply referencing user_fields we get a pointer, and since it's a
    pointer to the this object that is being passed around, it may be more
    convenient to have this conversion happen automatically.

    It's more commonly seen where users have to allocate some types
    themselves and pass these things around. If users of some API are told
    to write code like this:

    api_object a, b, c;
    api_int(a, 42);
    api_copy(b, a);
    api_do_calculation(c, a, b);

    then just by defining

    typedef struct huge { /* lots of members */ } api_object[1];

    the author can not only ensure that users can declare these things but
    that they will be passed using pointers rather than by copying the
    structure.

    The trouble is that it defeats people's expectation. These objects
    can't now be returned from a function and the fact that the API
    functions can modify the object passed is not obvious from the code.

    --
    Ben.
    Ben Bacarisse, Sep 27, 2011
    #10
  11. Mark Adler

    Tim Rentsch Guest

    Richard Damon <> writes:

    > On 9/24/11 1:23 PM, Mark Adler wrote:
    >> I have an internal structure with lots of stuff in it for controlling a
    >> data input operation. E.g.:
    >>
    >> struct data {
    >> int have;
    >> char *next;
    >> ... lots more stuff to keep track of the state
    >> };
    >>
    >> I would like to expose to the user (in the interface header file) only
    >> those first two elements, so that I can provide a macro (also in the
    >> header file) analogous to getc() to pull data that is readily available
    >> in the buffer, otherwise to call a function to go get more data. But I
    >> want to hide the other stuff in the structure, in order to make sure
    >> that users of the library don't end up depending on things that I change
    >> later for internal use. And for efficiency I don't want yet another
    >> layer of indirection pointing to another structure from within the
    >> exposed structure for the other stuff.
    >>
    >> E.g.
    >>
    >> struct ex {
    >> int have;
    >> char *next;
    >> };
    >>
    >> #define getd(d) (((struct ex *)d)->have ? (((struct ex *)d)->have--,
    >> *(((struct ex *)d)->next)++) : getdf(d))
    >>
    >> where d is a struct data *.
    >>
    >> My question is: does the C standard provide an assurance that this will
    >> work? I.e. that two different structures with identical sets of prefix
    >> elements will have those prefix elements layed out identically in memory
    >> so that casting one structure to the other will permit correct access to
    >> those prefix elements?
    >>
    >> Mark
    >>

    >
    > If you look at the specifications for unions, you will see a
    > requirement for access compatibility for structs with equivalent
    > initial sequences. While it technically means it only applies in
    > unions, to make it work for unions but not elsewhere is hard, so
    > unless you are using a "debug" version of the compiler checking for
    > this sort of "error", it should work. [snip unrelated]


    Dangerous advice. As Eric Sosman correctly points out, identical
    layout is not enough in the presence of aggressive (and allowed
    under the Standard) optimization.
    Tim Rentsch, Jan 24, 2012
    #11
  12. Mark Adler

    Tim Rentsch Guest

    Phil Carmody <> writes:

    > Mark Adler <> writes:
    >> I have an internal structure with lots of stuff in it for controlling
    >> a data input operation. E.g.:
    >>
    >> struct data {
    >> int have;
    >> char *next;
    >> ... lots more stuff to keep track of the state
    >> };
    >>
    >> I would like to expose to the user (in the interface header file) only
    >> those first two elements, so that I can provide a macro (also in the
    >> header file) analogous to getc() to pull data that is readily
    >> available in the buffer, otherwise to call a function to go get more
    >> data. But I want to hide the other stuff in the structure, in order
    >> to make sure that users of the library don't end up depending on
    >> things that I change later for internal use. And for efficiency I
    >> don't want yet another layer of indirection pointing to another
    >> structure from within the exposed structure for the other stuff.
    >>
    >> E.g.
    >>
    >> struct ex {
    >> int have;
    >> char *next;
    >> };
    >>
    >> #define getd(d) (((struct ex *)d)->have ? (((struct ex *)d)->have--,
    >> *(((struct ex *)d)->next)++) : getdf(d))
    >>
    >> where d is a struct data *.
    >>
    >> My question is: does the C standard provide an assurance that this
    >> will work? I.e. that two different structures with identical sets of
    >> prefix elements will have those prefix elements layed out identically
    >> in memory so that casting one structure to the other will permit
    >> correct access to those prefix elements?

    >
    > struct private {
    > struct public {
    > int have;
    > char *next;
    > } pub;
    > other stuff;
    > }
    >
    > let the clients see and pass around pointers to pub, and then either
    > use something like container_of() (which is not strictly portable), or


    Actually container_of() should be strictly portable provided the
    outer structure type being casted to matches the actual object
    being referred to. (This assuming container_of() is defined
    properly.)

    > just rely on the address of the first member being the address of the
    > structure and just cast, to map that back to the struct private.


    Using container_of() should be just as portable as this.


    > I wonder whether the union of structures with common initial sequences
    > might be the way of absolutely ensuring the fields are in the same
    > place:
    >
    > union private {
    > struct public {
    > int have;
    > char *next;
    > } pub;
    > struct priv {
    > int have;
    > char *next;
    > other stuff;
    > } priv;
    > };
    >
    > The address of the union is guaranteed to be the same as the address
    > of pub and priv, so casting is safe, but in addition, have and next
    > will always be at the same relative address thanks to 6.5.2.3p6:
    >
    > """
    > 6 One special guarantee is made in order to simplify the use of unions: if a union contains
    > several structures that share a common initial sequence (see below), and if the union
    > object currently contains one of these structures, it is permitted to inspect the common
    > initial part of any of them anywhere that a declaration of the completed type of the union
    > is visible. Two structures share a common initial sequence if corresponding members
    > have compatible types (and, for bit-#elds, the same widths) for a sequence of one or more
    > initial members.
    > """


    Certainly corresponding fields in the two structs are going to
    have the same offsets. The question is, are the rules for
    effective types being followed? The guarantees in 6.5.2.3 p6
    hold only if the struct(s) in question are actually in a union
    object, which wasn't true in the original scenario. Strictly
    speaking it isn't safe to play these kinds of tricks unless the
    smaller struct is (recursively) a member of the other (or the
    struct(s) in question are in an actual union object).
    Tim Rentsch, Jan 24, 2012
    #12
  13. Mark Adler

    Tim Rentsch Guest

    Richard Damon <> writes:

    > On 9/26/11 4:14 AM, Gene wrote:
    >>
    >> Questions about pointers to struct sharing fields seem popular. As
    >> others have said there are few compilers that won't allow you to cast
    >> blithely among structs to access common initial elements.
    >>
    >> But there is a way to do it that the Standard guarentees will work.
    >> Someone here kindly pointed me to the key phrase.
    >>
    >> From the publicly available draft of C99 at
    >> http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf (but I think
    >> it's in C90, too):
    >>
    >> 6.7.2.13 ... A pointer to a structure object, suitably converted,
    >> points to its initial member (or if that member is a
    >> bit-field, then to the unit in which it resides), and vice versa.
    >>

    >
    > You also get a fairly good promise from 6.5.2.3 para 5
    >
    > One special guarantee is made in order to simplify the use of unions:
    > if a union contains several structures that share a common initial
    > sequence (see below), and if the union object currently contains one
    > of these structures, it is permitted to inspect the common initial
    > part of any of them anywhere that a declaration of the complete type
    > of the union is visible. Two structures share a common initial
    > sequence if corresponding members have compatible types (and, for
    > bit-fields, the same widths) for a sequence of one or more initial
    > members.
    >
    > While technically only applying to structs within a union, since a
    > pointer to the structs can be passed to other code, unless the pointer
    > includes information that the struct is actually part of a union, and
    > modifies access rules based on that, it will hold for more general
    > access. [snip]


    Technically wrong, because of optimization, but also
    bad advice for a more practical reason: the guarantee
    of 6.5.2.3 p5 holds only if the complete type of such
    a union is visible at the point of access. No union
    declaration visible means the compiler can freely
    assume the two struct types don't intermingle.
    Tim Rentsch, Jan 25, 2012
    #13
  14. Mark Adler

    Shao Miller Guest

    On 1/24/2012 18:56, Tim Rentsch wrote:
    > Phil Carmody<> writes:
    >
    >> Mark Adler<> writes:
    >>> I have an internal structure with lots of stuff in it for controlling
    >>> a data input operation. E.g.:
    >>>
    >>> struct data {
    >>> int have;
    >>> char *next;
    >>> ... lots more stuff to keep track of the state
    >>> };
    >>>
    >>> I would like to expose to the user (in the interface header file) only
    >>> those first two elements, so that I can provide a macro (also in the
    >>> header file) analogous to getc() to pull data that is readily
    >>> available in the buffer, otherwise to call a function to go get more
    >>> data. But I want to hide the other stuff in the structure, in order
    >>> to make sure that users of the library don't end up depending on
    >>> things that I change later for internal use. And for efficiency I
    >>> don't want yet another layer of indirection pointing to another
    >>> structure from within the exposed structure for the other stuff.
    >>>
    >>> E.g.
    >>>
    >>> struct ex {
    >>> int have;
    >>> char *next;
    >>> };
    >>>
    >>> #define getd(d) (((struct ex *)d)->have ? (((struct ex *)d)->have--,
    >>> *(((struct ex *)d)->next)++) : getdf(d))
    >>>
    >>> where d is a struct data *.
    >>>
    >>> My question is: does the C standard provide an assurance that this
    >>> will work? I.e. that two different structures with identical sets of
    >>> prefix elements will have those prefix elements layed out identically
    >>> in memory so that casting one structure to the other will permit
    >>> correct access to those prefix elements?

    >>
    >> struct private {
    >> struct public {
    >> int have;
    >> char *next;
    >> } pub;
    >> other stuff;
    >> }
    >>
    >> let the clients see and pass around pointers to pub, and then either
    >> use something like container_of() (which is not strictly portable), or

    >
    > Actually container_of() should be strictly portable provided the
    > outer structure type being casted to matches the actual object
    > being referred to. (This assuming container_of() is defined
    > properly.)
    >
    >> just rely on the address of the first member being the address of the
    >> structure and just cast, to map that back to the struct private.

    >
    > Using container_of() should be just as portable as this.
    >


    For the special case where the pointer points to an object at the
    beginning of any containing sub-object... If a pointer carries bounds
    information, subtracting from a pointer needn't be defined (apparently).

    struct s_foo {
    int i[1];
    double d[1];
    } * foo = malloc(sizeof *foo);
    double * dp;
    assert(foo);
    dp = foo->d;
    /* dp bounds might != foo bounds. Uh oh */
    foo = container_of(dp, struct s_foo, d);

    --
    "The stationery store has moved. Aaargh!"
    Shao Miller, Jan 25, 2012
    #14
  15. Mark Adler

    Tim Rentsch Guest

    Shao Miller <> writes:

    > On 1/24/2012 18:56, Tim Rentsch wrote:
    >> Phil Carmody<> writes:
    >>
    >>> Mark Adler<> writes:
    >>>> I have an internal structure with lots of stuff in it for controlling
    >>>> a data input operation. E.g.:
    >>>>
    >>>> struct data {
    >>>> int have;
    >>>> char *next;
    >>>> ... lots more stuff to keep track of the state
    >>>> };
    >>>>
    >>>> I would like to expose to the user (in the interface header file) only
    >>>> those first two elements, so that I can provide a macro (also in the
    >>>> header file) analogous to getc() to pull data that is readily
    >>>> available in the buffer, otherwise to call a function to go get more
    >>>> data. But I want to hide the other stuff in the structure, in order
    >>>> to make sure that users of the library don't end up depending on
    >>>> things that I change later for internal use. And for efficiency I
    >>>> don't want yet another layer of indirection pointing to another
    >>>> structure from within the exposed structure for the other stuff.
    >>>>
    >>>> E.g.
    >>>>
    >>>> struct ex {
    >>>> int have;
    >>>> char *next;
    >>>> };
    >>>>
    >>>> #define getd(d) (((struct ex *)d)->have ? (((struct ex *)d)->have--,
    >>>> *(((struct ex *)d)->next)++) : getdf(d))
    >>>>
    >>>> where d is a struct data *.
    >>>>
    >>>> My question is: does the C standard provide an assurance that this
    >>>> will work? I.e. that two different structures with identical sets of
    >>>> prefix elements will have those prefix elements layed out identically
    >>>> in memory so that casting one structure to the other will permit
    >>>> correct access to those prefix elements?
    >>>
    >>> struct private {
    >>> struct public {
    >>> int have;
    >>> char *next;
    >>> } pub;
    >>> other stuff;
    >>> }
    >>>
    >>> let the clients see and pass around pointers to pub, and then either
    >>> use something like container_of() (which is not strictly portable), or

    >>
    >> Actually container_of() should be strictly portable provided the
    >> outer structure type being casted to matches the actual object
    >> being referred to. (This assuming container_of() is defined
    >> properly.)
    >>
    >>> just rely on the address of the first member being the address of the
    >>> structure and just cast, to map that back to the struct private.

    >>
    >> Using container_of() should be just as portable as this.
    >>

    >
    > For the special case where the pointer points to an object at the
    > beginning of any containing sub-object... If a pointer carries bounds
    > information, subtracting from a pointer needn't be defined
    > (apparently).
    >
    > struct s_foo {
    > int i[1];
    > double d[1];
    > } * foo = malloc(sizeof *foo);
    > double * dp;
    > assert(foo);
    > dp = foo->d;
    > /* dp bounds might != foo bounds. Uh oh */
    > foo = container_of(dp, struct s_foo, d);


    The same argument applies to members that aren't arrays,
    which shows the argument must be wrong, since the whole
    point of offsetof() is to be able to convert back and
    forth between pointers to member objects and a pointer
    to the struct as a whole.
    Tim Rentsch, Feb 1, 2012
    #15
  16. Mark Adler

    Shao Miller Guest

    On 1/31/2012 19:11, Tim Rentsch wrote:
    > Shao Miller<> writes:
    >
    >> On 1/24/2012 18:56, Tim Rentsch wrote:
    >>>
    >>> Using container_of() should be just as portable as this.
    >>>

    >>
    >> For the special case where the pointer points to an object at the
    >> beginning of any containing sub-object... If a pointer carries bounds
    >> information, subtracting from a pointer needn't be defined
    >> (apparently).
    >>
    >> struct s_foo {
    >> int i[1];
    >> double d[1];
    >> } * foo = malloc(sizeof *foo);
    >> double * dp;
    >> assert(foo);
    >> dp = foo->d;
    >> /* dp bounds might != foo bounds. Uh oh */
    >> foo = container_of(dp, struct s_foo, d);

    >
    > The same argument applies to members that aren't arrays,
    > which shows the argument must be wrong, since


    The argument _does_ apply to members that aren't arrays, but I've seen
    especial attention regarding "arrays" with "bounds-checking," so the
    example highlights that.

    Why would a bounds-checking implementation _not_ be allowed to
    continually reduce the possible bounds, but never increase them? Say if
    pointer representation went something like:

    struct s_ptr {
    unsigned char * first_byte;
    size_t size;
    size_t current_position;
    };

    "internally" (C pseudo-code for the implementation of the abstract
    machine). So imagine, in the program for consideration, we have:

    struct s_foo {
    int i;
    double d;
    };
    struct s_foo foo = { 0 };
    double * d = &foo.d;

    "Internally," 'd' might have the members populated with values
    equivalent to '(unsigned char *) &foo + offsetof(struct s_foo, d)',
    'sizeof foo.d', '0', respectively.

    Suppose we cast:

    unsigned char * bytes_of_d = (unsigned char *) d;

    Suppose this 'bytes_of_d' now has the same "internal" member values as
    'd'. If you try to:

    unsigned char * backwards = bytes_of_d - 1;

    You've gone out-of-bounds. If, instead, you'd been using an 'unsigned
    char *' cast from a pointer to the whole of 'foo', then the bounds would
    be sufficient to point anywhere within it.

    > the whole
    > point of offsetof() is to be able to convert back and
    > forth between pointers to member objects and a pointer
    > to the struct as a whole.


    I'm not sure why you suggest this. The whole point of 'offsetof' is to
    provide an integer constant expression giving the offset of a 'struct'
    member, as far as the standard text goes. You might have some Committee
    experiences that grant you knowledge of motives beyond that, but those
    aren't widely accessible.

    It's possible that you simply meant to type 'container_of' rather than
    'offsetof', there.

    Following the absurd-but-seemingly-permitted example bounds-checking
    implementation example above, if we had:

    struct s_baz {
    unsigned char c;
    double d;
    };
    struct s_baz baz;
    unsigned char * whole = (unsigned char *) &baz;
    unsigned char * part = &baz.c;

    The bounds for 'whole' could be greater than the bounds for 'part'. Now
    if we cast:

    part = *((unsigned char (*)[sizeof baz]) part);

    Then "internally," the implementation could say "they must know that
    there's more data than the current bounds" and adjust the "internal"
    'size' member value accordingly. Then 'part' and 'whole' would be the
    same. But there doesn't appear to be a way to say "take this pointer
    value and cast it as though its 'current_position' is part-way along;
    I'd expect that if anything, such a spiteful implementation would always
    reset 'current_position' to 0... "Hey, you said this is what we've got
    here."

    But seriously, in general, how might a bounds-checking implementation
    deal with:

    int mumble(int * i) {
    i = i - 1;
    return *i;
    }

    where you might pass a pointer to the xth element of an array of 'int'
    but you might not? What's safe to assume? What kinds of bounds might
    be encoded or associated with the pointer value? Would the 'beginning'
    map all the way back to some declared type somewhere? Some effective
    type? Or is bounds-checking a special feature of using 'a' notation
    and 's.m' or 'sp->m' notation and "raw" points are immune?

    --
    "The stationery store has moved. Aaargh!"
    Shao Miller, Feb 1, 2012
    #16
  17. Mark Adler

    Phil Carmody Guest

    Tim Rentsch <> writes:
    > Phil Carmody <> writes:
    >
    > > Mark Adler <> writes:
    > >> I have an internal structure with lots of stuff in it for controlling
    > >> a data input operation. E.g.:
    > >>
    > >> struct data {
    > >> int have;
    > >> char *next;
    > >> ... lots more stuff to keep track of the state
    > >> };
    > >>
    > >> I would like to expose to the user (in the interface header file) only
    > >> those first two elements, so that I can provide a macro (also in the
    > >> header file) analogous to getc() to pull data that is readily
    > >> available in the buffer, otherwise to call a function to go get more
    > >> data. But I want to hide the other stuff in the structure, in order
    > >> to make sure that users of the library don't end up depending on
    > >> things that I change later for internal use. And for efficiency I
    > >> don't want yet another layer of indirection pointing to another
    > >> structure from within the exposed structure for the other stuff.
    > >>
    > >> E.g.
    > >>
    > >> struct ex {
    > >> int have;
    > >> char *next;
    > >> };
    > >>
    > >> #define getd(d) (((struct ex *)d)->have ? (((struct ex *)d)->have--,
    > >> *(((struct ex *)d)->next)++) : getdf(d))
    > >>
    > >> where d is a struct data *.
    > >>
    > >> My question is: does the C standard provide an assurance that this
    > >> will work? I.e. that two different structures with identical sets of
    > >> prefix elements will have those prefix elements layed out identically
    > >> in memory so that casting one structure to the other will permit
    > >> correct access to those prefix elements?

    > >
    > > struct private {
    > > struct public {
    > > int have;
    > > char *next;
    > > } pub;
    > > other stuff;
    > > }
    > >
    > > let the clients see and pass around pointers to pub, and then either
    > > use something like container_of() (which is not strictly portable), or

    >
    > Actually container_of() should be strictly portable provided the
    > outer structure type being casted to matches the actual object
    > being referred to. (This assuming container_of() is defined
    > properly.)


    All the implementations I've seen (which isn't a huge number) have
    implemented container_of as an almost identical macro in a way that I
    would consider non-portable. Looking more closely, I'm fairly sure
    that this nonportability is historical accident being propagated
    rather than actually being necessary. Nothing more than casting to
    character pointer, subtracting offset_of, and casting again should
    work portably (given the provision you mention).

    Phil
    --
    Unix is simple. It just takes a genius to understand its simplicity
    -- Dennis Ritchie (1941-2011), Unix Co-Creator
    Phil Carmody, Feb 12, 2012
    #17
  18. Mark Adler

    Tim Rentsch Guest

    Shao Miller <> writes:

    > On 1/31/2012 19:11, Tim Rentsch wrote:
    >> Shao Miller<> writes:
    >>
    >>> On 1/24/2012 18:56, Tim Rentsch wrote:
    >>>>
    >>>> Using container_of() should be just as portable as this.
    >>>>
    >>>
    >>> For the special case where the pointer points to an object at the
    >>> beginning of any containing sub-object... If a pointer carries bounds
    >>> information, subtracting from a pointer needn't be defined
    >>> (apparently).
    >>>
    >>> struct s_foo {
    >>> int i[1];
    >>> double d[1];
    >>> } * foo = malloc(sizeof *foo);
    >>> double * dp;
    >>> assert(foo);
    >>> dp = foo->d;
    >>> /* dp bounds might != foo bounds. Uh oh */
    >>> foo = container_of(dp, struct s_foo, d);

    >>
    >> The same argument applies to members that aren't arrays,
    >> which shows the argument must be wrong, since

    >
    > The argument _does_ apply to members that aren't arrays, but I've seen
    > especial attention regarding "arrays" with "bounds-checking," so the
    > example highlights that.
    >
    > Why would a bounds-checking implementation _not_ be allowed to
    > continually reduce the possible bounds, but never increase them?
    > [snip elaboration]


    For the same reason that an implementation can't decide
    that '2 == 2' is 12 - it doesn't match the specification.

    Admittedly the specification for the result of '2 == 2'
    is clearer and easier to understand, but that doesn't
    lessen the point. Whatever mechanism one might imagine
    to explain how an implementation works, all that matters
    is whether the result matches the specification. If the
    purpose is to discover what the Standard requires, arguments
    based on reasoning about some putative underlying mechanism
    usually serve to confuse more than illuminate.
    Tim Rentsch, Mar 8, 2012
    #18
  19. Mark Adler

    Tim Rentsch Guest

    Phil Carmody <> writes:

    > Tim Rentsch <> writes:
    >> Phil Carmody <> writes:
    >>
    >> > Mark Adler <> writes:
    >> >> I have an internal structure with lots of stuff in it for controlling
    >> >> a data input operation. E.g.:
    >> >>
    >> >> struct data {
    >> >> int have;
    >> >> char *next;
    >> >> ... lots more stuff to keep track of the state
    >> >> };
    >> >>
    >> >> I would like to expose to the user (in the interface header file) only
    >> >> those first two elements, so that I can provide a macro (also in the
    >> >> header file) analogous to getc() to pull data that is readily
    >> >> available in the buffer, otherwise to call a function to go get more
    >> >> data. But I want to hide the other stuff in the structure, in order
    >> >> to make sure that users of the library don't end up depending on
    >> >> things that I change later for internal use. And for efficiency I
    >> >> don't want yet another layer of indirection pointing to another
    >> >> structure from within the exposed structure for the other stuff.
    >> >>
    >> >> E.g.
    >> >>
    >> >> struct ex {
    >> >> int have;
    >> >> char *next;
    >> >> };
    >> >>
    >> >> #define getd(d) (((struct ex *)d)->have ? (((struct ex *)d)->have--,
    >> >> *(((struct ex *)d)->next)++) : getdf(d))
    >> >>
    >> >> where d is a struct data *.
    >> >>
    >> >> My question is: does the C standard provide an assurance that this
    >> >> will work? I.e. that two different structures with identical sets of
    >> >> prefix elements will have those prefix elements layed out identically
    >> >> in memory so that casting one structure to the other will permit
    >> >> correct access to those prefix elements?
    >> >
    >> > struct private {
    >> > struct public {
    >> > int have;
    >> > char *next;
    >> > } pub;
    >> > other stuff;
    >> > }
    >> >
    >> > let the clients see and pass around pointers to pub, and then either
    >> > use something like container_of() (which is not strictly portable), or

    >>
    >> Actually container_of() should be strictly portable provided the
    >> outer structure type being casted to matches the actual object
    >> being referred to. (This assuming container_of() is defined
    >> properly.)

    >
    > All the implementations I've seen (which isn't a huge number) have
    > implemented container_of as an almost identical macro in a way that I
    > would consider non-portable. Looking more closely, I'm fairly sure
    > that this nonportability is historical accident being propagated
    > rather than actually being necessary. Nothing more than casting to
    > character pointer, subtracting offset_of, and casting again should
    > work portably (given the provision you mention).


    I see, that's interesting. Do you know what the original
    motivation was for defining it that way, rather than (what
    seems to me to be) the more straightfoward portable one?
    Tim Rentsch, Mar 8, 2012
    #19
  20. Mark Adler

    Phil Carmody Guest

    Tim Rentsch <> writes:
    > Phil Carmody <> writes:
    > > Tim Rentsch <> writes:
    > >> Phil Carmody <> writes:
    > >> > Mark Adler <> writes:

    ....
    > >> > struct private {
    > >> > struct public {
    > >> > int have;
    > >> > char *next;
    > >> > } pub;
    > >> > other stuff;
    > >> > }
    > >> >
    > >> > let the clients see and pass around pointers to pub, and then either
    > >> > use something like container_of() (which is not strictly portable), or
    > >>
    > >> Actually container_of() should be strictly portable provided the
    > >> outer structure type being casted to matches the actual object
    > >> being referred to. (This assuming container_of() is defined
    > >> properly.)

    > >
    > > All the implementations I've seen (which isn't a huge number) have
    > > implemented container_of as an almost identical macro in a way that I
    > > would consider non-portable. Looking more closely, I'm fairly sure
    > > that this nonportability is historical accident being propagated
    > > rather than actually being necessary. Nothing more than casting to
    > > character pointer, subtracting offset_of, and casting again should
    > > work portably (given the provision you mention).

    >
    > I see, that's interesting. Do you know what the original
    > motivation was for defining it that way, rather than (what
    > seems to me to be) the more straightfoward portable one?


    I had presumed Linux's definition:

    #define container_of(ptr, type, member) ({ \
    const typeof( ((type *)0)->member ) *__mptr = (ptr); \
    (type *)( (char *)__mptr - offsetof(type,member) );})

    was lifted from an early gcc definition, as a lot of things like that were.

    It seems that someone else has questioned the seemingly necessary
    additional line:
    http://psomas.wordpress.com/2009/07/01/weird-kernel-macros-container_of/
    """
    EDIT: Apparently, the first line is there for `type checking'. It
    ensures that type has a member called member(howerver this is done by
    offsetof macro too, I think), and if ptr isn't a pointer to the
    correct type(the type of the member), the compiler will print a
    warning, which can be useful for debuging.
    """

    Which implies that it wasn't a gcc-ism after all.

    The gccs I have access to seem to have now pulled it out of the header
    files and absorbed it into the compiler itself.

    So the linux weirdness seems to be confined. (And that's all I've
    surrounded myself in for the last few years.)

    Phil
    --
    > I'd argue that there is much evidence for the existence of a God.

    Pics or it didn't happen.
    -- Tom (/. uid 822)
    Phil Carmody, Mar 13, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Billy
    Replies:
    2
    Views:
    500
    Billy
    Feb 1, 2006
  2. metfan
    Replies:
    2
    Views:
    4,847
    Robert Olofsson
    Oct 21, 2003
  3. Chris Fogelklou
    Replies:
    36
    Views:
    1,372
    Chris Fogelklou
    Apr 20, 2004
  4. Thomas Heller
    Replies:
    13
    Views:
    855
    Michele Simionato
    Feb 8, 2007
  5. J. Clifford Dyer

    Re: Partial 1.0 - Partial classes for Python

    J. Clifford Dyer, Feb 8, 2007, in forum: Python
    Replies:
    0
    Views:
    517
    J. Clifford Dyer
    Feb 8, 2007
Loading...

Share This Page