Offset/alignement of structure members

Discussion in 'C Programming' started by Arto Huusko, Oct 17, 2003.

  1. Arto Huusko

    Arto Huusko Guest

    Hello,

    I'm wondering about the portability and correctness of the
    following scenario.

    I have a generic doubly linked list structure, and it contains
    generic nodes that look like this:

    struct node
    {
    struct node *head;
    struct node *tail;
    };

    struct list
    {
    struct node *first;
    struct node *last;
    };

    Now, I want to use the generic list and node structures so that

    - The list and its nodes stay completely generic.
    That means I can have generic functions like

    add_head(struct list *, struct node *);

    Any list can be traversed and examined and manipulated
    with the generic functions, as long as the operations
    are stay on the generic level.

    - The nodes can contain application specific data. That is:
    the node structure is extended by adding application specific
    fields, which can only be accessed if the application knows
    what kind of node it is using.

    Application specific node would look like

    struct my_app_node
    {
    struct node node;
    int datafield1;
    char *datafield2;
    etc.
    };

    Now, an application could do:

    struct list mylist;
    struct my_app_node node1;
    struct my_app_node node2;

    init_list(&mylist);

    add_head(&mylist, (struct node *)&node1);
    add_head(&mylist, (struct node *)&node2);

    So, is this correct ISO C? Is this portable?

    I know that I could do also:

    add_head(&mylist, &node1.node);


    But, of course, both questions boil down to: can I trust
    that offset of "node" inside my_app_node is 0? And also,
    does the compiler let me get away with the cast?


    I also do know that a certainly correct solution would be:

    struct node
    {
    struct node *next;
    struct node *prev;
    void *data;
    };

    struct my_app_node
    {
    struct node node;
    /* data fields follow */
    };

    struct my_app_node mynode;
    mynode.node.data = &mynode;

    But in this scenario I'm completely vasting one pointer.

    (This all motivated by GCC 3.3's strict alignment stuff...)
     
    Arto Huusko, Oct 17, 2003
    #1
    1. Advertising

  2. Arto Huusko

    dis Guest

    "Arto Huusko" <> wrote in message
    news:p...

    > Hello,
    >
    > I'm wondering about the portability and correctness of the
    > following scenario.
    >
    > I have a generic doubly linked list structure, and it contains
    > generic nodes that look like this:
    >
    > struct node
    >


    > struct node *head;
    > struct node *tail;
    > };
    >
    > struct list
    > {
    > struct node *first;
    > struct node *last;
    > };
    >
    > Now, I want to use the generic list and node structures so that
    >
    > - The list and its nodes stay completely generic.
    > That means I can have generic functions like
    >
    > add_head(struct list *, struct node *);
    >
    > Any list can be traversed and examined and manipulated
    > with the generic functions, as long as the operations
    > are stay on the generic level.
    >
    > - The nodes can contain application specific data. That is:
    > the node structure is extended by adding application specific
    > fields, which can only be accessed if the application knows
    > what kind of node it is using.
    >
    > Application specific node would look like
    >
    > struct my_app_node
    > {
    > struct node node;
    > int datafield1;
    > char *datafield2;
    > etc.
    > };
    >
    > Now, an application could do:
    >
    > struct list mylist;
    > struct my_app_node node1;
    > struct my_app_node node2;
    >
    > init_list(&mylist);
    >
    > add_head(&mylist, (struct node *)&node1);
    > add_head(&mylist, (struct node *)&node2);
    >
    > So, is this correct ISO C? Is this portable?


    Yes, this code fragment is conforming.

    > I know that I could do also:
    >
    > add_head(&mylist, &node1.node);
    >
    >
    > But, of course, both questions boil down to: can I trust
    > that offset of "node" inside my_app_node is 0? And also,
    > does the compiler let me get away with the cast?


    Yes, offsetof(struct my_app_node, node) is guaranteed to yield a value of 0.
    The cast is necessary as a pointer to a structure object points to its
    initial member after suitable conversion.

    [snip]
     
    dis, Oct 17, 2003
    #2
    1. Advertising

  3. Arto Huusko

    CBFalconer Guest

    Arto Huusko wrote:
    >
    > I'm wondering about the portability and correctness of the
    > following scenario.
    >
    > I have a generic doubly linked list structure, and it contains
    > generic nodes that look like this:
    >
    > struct node
    > {
    > struct node *head;
    > struct node *tail;
    > };
    >
    > struct list
    > {
    > struct node *first;
    > struct node *last;
    > };
    >
    > Now, I want to use the generic list and node structures so that
    >
    > - The list and its nodes stay completely generic.
    > That means I can have generic functions like
    >
    > add_head(struct list *, struct node *);


    You have to do two basic things. Have the data accessible by code
    the application provides, because ONLY the application knows the
    type of that data. Have the lists manipulated by code that
    preserves those generic pointers, which have to be of type void*.
    Thus your basic node will look something like:

    struct node {
    struct node *next;
    struct node *prev;
    void* dataptr;
    }

    This will NOT be in the published header file, which will only
    contain:

    typedef struct node *node;

    keeping the definition totally hidden from the application. Now
    your call will be something like:

    add_head(node thelist, void *datum);

    where thelist was created by the call:

    node thelist = createlist();

    and you will probably also want

    void destroylist(node thelist, freedata destroydata);

    where destroydata is a pointer to a routine receiving void* and
    releasing the data it points to. This may be required for other
    operations, and you might supply it once and for all via
    createlist.

    Now createlist can malloc a struct node, initialize the pointers
    to null, or to point back to the head if desired, with a NULL
    datum, and you can henceforce use that dummy node to hold the
    first and last pointers.

    An example is worth a good deal of rambling. You can see exactly
    this sort of generic operation in hashlib.zip, especially noting
    the example usages in wdfreq and markov. It can be found at:

    <http://cbfalconer.home.att.net/download/>

    --
    Chuck F () ()
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net> USE worldnet address!
     
    CBFalconer, Oct 17, 2003
    #3
  4. Arto Huusko

    Eric Sosman Guest

    CBFalconer wrote:
    >
    > You have to do two basic things. Have the data accessible by code
    > the application provides, because ONLY the application knows the
    > type of that data. Have the lists manipulated by code that
    > preserves those generic pointers, which have to be of type void*.
    > Thus your basic node will look something like:
    >
    > struct node {
    > struct node *next;
    > struct node *prev;
    > void* dataptr;
    > }
    > [...]


    This approach works, and has some advantages. One is that
    a given data item can exist simultaneously in an arbitrary
    number of lists.

    However, it also has a disadvantage: The linkage information
    is split away from the "payload," and this gets you into issues
    of allocating storage for the two independently, worrying about
    all those tiny allocations of pared-down `struct node' objects,
    and so on. All these issues are manageable, but they do require
    some management -- and if you don't need the full flexibility
    of CBF's method, the O.P.'s technique of embedding the links
    in the same struct as the payload can be more convenient.

    Back to the O.P.'s original question: In a situation like

    struct node {
    struct node *next;
    struct node *prev;
    };

    struct data {
    struct node links;
    int payload[42];
    double trouble[2];
    };

    .... is the `links' member of `struct data' guaranteed to be
    at offset zero within the struct? Yes, says Section 6.7.2.1
    paragraph 13: "A pointer to a structure object, suitably
    converted, points to its initial member [...] and vice versa.
    There may be unnamed padding within a structure object, but not
    at its beginning." So you can safely feed a pointer to the
    `links' member into your linked-list package, get that same
    pointer back later on, and convert it into a `struct data*'
    without trouble:

    struct list mylist;
    struct data mydata;
    struct data *dptr;
    ...
    add (&mylist, &mydata.links);
    ...
    dptr = (struct data*) get (&mylist);

    In fact, you can do a little bit more without too much
    trouble. Suppose you want each `struct data' to exist in
    exactly two lists (or in any fixed number of lists). You can
    still keep the linkage information with the payload:

    struct data {
    struct node links1;
    struct node links2;
    int payload[42];
    double trouble[2];
    };

    The `links1' element is at offset zero, and can be handled as
    before. The `links2' element is not at offset zero, but it is
    at a constant offset, namely `offsetof(struct data, links2)'.
    You can insert the struct into a secondary list like

    struct list list2;
    struct data mydata;
    ...
    add (&list2, &mydata.links2);

    Later on when you want to retrieve it you must work a little
    harder (the following is spelled out for clarity; in practice
    you'd probably abbreviate things a bit):

    struct list *lptr;
    struct data *dptr;
    ...
    lptr = get (&list2);
    dptr = (struct data *)
    ( (char*)lptr - offsetof(struct data, links2) );

    That is, you retrieve a pointer to a `links2' element (because
    that's the sort of pointer you put into the list in the first
    place), and then you "back up" by a known distance to find the
    start of the `struct data' that contains it. Of course, you've
    got to be careful not to mix `links1' and `links2' pointers in
    the same list; this technique requires that you know what offset
    to use.

    For a small, fixed number of lists -- one is the commonest
    case, sparse matrix representations sometimes use a "row" list
    and a "column" list -- this technique can be recommended. It's
    not good for a variable number of lists (because it's hard to
    keep track of the proper offsets) or for a large number of lists
    (because the bookkeeping gets unwieldy and the chance of error
    grows); for those circumstances I'd prefer CBF's technique.

    --
     
    Eric Sosman, Oct 17, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Lance Riedel

    Translated Offset to Source Offset

    Lance Riedel, Oct 14, 2003, in forum: XML
    Replies:
    2
    Views:
    526
    Patrick TJ McPhee
    Oct 15, 2003
  2. junky_fellow

    offset of a member inside a structure

    junky_fellow, Oct 14, 2004, in forum: C Programming
    Replies:
    15
    Views:
    1,020
    Tim Rentsch
    Oct 24, 2004
  3. Replies:
    2
    Views:
    409
  4. easter bunny

    icc, gcc sse alignement

    easter bunny, Jul 3, 2007, in forum: C++
    Replies:
    2
    Views:
    417
    easter bunny
    Jul 4, 2007
  5. Roy Smith
    Replies:
    4
    Views:
    320
    Roy Smith
    Jan 27, 2013
Loading...

Share This Page