QUERY: field offset rules within structures

Discussion in 'C Programming' started by Bradford Chamberlain, Jul 9, 2003.

  1. I work a lot with multidimensional arrays of dynamic size in C,
    implementing them using a single-dimensional C array and the
    appropriate multipliers and offsets to index into it appropriately. I
    tend to iterate over subsets of these arrays by walking a pointer and
    an offset per dimension across their dimensions. I've found that this
    tends to result in more performance in a portable manner than doing
    explicit indexing calculations and hoping that the compiler hoists
    invariant expressions out of the loops.

    For example, rather than doing something like:

    {
    for (i=ilo; i<ihi; i+=istr) {
    for (j=jlo; j<jhil j+=jstr) {
    ... a[(i-ailo)*imult + (j-ajlo)*jmult] ...
    }
    }
    }

    I would tend to do:

    {
    double* aptr = ...;
    const int abumpj = ...;
    const int abumpi = ...;

    for (i=ilo; i<ihi; i+=istr) {
    for (j=jlo; j<jhil j+=jstr) {
    ... *aptr ...

    aptr += abumpj;
    }
    aptr += abumpi;
    }
    }

    My question is motivated by the following desire: If I have an array
    of structures, I'd like to be able to walk around *a single field*
    within the array in a similar manner. For example, given the
    structure:

    struct _mystruct {
    short x;
    double y;
    char z;
    }

    I would like the ability to walk through the "y" fields of the array
    using a pointer and offsets.

    I've been able to achieve this using a char* pointer as follows:

    {
    char* xptr = ...; /* point to y field of the initial element */
    const int xbumpj = ...; /* express in bytes rather than */
    const int xbumpi = ...; /* sizeof(doubles) */

    for (i=ilo; i<ihi; i+=istr) {
    for (j=jlo; j<jhil j+=jstr) {
    ... *((double*)xptr) ...

    xptr += xbumpj;
    }
    xptr += xbumpi;
    }
    }

    I'd really prefer to eliminate the reliance on char* pointers and
    pointer casting, however. In particular, I'd like to make xptr a
    double*.

    My question is therefore the following: does C have any requirements
    on structure layout that would ensure that walking from field to field
    within an array of structures will be evenly divisible by sizeof()
    that field's type?

    If not, it seems like my hope is in vain because the additive offsets
    will be in terms of sizeof(<field type>) rather than bytes (or I will
    have to cast my pointer back to char* before adding a generic offset
    to it).

    If so, I'm off and running. However, it's hard for me to believe that
    C would ensure such rules on padding, which is why I ask.

    The only other approach I can think of is to walk a structure pointer
    over my array's elements and then dereference the pointer to access
    the field. I guess I've avoided this approach simply due to the fact
    that it seems more expensive (due to the field offset calculation
    involved).

    Does anyone have any other suggstions for how I might write such code
    without relying on char* pointers?

    Thanks,
    -Brad
     
    Bradford Chamberlain, Jul 9, 2003
    #1
    1. Advertising

  2. On 9 Jul 2003, Bradford Chamberlain wrote:

    > The only other approach I can think of is to walk a structure pointer
    > over my array's elements and then dereference the pointer to access
    > the field. I guess I've avoided this approach simply due to the fact
    > that it seems more expensive (due to the field offset calculation
    > involved).


    This is the right approach. The char * approach produces
    undefined behavior.

    Tak-Shing
     
    Tak-Shing Chan, Jul 9, 2003
    #2
    1. Advertising

  3. Bradford Chamberlain

    Eric Sosman Guest

    Bradford Chamberlain wrote:
    > [...]
    > My question is motivated by the following desire: If I have an array
    > of structures, I'd like to be able to walk around *a single field*
    > within the array in a similar manner. For example, given the
    > structure:
    >
    > struct _mystruct {
    > short x;
    > double y;
    > char z;
    > }
    >
    > I would like the ability to walk through the "y" fields of the array
    > using a pointer and offsets.
    >
    > [...]
    >
    > My question is therefore the following: does C have any requirements
    > on structure layout that would ensure that walking from field to field
    > within an array of structures will be evenly divisible by sizeof()
    > that field's type?


    No. This would be equivalent to requiring that the
    sizeof a struct be a multiple of the sizeof all the elements,
    and there is no such requirement. An example will indicate
    why such a requirement would be a poor idea:

    struct {
    char a[2], b[3], c[5], d[7], e[11];
    char f[13], g[17], h[19], i[23], j[29];
    } x;

    If `sizeof x' had to be a multiple of all of `sizeof x.a'
    through `sizeof x.j', it's easy to see that it would have
    to be a multiple of 2*3*...*29 = 6469693230. Don't you
    think it would be a wee bit wasteful to add more than six
    gigabytes of padding to a payload of 2+3+...+29 = 129
    bytes of actual data?

    What you *can* rely on is that the sizeof a struct is
    a multiple of the strictest alignment required of any of its
    elements. For example, some C implementations use an eight-
    byte `double' but only require it to be aligned to a four-
    byte boundary. On such an implementation, the struct

    struct { char c; double d; } y[2];

    .... might well occupy twelve bytes. This suffices to align
    the `d' elements in an array of such structs, but it's obvious
    that the twelve-byte "step" from y[0].d to y[1].d is not a
    multiple of eight.

    > The only other approach I can think of is to walk a structure pointer
    > over my array's elements and then dereference the pointer to access
    > the field. I guess I've avoided this approach simply due to the fact
    > that it seems more expensive (due to the field offset calculation
    > involved).


    That's the clean way to do it. You'll have to measure the
    performance consequences and decide whether they're important
    or not in your situation. If the performance is unacceptable,
    one alternative would be to "decompose" your structs into
    parallel arrays. That is, instead of your example of

    struct _mystruct {
    short x;
    double y;
    char z;
    }

    .... presumably augmented with `struct _mystruct *p' and the
    like, you'd instead have

    short *my_x;
    double *my_y;
    char *my_z;

    .... and access the three arrays independently. Depending on
    what you're doing, this could either improve or disimprove
    the running time.

    --
     
    Eric Sosman, Jul 9, 2003
    #3
  4. On Wed, 9 Jul 2003, Bradford Chamberlain wrote:
    >
    > My question is motivated by the following desire: If I have an array
    > of structures, I'd like to be able to walk around *a single field*
    > within the array in a similar manner. For example, given the
    > structure:
    >
    > struct _mystruct {
    > short x;
    > double y;
    > char z;
    > }
    >
    > I would like the ability to walk through the "y" fields of the array
    > using a pointer and offsets.

    <snip>

    > I'd really prefer to eliminate the reliance on char* pointers and
    > pointer casting, however. In particular, I'd like to make xptr a
    > double*.

    <snip>

    > Does anyone have any other suggstions for how I might write such code
    > without relying on char* pointers?


    Obviously the other answers you've received are far better than this
    "suggestion," so please don't use it in production code; but I see no
    reason that this [untested code] wouldn't work.


    extern void process(double);
    extern struct _mystruct my_array[42];
    extern int num_structs;

    int i;
    double *p = &(my_array->d);
    for (i=0; i < num_structs; ++i)
    {
    process(*p);
    p = (double *) ((unsigned char *)p + sizeof(struct _mystruct));
    }


    Remember: Don't Do This! :)
    -Arthur
     
    Arthur J. O'Dwyer, Jul 10, 2003
    #4
  5. Thanks to everyone for their answers. It was gratifying to get so
    many responses so quickly. Having spent another night mulling this
    over, I have some more thoughts/questions, and will phrase them in
    terms of Arthur's code:


    "Arthur J. O'Dwyer" <> writes:

    > Obviously the other answers you've received are far better than this
    > "suggestion," so please don't use it in production code; but I see no
    > reason that this [untested code] wouldn't work.
    >
    >
    > extern void process(double);
    > extern struct _mystruct my_array[42];
    > extern int num_structs;
    >
    > int i;
    > double *p = &(my_array->d);
    > for (i=0; i < num_structs; ++i)
    > {
    > process(*p);
    > p = (double *) ((unsigned char *)p + sizeof(struct _mystruct));
    > }



    This is very similar to my original approach, except that you're
    storing the pointer as a double* and then casting it as a char* to
    move it (whereas I always stored as a char* and cast as a double* to
    access it...). Unfortunately the cast to char* doesn't satisfy my
    query since I was hoping to get rid of all references to char*...

    That said, I'm in the same boat as you: I don't see why this type of
    approach would not work. Tak-Shing Chan's response seems to assert
    that such approaches aren't guaranteed to work (though in fact, in
    practice it has on the dozens of platforms I've used, from desktop
    machines to supercomputers). Could someone give a less elusive
    explanation of why it isn't guaranteed to work? In particular, I'd be
    curious to know why such an approach might fail due to the language
    standard, as well as what would cause it to fail in practice with
    actual compilers (since it never has that I've observed).


    For those interested in continuing to pursue this conversation, I've
    also remembered a second reason why I was uncomfortable with the
    solution that walked a "pointer-to-struct" through memory,
    dereferencing it to access a field at a time. The situation requires
    that you understand that I have a "multidimensional array descriptor"
    of sorts, a simplified version of which appears below:

    struct _array {
    void* data; /* pointer to data buffer */
    int numdims; /* number of conceptual dimensions */
    int block[MAXDIM]; /* per-dim multipliers in bytes */
    } array;

    (the real descriptor has additional fields supporting non-0 origins,
    strided arrays, etc.)

    For example, to initialize an n x n array of doubles, I would do
    something like the following:

    array mydblarray;

    mydblarray.data = malloc(n*n*sizeof(double));
    mydblarray.numdims = 2;
    mydblarray.block[0] = sizeof(double);
    mydblarray.block[1] = n * mydblarray.block[0];

    Then, I can create functions that operate on these arrays. For
    example, the following code zeros out the first k x k elements in a 2D
    array of doubles (assuming the array size is >= k in each dim).

    void zero_kxk_dbl_arr(array* dblarr, int k) {
    int i, j;
    char* wlk = dblarr->data;
    bump_j = dblarr->block[0];
    bump_i = dblarr->block[1] - (k*(dblarr->block[0]));

    for (i=0; i<k; i++) {
    for (j=0; j<k; j++) {
    *(double*)wlk = 0.0;

    wlk += bump_j;
    }
    wlk += bump_i;
    }
    }

    Now, the nice thing about this approach is that I can resuse this code
    for a double field in an array of structs by fiddling with the array
    descriptor slightly. For example:

    /* a struct with a double field */
    typedef struct _mystruct {
    short x;
    double y;
    } mystruct;

    /* intialize an array of structs */
    array mystructarray;
    mystructarray.data = malloc(n*n*sizeof(mystruct));
    mystructarray.numdims = 2;
    mystructarray.block[0] = sizeof(mystruct);
    mystructarray.block[1] = n * mystructarray.block[0];

    /* spoof array using y field from mystructarray */
    array tmparray = mystructarray
    tmparray.data += CALC_OFFSET(y);

    /* pass spoofed descriptor in as loosely strided double array */
    zero_kxk_dbl_arr(tmparray, 5);

    This has been a nice property for code reuse, in my experience
    (especially given the fact that I'm not writing all of this code by
    hand, but am generating it automatically from a higher-level array
    language). Yet it relies on the char* approach since (as we learned
    yesterday) we can't make assumptions about the distance between fields
    in two adjacent structs. Furthermore, in this case, I can't use a
    struct walker since the context in which the struct's field is being
    utilized is unaware of the struct's existence.

    So I guess my questions here are two: (1) Does anyone have other
    suggestions for how to approach this without char*s, yet without
    giving away the code reuse flexibility that they result in? and (2)
    I'm sure someone will tell me that what I'm doing is illegal in C, and
    I'm willing to believe that but I don't understand it. Why is it?
    (probably same reason as Arthur's code is). Again, I'd be curious to
    hear both why it's illegal according to the standard even though it
    generally seems to work in practice, as well as how it would fail in
    practice on real compilers (since I haven't observed that anywhere).

    Thanks again for all the answers, I really do appreciate the information
    I've been getting,
    -Brad
     
    Bradford Chamberlain, Jul 10, 2003
    #5
  6. On Thu, 10 Jul 2003, Bradford Chamberlain wrote:
    >
    > "Arthur J. O'Dwyer" <> writes:
    > >
    > > Obviously the other answers you've received are far better than this
    > > "suggestion," so please don't use it in production code; but I see no
    > > reason that this [untested code] wouldn't work.


    Clarification: The below code is, to the best of my knowledge,
    *completely* legal C and C++. There is nothing wrong with it
    from the point of view of the compiler. I just meant that (1)
    it's fugly; and (2) I hadn't actually compiled it myself yet.
    Don't do this because it's a PITA to maintain, not because it's
    not portable or anything (IOW, it *is* portable).

    > > int i;
    > > double *p = &(my_array->d);
    > > for (i=0; i < num_structs; ++i)
    > > {
    > > process(*p);
    > > p = (double *) ((unsigned char *)p + sizeof(struct _mystruct));
    > > }

    >
    > This is very similar to my original approach, except that you're
    > storing the pointer as a double* and then casting it as a char* to


    Nitpick: an **unsigned** char *, which is guaranteed to be able to
    represent the value. I don't know about plain char, and I don't ever
    expect to need to know, because I use 'unsigned char' for hacks like
    this exclusively.

    > move it (whereas I always stored as a char* and cast as a double* to
    > access it...). Unfortunately the cast to char* doesn't satisfy my
    > query since I was hoping to get rid of all references to char*...


    Why?

    > That said, I'm in the same boat as you: I don't see why this type of
    > approach would not work. Tak-Shing Chan's response seems to assert
    > that such approaches aren't guaranteed to work


    I don't know what Tak-Shing Chan was referring to with his very vague
    comment about leading to UB. Maybe he was referring to the fact that
    you can't reliably cast a 'struct _mystruct *' to a 'double *', or vice
    versa, whether you pass them through 'char *' first or not.

    > (though in fact, in
    > practice it has on the dozens of platforms I've used, from desktop
    > machines to supercomputers). Could someone give a less elusive
    > explanation of why it isn't guaranteed to work? In particular, I'd be
    > curious to know why such an approach might fail due to the language
    > standard, as well as what would cause it to fail in practice with
    > actual compilers (since it never has that I've observed).


    What *will* fail is what you originally asked about: whether the internals
    of structures were always aligned so that the 'double' fields of adjacent
    structures were always a multiple of 'sizeof(double)' bytes apart. This
    is not true. Consider the made-up example:

    struct foo
    {
    double d; /* suppose 10-byte doubles on this machine */
    /* and 2 bytes of structure padding */
    int i; /* and 4-byte ints */
    } a[2];

    Now a[0].d and a[1].d are both doubles, and (a+0)->d and (a+1)->d are
    both pointers to double, but the expression

    &(a+1)->d - &(a+0)->d

    yields undefined behavior. As should be obvious, since the two objects
    are 16 bytes apart, which isn't a multiple of [sizeof(double)=] 10.

    So trying to simplistically "step through" the array is doomed to
    failure. You need to account for the fact that each 'double' is
    part of a larger 'struct _mystruct'.

    > For those interested in continuing to pursue this conversation, I've
    > also remembered a second reason why I was uncomfortable with the
    > solution that walked a "pointer-to-struct" through memory,
    > dereferencing it to access a field at a time. The situation requires
    > that you understand that I have a "multidimensional array descriptor"
    > of sorts, a simplified version of which appears below:


    This code is incredibly yucky. Or so it looks to me right now.
    I'm not going to try to comprehend it, except to note that if you
    want bounds-checked array types (which is the only use I see for this
    sort of thing), C++ already has them. Besides, if you're going for
    an ADT sort of thing, then you should really supply some accessor
    functions to do your dereferencing and indexing, and then write your
    main code using those accessors. YMMV.

    > For example, to initialize an n x n array of doubles, I would do
    > something like the following:
    >
    > array mydblarray;
    >
    > mydblarray.data = malloc(n*n*sizeof(double));
    > mydblarray.numdims = 2;
    > mydblarray.block[0] = sizeof(double);
    > mydblarray.block[1] = n * mydblarray.block[0];


    See how yucky that is? To initialize an instance of *my* imaginary
    bounds-checked array type, I'd write

    array *mydblarray;
    mydblarray = new_adtarray(sizeof(double), 2,n,n);
    if (!mydblarray) exit(EXIT_FAILURE);

    and use callbacks like those passed to qsort() in order to do
    complicated things with whole matrices. N.B. that you omitted
    to check the return value of malloc() in the snippet above.
    What with duplicating that yucky code every time you need to
    create a matrix, you're going to run into silly mistakes like
    that a *lot*.

    > This has been a nice property for code reuse, in my experience
    > (especially given the fact that I'm not writing all of this code by
    > hand, but am generating it automatically from a higher-level array
    > language). Yet it relies on the char* approach since (as we learned
    > yesterday) we can't make assumptions about the distance between fields
    > in two adjacent structs. Furthermore, in this case, I can't use a
    > struct walker since the context in which the struct's field is being
    > utilized is unaware of the struct's existence.


    Pass another argument to the struct-walking function giving the sizeof
    the struct in bytes. Then use the (unsigned char *) method I outlined
    upthread. Or, probably better, re-write the core array library so it is
    more user-friendly, even at the expense of a few milliseconds per run.

    > So I guess my questions here are two: (1) Does anyone have other
    > suggestions for how to approach this without char*s, yet without
    > giving away the code reuse flexibility that they result in?


    Template programming. But that's a different language. C uses
    pointers to void and pointers to unsigned char for genericity; it's
    just the way it works (TM).

    > and (2)
    > I'm sure someone will tell me that what I'm doing is illegal in C, and
    > I'm willing to believe that but I don't understand it. Why is it?
    > (probably same reason as Arthur's code is).


    Mine isn't. I didn't look closely at yours, but it probably isn't
    either.

    > Thanks again for all the answers, I really do appreciate the information
    > I've been getting,
    > -Brad


    You're welcome.
    -Arthur
     
    Arthur J. O'Dwyer, Jul 10, 2003
    #6
  7. Bradford Chamberlain

    Eric Sosman Guest

    Bradford Chamberlain wrote:
    >
    > [ lots more than has survived my brutal snippage ]
    >
    > The situation requires
    > that you understand that I have a "multidimensional array descriptor"
    > of sorts, a simplified version of which appears below:
    >
    > struct _array {
    > void* data; /* pointer to data buffer */
    > int numdims; /* number of conceptual dimensions */
    > int block[MAXDIM]; /* per-dim multipliers in bytes */
    > } array;
    >
    > [...]
    >
    > Now, the nice thing about this approach is that I can resuse this code
    > for a double field in an array of structs by fiddling with the array
    > descriptor slightly. For example:
    >
    > /* a struct with a double field */
    > typedef struct _mystruct {
    > short x;
    > double y;
    > } mystruct;
    >
    > /* intialize an array of structs */
    > array mystructarray;
    > mystructarray.data = malloc(n*n*sizeof(mystruct));
    > mystructarray.numdims = 2;
    > mystructarray.block[0] = sizeof(mystruct);
    > mystructarray.block[1] = n * mystructarray.block[0];
    >
    > /* spoof array using y field from mystructarray */
    > array tmparray = mystructarray
    > tmparray.data += CALC_OFFSET(y);


    Bzzt! The `data' element is a `void*', and pointer
    arithmetic isn't defined on `void*'. The gcc compiler (and
    others, perhaps) treats `void*' as synonymous with `char*'
    for this purpose, but that's a non-conforming extension to
    the language. Some conforming alternatives:

    - tmparray.data = (char*)tmparray.data + CALC_OFFSET(y);
    - Declare `data' as `char*' instead of `void*'
    - Declare `data' as `double*' instead of `void*', and
    make adjustments to the pointer arithmetic as in
    Arthur O'Dwyer's post.

    By the way, <stddef.h> provides an offsetof() macro that
    will be useful in implementing your CALC_OFFSET().

    > [...]
    > So I guess my questions here are two: (1) Does anyone have other
    > suggestions for how to approach this without char*s, yet without
    > giving away the code reuse flexibility that they result in? and (2)
    > I'm sure someone will tell me that what I'm doing is illegal in C, and
    > I'm willing to believe that but I don't understand it. Why is it?
    > (probably same reason as Arthur's code is). Again, I'd be curious to
    > hear both why it's illegal according to the standard even though it
    > generally seems to work in practice, as well as how it would fail in
    > practice on real compilers (since I haven't observed that anywhere).


    For (1), I don't understand why you're so anxious about using
    `char*'. It's quite a natural thing to do when you find yourself
    laying things out in storage -- a C object's representation is
    defined as an array of characters, so why not use a character
    pointer? My usual advice is to forget about representations and
    concentrate on values, but this is one of those places where the
    representation comes to the fore -- and `char*' is a perfectly
    appropriate handle for a representation.

    As for (2), I see nothing illegal with what you're doing.
    The eventual conversion to `double*' (snipped; see up-thread)
    will succeed, because you're correctly computing the "stride"
    between successive elements and they're all properly aligned
    as `double' values should be. You could bullet-proof things
    a bit by making `block' an array of `size_t' instead of an
    array of `int', but even that won't be a problem if your arrays
    are of moderate size. Perhaps Tak-Shing spotted some other
    problem that's eluded me (it wouldn't be the first time ...).

    --
     
    Eric Sosman, Jul 10, 2003
    #7
  8. Bradford Chamberlain

    Guest

    Bradford Chamberlain <> wrote:
    >
    > That said, I'm in the same boat as you: I don't see why this type of
    > approach would not work. Tak-Shing Chan's response seems to assert
    > that such approaches aren't guaranteed to work (though in fact, in
    > practice it has on the dozens of platforms I've used, from desktop
    > machines to supercomputers). Could someone give a less elusive
    > explanation of why it isn't guaranteed to work? In particular, I'd be
    > curious to know why such an approach might fail due to the language
    > standard, as well as what would cause it to fail in practice with
    > actual compilers (since it never has that I've observed).


    Consider an implementation where doubles are 8 bytes long, but only need
    to be aligned on 4 byte boundaries. Now consider an array of structs
    like:

    struct {
    int n;
    double d;
    } a[10];

    Now the size of a[0] is 12, as is (char *)&a[1].d - (char *)a[0].d,
    which is 1.5 doubles, making it impossible to move a double pointer from
    one to the next without an intermediate cast to some other type. Such
    machines undoubtedly exist, although I can't name one off the top of my
    head. Machines where doubles can be on any byte boundary are legion,
    including the x86 and the Vax.

    -Larry Jones

    Oh, what the heck. I'll do it. -- Calvin
     
    , Jul 10, 2003
    #8
  9. Eric Sosman <> writes:

    > > tmparray.data += CALC_OFFSET(y);

    >
    > Bzzt! The `data' element is a `void*', and pointer
    > arithmetic isn't defined on `void*'. The gcc compiler (and
    > others, perhaps) treats `void*' as synonymous with `char*'
    > for this purpose, but that's a non-conforming extension to
    > the language.


    Good point -- this was me typing code off the top of my head and not
    compiling it to check it. But you understood what I meant, so that's
    the most important thing.


    > For (1), I don't understand why you're so anxious about using
    > `char*'.


    I was hoping to avoid complicating my already-too-long post with any
    justification. :) But since you asked...

    The reason is simply that I'm using a parallelizing compiler that
    doesn't do so well in the presence of char*'s or char* casts. Its
    alignment rules are designed such that the answer to the original
    question (will the fields of successive records in an array be spaced
    by a multiple of the field type?) happens to be "yes." I was worried
    that relying on this rule would result in a lack of portability to
    more conventional machines/compilers, which resulted in my original
    post (and learned that my worries were justified).

    My sense at this point is that I'm going to have to generate two
    versions of the code: one using the char* technique which a growing
    number of people agree is legal (pending explanation from Tak-Shing),
    and a second which takes advantage of this particular compiler's
    alignment rules to get around its char* allergy.

    Thanks for the opinions and backup, Arthur and Eric.

    As far as the "wow, your data structures are gross" and "boy, this is
    tricky stuff to get right" issues go, the saving graces and motivation
    are: (1) All this code I'm referring to is compiler-generated, so it's
    not designed to be utilized by a user as an ADT; (2) What I put in my
    postings was just a stripped-down version to get to my questions as
    quickly as possible -- the real deal is much more complicated, for
    better or worse; (3) performance is king because this code generation
    is for a scientific computing language, so I really am trying to
    squeeze portable performance out at the risk of making the code hard
    to read or write for a human.

    Thanks again for all the responses,
    -Brad
     
    Bradford Chamberlain, Jul 11, 2003
    #9
  10. Bradford Chamberlain

    Guest

    Bradford Chamberlain <> wrote:
    >
    > The reason is simply that I'm using a parallelizing compiler that
    > doesn't do so well in the presence of char*'s or char* casts. Its
    > alignment rules are designed such that the answer to the original
    > question (will the fields of successive records in an array be spaced
    > by a multiple of the field type?) happens to be "yes."


    I sincerely doubt that that's true. It may well be true for the
    particular types you're interested in, but not for all types. Consider:

    struct {
    int a;
    struct { char b[200]; } c;
    } d[10];

    I'm pretty sure that c won't be aligned on a 200 byte boundary (which
    would require each element of d to be 400 bytes long).

    -Larry Jones

    I sure wish I could get my hands on some REAL dynamite. -- Calvin
     
    , Jul 11, 2003
    #10
  11. writes:

    > > alignment rules are designed such that the answer to the original
    > > question (will the fields of successive records in an array be spaced
    > > by a multiple of the field type?) happens to be "yes."

    >
    > I sincerely doubt that that's true. It may well be true for the
    > particular types you're interested in, but not for all types. Consider:
    >
    > struct {
    > int a;
    > struct { char b[200]; } c;
    > } d[10];
    >
    > I'm pretty sure that c won't be aligned on a 200 byte boundary (which
    > would require each element of d to be 400 bytes long).


    You're right, I had oversimplified the rule in my mind to deal with
    the scalar field types I was most interested in (in particular, I
    don't tend to use char fields much). Your point is a good one: if I
    typedef char ch200[200] and replace the c declaration above with
    ch200, then I can't walk a ch200 pointer from one c field to the next
    without casting it to a pointer to a smaller type. Thanks for
    pointing this out, it's a subtle point that I probably would've
    glossed over otherwise.

    -Brad
     
    Bradford Chamberlain, Jul 11, 2003
    #11
  12. On 09 Jul 2003 13:32:48 -0700, Bradford Chamberlain
    <> wrote:

    >
    >I work a lot with multidimensional arrays of dynamic size in C,
    >implementing them using a single-dimensional C array and the
    >appropriate multipliers and offsets to index into it appropriately. I
    >tend to iterate over subsets of these arrays by walking a pointer and
    >an offset per dimension across their dimensions. I've found that this
    >tends to result in more performance in a portable manner than doing
    >explicit indexing calculations and hoping that the compiler hoists
    >invariant expressions out of the loops.
    >
    >For example, rather than doing something like:
    >
    > {
    > for (i=ilo; i<ihi; i+=istr) {
    > for (j=jlo; j<jhil j+=jstr) {
    > ... a[(i-ailo)*imult + (j-ajlo)*jmult] ...
    > }
    > }
    > }
    >
    >I would tend to do:
    >
    > {
    > double* aptr = ...;
    > const int abumpj = ...;
    > const int abumpi = ...;
    >
    > for (i=ilo; i<ihi; i+=istr) {
    > for (j=jlo; j<jhil j+=jstr) {
    > ... *aptr ...
    >
    > aptr += abumpj;
    > }
    > aptr += abumpi;
    > }
    > }
    >
    >My question is motivated by the following desire: If I have an array
    >of structures, I'd like to be able to walk around *a single field*
    >within the array in a similar manner. For example, given the
    >structure:
    >
    > struct _mystruct {
    > short x;
    > double y;
    > char z;
    > }
    >
    >I would like the ability to walk through the "y" fields of the array
    >using a pointer and offsets.
    >
    >I've been able to achieve this using a char* pointer as follows:
    >
    > {
    > char* xptr = ...; /* point to y field of the initial element */
    > const int xbumpj = ...; /* express in bytes rather than */
    > const int xbumpi = ...; /* sizeof(doubles) */
    >
    > for (i=ilo; i<ihi; i+=istr) {
    > for (j=jlo; j<jhil j+=jstr) {
    > ... *((double*)xptr) ...
    >
    > xptr += xbumpj;
    > }
    > xptr += xbumpi;
    > }
    > }
    >
    >I'd really prefer to eliminate the reliance on char* pointers and
    >pointer casting, however. In particular, I'd like to make xptr a
    >double*.
    >
    >My question is therefore the following: does C have any requirements
    >on structure layout that would ensure that walking from field to field
    >within an array of structures will be evenly divisible by sizeof()
    >that field's type?
    >
    >If not, it seems like my hope is in vain because the additive offsets
    >will be in terms of sizeof(<field type>) rather than bytes (or I will
    >have to cast my pointer back to char* before adding a generic offset
    >to it).
    >
    >If so, I'm off and running. However, it's hard for me to believe that
    >C would ensure such rules on padding, which is why I ask.
    >
    >The only other approach I can think of is to walk a structure pointer
    >over my array's elements and then dereference the pointer to access
    >the field. I guess I've avoided this approach simply due to the fact
    >that it seems more expensive (due to the field offset calculation
    >involved).
    >
    >Does anyone have any other suggstions for how I might write such code
    >without relying on char* pointers?
    >

    If you have a struct STRUCT that contains a member (either scalar or
    aggregate) of type TYPE and if the alignment for TYPE is the same as
    its size (e.g., a double is of size 8 and must be aligned on an 8-byte
    boundary), then sizeof(struct STRUCT) is guaranteed to be a multiple
    of sizeof(TYPE). Consider

    struct STRUCT{
    ...
    TYPE member; /* TYPE could be double to match you example */
    ...
    }
    struct STRUCT mystruct[2];
    intptr_t is0, is1, im0, im1;
    is0 = (intptr_t)&mystruct[0];
    is1 = (intptr_t)&mystruct[1];
    im0 = (intptr_t)&mystruct[0].member;
    im1 = (intptr_t)&mystruct[1].member;

    is1-is0 is the number of characters between the start of
    successive elements of the array which is, by definition,
    sizeof(struct STRUCT).
    im0-is0 is the number of characters from the start of the struct
    to the start of member which is, by definition, offsetof(...).
    im1-is1 is the same.
    im0 == is0 + offsetof(...).
    im1 == is1 + offsetof(...).
    im1-im0 == is1 + offsetof(...) - is0 - offsetof(...)
    == is1-is0
    == sizeof(struct STRUCT)
    im0 is a multiple of sizeof(TYPE) and can be represented as
    k0*sizeof(TYPE) for some integer k0.
    Similarly, im1 can be represented as k1*sizeof(TYPE).
    im1-im0 == (k1-k0)*sizeof(TYPE) == sizeof(struct STRUCT).
    (k1-k0) == sizeof(struct STRUCT)/sizeof(TYPE) which is the ratio
    of two constants and therefore constant.

    Consequently, if you want to step through a particular member for each
    element of an array of struct, you can compute the quotient once as
    part of program initialization and use it as the increment for your
    pointer arithmetic, something like:

    TYPE *ptr;
    int dim, ratio;
    ...
    ratio = sizeof(struct STRUCT) / sizeof(TYPE);
    ...
    for (dim=0, ptr=&mystruct[0].member;
    dim < max_dimension_of_array;
    dim++; ptr+=ratio){
    ...
    }

    This will not work if TYPE's alignment is different from its size.
    For example, a double has size 8 but alignment 4. Then struct x{int
    i1; double d1;} could easily have size 12 which is not a multiple of
    8.

    Unfortunately, I know of no portable way to test a type's alignment
    requirement.


    <<Remove the del for email>>
     
    Barry Schwarz, Jul 13, 2003
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Lance Riedel

    Translated Offset to Source Offset

    Lance Riedel, Oct 14, 2003, in forum: XML
    Replies:
    2
    Views:
    503
    Patrick TJ McPhee
    Oct 15, 2003
  2. Alfonso Morra
    Replies:
    11
    Views:
    722
    Emmanuel Delahaye
    Sep 24, 2005
  3. funkyj
    Replies:
    24
    Views:
    1,049
    Dave Thompson
    Feb 27, 2006
  4. Replies:
    7
    Views:
    490
  5. Roy Smith
    Replies:
    4
    Views:
    278
    Roy Smith
    Jan 27, 2013
Loading...

Share This Page