packing and structs

Discussion in 'C Programming' started by Greg Martin, Oct 25, 2012.

  1. Greg Martin

    Greg Martin Guest

    I have heard it said, but not confirmed, that the only guarantee that
    the standard gives with regards to structs is that the first element is
    aligned with the structures first byte and that the order of the members
    will not be changed. Does that mean that code like that below should
    print "Hello" but after that anything would be possible?


    Hello, World
    struct words: 14
    char[] str: 13



    /***********************************************/

    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>

    struct words {
    char hello[5];
    char comma;
    char space;
    char world[5];
    char exclaim;
    char term;
    };

    int main (int argc, char* argv[]) {
    char str[] = "Hello, World";
    struct words w;

    memcpy (&w, str, sizeof (str));

    char *cp = (char*) &w;

    while (*cp != '\0') {
    printf ("%c", *cp);
    ++cp;
    }

    printf ("\n");

    printf ("struct words: %d\nchar[] str: %d\n",
    sizeof (w), sizeof (str));

    return 0;
    }
    Greg Martin, Oct 25, 2012
    #1
    1. Advertising

  2. Greg Martin

    Eric Sosman Guest

    On 10/25/2012 2:31 PM, Greg Martin wrote:
    > I have heard it said, but not confirmed, that the only guarantee that
    > the standard gives with regards to structs is that the first element is
    > aligned with the structures first byte and that the order of the members
    > will not be changed.


    That's it, mostly. We know that members are properly aligned
    for their types and there's some special language pertaining to
    bit-fields, but you're essentially correct.

    > Does that mean that code like that below should
    > print "Hello" but after that anything would be possible?
    >
    >
    > Hello, World
    > struct words: 14
    > char[] str: 13
    >
    >
    >
    > /***********************************************/
    >
    > #include <stdlib.h>
    > #include <stdio.h>
    > #include <string.h>
    >
    > struct words {
    > char hello[5];
    > char comma;
    > char space;
    > char world[5];
    > char exclaim;
    > char term;
    > };


    We know that the "hello" member begins at the struct's first
    byte, and that the later members appear in order, not overlapping:

    offsetof(struct words, hello) == 0

    offsetof(struct words, comma) >= 0 + 5

    offsetof(struct words, space) >=
    offsetof(struct words, comma) + 1

    offsetof(struct words, world) >=
    offsetof(struct words, space) + 1

    offsetof(struct words, exclaim) >=
    offsetof(struct words, world) + 5

    offsetof(struct words, term) >=
    offsetof(struct words, exclaim) + 1

    Finally, we know that the struct it at least as large as the
    sum of its element sizes and any padding between them:

    sizeof(struct words) >= offsetof(struct words, term) + 1

    .... hence sizeof(struct words) >= 14 (== 5 + 1 + 1 + 5 + 1 + 1).

    Since none of the members requires any special alignment, it's
    quite likely that sizeof(struct words) will in fact be 14 exactly.
    Perhaps the next most likely value is 16, if a compiler decides to
    put two padding bytes at the end to make the whole thing fit in two
    8-byte units. Descending even further on the likelihood scale, a
    compiler might insert one padding byte before `world' and one more
    at the end, so each array would be contained in a single 8-byte
    unit. Other padding arrangements seem extremely unlikely -- though
    as you observe, they're permitted.

    > int main (int argc, char* argv[]) {
    > char str[] = "Hello, World";
    > struct words w;


    Okay, `w' occupies >=14 bytes of storage.

    > memcpy (&w, str, sizeof (str));


    This fills the first 13 bytes of `w' with a copy of the string.
    The 14th byte (and any others) remain uninitialized. Since `w'
    has sufficient space for everything that's being copied into it,
    there's no problem up to this point.

    Note that memcpy() makes no use of the "struct-ness" of
    the target. In C, any addressable object can be viewed as an
    array of bytes, without regard to the object's actual type.
    That's what memcpy() does: It just copies bytes, and doesn't
    care what type the bytes represent.

    > char *cp = (char*) &w;
    >
    > while (*cp != '\0') {
    > printf ("%c", *cp);
    > ++cp;
    > }


    Here, you're doing much the same thing as memcpy() did: You
    are not using `w' as a struct, but only as a bag of bytes. If
    there are padding bytes, you're using them on exactly the same
    basis as you use member bytes: They're all just bytes. The
    output *will* be "Hello, World" whether there's padding or not.

    Using the "struct-ness" might (in principle) have produced
    some surprises:

    printf("%.5s", w.hello); // fine so far
    printf("%c", w.comma); // BZZT!
    printf("%c", w.space); // BZZT!
    printf("%.5s", w.world); // BZZT!

    There's no telling (in principle) what the final three lines
    would have done.


    > printf ("\n");
    >
    > printf ("struct words: %d\nchar[] str: %d\n",
    > sizeof (w), sizeof (str));


    Nit-pick: "%d" is for signed integers, which `size_t' is
    not. I've used systems where this would have printed the two
    sizes as 14 and 0 thanks to the mismatch; in principle, worse
    things could happen.

    > return 0;
    > }


    --
    Eric Sosman
    d
    Eric Sosman, Oct 25, 2012
    #2
    1. Advertising

  3. Greg Martin

    Guest

    On Oct 25, 11:57 am, "Where's all the China Blue food?"
    <> wrote:
    > In article <Lzfis.3429$>,
    >  Greg Martin <> wrote:
    >
    > > I have heard it said, but not confirmed, that the only guarantee that
    > > the standard gives with regards to structs is that the first element is
    > > aligned with the structures first byte and that the order of the members
    > > will not be changed. Does that mean that code like that below should
    > > print "Hello" but after that anything would be possible?
    > > int main (int argc, char* argv[]) {
    > >      char str[] = "Hello, World";
    > >      struct words w;

    >
    > No. You're overlaying the members and optional alignment spacing with a string.
    > You should not assume anything portable from that. Treat a struct as a struct if
    > you want the code to be sensible. That means assign it member by member and
    > extract from it member by member. You should only use the whole structurewhere
    > it is the whole structure value you want, with all the members and optional
    > alignment bytes as one.
    >
    > If you don't want to be portable, the answer depends on your machine and
    > compiler. Some compilers have a packed declarator that forces no alignment, even
    > if that creates unaligned member access.
    >
    > --
    > My name is Indigo Montoya. \\        Annoying Usenet one post at a time.
    > You flamed my father.       \'         At least I can stay in character.
    > Prepare to be spanked.     //               When you look into the void,
    > Stop posting that!        `/  the void looks into you, and fulfills you.


    Hi, can you illustrate your point with a small example?
    , Oct 25, 2012
    #3
  4. Greg Martin

    James Kuyper Guest

    On 10/25/2012 02:31 PM, Greg Martin wrote:
    > I have heard it said, but not confirmed, that the only guarantee that
    > the standard gives with regards to structs is that the first element is
    > aligned with the structures first byte and that the order of the members
    > will not be changed.


    Basically. There's some additional requirements for bit-fields, but not
    enough to be of any use, and those requirements aren't relevant to your
    question.

    > ... Does that mean that code like that below should
    > print "Hello" but after that anything would be possible?


    Actually, no. Your struct is guaranteed to be large enough to store the
    entire string that you copy into it. It could be bigger, and it could
    have padding bytes, but it's definitely big enough. After copying the
    string, you print starting from the first byte of that string to the
    terminating null character. Some of the bytes that you'll be printing
    could be padding bytes between the named fields of the structure, but
    that won't interfere with them being printed. They will all be printed,
    and the result should be the same as printf(str). There might be
    uninitialized padding bytes at the end of your struct after the
    terminating null character, but your code stops printing before it would
    otherwise have printed them.

    If you had changed the value of any field of your struct between the
    memcpy() and the printing loop, then any and all of the padding bytes
    could have been changed from what was originally written to them by
    memcpy(). How could this happen? Consider w.comma. It could have been
    set up aligned on a 4-byte boundary, and followed by three padding
    bytes. Then it could be updated using a 4-byte instruction that would,
    as a side effect, also change the values of the the three following
    padding bytes. The standard specifies that the value of ALL padding
    bytes becomes unspecified after ANY field in the struct is updated
    (6.2.6.1p6), which allows the compiler to do that.

    > Hello, World
    > struct words: 14
    > char[] str: 13
    >
    >
    >
    > /***********************************************/
    >
    > #include <stdlib.h>
    > #include <stdio.h>
    > #include <string.h>
    >
    > struct words {
    > char hello[5];
    > char comma;
    > char space;
    > char world[5];
    > char exclaim;


    I assume that an earlier version of this program had an '!' at the end
    of the string?

    > char term;
    > };
    >
    > int main (int argc, char* argv[]) {
    > char str[] = "Hello, World";
    > struct words w;
    >
    > memcpy (&w, str, sizeof (str));


    What is not guaranteed, at this point, is that w.comma == ',', or that
    w.space==' ', or that w.world[0] == 'W', or that w.exclaim == '\0'. All
    of those should be true if there's no padding, but could be false if
    there is any padding.

    > char *cp = (char*) &w;
    >
    > while (*cp != '\0') {
    > printf ("%c", *cp);
    > ++cp;
    > }
    >
    > printf ("\n");
    >
    > printf ("struct words: %d\nchar[] str: %d\n",
    > sizeof (w), sizeof (str));
    >
    > return 0;
    > }


    A better test would be to try the following:

    struct words w2 = {"hello", ',', ' ', "World", '!', '\0'};

    and try printing it out with:

    #include <ctype.h> // at file scope
    for(char *cp = w.hello; cp < &w.term; cp++)
    {
    if(isprint((unsigned char)*cp))
    putchar(*cp)
    else
    printf("\unprintable character: %d\n", *cp);
    }
    putchar('\n');

    If there are any padding bytes, you'll see something different in your
    output than you might have expected if you didn't realize that there
    could be padding. However, don't get too excited about that possibility.
    Most compilers insert padding only as needed to meet alignment
    requirements, which is unlikely to be relevant in this case.

    You're more likely to have padding in your struct if it contains fields
    of several different basic data types, particularly if more strictly
    aligned data types come after less strictly aligned data types.
    James Kuyper, Oct 25, 2012
    #4
  5. Greg Martin

    James Kuyper Guest

    On 10/25/2012 02:31 PM, Greg Martin wrote:
    > I have heard it said, but not confirmed, that the only guarantee that
    > the standard gives with regards to structs is that the first element is
    > aligned with the structures first byte and that the order of the members
    > will not be changed.


    Basically. There's some additional requirements for bit-fields, but not
    enough to be of any use, and those requirements aren't relevant to your
    question.

    > ... Does that mean that code like that below should
    > print "Hello" but after that anything would be possible?


    Actually, no. Your struct is guaranteed to be large enough to store the
    entire string that you copy into it. It could be bigger, and it could
    have padding bytes, but it's definitely big enough. After copying the
    string, you print starting from the first byte of that string to the
    terminating null character. Some of the bytes that you'll be printing
    could be padding bytes between the named fields of the structure, but
    that won't interfere with them being printed. They will all be printed,
    and the result should be the same as printf(str). There might be
    uninitialized padding bytes at the end of your struct after the
    terminating null character, but your code stops printing before it would
    otherwise have printed them.

    If you had changed the value of any field of your struct between the
    memcpy() and the printing loop, then any and all of the padding bytes
    could have been changed from what was originally written to them by
    memcpy(). How could this happen? Consider w.comma. It could have been
    set up aligned on a 4-byte boundary, and followed by three padding
    bytes. Then it could be updated using a 4-byte instruction that would,
    as a side effect, also change the values of the the three following
    padding bytes. The standard specifies that the value of ALL padding
    bytes becomes unspecified after ANY field in the struct is updated
    (6.2.6.1p6), which allows the compiler to do that.

    > Hello, World
    > struct words: 14
    > char[] str: 13
    >
    >
    >
    > /***********************************************/
    >
    > #include <stdlib.h>
    > #include <stdio.h>
    > #include <string.h>
    >
    > struct words {
    > char hello[5];
    > char comma;
    > char space;
    > char world[5];
    > char exclaim;


    I assume that an earlier version of this program had an '!' at the end
    of the string?

    > char term;
    > };
    >
    > int main (int argc, char* argv[]) {
    > char str[] = "Hello, World";
    > struct words w;
    >
    > memcpy (&w, str, sizeof (str));


    What is not guaranteed, at this point, is that w.comma == ',', or that
    w.space==' ', or that w.world[0] == 'W', or that w.exclaim == '\0'. All
    of those should be true if there's no padding, but could be false if
    there is any padding.

    > char *cp = (char*) &w;
    >
    > while (*cp != '\0') {
    > printf ("%c", *cp);
    > ++cp;
    > }
    >
    > printf ("\n");
    >
    > printf ("struct words: %d\nchar[] str: %d\n",
    > sizeof (w), sizeof (str));
    >
    > return 0;
    > }


    A better test would be to try the following:

    struct words w2 = {"hello", ',', ' ', "World", '!', '\0'};

    and try printing it out with:

    #include <ctype.h> // at file scope
    for(char *cp = w.hello; cp < &w.term; cp++)
    {
    if(isprint((unsigned char)*cp))
    putchar(*cp)
    else
    printf("\nunprintable character: %d\n", *cp);
    }
    putchar('\n');

    If there are any padding bytes, you'll see something different in your
    output than you might have expected if you didn't realize that there
    could be padding. However, don't get too excited about that possibility.
    Most compilers insert padding only as needed to meet alignment
    requirements, which is unlikely to be relevant in this case.

    You're more likely to have padding in your struct if it contains fields
    of several different basic data types, particularly if more strictly
    aligned data types come after less strictly aligned data types.
    James Kuyper, Oct 25, 2012
    #5
  6. Greg Martin

    Greg Martin Guest

    On 12-10-25 12:20 PM, James Kuyper wrote:
    > On 10/25/2012 02:31 PM, Greg Martin wrote:


    >> struct words {
    >> char hello[5];
    >> char comma;
    >> char space;
    >> char world[5];
    >> char exclaim;

    >
    > I assume that an earlier version of this program had an '!' at the end
    > of the string?
    >


    Yes, I was playing around seeing what the compiler did with changed
    values. It didn't show me anything interesting.

    >
    > A better test would be to try the following:
    >
    > struct words w2 = {"hello", ',', ' ', "World", '!', '\0'};
    >
    > and try printing it out with:
    >
    > #include <ctype.h> // at file scope
    > for(char *cp = w.hello; cp < &w.term; cp++)
    > {
    > if(isprint((unsigned char)*cp))
    > putchar(*cp)
    > else
    > printf("\unprintable character: %d\n", *cp);
    > }
    > putchar('\n');
    >
    > If there are any padding bytes, you'll see something different in your
    > output than you might have expected if you didn't realize that there
    > could be padding. However, don't get too excited about that possibility.
    > Most compilers insert padding only as needed to meet alignment
    > requirements, which is unlikely to be relevant in this case.
    >


    I should have thought of reversing the process and seeing what happened.

    Actually it didn't occur to me that I would be overwriting any padding,
    which only makes sense of course, so under the conditions it would
    always work, Sort of a useless parlour trick I guess.
    Greg Martin, Oct 25, 2012
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Patricia  Van Hise

    structs with fields that are structs

    Patricia Van Hise, Apr 5, 2004, in forum: C Programming
    Replies:
    5
    Views:
    621
    Al Bowers
    Apr 5, 2004
  2. Chris Hauxwell

    const structs in other structs

    Chris Hauxwell, Apr 23, 2004, in forum: C Programming
    Replies:
    6
    Views:
    548
    Chris Hauxwell
    Apr 27, 2004
  3. tyler
    Replies:
    0
    Views:
    273
    tyler
    Sep 4, 2006
  4. Paminu
    Replies:
    5
    Views:
    630
    Eric Sosman
    Oct 11, 2005
  5. Michael Henry

    Practical packing for structs of bytes

    Michael Henry, Sep 17, 2010, in forum: C Programming
    Replies:
    12
    Views:
    1,522
Loading...

Share This Page