packing and structs

G

Greg Martin

I have heard it said, but not confirmed, that the only guarantee that
the standard gives with regards to structs is that the first element is
aligned with the structures first byte and that the order of the members
will not be changed. Does that mean that code like that below should
print "Hello" but after that anything would be possible?


Hello, World
struct words: 14
char[] str: 13



/***********************************************/

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

struct words {
char hello[5];
char comma;
char space;
char world[5];
char exclaim;
char term;
};

int main (int argc, char* argv[]) {
char str[] = "Hello, World";
struct words w;

memcpy (&w, str, sizeof (str));

char *cp = (char*) &w;

while (*cp != '\0') {
printf ("%c", *cp);
++cp;
}

printf ("\n");

printf ("struct words: %d\nchar[] str: %d\n",
sizeof (w), sizeof (str));

return 0;
}
 
E

Eric Sosman

I have heard it said, but not confirmed, that the only guarantee that
the standard gives with regards to structs is that the first element is
aligned with the structures first byte and that the order of the members
will not be changed.

That's it, mostly. We know that members are properly aligned
for their types and there's some special language pertaining to
bit-fields, but you're essentially correct.
Does that mean that code like that below should
print "Hello" but after that anything would be possible?


Hello, World
struct words: 14
char[] str: 13



/***********************************************/

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

struct words {
char hello[5];
char comma;
char space;
char world[5];
char exclaim;
char term;
};

We know that the "hello" member begins at the struct's first
byte, and that the later members appear in order, not overlapping:

offsetof(struct words, hello) == 0

offsetof(struct words, comma) >= 0 + 5

offsetof(struct words, space) >=
offsetof(struct words, comma) + 1

offsetof(struct words, world) >=
offsetof(struct words, space) + 1

offsetof(struct words, exclaim) >=
offsetof(struct words, world) + 5

offsetof(struct words, term) >=
offsetof(struct words, exclaim) + 1

Finally, we know that the struct it at least as large as the
sum of its element sizes and any padding between them:

sizeof(struct words) >= offsetof(struct words, term) + 1

.... hence sizeof(struct words) >= 14 (== 5 + 1 + 1 + 5 + 1 + 1).

Since none of the members requires any special alignment, it's
quite likely that sizeof(struct words) will in fact be 14 exactly.
Perhaps the next most likely value is 16, if a compiler decides to
put two padding bytes at the end to make the whole thing fit in two
8-byte units. Descending even further on the likelihood scale, a
compiler might insert one padding byte before `world' and one more
at the end, so each array would be contained in a single 8-byte
unit. Other padding arrangements seem extremely unlikely -- though
as you observe, they're permitted.
int main (int argc, char* argv[]) {
char str[] = "Hello, World";
struct words w;

Okay, `w' occupies >=14 bytes of storage.
memcpy (&w, str, sizeof (str));

This fills the first 13 bytes of `w' with a copy of the string.
The 14th byte (and any others) remain uninitialized. Since `w'
has sufficient space for everything that's being copied into it,
there's no problem up to this point.

Note that memcpy() makes no use of the "struct-ness" of
the target. In C, any addressable object can be viewed as an
array of bytes, without regard to the object's actual type.
That's what memcpy() does: It just copies bytes, and doesn't
care what type the bytes represent.
char *cp = (char*) &w;

while (*cp != '\0') {
printf ("%c", *cp);
++cp;
}

Here, you're doing much the same thing as memcpy() did: You
are not using `w' as a struct, but only as a bag of bytes. If
there are padding bytes, you're using them on exactly the same
basis as you use member bytes: They're all just bytes. The
output *will* be "Hello, World" whether there's padding or not.

Using the "struct-ness" might (in principle) have produced
some surprises:

printf("%.5s", w.hello); // fine so far
printf("%c", w.comma); // BZZT!
printf("%c", w.space); // BZZT!
printf("%.5s", w.world); // BZZT!

There's no telling (in principle) what the final three lines
would have done.

printf ("\n");

printf ("struct words: %d\nchar[] str: %d\n",
sizeof (w), sizeof (str));

Nit-pick: "%d" is for signed integers, which `size_t' is
not. I've used systems where this would have printed the two
sizes as 14 and 0 thanks to the mismatch; in principle, worse
things could happen.
 
G

gnuist007

 Greg Martin said:
I have heard it said, but not confirmed, that the only guarantee that
the standard gives with regards to structs is that the first element is
aligned with the structures first byte and that the order of the members
will not be changed. Does that mean that code like that below should
print "Hello" but after that anything would be possible?
int main (int argc, char* argv[]) {
     char str[] = "Hello, World";
     struct words w;

No. You're overlaying the members and optional alignment spacing with a string.
You should not assume anything portable from that. Treat a struct as a struct if
you want the code to be sensible. That means assign it member by member and
extract from it member by member. You should only use the whole structurewhere
it is the whole structure value you want, with all the members and optional
alignment bytes as one.

If you don't want to be portable, the answer depends on your machine and
compiler. Some compilers have a packed declarator that forces no alignment, even
if that creates unaligned member access.

--
My name is Indigo Montoya. \\        Annoying Usenet one post at a time.
You flamed my father.       \'         At least I can stay in character.
Prepare to be spanked.     //               When you look into the void,
Stop posting that!        `/  the void looks into you, and fulfills you.

Hi, can you illustrate your point with a small example?
 
J

James Kuyper

I have heard it said, but not confirmed, that the only guarantee that
the standard gives with regards to structs is that the first element is
aligned with the structures first byte and that the order of the members
will not be changed.

Basically. There's some additional requirements for bit-fields, but not
enough to be of any use, and those requirements aren't relevant to your
question.
... Does that mean that code like that below should
print "Hello" but after that anything would be possible?

Actually, no. Your struct is guaranteed to be large enough to store the
entire string that you copy into it. It could be bigger, and it could
have padding bytes, but it's definitely big enough. After copying the
string, you print starting from the first byte of that string to the
terminating null character. Some of the bytes that you'll be printing
could be padding bytes between the named fields of the structure, but
that won't interfere with them being printed. They will all be printed,
and the result should be the same as printf(str). There might be
uninitialized padding bytes at the end of your struct after the
terminating null character, but your code stops printing before it would
otherwise have printed them.

If you had changed the value of any field of your struct between the
memcpy() and the printing loop, then any and all of the padding bytes
could have been changed from what was originally written to them by
memcpy(). How could this happen? Consider w.comma. It could have been
set up aligned on a 4-byte boundary, and followed by three padding
bytes. Then it could be updated using a 4-byte instruction that would,
as a side effect, also change the values of the the three following
padding bytes. The standard specifies that the value of ALL padding
bytes becomes unspecified after ANY field in the struct is updated
(6.2.6.1p6), which allows the compiler to do that.
Hello, World
struct words: 14
char[] str: 13



/***********************************************/

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

struct words {
char hello[5];
char comma;
char space;
char world[5];
char exclaim;

I assume that an earlier version of this program had an '!' at the end
of the string?
char term;
};

int main (int argc, char* argv[]) {
char str[] = "Hello, World";
struct words w;

memcpy (&w, str, sizeof (str));

What is not guaranteed, at this point, is that w.comma == ',', or that
w.space==' ', or that w.world[0] == 'W', or that w.exclaim == '\0'. All
of those should be true if there's no padding, but could be false if
there is any padding.
char *cp = (char*) &w;

while (*cp != '\0') {
printf ("%c", *cp);
++cp;
}

printf ("\n");

printf ("struct words: %d\nchar[] str: %d\n",
sizeof (w), sizeof (str));

return 0;
}

A better test would be to try the following:

struct words w2 = {"hello", ',', ' ', "World", '!', '\0'};

and try printing it out with:

#include <ctype.h> // at file scope
for(char *cp = w.hello; cp < &w.term; cp++)
{
if(isprint((unsigned char)*cp))
putchar(*cp)
else
printf("\unprintable character: %d\n", *cp);
}
putchar('\n');

If there are any padding bytes, you'll see something different in your
output than you might have expected if you didn't realize that there
could be padding. However, don't get too excited about that possibility.
Most compilers insert padding only as needed to meet alignment
requirements, which is unlikely to be relevant in this case.

You're more likely to have padding in your struct if it contains fields
of several different basic data types, particularly if more strictly
aligned data types come after less strictly aligned data types.
 
J

James Kuyper

I have heard it said, but not confirmed, that the only guarantee that
the standard gives with regards to structs is that the first element is
aligned with the structures first byte and that the order of the members
will not be changed.

Basically. There's some additional requirements for bit-fields, but not
enough to be of any use, and those requirements aren't relevant to your
question.
... Does that mean that code like that below should
print "Hello" but after that anything would be possible?

Actually, no. Your struct is guaranteed to be large enough to store the
entire string that you copy into it. It could be bigger, and it could
have padding bytes, but it's definitely big enough. After copying the
string, you print starting from the first byte of that string to the
terminating null character. Some of the bytes that you'll be printing
could be padding bytes between the named fields of the structure, but
that won't interfere with them being printed. They will all be printed,
and the result should be the same as printf(str). There might be
uninitialized padding bytes at the end of your struct after the
terminating null character, but your code stops printing before it would
otherwise have printed them.

If you had changed the value of any field of your struct between the
memcpy() and the printing loop, then any and all of the padding bytes
could have been changed from what was originally written to them by
memcpy(). How could this happen? Consider w.comma. It could have been
set up aligned on a 4-byte boundary, and followed by three padding
bytes. Then it could be updated using a 4-byte instruction that would,
as a side effect, also change the values of the the three following
padding bytes. The standard specifies that the value of ALL padding
bytes becomes unspecified after ANY field in the struct is updated
(6.2.6.1p6), which allows the compiler to do that.
Hello, World
struct words: 14
char[] str: 13



/***********************************************/

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

struct words {
char hello[5];
char comma;
char space;
char world[5];
char exclaim;

I assume that an earlier version of this program had an '!' at the end
of the string?
char term;
};

int main (int argc, char* argv[]) {
char str[] = "Hello, World";
struct words w;

memcpy (&w, str, sizeof (str));

What is not guaranteed, at this point, is that w.comma == ',', or that
w.space==' ', or that w.world[0] == 'W', or that w.exclaim == '\0'. All
of those should be true if there's no padding, but could be false if
there is any padding.
char *cp = (char*) &w;

while (*cp != '\0') {
printf ("%c", *cp);
++cp;
}

printf ("\n");

printf ("struct words: %d\nchar[] str: %d\n",
sizeof (w), sizeof (str));

return 0;
}

A better test would be to try the following:

struct words w2 = {"hello", ',', ' ', "World", '!', '\0'};

and try printing it out with:

#include <ctype.h> // at file scope
for(char *cp = w.hello; cp < &w.term; cp++)
{
if(isprint((unsigned char)*cp))
putchar(*cp)
else
printf("\nunprintable character: %d\n", *cp);
}
putchar('\n');

If there are any padding bytes, you'll see something different in your
output than you might have expected if you didn't realize that there
could be padding. However, don't get too excited about that possibility.
Most compilers insert padding only as needed to meet alignment
requirements, which is unlikely to be relevant in this case.

You're more likely to have padding in your struct if it contains fields
of several different basic data types, particularly if more strictly
aligned data types come after less strictly aligned data types.
 
G

Greg Martin

On 10/25/2012 02:31 PM, Greg Martin wrote:
struct words {
char hello[5];
char comma;
char space;
char world[5];
char exclaim;

I assume that an earlier version of this program had an '!' at the end
of the string?

Yes, I was playing around seeing what the compiler did with changed
values. It didn't show me anything interesting.
A better test would be to try the following:

struct words w2 = {"hello", ',', ' ', "World", '!', '\0'};

and try printing it out with:

#include <ctype.h> // at file scope
for(char *cp = w.hello; cp < &w.term; cp++)
{
if(isprint((unsigned char)*cp))
putchar(*cp)
else
printf("\unprintable character: %d\n", *cp);
}
putchar('\n');

If there are any padding bytes, you'll see something different in your
output than you might have expected if you didn't realize that there
could be padding. However, don't get too excited about that possibility.
Most compilers insert padding only as needed to meet alignment
requirements, which is unlikely to be relevant in this case.

I should have thought of reversing the process and seeing what happened.

Actually it didn't occur to me that I would be overwriting any padding,
which only makes sense of course, so under the conditions it would
always work, Sort of a useless parlour trick I guess.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top