Data alignment questin, structures

mathog · Jan 12, 2013

(Apologies if this is a duplicate post, glitch on first attempt.)

Consider the following struct:

typedef struct {
uint16_t one;
uint16_t twothree[2];
uint16_t four;
uint16_t five;
} Mystruct;

It contains data storage which is actually:

{ uint16_t, uint32_t, uint16_t, uint16_t }

Accessing 'twothree" as a uint32_t directly, for instance with:

Mystruct instance;
printf("twothree is:%d\n",*(uint32_t *)&(instance.twothree[0]);

will be a nonaligned memory access. That will do nothing untoward on
x86 CPUs but will blow up on many others. OK so far?

Now, what happens when this struct is embedded in a binary file stored
in a character array "buffer[]" such that its position "i" is not
guaranteed to be aligned on the same boundary that

Mystract *instance2 = malloc(sizeof(Mystruct));

would have used, but will always be on a multiple of 2.

1. Will this always work?

function1((Mystruct *) &(buffer));

2. How about this?

function2(*(Mystruct *) &(buffer));

Assume that within the function access to the data is by memcpy() to the
appropriate offset for the uint32_t and by "=" for the uint16_t fields.
My guess is that (2) is likely to blow up more often than not, since
it is trying to pass a Mystruct unaligned.

What does the language standard say should happen in these two cases?

Thanks,

David Mathog

Ben Bacarisse · Jan 12, 2013

mathog said:
(Apologies if this is a duplicate post, glitch on first attempt.)

Consider the following struct:

typedef struct {
uint16_t one;
uint16_t twothree[2];
uint16_t four;
uint16_t five;
} Mystruct;

It contains data storage which is actually:

{ uint16_t, uint32_t, uint16_t, uint16_t }

Accessing 'twothree" as a uint32_t directly, for instance with:

Mystruct instance;
printf("twothree is:%d\n",*(uint32_t *)&(instance.twothree[0]);

will be a nonaligned memory access. That will do nothing untoward on
x86 CPUs but will blow up on many others. OK so far?

Now, what happens when this struct is embedded in a binary file stored
in a character array "buffer[]" such that its position "i" is not
guaranteed to be aligned on the same boundary that

Mystract *instance2 = malloc(sizeof(Mystruct));

would have used, but will always be on a multiple of 2.

1. Will this always work?

function1((Mystruct *) &(buffer));

2. How about this?

function2(*(Mystruct *) &(buffer));

Assume that within the function access to the data is by memcpy() to
the appropriate offset for the uint32_t and by "=" for the uint16_t
fields. My guess is that (2) is likely to blow up more often than not,
since it is trying to pass a Mystruct unaligned.

What does the language standard say should happen in these two cases?

I hope you'll permit a meta-answer. Having pulled my hair out porting
yards of code like this in the 80s, I am almost certain that there is a
better way to do whatever you are trying to do. More often than not,
the best way is though a small set of function that extract basic types
from a buffer regardless of alignment and byte order. Whether you then
choose to use the values directly or to put them into a struct -- whose
members can now be aligned any way the compiler chooses (because you
never play address tricks with them) -- depends on the rest of the code.

Eric Sosman · Jan 12, 2013

(Apologies if this is a duplicate post, glitch on first attempt.)

Consider the following struct:

typedef struct {
uint16_t one;
uint16_t twothree[2];
uint16_t four;
uint16_t five;
} Mystruct;

It contains data storage which is actually:

{ uint16_t, uint32_t, uint16_t, uint16_t }

Accessing 'twothree" as a uint32_t directly, for instance with:

Mystruct instance;
printf("twothree is:%d\n",*(uint32_t *)&(instance.twothree[0]);

will be a nonaligned memory access. That will do nothing untoward on
x86 CPUs but will blow up on many others. OK so far?

Mostly. The nature of the "blow up" varies, though: Some
systems ignore offending low-order address bits, and would
print a mixture of instance.one and instance.twothree[0].
(There's also the possibility of padding bytes, although the
possibility appears remote for this particular struct.)

Now, what happens when this struct is embedded in a binary file stored
in a character array "buffer[]" such that its position "i" is not
guaranteed to be aligned on the same boundary that

Mystract *instance2 = malloc(sizeof(Mystruct));

would have used, but will always be on a multiple of 2.

1. Will this always work?

function1((Mystruct *) &(buffer));

Not "always," no. 6.3.2.3p7: "A pointer to an object type
may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced
type, the behavior is undefined. [...]" There's no guarantee
that the alignment of `buffer' suffices for `Mystruct'.

2. How about this?

function2(*(Mystruct *) &(buffer));

Click to expand...

Not "always;" same reasoning (with even more force this time,
since the potentially misaligned pointer is not only computed,
but also dereferenced).

Assume that within the function access to the data is by memcpy() to the
appropriate offset for the uint32_t and by "=" for the uint16_t fields.
My guess is that (2) is likely to blow up more often than not, since
it is trying to pass a Mystruct unaligned.

What does the language standard say should happen in these two cases?

Click to expand...

"The behavior is undefined."

What problem are you trying to solve? As Ben Bacarisse says,
safer and saner approaches are likely to exist.

Shao Miller · Jan 12, 2013

(Apologies if this is a duplicate post, glitch on first attempt.)

Consider the following struct:

typedef struct {
uint16_t one;
uint16_t twothree[2];
uint16_t four;
uint16_t five;
} Mystruct;

It contains data storage which is actually:

{ uint16_t, uint32_t, uint16_t, uint16_t }

Accessing 'twothree" as a uint32_t directly, for instance with:

Mystruct instance;
printf("twothree is:%d\n",*(uint32_t *)&(instance.twothree[0]);

will be a nonaligned memory access. That will do nothing untoward on
x86 CPUs but will blow up on many others. OK so far?

Now, what happens when this struct is embedded in a binary file stored
in a character array "buffer[]" such that its position "i" is not
guaranteed to be aligned on the same boundary that

Mystract *instance2 = malloc(sizeof(Mystruct));

would have used, but will always be on a multiple of 2.

1. Will this always work?

function1((Mystruct *) &(buffer));

2. How about this?

function2(*(Mystruct *) &(buffer));

Assume that within the function access to the data is by memcpy() to the
appropriate offset for the uint32_t and by "=" for the uint16_t fields.
My guess is that (2) is likely to blow up more often than not, since
it is trying to pass a Mystruct unaligned.

What does the language standard say should happen in these two cases?

If you have implementation-specific knowledge of what will happen, then
I'd suggest that you do whatever you want to accomplish your goal.

If you actually care about portability (you seem to), I'd suggest
formally serializing and deserializing data to and from the file.

You might be interested in modern C, which is C11. It includes the
'_Alignof' and '_Alignas' keywords, which allow you to consider and
choose alignments, respectively.

Prior to C11, you could try to use something like:

#include <stddef.h> /* For 'offsetof' */
#define Alignof(type) (offsetof(struct { char c; type t; }, t))

to consider alignment, and could use unions to choose alignment, if you
know the complete object type whose alignment needs to be satsified:

union {
char buffer[512];
MyStruct as_mystruct;
} buffer_and_mystruct;

Here, you know that 'buffer_and_mystruct.buffer' will be properly
aligned for an instance of your structure type.

You can always use 'memcpy' to copy some bytes into a member of an
instance of your structure type; then it'll be safe to examine that
member, as long as it hasn't been populated with a trap representation.

mathog · Jan 13, 2013

Shao said:
If you have implementation-specific knowledge of what will happen, then
I'd suggest that you do whatever you want to accomplish your goal.

That's the point - I want this code to work wherever, without having
access to all possible platforms to test it first. Implicitly
implementation specific is exactly what I am trying to avoid.

If you actually care about portability (you seem to), I'd suggest
formally serializing and deserializing data to and from the file.

Yeah, I was afraid that was going to be the only standard compliant way
of doing this. The key thing that I have learned is that passing a
pointer to an unaligned structure isn't portable (6.3.2.3p7 cited by
Eric Sosnan earlier in this thread). There are some instances of that
in the code that swaps bytes (big endian to little endian), so those
routines will need to have their interfaces adjusted. They are
otherwise "serial". Better to do this now, before the code is released,
than later. It also seems unavoidable that explicit recordtype_get()
functions will be needed, with the data passed by void * or char *.

You might be interested in modern C, which is C11. It includes the
'_Alignof' and '_Alignas' keywords, which allow you to consider and
choose alignments, respectively.

This doesn't help with the struct offset problem, since we do not know
the offset ahead of time. In a particular file the struct could be
anywhere in the buffer, at 100,102,..,500,502,etc. Or am I
misunderstanding the purpose of the _Alignof and _Alignas?

Thanks,

David Mathog

Shao Miller · Jan 13, 2013

That's the point - I want this code to work wherever, without having
access to all possible platforms to test it first. Implicitly
implementation specific is exactly what I am trying to avoid.

Yeah, I was afraid that was going to be the only standard compliant way
of doing this. The key thing that I have learned is that passing a
pointer to an unaligned structure isn't portable (6.3.2.3p7 cited by
Eric Sosnan
(Sosman)

earlier in this thread). There are some instances of that
in the code that swaps bytes (big endian to little endian), so those
routines will need to have their interfaces adjusted. They are
otherwise "serial". Better to do this now, before the code is released,
than later.

Based on what you've typed, I am guessing that the file format is not
intended to be portable across different platforms. Is that right?
That is, copying a saved file that was made on an x86 isn't expected to
load properly in the same program on PowerPC? I ask because the padding
within structures might also be a concern, for you.

By formally serializing/deserializing, you can have a portable file format.

It also seems unavoidable that explicit recordtype_get()
functions will be needed, with the data passed by void * or char *.

This doesn't help with the struct offset problem, since we do not know
the offset ahead of time. In a particular file the struct could be
anywhere in the buffer, at 100,102,..,500,502,etc. Or am I
misunderstanding the purpose of the _Alignof and _Alignas?

Ah, no. If the structure can appear at any offset in the buffer, then
these don't help you as much, unless the offset happens to be a multiple
of the alignment requirement. But as mentioned, you can 'memcpy' from
the offset into an aligned location, which you might be interested in
doing anyway, if a structure can be truncated by being near the end of
your buffer.

At least you can automate some of the serializing/deserializing... You
can have:

struct s_foo {
double d;
int i;
char c;
};

size_t foo_offsets[] = {
offsetof(struct s_foo, d),
offsetof(struct s_foo, i),
offsetof(struct s_foo, c),
};

f_serialize * foo_serials[] = {
serialize_double,
serialize_int,
serialize_char,
};

where 'f_serialize' is a function typedef for a serialization function.
Then you can iterate through these arrays with a loop in a generic
"structure serialization" function, instead of having a monolithic:

void serialize_foo(struct s_foo * foo) {
serialize_double(&foo->d);
serialize_int(&foo->i);
serialize_char(&foo->c);
}

for each of your different structure types.

Or, here is one library you might be interested in:

http://www.leonerd.org.uk/code/libpack/intro.html

Jorgen Grahn · Jan 15, 2013

.

I hope you'll permit a meta-answer. Having pulled my hair out porting
yards of code like this in the 80s, I am almost certain that there is a
better way to do whatever you are trying to do. More often than not,
the best way is though a small set of function that extract basic types
from a buffer regardless of alignment and byte order. Whether you then
choose to use the values directly or to put them into a struct -- whose
members can now be aligned any way the compiler chooses (because you
never play address tricks with them) -- depends on the rest of the code.

AOL, except in my case "porting yards of code like this in the 80s"
would be "being bogged down in code like this on and off 1996--present".

Everything just becomes so much simpler and safer if you treat octet
buffers as octet buffers, and structs as structs.

/Jorgen

Eric Sosman · Jan 15, 2013

AOL, except in my case "porting yards of code like this in the 80s"
would be "being bogged down in code like this on and off 1996--present".

Everything just becomes so much simpler and safer if you treat octet
buffers as octet buffers, and structs as structs.

Let's add "powerful" and "flexible" to the "so much more" list.

Powerful: It's quite common to want to read and write not just
a struct, but a data structure. As a simple example, consider
`struct person { char *name; struct person *spouse; }': It will
do you no good at all to send these two pointers to another program.
You need to read and write this data with functions that are aware
of the semantics. (Also, they can be smart enough to handle "I'm
writing John; John's spouse is Mary so I'll also write Mary; Mary's
spouse is John so I'll also write John; John's spouse is Mary so ...")

Flexible: So you've settled on a serialized form (in the old
days we had "wire formats"), and along comes someone braying "Ad-hoc
binary formats are s-o-o-o twentieth-century! Management orders
you to get rid of all that well-tested, highly reliable, blazingly
efficient cruft, and use this shiny new XML DTD instead. Hop to it!"
If your programs are full of fread() and fwrite() calls you've got a
headache; if they already use functions that separate internal and
external representations you'll have a much easier time.

mathog · Jan 18, 2013

Eric said:
Now, what happens when this struct is embedded in a binary file stored
in a character array "buffer[]" such that its position "i" is not
guaranteed to be aligned on the same boundary that

Mystract *instance2 = malloc(sizeof(Mystruct));

would have used, but will always be on a multiple of 2.

1. Will this always work?

function1((Mystruct *) &(buffer));

Click to expand...

Not "always," no. 6.3.2.3p7: "A pointer to an object type
may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced
type, the behavior is undefined. [...]" There's no guarantee
that the alignment of `buffer' suffices for `Mystruct'.

On further consideration, how does one know what "correctly aligned"
means for a given struct? Consider these two examples:

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
} Mystruct4;

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
uint16_t five;
} Mystruct5;

The first would presumably be 8 bytes and the second 10. Is it safe to
access these in a memory buffer via a pointer so long as that pointer
is aligned on a 2 byte boundary? I would assume yes because all of the
data within each struct has that alignment, and I am also assuming that
this code should always work:

Mystruct5 array[10]; /* valid */
Mystruct5 *aptr;
uint16_t x;
array[1].four=20;
aptr = &array[1];
/* the next three lines should all print: 1,4 is:20 */
printf("1,4 is:%u\n",aptr->five);
somefunction1(aptr);
somefunction2(*aptr);

....

void somefunction1(Mystruct5 *aptr){
printf("1,4 is:%u\n",aptr->four);
}
void somefunction2(Mystruct5 aptr){
printf("1,4 is:%u\n",aptr.four);
}

Or may a compiler make some assumption for Mystruct4 that requires it to
be aligned on a 4 byte boundary too? That would not prevent it from
being used in an array, and would not break the above example (after
changing Mystruct5 -> Mystruct4). But it would break both of the two
function calls if aptr was pointing to a buffer in memory where the data
was only 2 byte aligned for the structure. Conversely, adding a 4 byte
boundary alignment requirement on Mystruct5 would add two pad bytes, for
no obvious reason, but it would not break the above code example. I see
no reason for the compiler to add the 4 byte alignment
requirement for these structures, but is it nevertheless free to do so?

Thanks,

David Mathog

James Kuyper · Jan 18, 2013

Eric said:
Eric said:

Now, what happens when this struct is embedded in a binary file stored
in a character array "buffer[]" such that its position "i" is not
guaranteed to be aligned on the same boundary that

Mystract *instance2 = malloc(sizeof(Mystruct));

would have used, but will always be on a multiple of 2.

1. Will this always work?

function1((Mystruct *) &(buffer));

Click to expand...

Not "always," no. 6.3.2.3p7: "A pointer to an object type
may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced
type, the behavior is undefined. [...]" There's no guarantee
that the alignment of `buffer' suffices for `Mystruct'.

Click to expand...

On further consideration, how does one know what "correctly aligned"
means for a given struct? Consider these two examples:

In C99 and earlier, there were only a few cases where you could infer
that a pointer was correctly aligned for a given operation. For
instance, a struct must have alignment requirements at least as strict
as those of any of it's members, but it can be stricter.

In C2011, several alignment-oriented features were added, such as
_Alignof(), and it's now possible to determine exactly whether or not
the alignment of any arbitrary type allows an operation.

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
} Mystruct4;

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
uint16_t five;
} Mystruct5;

The first would presumably be 8 bytes and the second 10. Is it safe to
access these in a memory buffer via a pointer so long as that pointer
is aligned on a 2 byte boundary? I would assume yes because all of the
data within each struct has that alignment, and I am also assuming that
this code should always work:

Click to expand...

You've made several assumptions not guaranteed by the standard:
sizeof(uint16_t) == 2
_Alignof(uint16_t) == 2
_Alignof(Mystruct4) == _Alignof(uint16_t)
_Alignof(Mystruct5) == _Alignof(uint16_t)

For implementations where all of those things are true, your conclusion
holds, but it's not necessarily the case that any of those things are true.

Or may a compiler make some assumption for Mystruct4 that requires it to
be aligned on a 4 byte boundary too? ...
Yes.

... I see
no reason for the compiler to add the 4 byte alignment
requirement for these structures, but is it nevertheless free to do so?

Click to expand...

Yes.

Eric Sosman · Jan 18, 2013

Eric said:
Eric said:

Now, what happens when this struct is embedded in a binary file stored
in a character array "buffer[]" such that its position "i" is not
guaranteed to be aligned on the same boundary that

Mystract *instance2 = malloc(sizeof(Mystruct));

would have used, but will always be on a multiple of 2.

1. Will this always work?

function1((Mystruct *) &(buffer));

Click to expand...

Not "always," no. 6.3.2.3p7: "A pointer to an object type
may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced
type, the behavior is undefined. [...]" There's no guarantee
that the alignment of `buffer' suffices for `Mystruct'.

Click to expand...

On further consideration, how does one know what "correctly aligned"
means for a given struct?

As of C11, the _Alignof operator tells you. In earlier C's
alignment requirements are implementation-defined, and all you
could tell for sure was that the alignment requirement for a
type T had to be an exact divisor of sizeof(T).

Consider these two examples:

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
} Mystruct4;

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
uint16_t five;
} Mystruct5;

The first would presumably be 8 bytes and the second 10.

Click to expand...

If uint16_t exists, it follows that CHAR_BIT must be 8 or 16.
Then the minimum sizes of the first struct is 8 or 4, and of the
second 5 or 10. But these are minima: Padding is permitted after
any struct element (let's ignore bit-fields), for any reason the
implementor finds convincing.

Even so, I'd be extremely surprised if the size of the first
struct were not 8 or 4. I'd be only mildly surprised, though, if
the size of the second were 12 or 6.

Is it safe to
access these in a memory buffer via a pointer so long as that pointer
is aligned on a 2 byte boundary? I would assume yes because all of the
data within each struct has that alignment,

Click to expand...

The alignment of a struct (or union) is *at least* as strict
as the maximum alignment of any of its elements, but may be stricter.

and I am also assuming that
this code should always work:

Mystruct5 array[10]; /* valid */
Mystruct5 *aptr;
uint16_t x;
array[1].four=20;
aptr = &array[1];
/* the next three lines should all print: 1,4 is:20 */
printf("1,4 is:%u\n",aptr->five);
somefunction1(aptr);
somefunction2(*aptr);

...

void somefunction1(Mystruct5 *aptr){
printf("1,4 is:%u\n",aptr->four);
}
void somefunction2(Mystruct5 aptr){
printf("1,4 is:%u\n",aptr.four);
}

Click to expand...

Yes, this will always produce the expected output (once the
typo in the first printf() is corrected and suitable scaffolding
is added). But I don't see what bearing it has on matters of
alignment; could you elucidate?

Or may a compiler make some assumption for Mystruct4 that requires it to
be aligned on a 4 byte boundary too?

Click to expand...

Yes. It might even require 8-byte alignment. In theory, the
compiler might insert 1016 padding bytes and require 1024-byte
alignment (this is the same theory that predicts the nasal demon).

That would not prevent it from
being used in an array, and would not break the above example (after
changing Mystruct5 -> Mystruct4). But it would break both of the two
function calls if aptr was pointing to a buffer in memory where the data
was only 2 byte aligned for the structure.
Right.

Conversely, adding a 4 byte
boundary alignment requirement on Mystruct5 would add two pad bytes, for
no obvious reason, but it would not break the above code example. I see
no reason for the compiler to add the 4 byte alignment
requirement for these structures, but is it nevertheless free to do so?

Click to expand...

You and I see no pressing need for padding in these structs,
but the implementor might see one (or imagine he sees one) -- and
he's got the only vote.

glen herrmannsfeldt · Jan 18, 2013

mathog said:
Eric said:

Now, what happens when this struct is embedded in a binary file stored
in a character array "buffer[]" such that its position "i" is not
guaranteed to be aligned on the same boundary that
Mystract *instance2 = malloc(sizeof(Mystruct));
would have used, but will always be on a multiple of 2.

Click to expand...

Click to expand...

(snip)

On further consideration, how does one know what "correctly aligned"
means for a given struct? Consider these two examples:

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
} Mystruct4;

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
uint16_t five;
} Mystruct5;

The first would presumably be 8 bytes and the second 10. Is it safe to
access these in a memory buffer via a pointer so long as that pointer
is aligned on a 2 byte boundary?

Consider Alpha, where all memory accesss are on four byte boundaries.
Now, there are special instructions to access 16 byte quantities on
two byte, or even (I believe) one byte boundaries, but it seems to me
that the compiler could optimize this case if it knew the structs
was allocated on a four byte boundary, and padded the end, such that
arrays of the struct aligned them on four byte boundaries. (Or,
similarly, for other boundaries.)

I would assume yes because all of the
data within each struct has that alignment, and I am also assuming that
this code should always work:

Mystruct5 array[10]; /* valid */
Mystruct5 *aptr;
uint16_t x;
array[1].four=20;
aptr = &array[1];
/* the next three lines should all print: 1,4 is:20 */
printf("1,4 is:%u\n",aptr->five);
somefunction1(aptr);
somefunction2(*aptr);

...

Click to expand...

void somefunction1(Mystruct5 *aptr){
printf("1,4 is:%u\n",aptr->four);
}
void somefunction2(Mystruct5 aptr){
printf("1,4 is:%u\n",aptr.four);
}

Or may a compiler make some assumption for Mystruct4 that requires it to
be aligned on a 4 byte boundary too? That would not prevent it from
being used in an array, and would not break the above example (after
changing Mystruct5 -> Mystruct4). But it would break both of the two
function calls if aptr was pointing to a buffer in memory where the data
was only 2 byte aligned for the structure. Conversely, adding a 4 byte
boundary alignment requirement on Mystruct5 would add two pad bytes, for
no obvious reason, but it would not break the above code example. I see
no reason for the compiler to add the 4 byte alignment
requirement for these structures, but is it nevertheless free to do so?

Well, on 32 bit word addressed machines it is somewhat obvious what
might happen. Alpha is byte addressed but, at least until MS got into
it, required access addresses to be multiples of four.A

Given a pointer to an arbitrary (16 bit) short, the compiler has to use
the general case, but it seems to me that it could optimize the struct
access case.

-- glen

Shao Miller · Jan 18, 2013

Eric said:
Eric said:

Now, what happens when this struct is embedded in a binary file stored
in a character array "buffer[]" such that its position "i" is not
guaranteed to be aligned on the same boundary that

Mystract *instance2 = malloc(sizeof(Mystruct));

would have used, but will always be on a multiple of 2.

1. Will this always work?

function1((Mystruct *) &(buffer));

Click to expand...

Not "always," no. 6.3.2.3p7: "A pointer to an object type
may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced
type, the behavior is undefined. [...]" There's no guarantee
that the alignment of `buffer' suffices for `Mystruct'.

Click to expand...

On further consideration, how does one know what "correctly aligned"
means for a given struct? Consider these two examples:

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
} Mystruct4;

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
uint16_t five;
} Mystruct5;

The first would presumably be 8 bytes and the second 10. Is it safe to
access these in a memory buffer via a pointer so long as that pointer
is aligned on a 2 byte boundary? I would assume yes because all of the
data within each struct has that alignment, and I am also assuming that
this code should always work:

You can infer that both of these are correctly aligned for a 'uint16_t'
(the first member of each), if that type exists, which it mightn't. You
can convert both a 'Mystruct4 *' and a 'Mystruct5 *' to a 'uint16_t *'
and back again, but you cannot safely convert a 'uint16_t *' of unknown
origin to either a 'Mystruct4 *' or a 'Mystruct5 *'.

There can be padding between members and after the last member. The
alignment requirement must be an integer factor of the size of the
structure, and in C11 it must be an integer power of two. It might be
argued that this further constraint in C11 suggests that it was always
the case, or that anyone who was doing something else has no
representation with Working Group 14.

One sometimes sees padding after the penultimate member for the case of
Flexible Array Members (>= C99).

Mystruct5 array[10]; /* valid */
Mystruct5 *aptr;
uint16_t x;
array[1].four=20;
aptr = &array[1];
/* the next three lines should all print: 1,4 is:20 */
printf("1,4 is:%u\n",aptr->five);

Click to expand...

I think you meant 'aptr->four'.

somefunction1(aptr);
somefunction2(*aptr);

...

void somefunction1(Mystruct5 *aptr){
printf("1,4 is:%u\n",aptr->four);
}
void somefunction2(Mystruct5 aptr){
printf("1,4 is:%u\n",aptr.four);
}

Or may a compiler make some assumption for Mystruct4 that requires it to
be aligned on a 4 byte boundary too? That would not prevent it from
being used in an array, and would not break the above example (after
changing Mystruct5 -> Mystruct4). But it would break both of the two
function calls if aptr was pointing to a buffer in memory where the data
was only 2 byte aligned for the structure. Conversely, adding a 4 byte
boundary alignment requirement on Mystruct5 would add two pad bytes, for
no obvious reason, but it would not break the above code example. I see
no reason for the compiler to add the 4 byte alignment
requirement for these structures, but is it nevertheless free to do so?

Click to expand...

There's no need to assume, is there? We've already discussed C11 and
pre-C11 methods for detecting alignment requirements.

The alignment requirement of 'Mystruct5' is unlikely to be 10, pre-C11,
and not allowed to be 10, for C11.

Given the definition of "common initial sequence" all the way back to
C89, it is likely that you could cast a 'Mystruct5 *' to a 'Mystruct4 *'
and back again. If your code contains:

#include <stddef.h>

union {
Mystruct4 m4;
Mystruct5 m5;
} dummy;
enum e_dummy {
cv_dummy1 = offsetof(Mystruct4, four),
cv_dummy2 = offsetof(Mystruct5, four)
};

then it is guaranteed that these two enum values will compare as equal.

glen herrmannsfeldt · Jan 18, 2013

(snip)

There can be padding between members and after the last member. The
alignment requirement must be an integer factor of the size of the
structure, and in C11 it must be an integer power of two. It might be
argued that this further constraint in C11 suggests that it was always
the case, or that anyone who was doing something else has no
representation with Working Group 14.

Machines with decimal addressing are long gone now.

I believe the IBM 650 was one, and maybe a few others from
that era.

No idea about any alignment requirements, though.

-- glen

Keith Thompson · Jan 19, 2013

mathog said:
On further consideration, how does one know what "correctly aligned"
means for a given struct? Consider these two examples:

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
} Mystruct4;

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
uint16_t five;
} Mystruct5;

The first would presumably be 8 bytes and the second 10. Is it safe to
access these in a memory buffer via a pointer so long as that pointer
is aligned on a 2 byte boundary? I would assume yes because all of the
data within each struct has that alignment, and I am also assuming that
this code should always work:

The required alignment for a given type is largely up to the whims
of the compiler implementer and the compiler implementer, subject
to the constraints of the standard and common sense.

For the case above, an implementation might require 2-byte alignment
for Mystruct4, or it might require 4-byte or even 8-byte alignment.
Accesses to individual members would only require them to be 2-byte
aligned (or even 1-byte aligned on some implementations), but code
that assigns a whole struct might use 4-byte or 8-byte reads and
writes that might require 4-byte or 8-byte alignment. A single
64-bit move instruction might be much faster than copying the
struture in pieces, which could be worth the cost in alignment gaps.

For Mystruct5, a compiler could conceivably add 6 bytes of padding
to make the structure as a whole 16 bytes, and then impose a 4-byte,
8-byte, or 16-byte alignment on the structure. Again, this could
make accesses to entire Mystruct5 objects substantially faster.

[...]

mathog · Jan 20, 2013

Now, what happens when this struct is embedded in a binary file stored

in a character array "buffer[]" such that its position "i" is not
guaranteed to be aligned on the same boundary that

Mystract *instance2 = malloc(sizeof(Mystruct));

would have used, but will always be on a multiple of 2.

1. Will this always work?

function1((Mystruct *) &(buffer));

Click to expand...

(Eric Sosman wrote
Not "always," no. 6.3.2.3p7: "A pointer to an object type
may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced
type, the behavior is undefined. [...]" There's no guarantee
that the alignment of `buffer' suffices for `Mystruct'.

Click to expand...

On further consideration, how does one know what "correctly aligned"
means for a given struct? Consider these two examples:

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
} Mystruct4;

Taking into account the various answers in this thread to the preceding
question it would seem that there is still no safe (safe meaning here
"code that will work on any platform") method to use a C struct
to extract binary data directly from memory, since the compiler can
throw in padding and alignment requirements whenever it feels like it.
Even if the data is reduced to byte representation, something like this
(I know it isn't the same as Mystruct4 above):

typedef struct {
uint8_t one[2]; /* actually uint16_t */
uint8_t two[4]; /* actually uint32_t */
int8_t three[2]; /* etc. */
int8_t four;
} Mystruct4b;

it seems not to be safe to pass a "Mystruct4b *" pointer to a function
which references this data at an arbitrary location in memory. Instead
the only safe method is to pass "char *" pointers and take the data
apart with memcpy() at a very low level, moving it from memory to the
structure, or vice versa.

What an odd situation. I would have thought with all of the data that
does show up in C programming as "a series of bytes in a buffer arranged
in some particular manner", that is, pretty much any data which is
passed from machine to machine, C would have developed a method for
simplifying this sort of code. Perhaps something along these lines

/* declare memory organization, no padding, no alignment requirement */
typedef memstruct {
uint16_t one;
uint32_t two;
int16_t three;
uint8_t four;
} Mystruct4c;

Specifically so that one could pass a "Mystruct4c *" pointer to a
function, like so:

myfunction3a(Mystruct4c *ptr){
ptr->two = 5;
printf("value of three:%d\n",ptr->three);
}

and the compiler would do the "right thing", using memcpy or whatever,
to hide all of the cruft that is platform specific, up to and including
loading the magic N bytes of memory (as in the Alpha issue Glen
mentioned), shifting, and masking and so forth, to handle data types
smaller than is native for the CPU, without the programmer having to
ever care about the details. One might even dream that the compiler
could be induced to do:

Mystruct4c native,ncopy;
Mystruct4b *foreign
Mystruct4b acopy;

/* field names/sizes must match for the following statement */
alternate_representations {Mystruct4b, Mystruct4c}

char *buffer;
/* fill buffer from a file or network */
foreign = (Mystruct4b *)buffer[123];
native = *foreign; /* field to field copy, NOT a memcpy*/
/* change some data in native*/
*foreign = native; /* field to field copy */
ncopy = native; /* this is a memcpy */
acopy = *foreign; /* as is this */
*foreign = acopy; /* as is this */
ncopy = acopy; /* field to field copy */

That is, when a memstruct pointer is dereferenced it does not mean quite
the same thing as when a struct pointer is dereferenced. A few more
rules for the compiler, a lot less work for the programmer.

Thanks,

David Mathog

Shao Miller · Jan 20, 2013

What an odd situation. I would have thought with all of the data that
does show up in C programming as "a series of bytes in a buffer arranged
in some particular manner", that is, pretty much any data which is
passed from machine to machine, C would have developed a method for
simplifying this sort of code.

There are implementation extensions and libraries that both help with
this sort of thing. Some of the rest of your post resembles what's
available for certain implementations ("packed structures," for
example). I don't think it's universal enough to warrant inclusion in
the Standard, but that's just my opinion.

I know it'd be nice to be able to say, "Any C implementation supports
the features I desire," but the people who make the implementations
might disagree.

I don't know what your actual goal is, but if it involves communicating
data between different C implementations, then there's more than
alignment and padding to worry about.

You appear to wish to work with structures, for some reason. Why is
that? There are other ways to group data values together.

glen herrmannsfeldt · Jan 20, 2013

mathog said:
Now, what happens when this struct is embedded in a binary file stored
in a character array "buffer[]" such that its position "i" is not
guaranteed to be aligned on the same boundary that
Mystract *instance2 = malloc(sizeof(Mystruct));
would have used, but will always be on a multiple of 2.
1. Will this always work?

Click to expand...

Click to expand...

(snip)
(Eric Sosman wrote

Not "always," no. 6.3.2.3p7: "A pointer to an object type
may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced
type, the behavior is undefined. [...]" There's no guarantee
that the alignment of `buffer' suffices for `Mystruct'.

Click to expand...

On further consideration, how does one know what "correctly aligned"
means for a given struct? Consider these two examples:
typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
} Mystruct4;

Click to expand...

Taking into account the various answers in this thread to the preceding
question it would seem that there is still no safe (safe meaning here
"code that will work on any platform") method to use a C struct
to extract binary data directly from memory, since the compiler can
throw in padding and alignment requirements whenever it feels like it.

While I believe that is true, it is still most likely that you can
use it on any appropriate boundary.

Even if the data is reduced to byte representation, something like this
(I know it isn't the same as Mystruct4 above):

typedef struct {
uint8_t one[2]; /* actually uint16_t */
uint8_t two[4]; /* actually uint32_t */
int8_t three[2]; /* etc. */
int8_t four;
} Mystruct4b;

it seems not to be safe to pass a "Mystruct4b *" pointer to a function
which references this data at an arbitrary location in memory. Instead
the only safe method is to pass "char *" pointers and take the data
apart with memcpy() at a very low level, moving it from memory to the
structure, or vice versa.

Yes, that is always the most reliable way. Especially if you have
the possibility of different endianness. (Not to menion all the other
possible different representations.)

What an odd situation. I would have thought with all of the data that
does show up in C programming as "a series of bytes in a buffer arranged
in some particular manner", that is, pretty much any data which is
passed from machine to machine, C would have developed a method for
simplifying this sort of code. Perhaps something along these lines

Well, pretty much it optimizes for passing data around within the
program.

For many years Fortran required, or at least it was believed to require,
that COMMON blocks be packed (no padding). On some machine, that was
just a little slow, on others it required run-time trap for the
misaligned access, copy the data, perform the operation, copy the data
back again, and then return. Much much slower.

Most RISC processors require data to be aligned, though some have
special instructions for access to misaligned data. (Faster than a byte
copy and performing the operation on the copy.)

/* declare memory organization, no padding, no alignment requirement */
typedef memstruct {
uint16_t one;
uint32_t two;
int16_t three;
uint8_t four;
} Mystruct4c;

Doesn't help if the actual representation, such as endianness,
is different.

Specifically so that one could pass a "Mystruct4c *" pointer to a
function, like so:

myfunction3a(Mystruct4c *ptr){
ptr->two = 5;
printf("value of three:%d\n",ptr->three);
}

and the compiler would do the "right thing", using memcpy or whatever,
to hide all of the cruft that is platform specific, up to and including
loading the magic N bytes of memory (as in the Alpha issue Glen
mentioned), shifting, and masking and so forth, to handle data types
smaller than is native for the CPU, without the programmer having to
ever care about the details.

In the cases where the compiler knows in advance, it isn't
so bad. But then if you pass a pointer to misaligned data, then it is
referenced in a place that doesn't expect it to be misaligned.
Or, the compiler has to generate slow (maybe much slower) code
for all accesses.

One might even dream that the compiler could be induced to do:

Mystruct4c native,ncopy;
Mystruct4b *foreign
Mystruct4b acopy;

/* field names/sizes must match for the following statement */
alternate_representations {Mystruct4b, Mystruct4c}

Well, when C was new there were still plenty of 36 bit, 48 bit,
and 60 bit machines around, and probably others that I don't know
about. In addition, there is at least (still in the standard) allowance
for sign magnitude or ones complement representation, and finally
endianness. But yes, it could be done.

char *buffer;
/* fill buffer from a file or network */
foreign = (Mystruct4b *)buffer[123];
native = *foreign; /* field to field copy, NOT a memcpy*/
/* change some data in native*/
*foreign = native; /* field to field copy */
ncopy = native; /* this is a memcpy */
acopy = *foreign; /* as is this */
*foreign = acopy; /* as is this */
ncopy = acopy; /* field to field copy */

Well, there is XDR http://www.ietf.org/rfc/rfc4506.txt
which will do all the work for any reasonable, and also not
so reasonable representation.

That is, when a memstruct pointer is dereferenced it does not mean quite
the same thing as when a struct pointer is dereferenced. A few more
rules for the compiler, a lot less work for the programmer.

Well, as in another thread, consider C a portable assembler.
It helps you, but you still have to do some of the work.

-- glen

Ben Bacarisse · Jan 20, 2013

mathog said:
Taking into account the various answers in this thread to the preceding
question it would seem that there is still no safe (safe meaning here
"code that will work on any platform") method to use a C struct
to extract binary data directly from memory, since the compiler can
throw in padding and alignment requirements whenever it feels like it.
Even if the data is reduced to byte representation, something like this
(I know it isn't the same as Mystruct4 above):

typedef struct {
uint8_t one[2]; /* actually uint16_t */
uint8_t two[4]; /* actually uint32_t */
int8_t three[2]; /* etc. */
int8_t four;
} Mystruct4b;

it seems not to be safe to pass a "Mystruct4b *" pointer to a function
which references this data at an arbitrary location in memory.
Instead the only safe method is to pass "char *" pointers and take the
data apart with memcpy() at a very low level, moving it from memory to
the structure, or vice versa.

What an odd situation.

Maybe it looks odder now than it did in the days C was designed. In
those days, there was a veritable wild west of machine formats: 36-bit
ints, 9-bit bytes, 24-bit addresses, 60-bit floats, little endian, big
endian and even middle-endian byte orders, etc, etc. Since you could
not even map a couple of bytes onto an int and be sure you have the
right value, it would have seemed pointless to try to address the other
problems of alignment and packing.

I would have thought with all of the data that
does show up in C programming as "a series of bytes in a buffer
arranged in some particular manner", that is, pretty much any data
which is passed from machine to machine, C would have developed a
method for simplifying this sort of code. Perhaps something along
these lines

/* declare memory organization, no padding, no alignment requirement */
typedef memstruct {
uint16_t one;
uint32_t two;
int16_t three;
uint8_t four;
} Mystruct4c;

You'd want some syntax to distinguish these access-inefficient structs
from normal ones. __attribute__((packed)) maybe?

Specifically so that one could pass a "Mystruct4c *" pointer to a
function, like so:

myfunction3a(Mystruct4c *ptr){
ptr->two = 5;
printf("value of three:%d\n",ptr->three);
}

and the compiler would do the "right thing", using memcpy or whatever,
to hide all of the cruft that is platform specific, up to and
including loading the magic N bytes of memory (as in the Alpha issue
Glen mentioned), shifting, and masking and so forth, to handle data
types smaller than is native for the CPU, without the programmer
having to ever care about the details.

But you would still not have portable struct to byte array mapping due
to differing byte orders. What about differing signed integer
representations? What about floating-point formats? You could deal
with all these too, of course, but at some point you have to say that
this is not something that should be built in to a programming
language.

<snip>

Keith Thompson · Jan 20, 2013

mathog said:
What an odd situation. I would have thought with all of the data that
does show up in C programming as "a series of bytes in a buffer arranged
in some particular manner", that is, pretty much any data which is
passed from machine to machine, C would have developed a method for
simplifying this sort of code. Perhaps something along these lines

/* declare memory organization, no padding, no alignment requirement */
typedef memstruct {
uint16_t one;
uint32_t two;
int16_t three;
uint8_t four;
} Mystruct4c;

Specifically so that one could pass a "Mystruct4c *" pointer to a
function, like so:

myfunction3a(Mystruct4c *ptr){
ptr->two = 5;
printf("value of three:%d\n",ptr->three);
}

and the compiler would do the "right thing", using memcpy or whatever,
to hide all of the cruft that is platform specific, up to and including
loading the magic N bytes of memory (as in the Alpha issue Glen
mentioned), shifting, and masking and so forth, to handle data types
smaller than is native for the CPU, without the programmer having to
ever care about the details.

[...]

Consider this:

Mystruct4c obj;
void func(uint32_t *arg);
func(&obj.two);

Given your definition of what a "memstruct" is, the code that implements
func() would have to allow for the possibility that its argument points
to an unaligned int32_t object.

(Note that gcc's "__attribute__((packed))" doesn't solve this; see my
discussion here: http://stackoverflow.com/q/8568432/827263 and here:
http://stackoverflow.com/a/8568441/827263.)

The alternative, I suppose, would be to forbid taking the address of a
member of a "memstruct", treating its members much like bit fields.

Inserting IPv4 header checksum into dummy IP header	6	Dec 1, 2010
Reading little-endian data from a file in a portable manner	46	Jul 16, 2010
Structures...	1	May 9, 2012
Returning structures from functions.	15	Sep 11, 2008
gcc alignment options	19	Sep 16, 2012
writing uint16_t in a buffer	7	Dec 2, 2008
Serialization Framework	3	Dec 16, 2012
types and conversions	14	May 31, 2010

Data alignment questin, structures

mathog

Ben Bacarisse

Eric Sosman

Shao Miller

mathog

Shao Miller

Jorgen Grahn

Eric Sosman

mathog

James Kuyper

Eric Sosman

glen herrmannsfeldt

Shao Miller

glen herrmannsfeldt

Keith Thompson

mathog

Shao Miller

glen herrmannsfeldt

Ben Bacarisse

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads