structure layout question

K

kyle york

Greetings,

Why does the C standard require the members of a structure not be
re-ordered (6.2.5.20)? Padding is allowed, and platform dependent, which
means one cannot rely on the exact layout anyway, so what's the point?

Without this restriction the compiler could layout the structure in the
most efficient way possible, for some definition of efficient. It would
be easy enough to turn this reordering off with a compiler specific
pragma as is often done with padding.
 
W

Walter Roberson

kyle york said:
Why does the C standard require the members of a structure not be
re-ordered (6.2.5.20)? Padding is allowed, and platform dependent, which
means one cannot rely on the exact layout anyway, so what's the point?

As best I recall, arbitrary padding is not permitted: only
where required to bring the entry into alignment (where alignment
rules are platform dependant.) So if you know the alignment rules
for (say) a "pure" double, then you also know the alignment
rules for a double embedded in a struct.

In any case, take the structure and wrap it around with a union,
the other member of which is an array of unsigned char. This is
a legal way to get at the bytes that make up the structure without
worrying about trap representations, and there are scenarios you
can construct where the alignment rules provide guarantees about
what will be where in the unsigned char array that wouldn't be met
if reordering was possible. For example, take the offset of a
member that was known to be followed by a char. char is the most
general alignment, so you know it followed -immediately- after the
end of the member. You know the size of the member via sizeof.
So the unsigned char array indexed at the offset of the member,
plus the sizeof the member, is certain to get you to the beginning of
that char -- but if you'd reordered the elements, the char could
be anywhere relative to the member in question.

Without this restriction the compiler could layout the structure in the
most efficient way possible, for some definition of efficient. It would
be easy enough to turn this reordering off with a compiler specific
pragma as is often done with padding.

And it would be easy enough for a compiler to provide a pragma to
pack efficiently, possibly breaking ordering or possibly requiring
slow movements in and out of internal char buffers if the architecture
doesn't support "move unaligned". You could call it something wild such
as ... ummm, say, #pragma pack
 
E

Eric Sosman

kyle york wrote On 03/01/07 13:06,:
Greetings,

Why does the C standard require the members of a structure not be
re-ordered (6.2.5.20)? Padding is allowed, and platform dependent, which
means one cannot rely on the exact layout anyway, so what's the point?

Without this restriction the compiler could layout the structure in the
most efficient way possible, for some definition of efficient. It would
be easy enough to turn this reordering off with a compiler specific
pragma as is often done with padding.

The first element of a struct must come first, and
must be preceded by no padding. This allows a struct
pointer to be converted to a pointer to the struct's
first element and vice versa, which is a useful property.

Under certain conditions, different struct types
that share a "common initial subsequence" of elements
can be accessed through a pointer to either type, so
long as the accesses are to the common elements. That's
another useful property.

A struct is often not just a bag of related elements,
but also a description of a "published" format. For
example, an image file that starts with a "magic number"
followed by version numbers followed by ... may well be
described by a struct. There are portability issues
with such usages, but they are useful nonetheless.
 
B

Ben Pfaff

(e-mail address removed)-cnrc.gc.ca (Walter Roberson) writes:

[in a struct]
As best I recall, arbitrary padding is not permitted: only
where required to bring the entry into alignment (where alignment
rules are platform dependant.) So if you know the alignment rules
for (say) a "pure" double, then you also know the alignment
rules for a double embedded in a struct.

This is a nice theory, but I can't see how to back it up with a
quote from the standard. The text of the standard says "There
may be unnamed padding within a structure object, but not at its
beginning." and I don't see any restrictions on that.
 
W

Walter Roberson

(e-mail address removed)-cnrc.gc.ca (Walter Roberson) writes:
[in a struct]
As best I recall, arbitrary padding is not permitted: only
where required to bring the entry into alignment (where alignment
rules are platform dependant.) So if you know the alignment rules
for (say) a "pure" double, then you also know the alignment
rules for a double embedded in a struct.
This is a nice theory, but I can't see how to back it up with a
quote from the standard. The text of the standard says "There
may be unnamed padding within a structure object, but not at its
beginning." and I don't see any restrictions on that.

There is a bit more wording that that in C89 3.5.2.1:

Each non-bit-field member of a structure or union object is
aligned in an implementation- defined manner appropriate to its type.

Within a structure object, the non-bit-field members and the units
in which bit-fields reside have addresses that increase in the order
in which they are declared. A pointer to a structure object,
suitably converted, points to its initial member (or if that
member is a bit-field, then to the unit in which it resides), and
vice versa. There may theefore be unnamed padding within a
structure object, but not at its beginning, as necessary to achieve the
appropriate alignment.


Notice that the alignment within the structure is in a manner
"appropriate to its type". I interpret that as the alignment
appropriate to the type "ex-vivo", outside of structure. I do not
see anything there that would suggest that a member could have one
alignment within structures and a different alignment outside of
structures.

Notice the padding is not "for arbitrary purposes", but only
"as necessary to achieve the appropriate alignment".

I only apply this hypothesis to the padding -inside- the structure,
and not to any "trailing" padding of the structure. For example,
the classic struct {int i; char c;} might have trailing padding
so that you can form effecient arrays of such elements.
 
B

Ben Pfaff

(e-mail address removed)-cnrc.gc.ca (Walter Roberson) writes:
[in a struct]
As best I recall, arbitrary padding is not permitted: only
where required to bring the entry into alignment (where alignment
rules are platform dependant.) So if you know the alignment rules
for (say) a "pure" double, then you also know the alignment
rules for a double embedded in a struct.
This is a nice theory, but I can't see how to back it up with a
quote from the standard. The text of the standard says "There
may be unnamed padding within a structure object, but not at its
beginning." and I don't see any restrictions on that.

There is a bit more wording that that in C89 3.5.2.1:

Each non-bit-field member of a structure or union object is
aligned in an implementation- defined manner appropriate to its type.

Within a structure object, the non-bit-field members and the units
in which bit-fields reside have addresses that increase in the order
in which they are declared. A pointer to a structure object,
suitably converted, points to its initial member (or if that
member is a bit-field, then to the unit in which it resides), and
vice versa. There may theefore be unnamed padding within a
structure object, but not at its beginning, as necessary to achieve the
appropriate alignment.

Notice that the alignment within the structure is in a manner
"appropriate to its type". I interpret that as the alignment
appropriate to the type "ex-vivo", outside of structure. I do not
see anything there that would suggest that a member could have one
alignment within structures and a different alignment outside of
structures.

Interesting, I hadn't noticed that nuance.

But I don't think that "appropriate to its type" means that there
couldn't be more padding than necessary, or that it must be the
same padding as outside an structure.
Notice the padding is not "for arbitrary purposes", but only
"as necessary to achieve the appropriate alignment".

The sentence "There may therefore..." was changed in C99 to
"There may be unnamed padding within a structure object, but not
at its beginning." (as I quoted earlier), which may indicate a
change in requirements by the Standard. (But I do not know.)
 
E

Eric Sosman

Walter Roberson wrote On 03/01/07 14:17,:
[concerning "excessive" padding in structs]

Notice the padding is not "for arbitrary purposes", but only
"as necessary to achieve the appropriate alignment".

I don't think "appropriate" implies "minimal." The
Rationale gives some hints as to how the authors wanted
these passages understood:

3.5.2.1 Structure and union specifiers
...
Since some existing implementations, in the interest
of enhanced access time, leave internal holes larger
than absolutely necessary, [...]

(As the section number indicates, this is from the original
ANSI C Rationale and not from a more recent ISO version. The
text might therefore be considered "closer" to the original
authors' thinking on the matter.)
 
S

Stephen Sprunk

kyle york said:
Why does the C standard require the members of a structure not be
re-ordered (6.2.5.20)? Padding is allowed, and platform dependent,
which means one cannot rely on the exact layout anyway, so what's
the point?

Without this restriction the compiler could layout the structure in the
most efficient way possible, for some definition of efficient. It would be
easy enough to turn this reordering off with a compiler specific pragma as
is often done with padding.

The standard's wording guarantees that two structs defined with the same
initial elements will be laid out the same way in memory and that, as long
as you access only common members, they will be interchangeable. It also
means that any later elements that are not common will be laid out _after_
the common ones, not interspersed with the common ones.

If a compiler was allowed to reorder the elements, these properties would
not hold and a lot of code would break.

S
 
I

Ian Collins

Stephen said:
The standard's wording guarantees that two structs defined with the same
initial elements will be laid out the same way in memory and that, as long
as you access only common members, they will be interchangeable. It also
means that any later elements that are not common will be laid out _after_
the common ones, not interspersed with the common ones.

If a compiler was allowed to reorder the elements, these properties would
not hold and a lot of code would break.
Just to add to this, the above guarantee is important where structures
are members of a union and the first member or members are used to
identify the appropriate type. One example of this is the X-windows
event object which is a union of all possible event structs.
 
W

Walter Roberson

Stephen Sprunk said:
The standard's wording guarantees that two structs defined with the same
initial elements will be laid out the same way in memory and that, as long
as you access only common members, they will be interchangeable.

C89 makes that guarantee where the two structs are the common
prefix of a union encompassing both, but does C89 or C99 promise
it if not union is involved?
 
D

Doug

The standard's wording guarantees that two structs defined with the same
initial elements will be laid out the same way in memory

<snip>

Ben, Walter,

I'm interested in your discussion about 'minimal' vs 'appropriate'
padding. If we accept Stephen's statement above (which I *think* is
true), then I think this means that there must be a canonical way to
lay out a structure.

If this canonical method is compiler-specific, then I guess
appropriate might not equal minimal.

But I read the standard to say that all compilers (on a given arch)
should lay the structure out in memory the same way (without use of
#pragma pack, etc.). (Thus I can use a 3rd party library without
caring about what it was compiled with.)

Putting this all together, surely that means that all compilers (on
the same arch) must share the same canonical method of laying out a
structure? If so, then surely 'minimal' is the only sensible way to
agree on that canonical method?

Or am I way off base?

Thanks,
Doug
 
M

Malcolm McLean

kyle york said:
Why does the C standard require the members of a structure not be
re-ordered (6.2.5.20)? Padding is allowed, and platform dependent, which
means one cannot rely on the exact layout anyway, so what's the point?

Without this restriction the compiler could layout the structure in the
most efficient way possible, for some definition of efficient. It would be
easy enough to turn this reordering off with a compiler specific pragma as
is often done with padding.
The rules are a hangover from the bad old days when people would play silly
games, like putting a header in from of a variable-sized array and defining
the whole thing as a struct with a zero length array as the last member.
The other favourite is a disambiguation field for packets.

With modern processors and modern programming conventions, it should not be
necessary to rely on the arrangement of structure elements in memory. As you
say, doing so is fraught with portability bugs. But legacy code needs to be
supported, which is more important than minor gains in efficiency you might
obtain by allowing the compiler to organise the elements itself.
 
K

Keith Thompson

C89 makes that guarantee where the two structs are the common
prefix of a union encompassing both, but does C89 or C99 promise
it if not union is involved?

I don't believe so, but it's difficult to imagine an implementation
that would lay out the structs differently if there doesn't happen to
be a union declaration. The mythical DS9K probably does so, but the
compiler has to expend a lot of effort to prove that there is no union
declared anywhere in the scope of the two types; in some cases, it has
to postpone layout decisions until link time.
 
K

Keith Thompson

Doug said:
<snip>

Ben, Walter,

I'm interested in your discussion about 'minimal' vs 'appropriate'
padding. If we accept Stephen's statement above (which I *think* is
true), then I think this means that there must be a canonical way to
lay out a structure.

If this canonical method is compiler-specific, then I guess
appropriate might not equal minimal.

But I read the standard to say that all compilers (on a given arch)
should lay the structure out in memory the same way (without use of
#pragma pack, etc.). (Thus I can use a 3rd party library without
caring about what it was compiled with.)

Putting this all together, surely that means that all compilers (on
the same arch) must share the same canonical method of laying out a
structure? If so, then surely 'minimal' is the only sensible way to
agree on that canonical method?

Or am I way off base?

I'm afraid you are.

Having all compilers on a given architecture lay out structures in the
same way is (almost) certainly a good idea, and it might be required
by some architecture-specific standard. But the C standard itself
imposes no such requirement.

Realistically, there can even be good reasons for the layout to be
inconsistent; one compiler might optimize the layout for efficiency,
and another might optimize for compatibility with, say, some earlier
version of the same architecture. Interoperation of code compiled by
the two compilers would be difficult, but as I said the C standard
doesn't require it to be easy.
 
I

Ian Collins

Malcolm said:
With modern processors and modern programming conventions, it should not
be necessary to rely on the arrangement of structure elements in memory.
As you say, doing so is fraught with portability bugs. But legacy code
needs to be supported, which is more important than minor gains in
efficiency you might obtain by allowing the compiler to organise the
elements itself.
No matter how modern the code is, the problem of disambiguation of
structures in a union remains.
 
E

Eric Sosman

Doug wrote On 03/01/07 16:24,:
<snip>

Ben, Walter,

I'm interested in your discussion about 'minimal' vs 'appropriate'
padding. If we accept Stephen's statement above (which I *think* is
true), then I think this means that there must be a canonical way to
lay out a structure.

If this canonical method is compiler-specific, then I guess
appropriate might not equal minimal.

But I read the standard to say that all compilers (on a given arch)
should lay the structure out in memory the same way (without use of
#pragma pack, etc.). (Thus I can use a 3rd party library without
caring about what it was compiled with.)

No, the C Standard has no concept of an "arch;" it talks
only about "implementations." Every implementation must meet
the requirements of the Standard, but can do so in any way it
chooses. There is no guarantee (in the C Standard) that gcc
and Frobozz Magic C will make identical decisions, even if
they run on the same machine. There is no guarantee even
that a single compiler will make the same decisions when run
with different option flags! As far as the C Standard can
see, "gcc" and "gcc -fomit-frame-pointer" are two distinct
implementations, and need not be compatible.

That said, most platforms publish some kind of "Application
Binary Interface" that specifies some of the decisions that the
C Standard leaves unmade. If you're supposed to pass some kind
of struct to a system service, the struct must be laid out in
thus-and-such a way, and all compilers on that platform had
better toe the line. So the standard you mention usually does
exist -- except that it's not The Standard, and it may or may
not describe things in terms of language-specific constructs
like structs.
Putting this all together, surely that means that all compilers (on
the same arch) must share the same canonical method of laying out a
structure? If so, then surely 'minimal' is the only sensible way to
agree on that canonical method?

I don't see how "canonical" implies "minimal."
Or am I way off base?

I once saw a relief pitcher enter a baseball game and seal
the win without throwing even one pitch. The Red Sox were down
by two in the top of the ninth in Baltimore, with two out and a
man on base. Carlton Fisk singled, advancing the runner to
third and putting himself on first as the tying run. In came
the reliever to face the next batter, the potential go-ahead run.
He took his warm-up throws, got his sign from the catcher, and
threw to first to pick off Fisk and end the game.

He would *definitely* have had you flat-footed. ;-)
 
M

Malcolm McLean

Ian Collins said:
No matter how modern the code is, the problem of disambiguation of
structures in a union remains.
Generally you don't need unions.
A 256-byte packet arrives. Instead of trying to define the bit pattern with
a C union, you can take the first two bytes, switch on the type, and then
create the appropriate structure, reading that data packet one byte at a
time.
If you need a generic function, it accepts a void * together with some
information telling it what type of packet was received. That probably
entails a call to malloc() to receive the data, but that doesn't matter on a
modern system.
 
D

Dave Vandervies

Eric Sosman said:
Under certain conditions, different struct types
that share a "common initial subsequence" of elements
can be accessed through a pointer to either type, so
long as the accesses are to the common elements. That's
another useful property.

Aren't the condititions for that pretty much "You're not running it on
the DS9k"?


dave
 
I

Ian Collins

Malcolm said:
Generally you don't need unions.
A 256-byte packet arrives. Instead of trying to define the bit pattern
with a C union, you can take the first two bytes, switch on the type,
and then create the appropriate structure, reading that data packet one
byte at a time.
If you need a generic function, it accepts a void * together with some
information telling it what type of packet was received. That probably
entails a call to malloc() to receive the data, but that doesn't matter
on a modern system.
It sure does on the 'modern' 8 and 16 bit embedded devices I work with!
 
S

Stephen Sprunk

Malcolm McLean said:
Generally you don't need unions.
A 256-byte packet arrives. Instead of trying to define the bit
pattern with a C union, you can take the first two bytes, switch on
the type, and then create the appropriate structure, reading that
data packet one byte at a time.
If you need a generic function, it accepts a void * together with
some information telling it what type of packet was received.
That probably entails a call to malloc() to receive the data, but
that doesn't matter on a modern system.

A common technique for implementing inheritance in C is to have the first
element in a subclass be the superclass, or have the initial elements be
identical. That allows you to pass a "Tree" struct to a function that
expects a "Plant" struct with only a cast -- but only if the condition we're
discussing holds. Passing a Plant subclass in as a void* and enumerating
its type requires a function that only cares about the Plant parts to know
about every possible type of Plant subclass and a monstrous switch statement
that would cast the argument to one of potentially millions of different
types of Plants, just so it could access an element that _should_ be at the
same offset in all of them. That's just wasteful.

S
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top