Pre-offsetof() question

  • Thread starter Arthur J. O'Dwyer
  • Start date
A

Arthur J. O'Dwyer

As far as I know, C89/C90 did not contain the
now-standard offsetof() macro.

Did C89 mandate that structs had to have a consistent
layout? For example, consider the typical layout of
the following structure:

struct weird
{
int x; /* sizeof(int)==4 here */
double y; /* sizeof(double)==8 here */
int z;
};

Now, let's suppose that the target architecture has typical
80x86 alignment requirements, where 'int' aligns on 4-byte
boundaries and 'double' on 8-byte boundaries.
A C99 compiler might produce a layout that looked like
this:

|_x__|####|___y____|_z__|####|

sizeof (struct weird) == 24 bytes


But could a C89, pre-offsetof() compiler decide to make
the layout of the struct vary, like this:

|_x__|####|___y____|_z__| on 8-byte alignment

|_x__|___y____|####|_z__| on 4-byte alignment

sizeof (struct weird) == 20 bytes


Note that the relative ordering of the members is
preserved; each 'struct weird' has the same size in
bytes; and all objects are properly aligned for their
type. But the "weird" ordering has saved us 4 bytes
per structure!

Does C89 allow this, or is it disallowed by something
in that standard? If so, what?

TIA,
-Arthur
 
E

Eric Sosman

Arthur J. O'Dwyer said:
As far as I know, C89/C90 did not contain the
now-standard offsetof() macro.

Did C89 mandate that structs had to have a consistent
layout? For example, consider the typical layout of
the following structure:

struct weird
{
int x; /* sizeof(int)==4 here */
double y; /* sizeof(double)==8 here */
int z;
};

Now, let's suppose that the target architecture has typical
80x86 alignment requirements, where 'int' aligns on 4-byte
boundaries and 'double' on 8-byte boundaries.
A C99 compiler might produce a layout that looked like
this:

|_x__|####|___y____|_z__|####|

sizeof (struct weird) == 24 bytes

But could a C89, pre-offsetof() compiler decide to make
the layout of the struct vary, like this:

|_x__|####|___y____|_z__| on 8-byte alignment

|_x__|___y____|####|_z__| on 4-byte alignment

sizeof (struct weird) == 20 bytes

Note that the relative ordering of the members is
preserved; each 'struct weird' has the same size in
bytes; and all objects are properly aligned for their
type. But the "weird" ordering has saved us 4 bytes
per structure!

Does C89 allow this, or is it disallowed by something
in that standard? If so, what?

No version of the Standard describes what alignments
are to be enforced. However, the rules for compatibility
of types guarantee that the same struct type will have the
same arrangement of padding bytes in all translation units.

Could this arrangement be different depending on flags
calling for different "strictnesses" of alignment? Yes, of
course -- but this isn't a contradiction, because using a
different set of compiler flags gives you a different
implementation of C, and the Standard makes no requirement
that translation units compiled by different implementations
must interoperate.

By the way, note that your 8-byte alignment example is
faulty. If a double must be aligned to an 8-byte boundary,
the sizeof a struct containing a double must be a multiple
of 8 bytes. Otherwise, you would not be able to malloc()
an array of two such structs:

struct weird *p = malloc(2 * sizeof *p); // assume 40

0 4 8 16 20 24 28 36 40
|_x__|####|___y____|_z__|_x__|####|___y____|_z__|
^ ^
| |
p p+1

Note that (p+1)->y is mis-aligned.
 
A

Arthur J. O'Dwyer

Full stop: C89 invented the <stddef.h> header, and specified
that it must provide offsetof().

Oops. I guess the point is moot, then.
No version of the Standard describes what alignments
are to be enforced. However, the rules for compatibility
of types guarantee that the same struct type will have the
same arrangement of padding bytes in all translation units.

How so? (Obviously, the existence of 'offsetof' assumes
that all 'struct weird's will have the same layout -- but
would that rule be explicitly stated anywhere if 'offsetof'
didn't exist?)

By the way, note that your 8-byte alignment example is
faulty. If a double must be aligned to an 8-byte boundary,
the sizeof a struct containing a double must be a multiple
of 8 bytes.

Why? (Other than the paragraph which in N869 is 7.17#3,
that is.)
Otherwise, you would not be able to malloc()
an array of two such structs:

struct weird *p = malloc(2 * sizeof *p); // assume 40

0 4 8 16 20 24 28 36 40
|_x__|####|___y____|_z__|_x__|####|___y____|_z__|

Ah -- your diagram is incorrect. :) The "correct" layout
for two optimized (but apparently non-conforming) 'struct
weird's is:
0 4 8 16 20 24 28 36 40
|_x__|####|___y____|_z__|_x__|___y____|####|_z__|
^ ^
| |
p p+1

Note that (p+1)->y is mis-aligned.

Not anymore -- not if we remove 7.17#3. I had thought
that C89 didn't have offsetof(); apparently I was
wrong. Never mind, then.

-Arthur
 
E

Eric Sosman

Arthur J. O'Dwyer said:
Why? (Other than the paragraph which in N869 is 7.17#3,
that is.)


Ah -- your diagram is incorrect. :) The "correct" layout
for two optimized (but apparently non-conforming) 'struct
weird's is:


Not anymore -- not if we remove 7.17#3. I had thought
that C89 didn't have offsetof(); apparently I was
wrong. Never mind, then.

Aha! Finally, the mystery of why offsetof intruded itself
into an apparently unrelated question becomes clear. Just to
be sure I've understood you: You're wondering whether different
instances of struct weird in the same program could arrange
their padding differently. Clearly, this cannot be the case
if offsetof(struct weird, y) is single-valued.

But even without offsetof I think you can rule out such
shenanigans. True, direct assignment of struct objects might
perhaps be clever enough to play games. But memcpy() must
also work:

struct weird *p = malloc(2 * sizeof *p);
p[0].x = ...; p[0].y = ...; p[0].z = ...;
memcpy (p+1, p, sizeof *p);
assert (p[1].x == p[0].x);
assert (p[1].y == p[0].y); // the crucial point
assert (p[2].z == p[0].z);

Since memcpy() knows only the size of the data being copied
and nothing about the nature of the object those data bytes
represent, it cannot possibly know enough to "slide" the
`y' element while copying the bag of bytes from one place
to another. Similar remarks apply to realloc() and to
fwrite()/fread(), and to other type-oblivious ways of moving
data from place to place.
 
A

Arthur J. O'Dwyer

Aha! Finally, the mystery of why offsetof intruded itself
into an apparently unrelated question becomes clear. Just to
be sure I've understood you: You're wondering whether different
instances of struct weird in the same program could arrange
their padding differently. Clearly, this cannot be the case
if offsetof(struct weird, y) is single-valued.

Yes! You've hit the nail on the head.
But even without offsetof I think you can rule out such
shenanigans. True, direct assignment of struct objects might
perhaps be clever enough to play games. But memcpy() must
also work:

struct weird *p = malloc(2 * sizeof *p);
p[0].x = ...; p[0].y = ...; p[0].z = ...;
memcpy (p+1, p, sizeof *p);
assert (p[1].x == p[0].x);
assert (p[1].y == p[0].y); // the crucial point
assert (p[2].z == p[0].z);

Yes, but *must* these 'assert(...)'s succeed? (Obviously
they needn't succeed if p[0].y is a trap representation,
or one of p[0],p[1] is volatile, for instance.)

Where does it say that

foo x = ...;
foo y = ...;
memcpy(&x, &y, sizeof (foo))
assert (x==y);

must necessarily succeed? I don't see anywhere, except perhaps
footnote 38 (which says that struct assignment may be done
"element-at-a-time or via memcpy"). And I don't think footnotes
are normative, even if the intent of the footnote were clearer.

-Arthur
[Remember, the whole question is moot.] ;-)
 
J

Jack Klein

Aha! Finally, the mystery of why offsetof intruded itself
into an apparently unrelated question becomes clear. Just to
be sure I've understood you: You're wondering whether different
instances of struct weird in the same program could arrange
their padding differently. Clearly, this cannot be the case
if offsetof(struct weird, y) is single-valued.

Yes! You've hit the nail on the head.
But even without offsetof I think you can rule out such
shenanigans. True, direct assignment of struct objects might
perhaps be clever enough to play games. But memcpy() must
also work:

struct weird *p = malloc(2 * sizeof *p);
p[0].x = ...; p[0].y = ...; p[0].z = ...;
memcpy (p+1, p, sizeof *p);
assert (p[1].x == p[0].x);
assert (p[1].y == p[0].y); // the crucial point
assert (p[2].z == p[0].z);

Yes, but *must* these 'assert(...)'s succeed? (Obviously
they needn't succeed if p[0].y is a trap representation,
or one of p[0],p[1] is volatile, for instance.)

Where does it say that

foo x = ...;
foo y = ...;
memcpy(&x, &y, sizeof (foo))
assert (x==y);

must necessarily succeed? I don't see anywhere, except perhaps
footnote 38 (which says that struct assignment may be done
"element-at-a-time or via memcpy"). And I don't think footnotes
are normative, even if the intent of the footnote were clearer.

-Arthur
[Remember, the whole question is moot.] ;-)

What you missed is:

========
6.2.6 Representations of types

6.2.6.1 General

1 The representations of all types are unspecified except as stated in
this subclause.

2 Except for bit-fields, objects are composed of contiguous sequences
of one or more bytes, the number, order, and encoding of which are
either explicitly specified or implementation-defined.

3 Values stored in unsigned bit-fields and objects of type unsigned
char shall be represented using a pure binary notation.

4 Values stored in non-bit-field objects of any other object type
consist of n ´ CHAR_BIT bits, where n is the size of an object of that
type, in bytes. The value may be copied into an object of type
unsigned char [n] (e.g., by memcpy); the resulting set of bytes is
called the object representation of the value. Values stored in
bit-fields consist of m bits, where m is the size specified for the
bit-field. The object representation is the set of m bits the
bit-field comprises in the addressable storage unit holding it. Two
values (other than NaNs) with the same object representation compare
equal, but values that compare equal may have different object
representations.
========

From C99, and note the last sentence in paragraph 4.

Even without this, it would be impossible pass or return structures or
pointers to structures to functions in separate translation units if
an identical structure definition did not result in identically laid
out objects.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq
 
A

Arthur J. O'Dwyer

What you missed is:

========
6.2.6 Representations of types

6.2.6.1 General

1 The representations of all types are unspecified except as stated in
this subclause.

2 Except for bit-fields, objects are composed of contiguous sequences
of one or more bytes, the number, order, and encoding of which are
either explicitly specified or implementation-defined.

Okay, no problems here. The "weird" layout can be defined easily
by the implementation.
3 Values stored in unsigned bit-fields and objects of type unsigned
char shall be represented using a pure binary notation.

4 Values stored in non-bit-field objects of any other object type
consist of n ´ CHAR_BIT bits, where n is the size of an object of that
type, in bytes. The value may be copied into an object of type
unsigned char [n] (e.g., by memcpy); the resulting set of bytes is
called the object representation of the value. Values stored in
bit-fields consist of m bits, where m is the size specified for the
bit-field. The object representation is the set of m bits the
bit-field comprises in the addressable storage unit holding it. Two
values (other than NaNs) with the same object representation compare
equal,

Okay, this is the part I assume you mean. Well,
<devil's-advocate>
what exactly does it mean for two structs to "compare equal"?
I mean, you can't use the == operator on structs, right? And if
we can only talk about member-by-member equality, well then we'll
have to consider a *member-by-member* memcpy -- which works fine!
but values that compare equal may have different object
representations.
========

From C99, and note the last sentence in paragraph 4.

(And not in N869, right?)
Even without this, it would be impossible pass or return structures or
pointers to structures to functions in separate translation units if
an identical structure definition did not result in identically laid
out objects.

Debatable. But irrelevant. ;-)
Remember, the "weird" layout is perfectly consistent between t.u.'s.
A compiler could say, "Okay, this struct is a candidate for
weirdification," and generate appropriate code across all t.u.'s,
easily enough.

-Arthur
[Remember, still moot.]

P.S.-- As a small on-topic note, am I completely mistaken in my
prior belief that 'offsetof' was a relatively recent addition to
C? If so, why do we get so many variations on FAQ 2.14? :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top