how is a struct represented internally.

X

xpika

out of curiosity I would like to know how a struct is represented in C

and then knowing that how to access the members manually.

All Ive gathered is that The first member is always the same as the
address of the struct.
 
N

Nick Keighley

out of curiosity I would like to know how a struct is represented in C

the short answer to this is "why do you need to know?"

the second shortest answer is it is highly compiler (even compiler
version)
dependent.
and then knowing that how to access the members manually.

take a look at the offsetof() macro, it may answer the questions you
have

All Ive gathered is that The first member is always the same as the
address of the struct.


but, again, WHY do you want to do this?
 
W

Walter Roberson

out of curiosity I would like to know how a struct is represented in C
and then knowing that how to access the members manually.
All Ive gathered is that The first member is always the same as the
address of the struct.

The below is for structures without bit fields:

Yes, for a structure, the location of the first member is always
the same as the address of the structure. After the first member,
there may be some padding before the next member. The location
of the second member is always at an increasing address relative
to the first member (i.e., structures may not be built "downwards"
in memory.) After the second member, there may be some padding
before the next member. And so on, each member, then possibly
some padding. After the last member in the structure, there may be
some padding.

C does not specify anything about how much padding (if any) occurs
between any two members. Generally, the padding will be such that
each structure member occurs at an address which is a multiple of
the natural hardware alignment for that type of element. For example,
if you have a structure that has char x; int y; then
the char x will (as you noted) occur right at the beginning of the
structure, but then there would -usually- be a few bytes of padding
so that the int y is on an even address or an address that is a
multiple of 4 (or possibly a multiple of 8), with the exact padding
in this case depending on the hardware target involved. Beyond that,
compilers could put in extra padding for the sake of efficiency.
For example, if you had double P[1024]; double Q[1024]; then
even though there would not need to be any padding between P and Q
in order to get the start of Q aligned properly for a double,
a compiler could choose to put in some padding between them, perhaps
to improve cache performance. Two structures that listed exactly the
same types and sizes for the elements could potentially end up with
different padding, if the compiler decided to optimize the structures
in different ways based upon the way the code accessed the
structure members.

Thus, unless you have extra knowledge of the way your particular
compiler handles padding and optimization, you cannot (reliably) figure
out the amount of padding in a structure just by reading the structure
declaration.

And that's about it for the rules on structure allocation in memory:
location of the first member, increasing addresses for the others,
padding may occur between any two members, padding may occur at
the end of a structure, and you have no reliable way of predicting
the padding without additional knowledge of the compiler. So there
is no reliable way of "accessing the members yourself" except as
noted below:

In order to found out the distance between the start of a structure
and a particular member, you can use the offsetof() macro. The
distance will be the same for all instances of that particular
structure type, but as noted above, it may be different for another
structure that has exactly the same types in exactly the same order,
so you need to use offsetof() for each particular structure type;
you cannot {generally} say "Well, the offset was 28 bytes in
structure type G, and the same sequence of element types occurs
in structure type H, so the offset of the corresponding element in H
must be 28 as well."
 
M

Martien Verbruggen

out of curiosity I would like to know how a struct is represented in C
[snip]

to improve cache performance. Two structures that listed exactly the
same types and sizes for the elements could potentially end up with
different padding, if the compiler decided to optimize the structures
in different ways based upon the way the code accessed the
structure members.


Just out of curiosity, is there any compiler that anyone knows of that
actually would do this?

There is a lot of very common code around that uses type punning on
structs with identical initial layout, wich would break under a compiler
that wouldn't lay out those structs the same way.

I haven't (yet) tried to divine from the standard whether there is any
guarantee about identical initial layout for structs with identical
initial types and orders, but there seems to be quite some code out
there that simply assumes that there is such a guarantee.

Martien
 
W

Walter Roberson

Martien Verbruggen said:
I haven't (yet) tried to divine from the standard whether there is any
guarantee about identical initial layout for structs with identical
initial types and orders, but there seems to be quite some code out
there that simply assumes that there is such a guarantee.

C89 3.5.2.1 Structure and Union Specifiers
says:

Each non-bit-field member of a structure or union object
is aligned in an implementation-defined manner appropriate to
its type.

This could be construed as implying that only the types matter for
alignment purposes, and thus that identical type lists would get
identical alignments. However, that "implementation-defined manner"
leaves a lot of leeway.


C89 3.1.2.6 Compatible Type and Composite Type
says:

Moreover, two structure, union, or enumeration types declared in
seperate translation units are compatible if they have the same
number of members, the same member names, and compatible member types;
for two structures, the members shall be in the same order;
for two structures or unions, the bit-fields shall have the same
width; for two enumerations, the members shall have the same values.

This doesn't actually rule out differences within the -same- translation
unit, but as soon as you added another translation unit with the
same structure setup, the second unit would have to be compatible with
both declarations in the first, thus -effectively- ruling out
padding differences unless the compiler were able to prove through
fancy intra-procedure analysis that the compatability of the
structure types did not need to be maintained.


There is another constraint on padding different structures differently:

C89 3.3.2.3 Structure and Union Members
says

With one exception, if a member of a union object is accessed
after a value has been stored in a different member of the object,
the behavior is implementation-defined.[41] One special
guarantee is made in order to simplify the use of unions: If a
union contains several structures that share a common initial
sequence (see below), and if the union object currently contains
one of those structures, it is permitted to inspect the common
initial part of any of them. Two structure share a common initial
sequence if corresponding members have compatible types (and,
for bit-fields, the same widths) for a sequence of one or more
initial members.

{footnote} [41] The "byte orders" for scalar types are invisible
to isolated programs that do not indulge in type punning (for
example, by assigning to one member of a union and inspecting
the storage by accessing another member that is an appropriately
sized array of character type), but must be accounted for when
conforming to externally imposed storage layouts.


I wrote what I did about possible differences in padding with a
simple optimization in mind. Consider this code fragment:

struct demo { long A[2048]; long B[2048] };
demo foo;
int idx;
for (idx=0; idx<sizeof(foo.A)/sizeof(foo.A[0]); i++)
foo.A[idx] = 0;
for (idx=0; idx<sizeof(foo.A)/sizeof(foo.A[0]); i++)
foo.B[idx] = foo.A[idx];

Then on many architectures, foo.A[idx] and foo.B[idx] would share
cache lines, resulting in an inefficient program if A and B were
placed beside each other as implied by mere type alignment concerns.
However, if the compiler inserted unnamed padding of the same width
as a long (or possibly 2 longs on some system), between demo.A and
demo.B, then the cache line sharing of foo.A[idx] and foo.B[idx] would
be broken, resulting in a significantly more efficient program on
those architectures. If foo were a local variable whose address
was never taken (including through the address equivilence of
foo and foo.A), then the compiler would be able to deduce that
type compatability with other translation units was irrelevant and
could [it seems to me] insert padding for the sake of efficiency.
 
D

David Thompson

C89 3.1.2.6 Compatible Type and Composite Type
says:

Moreover, two structure, union, or enumeration types declared in
seperate translation units are compatible if they have the same
number of members, the same member names, and compatible member types;
for two structures, the members shall be in the same order;
for two structures or unions, the bit-fields shall have the same
width; for two enumerations, the members shall have the same values.
C99 6.2.7p1 rewords this slightly and additionally requires the tags
if any to be the same, and unnamed fields (which can only be intish :
0; for alignment) to be the same. It also allows a trivial exception
for incomplete (tag-only) types but that isn't relevant to layout.
This doesn't actually rule out differences within the -same- translation
unit, but as soon as you added another translation unit with the
same structure setup, the second unit would have to be compatible with
both declarations in the first, thus -effectively- ruling out
padding differences unless the compiler were able to prove through
fancy intra-procedure analysis that the compatability of the
structure types did not need to be maintained.
Concur. And I doubt any implementation would bother. Especially since
now in C99 the programmer can easily allow (and notate!) different
padding by specifying different tags -- and possibly request or
require it by some compiler-dependent means such as a #pragma.
There is another constraint on padding different structures differently:

C89 3.3.2.3 Structure and Union Members
says

With one exception, if a member of a union object is accessed
after a value has been stored in a different member of the object,
the behavior is implementation-defined.[41] One special
guarantee is made in order to simplify the use of unions: If a
union contains several structures that share a common initial
sequence (see below), and if the union object currently contains
one of those structures, it is permitted to inspect the common
initial part of any of them. Two structure share a common initial
sequence if corresponding members have compatible types (and,
for bit-fields, the same widths) for a sequence of one or more
initial members.

{footnote} [41] The "byte orders" for scalar types are invisible
to isolated programs that do not indulge in type punning (for
example, by assigning to one member of a union and inspecting
the storage by accessing another member that is an appropriately
sized array of character type), but must be accounted for when
conforming to externally imposed storage layouts.
In C99 the first sentence (and its footnote) are replaced by different
wording in 6.2.6.1p6,7 which makes the other members unspecified
(undocumented) rather than implementation-defined (documented);
but the remainder stays in 6.5.2.3p5 unchanged except for adding the
restriction 'anywhere that the declaration of the complete type of the
union is visible'.

<snip>
- formerly david.thompson1 || achar(64) || worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top