[... a question about the size and padding of a struct
"defined" by uncompilable code ...]
I have not given a compilable code. I am just
asking the size of the struct given.
How big is this array:
int array<7>;
? In other words, if the code describing your struct won't even
compile, then you have not "given" a struct at all. If there is
no struct, it has no size and no padding -- and no existence.
Understood. How about below:
int main(void) {
struct test {
char a;
int b;
short c;
};
printf("%d\n", sizeof(struct test));
return 0;
}
I completely understand what will be the size of the struct with and
without mapping but the question is what parameters decides the padding
involved? Such as size of registers or size of address/data bus of the
processor?
Padding involved is determined by the "ABI" rules for the given architecture.
It has grave impact for the interoperability of programs, especially ina mixed
environment either with multiple compilers for C or C-related dialects, and
other languages that need to "bind" to C interfaces.
The width of the address or data bus of the processor is a very low-level
implementation detail on the actual silicon die, and is largely irrelevant.
Moreover, there is more than noe bus. Are you talking about the connection
betwen the L1 cache and L2 cache? A processor may read an entire cache line (or
several of them in burst mode) at a time from main memory nowadays; that
doesn't mean we align every structure member to a cache line.
ABI rules also span multiple implementations of an architecture. If we are
compiling for 32 bit x86, we might tell the compiler to optimize for a 386,
486, Pentium, i7 or whatever, but the layout of the structures should be
interoperable across the family.
I think how it will work on GCC targetting 32 bit Intel is this.
The int will be padded so that it is aligned to an offset divisible by 4,
and the short will be padded so that it is aligned to an offset divisible by 2.
The reason for this is not that the alignment must be there, because processors
in this family support unaligned reads. It's for efficiency of access.
Even processors that can read a word at any byte address stil read it faster
if the address is aligned.
So there is a byte for "a", then three padding bytes. Then "b" is placed,
occupying four bytes, bringing us to offset 8. This is divisible by two, so "c"
is placed there taking two bytes, for a total of ten.
If we add a second "char a2" after "char a", it should go into the padding.
Generally, if a type with a weaker alignment is placed after a type with a
stronger alignment, it shouldn't need alignment. Conversely,
if a type with stronger alignment is placed after one with weaker alignment,
then it may require padding up to an offset that is a multiple of its
alignment.
Furthermore, there may be additional padding at the end of a structure, to
support the notion that structures can be combined together to form an array,
whereby the padding at the end of element [n] establishes the alignment of the
first member of element[n+1].
Thus a structure which is like this { int a; char b; } might have,
depending no architectural details, some bytes of padding after the b, so that
the overall size is divisible by sizeof(int). On Intel x86, I might expect
no padding between a and b, and three bytes after b.
These are very general concepts and the deatils vary quite a lot among
architectures.
For some architectures, the vendors or other organizations who standardize the
architectures, develop a set of documents which specify the ABI. The documents
dicate everything from how structures are laid out, to what stack frames look
like, what registers are used for what, how arguments are passed between
functions and so on. If there is such a body of standards, then generally the
compiler implementor follows that.