Structure byte padding rule

  • Thread starter Shivanand Kadwadkar
  • Start date
S

Shivanand Kadwadkar

can any one give rule behind the how structure byte padding works

Is it depends on machine word size or size of the largest data type or
something else.
 
B

bert

can any one give rule behind the how structure byte padding works

Is it depends on machine word size or size of the largest data type or
something else.

No. Whatever one implementation does,
another one can do quite differently,
so long as well-defined code (that is,
code that does not depend on how the
padding bytes are implemented) works
as it is expected to.
--
 
E

Eric Sosman

can any one give rule behind the how structure byte padding works

There can be padding after any element, including the last.
(Bit-field elements are special, and complicated, so let's just
ignore them -- besides, you can't point at them anyhow, so their
position within the larger struct doesn't matter much.)
Is it depends on machine word size or size of the largest data type or
something else.

The implementation is free to use as much or as little padding
as it wants, and to arrange the padding any way it wants, provided
any padding bytes come after struct elements (that is, there can be
no padding before the first element). Usually, an implementation
will insert the smallest amount of padding necessary to satisfy the
alignment requirements of the element's own type. For example, on
a system where a `double' is eight bytes long and must be aligned
on a four-byte boundary, the struct

struct s { char x; double y; char z; };

.... will probably have six padding bytes: three after `x' so that
`y' begins four bytes in (and will be four-byte-aligned if the
struct itself is), and another three after `z' so that in an array
of `struct s' objects the second array element will be four-byte-
aligned if the array itself is. If you want to discover how a given
implementation has padded a given struct, you can use the offsetof()
macro from <stddef.h>:

printf ("struct s takes %d bytes\n", (int)sizeof(struct s));
printf ("x starts %d bytes in\n", (int)offsetof(struct s, x));
printf ("y starts %d bytes in\n", (int)offsetof(struct s, y));
printf ("z starts %d bytes in\n", (int)offsetof(struct s, z));

However, the alignment requirements for various data types are
also entirely up to the implementation. Thus, different compilers
may pad the same source-code struct differently to satisfy their
differing alignment needs, and the values printed by this code may
differ from one system to another. (Except that the offset of `x'
will always be zero; no padding before the first element.)
 
B

BGB

can any one give rule behind the how structure byte padding works

Is it depends on machine word size or size of the largest data type or
something else.

as others, have noted, the specifics are somewhat compiler/target specific.


however, there are a few common "rules of thumb" (for compilers/targets
which use padding):
most base types have a power-of-2 size (note 1);
most base types require an alignment which is the same as their size
(note 2) often up to a certain limit (note 3);
....

note 1: except "long double", which even on x86, differs widely between
compilers and CPU mode. 80, 96, and 128 bit storage sizes exist, as well
as some compilers which simply treat them as double.

note 2: this is not always consistent, as targets may require an
alignment smaller than the size for some types. an example is in 32-bit
x86, where sometimes "long long" will only require a 32-bit alignment
despite being a 64 bit type, and other compilers will still align it to
64 bits out of principle.

note 3: often an architecture will only care about alignment up to a
certain point (such as the native word size, address size, or bus
width), and past this point no greater alignment is needed (even if
larger sizes may exist). for example, on x86 at present such limit is 16
bytes (128 bits), but this may change later if/when larger CPU registers
are added...


so, usual strategy:
for each struct member, it figures out the needed alignment, and the
current offset within the struct (directly following the prior member);
if the offset is not aligned, it is padded up to the needed alignment;
following the last member, the struct may be in-turn padded up to its
own needed alignment (so they can go nicely into arrays), which is
usually that of the greatest needed alignment within the struct.


or such...
 
S

sandeep

Eric said:
The implementation is free to use as much or as little padding
as it wants, and to arrange the padding any way it wants, provided any
padding bytes come after struct elements (that is, there can be no
padding before the first element). Usually, an implementation will
insert the smallest amount of padding necessary to satisfy the alignment
requirements of the element's own type. For example, on a system where
a `double' is eight bytes long and must be aligned on a four-byte
boundary, the struct

struct s { char x; double y; char z; };

... will probably have six padding bytes: three after `x' so that `y'
begins four bytes in (and will be four-byte-aligned if the struct itself
is), and another three after `z' so that in an array of `struct s'
objects the second array element will be four-byte- aligned if the array
itself is. If you want to discover how a given implementation has
padded a given struct, you can use the offsetof() macro from <stddef.h>:

printf ("struct s takes %d bytes\n", (int)sizeof(struct s)); printf ("x
starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y starts
%d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d bytes
in\n", (int)offsetof(struct s, z));

Unfortunately though, this code will invoke an undefined behavior on an
implementation where sizeof(struct s) is bigger than INTMAX. I would
advise using the %z argument to printf, this matches the return type of
sizeof() and offsetof() so no explicit casts will be needed.
 
K

Keith Thompson

sandeep said:
Unfortunately though, this code will invoke an undefined behavior on an
implementation where sizeof(struct s) is bigger than INTMAX. I would
advise using the %z argument to printf, this matches the return type of
sizeof() and offsetof() so no explicit casts will be needed.

A struct containing a char, a double, and a char is vanishingly
unlikely to exceed INT_MAX bytes.

But yes, using "%zu" would make the code a bit cleaner (assuming your
implementation supports it; not all do).
 
B

Ben Bacarisse

sandeep said:
Eric Sosman writes:

Unfortunately though, this code will invoke an undefined behavior on an
implementation where sizeof(struct s) is bigger than INTMAX.

It's not undefined behaviour -- it's implementation-defined.
I would
advise using the %z argument to printf,

Presumably you mean %zu. 'z' is just a length modifier.
this matches the return type of
sizeof() and offsetof() so no explicit casts will be needed.

If you don't have a C99 version of printf, the most portable solution is
to cast to unsigned long (so there is not even any implementation-
defined behaviour) and use %lu as the format.

However (as I am sure you know) even this advice is over the top for the
code in question!
 
K

Keith Thompson

Ben Bacarisse said:
It's not undefined behaviour -- it's implementation-defined.
[...]

An overflowing conversion to a signed type either yields an
implementation-defined result or raises an implementation-defined signal
(C99 6.3.1.3p3). The consequences of raising an implementation-defined
signal are (at least potentially) undefined.

The permission to raise a signal is new in C99, and I've never
heard of any compiler taking advantage of it.
 
B

BGB

A struct containing a char, a double, and a char is vanishingly
unlikely to exceed INT_MAX bytes.

But yes, using "%zu" would make the code a bit cleaner (assuming your
implementation supports it; not all do).

a struct exceeding INT_MAX bytes on any "reasonable" architecture seems
itself exceedingly unlikely...

on a 16-bit target, having a struct this large would be itself a problem
(yes, yes, say on DOS one could have a far pointer and a 64kB struct,
but how likely is this?...).

on most 32-bit systems, this can't practically happen (would need a 2GB
struct, which would have problems fitting into most address spaces).

on 64-bit systems, it could happen, but seriously, how likely is it in
the near future that there will be >=2GB structs?...

unless, maybe:
struct foo_s
{
int arr[1000][1000][1000];
};


more subtly, there is the issue of if existing 64-bit systems have
memory managers which allow objects this large? (such as via
malloc/free...).

or, additionally, the last time I did a multi-GB memory allocation (on
64-bit Windows, via "VirtualAlloc()"...), the computer lagged so hard
(due to swapping) that I worried a crash was likely (although, I changed
it to not use COMMIT on the memory, and problem fixed...).


OT:

mostly though this was for a region for my "code/data/bss heap":
basically, for dynamically generated machine code, which has a +-2GB limit.
x86-64 doesn't allow direct 64-bit memory addressing or jumps, meaning
one either has to load addresses into a register and use an indirect
addressing, or use the new RIP-relative addressing and live with a +-2GB
limit, or have all code/data/bss sections within the lower 4GB.

but, if one uses a single 2GB region, they can assure that any local
accesses will be within the +-2GB window, and thus use the cheaper
direct addressing (non-local calls then being handled via trampoline
thunks, and non-local global variables being assumed to be invalid).
 
B

Ben Bacarisse

Keith Thompson said:
Ben Bacarisse said:
It's not undefined behaviour -- it's implementation-defined.
[...]

An overflowing conversion to a signed type either yields an
implementation-defined result or raises an implementation-defined signal
(C99 6.3.1.3p3). The consequences of raising an implementation-defined
signal are (at least potentially) undefined.

I don't see how except as a rather extreme reading the standard. The
implementation-defined signal must be "set" to either SIG_IGN or
SIG_DFL. The SIG_IGN case is well-defined; that of SIG_DFL says that
"default handling for that signal will occur". That's maybe a bit vague
but J.3.2 says of implementation-defined behaviour that "[t]he set of
signals, their semantics, and their default handling" must be
documented.

Of course, you could say that the implementation may document the
default handling as being "undefined behaviour" but seems to me to be a
perverse interpretation. In effect it requires that implementation-
defined behaviour may be defined as undefined!

<snip>
 
T

Tim Rentsch

Keith Thompson said:
Ben Bacarisse said:
It's not undefined behaviour -- it's implementation-defined.
[...]

An overflowing conversion to a signed type [...snip...]

Nit: an out-of-range conversion. "Overflow", as used in
the Standard, is something else (admittedly similar but
still something else).
 
T

Tim Rentsch

Ben Bacarisse said:
Keith Thompson said:
Ben Bacarisse said:
Eric Sosman writes:
<snip>
struct s { char x; double y; char z; };
<snip>
printf ("struct s takes %d bytes\n", (int)sizeof(struct s));
printf ("x
starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y
starts
%d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d
bytes
in\n", (int)offsetof(struct s, z));

Unfortunately though, this code will invoke an undefined behavior on an
implementation where sizeof(struct s) is bigger than INTMAX.

It's not undefined behaviour -- it's implementation-defined.
[...]

An overflowing conversion to a signed type either yields an
implementation-defined result or raises an implementation-defined signal
(C99 6.3.1.3p3). The consequences of raising an implementation-defined
signal are (at least potentially) undefined.

I don't see how except as a rather extreme reading the standard.
[snip elaboration]

Because, for example, an implementation can choose to specify
the behavior of the default signal handler by giving a
function body that would exhibit undefined behavior in some
code paths under some conditions (such as trying to convert
the bit pattern corresponding to negative zero on a machine
that uses ones complement but doesn't support negative
zeroes).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top