can any one give rule behind the how structure byte padding works

Is it depends on machine word size or size of the largest data type or
something else.

2. ### bertGuest

<> wrote:
> can any one give rule behind the how structure byte padding works
>
> Is it depends on machine word size or size of the largest data type or
> something else.

No. Whatever one implementation does,
another one can do quite differently,
so long as well-defined code (that is,
code that does not depend on how the
as it is expected to.
--

bert, Feb 24, 2011

3. ### Eric SosmanGuest

> can any one give rule behind the how structure byte padding works

There can be padding after any element, including the last.
(Bit-field elements are special, and complicated, so let's just
ignore them -- besides, you can't point at them anyhow, so their
position within the larger struct doesn't matter much.)

> Is it depends on machine word size or size of the largest data type or
> something else.

The implementation is free to use as much or as little padding
as it wants, and to arrange the padding any way it wants, provided
any padding bytes come after struct elements (that is, there can be
no padding before the first element). Usually, an implementation
will insert the smallest amount of padding necessary to satisfy the
alignment requirements of the element's own type. For example, on
a system where a `double' is eight bytes long and must be aligned
on a four-byte boundary, the struct

struct s { char x; double y; char z; };

.... will probably have six padding bytes: three after `x' so that
`y' begins four bytes in (and will be four-byte-aligned if the
struct itself is), and another three after `z' so that in an array
of `struct s' objects the second array element will be four-byte-
aligned if the array itself is. If you want to discover how a given
implementation has padded a given struct, you can use the offsetof()
macro from <stddef.h>:

printf ("struct s takes %d bytes\n", (int)sizeof(struct s));
printf ("x starts %d bytes in\n", (int)offsetof(struct s, x));
printf ("y starts %d bytes in\n", (int)offsetof(struct s, y));
printf ("z starts %d bytes in\n", (int)offsetof(struct s, z));

However, the alignment requirements for various data types are
also entirely up to the implementation. Thus, different compilers
may pad the same source-code struct differently to satisfy their
differing alignment needs, and the values printed by this code may
differ from one system to another. (Except that the offset of `x'
will always be zero; no padding before the first element.)

--
Eric Sosman
d

Eric Sosman, Feb 24, 2011
4. ### BGBGuest

> can any one give rule behind the how structure byte padding works
>
> Is it depends on machine word size or size of the largest data type or
> something else.

as others, have noted, the specifics are somewhat compiler/target specific.

however, there are a few common "rules of thumb" (for compilers/targets
most base types have a power-of-2 size (note 1);
most base types require an alignment which is the same as their size
(note 2) often up to a certain limit (note 3);
....

note 1: except "long double", which even on x86, differs widely between
compilers and CPU mode. 80, 96, and 128 bit storage sizes exist, as well
as some compilers which simply treat them as double.

note 2: this is not always consistent, as targets may require an
alignment smaller than the size for some types. an example is in 32-bit
x86, where sometimes "long long" will only require a 32-bit alignment
despite being a 64 bit type, and other compilers will still align it to
64 bits out of principle.

note 3: often an architecture will only care about alignment up to a
certain point (such as the native word size, address size, or bus
width), and past this point no greater alignment is needed (even if
larger sizes may exist). for example, on x86 at present such limit is 16
bytes (128 bits), but this may change later if/when larger CPU registers

so, usual strategy:
for each struct member, it figures out the needed alignment, and the
current offset within the struct (directly following the prior member);
if the offset is not aligned, it is padded up to the needed alignment;
following the last member, the struct may be in-turn padded up to its
own needed alignment (so they can go nicely into arrays), which is
usually that of the greatest needed alignment within the struct.

or such...

BGB, Feb 24, 2011
5. ### sandeepGuest

Eric Sosman writes:
> The implementation is free to use as much or as little padding
> as it wants, and to arrange the padding any way it wants, provided any
> padding bytes come after struct elements (that is, there can be no
> padding before the first element). Usually, an implementation will
> insert the smallest amount of padding necessary to satisfy the alignment
> requirements of the element's own type. For example, on a system where
> a `double' is eight bytes long and must be aligned on a four-byte
> boundary, the struct
>
> struct s { char x; double y; char z; };
>
> ... will probably have six padding bytes: three after `x' so that `y'
> begins four bytes in (and will be four-byte-aligned if the struct itself
> is), and another three after `z' so that in an array of `struct s'
> objects the second array element will be four-byte- aligned if the array
> itself is. If you want to discover how a given implementation has
> padded a given struct, you can use the offsetof() macro from <stddef.h>:
>
> printf ("struct s takes %d bytes\n", (int)sizeof(struct s));

printf ("x
> starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y

starts
> %d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d

bytes
> in\n", (int)offsetof(struct s, z));

Unfortunately though, this code will invoke an undefined behavior on an
implementation where sizeof(struct s) is bigger than INTMAX. I would
advise using the %z argument to printf, this matches the return type of
sizeof() and offsetof() so no explicit casts will be needed.

sandeep, Feb 24, 2011
6. ### Keith ThompsonGuest

sandeep <> writes:
> Eric Sosman writes:
>> The implementation is free to use as much or as little padding
>> as it wants, and to arrange the padding any way it wants, provided any
>> padding bytes come after struct elements (that is, there can be no
>> padding before the first element). Usually, an implementation will
>> insert the smallest amount of padding necessary to satisfy the alignment
>> requirements of the element's own type. For example, on a system where
>> a `double' is eight bytes long and must be aligned on a four-byte
>> boundary, the struct
>>
>> struct s { char x; double y; char z; };
>>
>> ... will probably have six padding bytes: three after `x' so that `y'
>> begins four bytes in (and will be four-byte-aligned if the struct itself
>> is), and another three after `z' so that in an array of `struct s'
>> objects the second array element will be four-byte- aligned if the array
>> itself is. If you want to discover how a given implementation has
>> padded a given struct, you can use the offsetof() macro from <stddef.h>:
>>
>> printf ("struct s takes %d bytes\n", (int)sizeof(struct s));

> printf ("x
>> starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y

> starts
>> %d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d

> bytes
>> in\n", (int)offsetof(struct s, z));

>
> Unfortunately though, this code will invoke an undefined behavior on an
> implementation where sizeof(struct s) is bigger than INTMAX. I would
> advise using the %z argument to printf, this matches the return type of
> sizeof() and offsetof() so no explicit casts will be needed.

A struct containing a char, a double, and a char is vanishingly
unlikely to exceed INT_MAX bytes.

But yes, using "%zu" would make the code a bit cleaner (assuming your
implementation supports it; not all do).

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Feb 24, 2011
7. ### Ben BacarisseGuest

sandeep <> writes:

> Eric Sosman writes:

<snip>
>> struct s { char x; double y; char z; };

<snip>
>> printf ("struct s takes %d bytes\n", (int)sizeof(struct s));

> printf ("x
>> starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y

> starts
>> %d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d

> bytes
>> in\n", (int)offsetof(struct s, z));

>
> Unfortunately though, this code will invoke an undefined behavior on an
> implementation where sizeof(struct s) is bigger than INTMAX.

It's not undefined behaviour -- it's implementation-defined.

> I would
> advise using the %z argument to printf,

Presumably you mean %zu. 'z' is just a length modifier.

> this matches the return type of
> sizeof() and offsetof() so no explicit casts will be needed.

If you don't have a C99 version of printf, the most portable solution is
to cast to unsigned long (so there is not even any implementation-
defined behaviour) and use %lu as the format.

However (as I am sure you know) even this advice is over the top for the
code in question!

--
Ben.

Ben Bacarisse, Feb 24, 2011
8. ### Keith ThompsonGuest

Ben Bacarisse <> writes:
> sandeep <> writes:
>
>> Eric Sosman writes:

> <snip>
>>> struct s { char x; double y; char z; };

> <snip>
>>> printf ("struct s takes %d bytes\n", (int)sizeof(struct s));

>> printf ("x
>>> starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y

>> starts
>>> %d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d

>> bytes
>>> in\n", (int)offsetof(struct s, z));

>>
>> Unfortunately though, this code will invoke an undefined behavior on an
>> implementation where sizeof(struct s) is bigger than INTMAX.

>
> It's not undefined behaviour -- it's implementation-defined.

[...]

An overflowing conversion to a signed type either yields an
implementation-defined result or raises an implementation-defined signal
(C99 6.3.1.3p3). The consequences of raising an implementation-defined
signal are (at least potentially) undefined.

The permission to raise a signal is new in C99, and I've never
heard of any compiler taking advantage of it.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Feb 24, 2011
9. ### BGBGuest

On 2/24/2011 3:33 PM, Keith Thompson wrote:
> sandeep<> writes:
>> Eric Sosman writes:
>>> The implementation is free to use as much or as little padding
>>> as it wants, and to arrange the padding any way it wants, provided any
>>> padding bytes come after struct elements (that is, there can be no
>>> padding before the first element). Usually, an implementation will
>>> insert the smallest amount of padding necessary to satisfy the alignment
>>> requirements of the element's own type. For example, on a system where
>>> a `double' is eight bytes long and must be aligned on a four-byte
>>> boundary, the struct
>>>
>>> struct s { char x; double y; char z; };
>>>
>>> ... will probably have six padding bytes: three after `x' so that `y'
>>> begins four bytes in (and will be four-byte-aligned if the struct itself
>>> is), and another three after `z' so that in an array of `struct s'
>>> objects the second array element will be four-byte- aligned if the array
>>> itself is. If you want to discover how a given implementation has
>>> padded a given struct, you can use the offsetof() macro from<stddef.h>:
>>>
>>> printf ("struct s takes %d bytes\n", (int)sizeof(struct s));

>> printf ("x
>>> starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y

>> starts
>>> %d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d

>> bytes
>>> in\n", (int)offsetof(struct s, z));

>>
>> Unfortunately though, this code will invoke an undefined behavior on an
>> implementation where sizeof(struct s) is bigger than INTMAX. I would
>> advise using the %z argument to printf, this matches the return type of
>> sizeof() and offsetof() so no explicit casts will be needed.

>
> A struct containing a char, a double, and a char is vanishingly
> unlikely to exceed INT_MAX bytes.
>
> But yes, using "%zu" would make the code a bit cleaner (assuming your
> implementation supports it; not all do).
>

a struct exceeding INT_MAX bytes on any "reasonable" architecture seems
itself exceedingly unlikely...

on a 16-bit target, having a struct this large would be itself a problem
(yes, yes, say on DOS one could have a far pointer and a 64kB struct,
but how likely is this?...).

on most 32-bit systems, this can't practically happen (would need a 2GB
struct, which would have problems fitting into most address spaces).

on 64-bit systems, it could happen, but seriously, how likely is it in
the near future that there will be >=2GB structs?...

unless, maybe:
struct foo_s
{
int arr[1000][1000][1000];
};

more subtly, there is the issue of if existing 64-bit systems have
memory managers which allow objects this large? (such as via
malloc/free...).

or, additionally, the last time I did a multi-GB memory allocation (on
64-bit Windows, via "VirtualAlloc()"...), the computer lagged so hard
(due to swapping) that I worried a crash was likely (although, I changed
it to not use COMMIT on the memory, and problem fixed...).

OT:

mostly though this was for a region for my "code/data/bss heap":
basically, for dynamically generated machine code, which has a +-2GB limit.
x86-64 doesn't allow direct 64-bit memory addressing or jumps, meaning
one either has to load addresses into a register and use an indirect
addressing, or use the new RIP-relative addressing and live with a +-2GB
limit, or have all code/data/bss sections within the lower 4GB.

but, if one uses a single 2GB region, they can assure that any local
accesses will be within the +-2GB window, and thus use the cheaper
direct addressing (non-local calls then being handled via trampoline
thunks, and non-local global variables being assumed to be invalid).

BGB, Feb 24, 2011
10. ### Ben BacarisseGuest

Keith Thompson <> writes:

> Ben Bacarisse <> writes:
>> sandeep <> writes:
>>
>>> Eric Sosman writes:

>> <snip>
>>>> struct s { char x; double y; char z; };

>> <snip>
>>>> printf ("struct s takes %d bytes\n", (int)sizeof(struct s));
>>> printf ("x
>>>> starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y
>>> starts
>>>> %d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d
>>> bytes
>>>> in\n", (int)offsetof(struct s, z));
>>>
>>> Unfortunately though, this code will invoke an undefined behavior on an
>>> implementation where sizeof(struct s) is bigger than INTMAX.

>>
>> It's not undefined behaviour -- it's implementation-defined.

> [...]
>
> An overflowing conversion to a signed type either yields an
> implementation-defined result or raises an implementation-defined signal
> (C99 6.3.1.3p3). The consequences of raising an implementation-defined
> signal are (at least potentially) undefined.

I don't see how except as a rather extreme reading the standard. The
implementation-defined signal must be "set" to either SIG_IGN or
SIG_DFL. The SIG_IGN case is well-defined; that of SIG_DFL says that
"default handling for that signal will occur". That's maybe a bit vague
but J.3.2 says of implementation-defined behaviour that "[t]he set of
signals, their semantics, and their default handling" must be
documented.

Of course, you could say that the implementation may document the
default handling as being "undefined behaviour" but seems to me to be a
perverse interpretation. In effect it requires that implementation-
defined behaviour may be defined as undefined!

<snip>
--
Ben.

Ben Bacarisse, Feb 24, 2011
11. ### Tim RentschGuest

Keith Thompson <> writes:

> Ben Bacarisse <> writes:
>> sandeep <> writes:
>>
>>> Eric Sosman writes:

>> <snip>
>>>> struct s { char x; double y; char z; };

>> <snip>
>>>> printf ("struct s takes %d bytes\n", (int)sizeof(struct s));
>>> printf ("x
>>>> starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y
>>> starts
>>>> %d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d
>>> bytes
>>>> in\n", (int)offsetof(struct s, z));
>>>
>>> Unfortunately though, this code will invoke an undefined behavior on an
>>> implementation where sizeof(struct s) is bigger than INTMAX.

>>
>> It's not undefined behaviour -- it's implementation-defined.

> [...]
>
> An overflowing conversion to a signed type [...snip...]

Nit: an out-of-range conversion. "Overflow", as used in
the Standard, is something else (admittedly similar but
still something else).

Tim Rentsch, Mar 11, 2011
12. ### Tim RentschGuest

Ben Bacarisse <> writes:

> Keith Thompson <> writes:
>
>> Ben Bacarisse <> writes:
>>> sandeep <> writes:
>>>
>>>> Eric Sosman writes:
>>> <snip>
>>>>> struct s { char x; double y; char z; };
>>> <snip>
>>>>> printf ("struct s takes %d bytes\n", (int)sizeof(struct s));
>>>> printf ("x
>>>>> starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y
>>>> starts
>>>>> %d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d
>>>> bytes
>>>>> in\n", (int)offsetof(struct s, z));
>>>>
>>>> Unfortunately though, this code will invoke an undefined behavior on an
>>>> implementation where sizeof(struct s) is bigger than INTMAX.
>>>
>>> It's not undefined behaviour -- it's implementation-defined.

>> [...]
>>
>> An overflowing conversion to a signed type either yields an
>> implementation-defined result or raises an implementation-defined signal
>> (C99 6.3.1.3p3). The consequences of raising an implementation-defined
>> signal are (at least potentially) undefined.

>
> I don't see how except as a rather extreme reading the standard.
> [snip elaboration]

Because, for example, an implementation can choose to specify
the behavior of the default signal handler by giving a
function body that would exhibit undefined behavior in some
code paths under some conditions (such as trying to convert
the bit pattern corresponding to negative zero on a machine
that uses ones complement but doesn't support negative
zeroes).

Tim Rentsch, Mar 11, 2011