padding?

Z

Zach

I looked in the index of K&R and couldn't find anything on padding.
Could someone please explain what padding is in C programming and
illustrate it with some code. I heard it is often used in constructing
network packets.

Zach
 
S

Squeamizh

Zach said:
I looked in the index of K&R and couldn't find anything on
padding. Could someone please explain what padding is in C
programming and illustrate it with some code. I heard it is often
used in constructing network packets.

Padding is simply a region of data storage that it is convenient to
"waste". Precisely whom is convenienced depends on the situation.
[...]

The tm_yday field in a struct tm is essentially padding as far as
many people are concerned (but those who /do/ use it would
disagree!).

Then I would strongly suggest that it isn't padding. Correct me if
I'm wrong; you seem to be saying that padding is any data that you
don't find useful in a particular situation.
 
S

Spiros Bousbouras

In C, padding is often used by compilers between struct members, for
example:

struct s
{
char c;
<- compiler might insert 7 padding bytes here
double d;

};

this because the target CPU might have alignment requirements on double
access, or that double access is faster when e.g. aligned to e.g. 8
byte boundery.

Hence in C, you can <no> assume that

struct s mys;

memcpy(&mys, ...);

will work, due to potential padding bytes. To load some data into a
struct, you need to do that struct member for for struct member, unless
using a non-standard pragma for struct packing.

What won't work ? It's not clear what you have in mind for "...".

If structures s1 and s2 are of the same type then you can do s1=s2
or memmove(&s1 , &s2 , sizeof(s2)) Both will "load some data"
to s1 without explicitly assigning to each member.
 
I

Ian Collins

Tor said:
Hence in C, you can <no> assume that

struct s mys;

memcpy(&mys, ...);

will work, due to potential padding bytes. To load some data into a
struct, you need to do that struct member for for struct member, unless
using a non-standard pragma for struct packing.

Or use a static initialiser, or even just a plain old copy.
 
B

Bjarni Juliusson

Tor said:
I had a buffer in mind, which is the typical case when a network packet
arrives or you read some data from disk.

Put another way, if you have a buffer

char buf[1024];

which has some received network packet in it, you can't do

struct packet_header *header=(struct packet_header *)buf;

and start reading the members of header and expect everything to be
fine. The packet has all members aligned according to the network
protocol, probably with very little padding, and the compiler probably
aligns for instance ints and chars differently, inserting padding in
between them or even reordering them inside the struct. There are no
guarantees (well almost no guarantees, go read the standard).

So, as an example, say you have a network protocol with a header that
contains an 8 bit packet type and a 16 bit payload length. It probably
looks like this:

offset 0: type [byte]
offset 1: length [high byte] [low byte]

Now if you declare a struct for that as

struct header{
char type;
short length;
};

on a little-endian 32 bit computer with no native 16 bit data type and a
32 bit alignment restriction, you'll probably get a struct that looks
like this in memory:

offset 0: type [byte]
offset 1: [padding] [padding] [padding]
offset 4: length [low byte] [2:nd byte]
[3:rd byte] [high byte]

If you set a pointer like that to the beginning of the buffer and try to
read the members, you'll get the right packet type, but the length will
be outside of the header data and might segfault your program, or it
might point to the second byte of the payload, or it might point to
something else entirely.

Without the padding, the length member of the struct would start at the
right place but would still read four bytes instead if the two defined
in the protocol and present in the buffer, and on top of that they would
be in the wrong order.

So that was a quick lesson in padding and network byte order.


Bjarni
 
S

Spiros Bousbouras

The packet has all members aligned according to the network
protocol, probably with very little padding, and the compiler probably
aligns for instance ints and chars differently, inserting padding in
between them or even reordering them inside the struct.

A compiler cannot reorder the fields of a structure. Paragraph 5 of
6.5.8 says:

When two pointers are compared, the result depends on the
relative locations in the address space of the objects
pointed to.
[...]
If the objects pointed to are members of the same aggregate
object, pointers to structure members declared later compare
greater than pointers to members declared earlier in the
structure,
 
B

Bjarni Juliusson

Spiros said:
The packet has all members aligned according to the network
protocol, probably with very little padding, and the compiler probably
aligns for instance ints and chars differently, inserting padding in
between them or even reordering them inside the struct.

A compiler cannot reorder the fields of a structure. Paragraph 5 of
6.5.8 says:

When two pointers are compared, the result depends on the
relative locations in the address space of the objects
pointed to.
[...]
If the objects pointed to are members of the same aggregate
object, pointers to structure members declared later compare
greater than pointers to members declared earlier in the
structure,

You are right, I apologise. In fact, it is stated more clearly in
paragraph 13 of 6.7.2.1:

Within a structure object, the [...] members [...] have
addresses that increase in the order in which they are declared.

I misremembered, and thought the only guarantee was that the first
element always ended up first in memory.

Can anyone tell me what the rationale is? It seems to me like it might
be sensible to, say, take all the char members in a struct and pack them
together at the end to preserve alignment of any int members without
lots of padding.


Bjarni
 
S

Stephen Sprunk

Bjarni said:
You are right, I apologise. In fact, it is stated more clearly in
paragraph 13 of 6.7.2.1:

Within a structure object, the [...] members [...] have
addresses that increase in the order in which they are declared.

I misremembered, and thought the only guarantee was that the first
element always ended up first in memory.

Can anyone tell me what the rationale is? It seems to me like it might
be sensible to, say, take all the char members in a struct and pack them
together at the end to preserve alignment of any int members without
lots of padding.

It flows partially from the requirement that any initial member types
common to two structs be laid out in the same order and at the same
offsets, which is used widely for crude polymorphism. The compiler
can't know when compiling unit A what other structs will be in unit B
(which may not even be written yet) and how many, if any, of their
initial members may be in common. Therefore, the only reordering of
members that _could_ potentially be allowed is fitting later members
into the padding between earlier elements.

Once you're going to put the above restriction on reordering, you don't
lose much if you ban reordering entirely and therefore comply with the
Rule of Least Surprise -- the programmer put the members in a particular
order, so it would be logical for him to expect them to be laid out that
way in memory. If he cares about the padding (which is, in most cases,
a micro-optimization and therefore Evil(tm)), he can reorder them
himself to minimize it.

More importantly, though, I suspect that all known compilers at the time
followed this rule already, so it was probably a matter of C89
formalizing the behavior to keep future implementations from doing
something that wouldn't be compatible. Remember, ANSI's primary goal
was to standardize existing practice, not to create an ideal language.

S
 
R

Rainer Weikusat

Richard Heathfield said:
Squeamizh said:


...but that /someone/ or /something/ finds useful, and thus we can't
just leave the padding out. Yes. That's not a formal definition,
obviously, but it seems to me to be a very pragmatic way of looking
at padding.

It is nevertheless wrong: 'Padding' is additional storage beyond the
one necessary to hold, say, the data values of a C struct, which is
used to achieve some effect beyond what is specified in the
C-standard, typically, to conform to ABI-requirements regarding
alignment of objects of a particular size (for instance, '4-byte
integers must always be stored at addresses evenly divisble by four')
in order to be able to generate more efficient machine code (for
instance, because properly aligned 4-byte values can be manipulated
with machine instructions operating on 'words' of data). This is
something different than 'data members someone may consider to be
useless' (and hence, 'a waste of space').
 
G

Giorgos Keramidas

You are right, I apologise. In fact, it is stated more clearly in
paragraph 13 of 6.7.2.1:

Within a structure object, the [...] members [...] have
addresses that increase in the order in which they are declared.

I misremembered, and thought the only guarantee was that the first
element always ended up first in memory.

Can anyone tell me what the rationale is? It seems to me like it might
be sensible to, say, take all the char members in a struct and pack
them together at the end to preserve alignment of any int members
without lots of padding.

For a struct with no aggregate members, this may not be quite as useful
(but then I may also be missing an important detail behind the rule).
But it seems quite sensible for code that includes structs inside other
structs, i.e.:

struct methods;

struct object {
long magic;
size_t size;
long type;
struct methods *fptr;
size_t nfptr;
};

struct myobject {
struct object parent;
char mydata[10];
};

This is commonly used to 'tag' structures of different types. If the
compiler was allowed to reorder the fields of `myobject', it would be
quite unpredictable where myobject.parent would end up.
 
B

Boon

Richard said:
Padding in network packets: there's a byte of padding at the end of
the TCP header, purely so that the data is four-byte-aligned. For
some machines, this is an advantage (although the cynical part of
me suspects it was put there purely to make the diagram look
neater). The TCP protocol insists that this padding is set to 0.

You lost me.

http://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_segment_structure

Assuming no options (i.e. header size = 20 octets), the last two fields of a TCP
header are Checksum and Urgent pointer.

"Urgent pointer (16 bits) – if the URG flag is set, then this 16-bit field is an
offset from the sequence number indicating the last urgent data byte"

Perhaps you were thinking of headers with options?
(In that case, padding might be needed.)
Note also that six /other/ bits of the header are unused, but
reserved for future use. ("Reserved for future use" very often
means the same as "padding"!)

Only 4 bits are reserved for future use and should be set to zero.
CWR and ECE were defined in 2001.

http://tools.ietf.org/html/rfc3168

Regards.
 
K

karthikbalaguru

I looked in the index of K&R and couldn't find anything on padding.
Could someone please explain what padding is in C programming and
illustrate it with some code. I heard it is often used in constructing
network packets.

Padding is used for Boundary Alignment w.r.t processor.
It is very important to take care of boundary
alignment while designing database's in C language in embedded
environment as it improves the performance and also in saving
the memory. So, Database Design is very important.

Karthik Balaguru
 
R

Richard Bos

Rainer Weikusat said:
It is nevertheless wrong: 'Padding' is additional storage beyond the
one necessary to hold, say, the data values of a C struct, which is
used to achieve some effect beyond what is specified in the
C-standard, typically, to conform to ABI-requirements regarding
alignment of objects of a particular size (for instance, '4-byte
integers must always be stored at addresses evenly divisble by four')
in order to be able to generate more efficient machine code (for
instance, because properly aligned 4-byte values can be manipulated
with machine instructions operating on 'words' of data). This is
something different than 'data members someone may consider to be
useless' (and hence, 'a waste of space').

What's more, padding is never a data member at all. The defining
characteristic of padding is that it exists because of the space it
takes, not because of the value it may or may even never have. Any data
member, even a data member only few people find useful, should at some
point in its life have a value that those people want to refer to.
Padding may change randomly or never at all, and you can blot over it
without affecting anyone.

Richard
 
E

Eric Sosman

Richard said:
What's more, padding is never a data member at all. The defining
characteristic of padding is that it exists because of the space it
takes, not because of the value it may or may even never have. Any data
member, even a data member only few people find useful, should at some
point in its life have a value that those people want to refer to.
Padding may change randomly or never at all, and you can blot over it
without affecting anyone.

So I guess you'd say "unused; must be zero" bits are
not padding?
 
T

Tim Rentsch

What's more, padding is never a data member at all. The defining
characteristic of padding is that it exists because of the space it
takes, not because of the value it may or may even never have. Any data
member, even a data member only few people find useful, should at some
point in its life have a value that those people want to refer to.
Padding may change randomly or never at all, and you can blot over it
without affecting anyone.

In

struct s {
unsigned foo : 4;
unsigned : 12;
unsigned bas : 16;
};

would you say the data member between foo and bas is there as
padding? If we have

struct s x, y;
memset( &y, 0, sizeof y );
x = y;

is the unnamed bit-field member guaranteed to hold zeroes, or
not? If it is not guaranteed to hold zeroes (because structure
assignments are not required to copy padding bits) does that mean
it's illegal to put an unnamed bit-field member at the start of a
structure (because structures are not allowed to have padding at
the beginning)? Or is this a case of a data member that exists
just because of the space it takes, yet is not padding? But if
the values of unnamed bit-fields are supposed to be useful (ie,
and not padding), why are they indeterminate even after
initialization? (6.7.8 p 9)
 
E

Eric Sosman

Mark said:
I believe the original context was with respect to structs, not unused
bits in a bitfield object.

The original question mentioned padding in connection
with "constructing network packets." True, it was only
in an "I heard ..." context, but a structs-only view of
the discussion seems a bit restrictive.
 
R

Richard Bos

Eric Sosman said:
So I guess you'd say "unused; must be zero" bits are
not padding?

That depends on whether that is a political or a technical "must". If
there is reasonable expectation that it may in the future be used, and
can then get other values, it's not. If it's only required to be zero
out of a show of future planning, it's padding. I admit that this may,
for an outsider, be difficult to judge.

Richard
 
R

Richard Bos

Tim Rentsch said:
In

struct s {
unsigned foo : 4;
unsigned : 12;
unsigned bas : 16;
};

would you say the data member between foo and bas is there as
padding?

Since you can't access its value at all (stupid tricks with unsigned
char pointers aside), and that value is therefore irrelevant, yes.
If we have

struct s x, y;
memset( &y, 0, sizeof y );
x = y;

is the unnamed bit-field member guaranteed to hold zeroes, or
not?

In y, yes, but they're irrelevant; in x, they're not even guaranteed to
be zero.
(because structures are not allowed to have padding at the beginning)?

Structures are not allowed to have _implementation-inserted_ padding
_bytes_ at the beginning. You, as the user-programmer, are allowed to
add as much padding of your own as pleases you. There is nothing in the
Standard to stop you from declaring

struct t {
unsigned char padding[37];
long int single_data_member;
unsigned char more_padding[51];
}

Do not be surprised to find extra padding added after _your_ member
called padding, and before or after more_padding.

Tell me, did you _really_ not know all this, or are you being an awkward
arsehole just to make the point that you _can_ be an awkward arsehole?

Richard
 
J

James Kanze

Boon said:
I wrote my article after referring to:

The diagram in this RFC does show three bytes of options and one
of padding at the end. The text, however, makes it clear that
the header may end immediately after the urgent pointer field,
that the size of the options field is variable, that the ammount
of padding is also variable---the sum of the sizes of the
options and the padding must be a multiple of 4.
<snip>

Then it's possible that I'm out of date. (In fact, it's quite
probable.)

RFC 793 is still the basic definition of TCP, although there are
later RFC's which update it (e.g. by specifying additional
options or additional control bits).
 
J

James Kanze

Put another way, if you have a buffer
char buf[1024];
which has some received network packet in it, you can't do
struct packet_header *header=(struct packet_header *)buf;
and start reading the members of header and expect everything
to be fine. The packet has all members aligned according to
the network protocol, probably with very little padding, and
the compiler probably aligns for instance ints and chars
differently, inserting padding in between them or even
reordering them inside the struct.

And the network may use a different representation for negative
values, or even a different byte size. The RFC's specify
octets, the C standard bytes. Octets are eight bits, bytes may
be any number of bits, depending on the hardware (although I've
never heard of less than six, the C standard requires at least
eight, and Posix does require exactly eight). Posix also
requires 2's complement, so on a Posix compliant system, the
only difference in representation can be byte order.

En general, anytime you're moving between network data and
internal data, you need some sort of marshalling code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top