struct calcsize discrepency?

C

Chris Angelico

In IPython:
16

This doesn't make sense to me.  Can anyone explain?

Same thing happens in CPython, and it looks to be the result of alignment.
12

The eight-byte integer is aligned on an eight-byte boundary, so when
it follows a four-byte string, you get four padding bytes inserted.
Put them in the other order, and the padding disappears.

(Caveat: I don't use the struct module much, this is based on conjecture.)

ChrisA
 
G

Glen Rice

When you mix different types in a struct there can be padding inserted
between the items. In this case the 8 byte unsigned long long must always
start on an 8 byte boundary so 4 padding bytes are inserted.

Seehttp://docs.python.org/library/struct.html?highlight=struct#byte-order-
size-and-alignment in particular the first sentence:

"By default, C types are represented in the machine s native format and
byte order, and properly aligned by skipping pad bytes if necessary
(according to the rules used by the C compiler)."

Chris / Duncan, Thanks. I missed that in the docs.
 
P

Peter Otten

Glen said:
In IPython:
16

This doesn't make sense to me. Can anyone explain?

A C compiler can insert padding bytes into a struct:

"""By default, the result of packing a given C struct includes pad bytes in
order to maintain proper alignment for the C types involved; similarly,
alignment is taken into account when unpacking. This behavior is chosen so
that the bytes of a packed struct correspond exactly to the layout in memory
of the corresponding C struct. To handle platform-independent data formats
or omit implicit pad bytes, use standard size and alignment instead of
native size and alignment: see Byte Order, Size, and Alignment for details.
"""

http://docs.python.org/library/struct.html#struct-alignment

You can avoid this by specifying a non-native byte order (little endian, big
endian, or "network"):
12
 
D

Dave Angel

Same thing happens in CPython, and it looks to be the result of alignment.

12

The eight-byte integer is aligned on an eight-byte boundary, so when
it follows a four-byte string, you get four padding bytes inserted.
Put them in the other order, and the padding disappears.
NOT disappears. In C, the padding to the largest alignment occurs at
the end of a structure as well as between items. Otherwise, an array of
the struct would not be safely aligned. if you have an 8byte item
followed by a 4 byte item, the total size is 16.
 
C

Chris Angelico

NOT disappears.  In C, the padding to the largest alignment occurs at the
end of a structure as well as between items.  Otherwise, an array of the
struct would not be safely aligned.  if you have an 8byte item followedby a
4 byte item, the total size is 16.

That's padding of the array, not of the structure. But you're right in
that removing padding from inside the structure will in this case
result in padding outside the structure. However, in more realistic
scenarios, it's often possible to truly eliminate padding by ordering
members appropriately.

ChrisA
 
M

Mark Dickinson

That's padding of the array, not of the structure.

That's a strange way to think of it, especially since the padding also
happens for a single struct object when there's no array present. I
find it cleaner to think of C as having no padding in arrays, but
padding at the end of a struct. See C99 6.7.2.1p15: 'There may be
unnamed padding at the end of a structure or union.' There's no
mention in the standard of padding for arrays.
 
C

Chris Angelico

That's a strange way to think of it, especially since the padding also
happens for a single struct object when there's no array present.  I
find it cleaner to think of C as having no padding in arrays, but
padding at the end of a struct.  See C99 6.7.2.1p15: 'There may be
unnamed padding at the end of a structure or union.'  There's no
mention in the standard of padding for arrays.

May be, yes, but since calcsize() is returning 12 when the elements
are put in the other order, it would seem to be not counting such
padding. The way I look at it, padding is always used to place the
beginning of something; in an array, it places the beginning of the
second element on a convenient boundary, rather than filling out the
first element to that boundary.

I tried a similar thing with a couple of C compilers, and both of them
gave the same sizeof() value for both orderings, which would imply
that they _do_ include such padding in the structure's end. My
statement that the padding was removed was based solely on
calcsize()'s different result.

ChrisA
 
N

Nobody

Indeed. That's arguably a bug in the struct module,

There's no "arguably" about it. The documentation says:

Native size and alignment are determined using the C compiler’s sizeof
expression.

But given:

struct { unsigned long long a; char b[4]; } foo;
struct { char b[4]; unsigned long long a; } bar;

sizeof(foo) will always equal sizeof(bar). If long long is 8 bytes and has
8-byte alignment, both will be 16.

If you want consistency with the in-memory representation used by
C/C++ programs (and the on-disk representation used by C/C++ programs
which write the in-memory representation directly to file), use ctypes;
e.g.:
_fields_ = [
("a", c_ulonglong),
("b", c_char * 4)]
16
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top