struct size confusion:

M

Michael Yanowitz

Hello:

I am relatively new to Python and this is my first post on
this mailing list.

I am confused as to why I am getting size differences in the following
cases:
6

Why is it 8 bytes in the third case and why would it be only 6 bytes
in the last case if it is 8 in the previous?

I tried specifying big endian and little endian and they both have
the same results.

I suspect, there is some kind of padding involved, but it does not
seem to be done consistently or in a recognizable method.

I will be reading shorts and longs sent from C into Python
through a socket.

Suppose I know I am getting 34 bytes, and the last 6 bytes are a 2-byte
word followed by a 4-byte int, how can I be assured that it will be in that
format?

In a test, I am sending data in this format:
PACK_FORMAT = "HBBBBHBBBBBBBBBBBBBBBBBBBBHI"
which is 34 bytes
However, when I receive the data, I am using the format:
UNPACK_FORMAT = "!HBBBBHBBBBBBBBBBBBBBBBBBBBHHI"
which has the extra H in the second to last position to make them
compatible, but that makes it 36 bytes. I am trying to come up with
some explanation as to where the extra 2 bytes come from.

Thanks in advance:
 
D

Diez B. Roggisch

Michael said:
Why is it 8 bytes in the third case and why would it be only 6 bytes
in the last case if it is 8 in the previous?

From TFM:

"""
Native size and alignment are determined using the C compiler's sizeof
expression. This is always combined with native byte order.

Standard size and alignment are as follows: no alignment is required for
any type (so you have to use pad bytes); short is 2 bytes; int and long are
4 bytes; long long (__int64 on Windows) is 8 bytes; float and double are
32-bit and 64-bit IEEE floating point numbers, respectively
"""

See this how to achieve the desired results (on my system at least):

Regards,

Diez
 
K

Kent Johnson

Michael said:
Hello:

I am relatively new to Python and this is my first post on
this mailing list.

I am confused as to why I am getting size differences in the following
cases:



6

Why is it 8 bytes in the third case and why would it be only 6 bytes
in the last case if it is 8 in the previous?

By default the struct module uses native byte-order and alignment which
may insert padding. In your case, the integer is forced to start on a
4-byte boundary so two pad bytes must be inserted between the short and
the int. When the int is first no padding is needed - the short starts
on a 2-byte boundary.

To eliminate the padding you should use any of the options that specify
'standard' alignment instead of native:

In [2]: struct.calcsize('I')
Out[2]: 4

In [3]: struct.calcsize('H')
Out[3]: 2

In [4]: struct.calcsize('HI')
Out[4]: 8

In [5]: struct.calcsize('IH')
Out[5]: 6

In [6]: struct.calcsize('!HI')
Out[6]: 6

In [7]: struct.calcsize('>HI')
Out[7]: 6

In [8]: struct.calcsize('<HI')
Out[8]: 6
I tried specifying big endian and little endian and they both have
the same results.

Are you sure? They should use standard alignment as in the example above.
I suspect, there is some kind of padding involved, but it does not
seem to be done consistently or in a recognizable method.

I will be reading shorts and longs sent from C into Python
through a socket.

Suppose I know I am getting 34 bytes, and the last 6 bytes are a 2-byte
word followed by a 4-byte int, how can I be assured that it will be in that
format?

In a test, I am sending data in this format:
PACK_FORMAT = "HBBBBHBBBBBBBBBBBBBBBBBBBBHI"
which is 34 bytes

Are you sure? Not for me:

In [9]: struct.calcsize('HBBBBHBBBBBBBBBBBBBBBBBBBBHI')
Out[9]: 36

Kent
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top