Byte oriented data types in python

R

Ravi

I have following packet format which I have to send over Bluetooth.

packet_type (1 byte unsigned) || packet_length (1 byte unsigned) ||
packet_data(variable)

How to construct these using python data types, as int and float have
no limits and their sizes are not well defined.
 
M

Martin v. Löwis

packet_type (1 byte unsigned) || packet_length (1 byte unsigned) ||
packet_data(variable)

How to construct these using python data types, as int and float have
no limits and their sizes are not well defined.

In Python 2.x, use the regular string type: chr(n) will create a single
byte, and the + operator will do the concatenation.

In Python 3.x, use the bytes type (bytes() instead of chr()).

Regards,
Martin
 
R

Ravi

Take a look at the struct and ctypes modules.

struct is really not the choice. it returns an expanded string of the
data and this means larger latency over bluetooth.

ctypes is basically for the interface with libraries written in C
(this I read from the python docs)
 
R

Ravi

In Python 2.x, use the regular string type: chr(n) will create a single
byte, and the + operator will do the concatenation.

In Python 3.x, use the bytes type (bytes() instead of chr()).

This looks really helpful thanks!
 
S

Steve Holden

Ravi said:
struct is really not the choice. it returns an expanded string of the
data and this means larger latency over bluetooth.
If you read the module documentation more carefully you will see that it
"converts" between the various native data types and character strings.
Thus each native data type occupies only as many bytes as are required
to store it in its native form (modulo any alignments needed).
ctypes is basically for the interface with libraries written in C
(this I read from the python docs)
I believe it *is* the struct module you need.

regards
Steve
 
M

Martin v. Löwis

I don't know what you mean by "returns an expanded string of
the data".

I do know that struct does exactly what you requested.

I disagree. He has a format (type, length, value), with the
value being variable-sized. How do you do that in the struct
module?
It converts between Python objects and what is bascially a C
"struct" where you specify the endianness of each field and
what sort of packing/padding you want.

Sure. However, in the specific case, there is really no C
struct that can reasonably represent the data. Hence you
cannot really use the struct module.
I use the struct module frequenty to impliment binary,
communications protocols in Python. I've used Python/struct
with transport layers ranging from Ethernet (raw, TCP, and UDP)
to async serial, to CAN.

Do you use it for the fixed-size parts, or also for the variable-sized
data?

Regards,
Martin
 
M

Martin v. Löwis

I disagree. He has a format (type, length, value), with the
You construct a format string for the "value" portion based on
the type/length header.

Can you kindly provide example code on how to do this?
I don't see how that can be the case. There may not be a
single C struct that can represent all frames, but for every
frame you should be able to come up with a C struct that can
represent that frame.

Sure. You would normally have a struct such as

struct TLV{
char type;
char length;
char *data;
};

However, the in-memory representation of that struct is *not*
meant to be sent over the wire. In particular, the character
pointer has no meaning outside the address space, and is thus
not to be sent.
Both. For varible size/format stuff you decode the first few
bytes and use them to figure out what format/layout to use for
the next chunk of data. It's pretty much the same thing you do
in other languages.

In the example he gave, I would just avoid using the struct module
entirely, as it does not provide any additional value:

def encode(type, length, value):
return chr(type)+chr(length)+value

Regards,
Martin
 
J

John Machin

This looks really helpful thanks!

Provided that you don't take Martin's last sentence too literally :)


| Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on win32
| >>> p_data = b"abcd" # Omit the b prefix if using 2.5 or earlier
| >>> p_len = len(p_data)
| >>> p_type = 3
| >>> chr(p_type) + chr(p_len) + p_data
| '\x03\x04abcd'

| Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit
(Intel)] on win32
| >>> p_data = b"abcd"
| >>> p_len = len(p_data)
| >>> p_type = 3
| >>> bytes(p_type) + bytes(p_len) + p_data # literal translation
| b'\x00\x00\x00\x00\x00\x00\x00abcd'
| >>> bytes(3)
| b'\x00\x00\x00'
| >>> bytes(10)
| b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
| >>> bytes([p_type]) + bytes([p_len]) + p_data
| b'\x03\x04abcd'
| >>> bytes([p_type, p_len]) + p_data
| b'\x03\x04abcd'

Am I missing a better way to translate chr(n) from 2.x to 3.x? The
meaning assigned to bytes(n) in 3.X is "interesting":

2.X:
nuls = '\0' * n
out_byte = chr(n)

3.X:
nuls = b'\0' * n
or
nuls = bytes(n)
out_byte = bytes([n])

Looks to me like there was already a reasonable way of getting a bytes
object containing a variable number of zero bytes. Any particular
reason why bytes(n) was given this specialised meaning? Can't be the
speed, because the speed of bytes(n) on my box is about 50% of the
speed of the * expression for n = 16 and about 65% for n = 1024.

Cheers,
John
 
M

Martin v. Löwis

Looks to me like there was already a reasonable way of getting a bytes
object containing a variable number of zero bytes. Any particular
reason why bytes(n) was given this specialised meaning?

I think it was because bytes() was originally mutable, and you need a
way to create a buffer of n bytes. Now that bytes() ended up immutable
(and bytearray was added), it's perhaps not so useful anymore. Of
course, it would be confusing if bytes(4) created a sequence of one
byte, yet bytearray(4) created four bytes.

Regards,
Martin
 
M

Martin v. Löwis

dtype = ord(rawdata[0])
dcount = struct.unpack("!H",rawdata[1:3])
if dtype == 1:
fmtstr = "!" + "H"*dcount
elif dtype == 2:
fmtstr = "!" + "f"*dcount
rlen = struct.calcsize(fmtstr)

data = struct.unpack(fmtstr,rawdata[3:3+rlen])

leftover = rawdata[3+rlen:]

Unfortunately, that does not work in the example. We have
a message type (an integer), and a variable-length string.
So how do you compute the struct format for that?
Well if it's not representing the layout of the data we're
trying to deal with, then it's irrelevent. We are talking
about how convert python objects to/from data in the
'on-the-wire' format, right?

Right: ON-THE-WIRE, not IN MEMORY. In memory, there is a
pointer. On the wire, there are no pointers.
Like this?

... return chr(type)+chr(length)+value
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in encode
TypeError: an integer is required

No:

py> CONNECT_REQUEST=17
py> payload="call me"
py> encode(CONNECT_REQUEST, len(payload), payload)
'\x11\x07call me'

Regards,
Martin
 
M

Martin v. Löwis

Unfortunately, that does not work in the example. We have
I'm confused. Are you asking for an introductory tutorial on
programming in Python?

Perhaps. I honestly do not know how to deal with variable-sized
strings in the struct module in a reasonable way, and thus believe
that this module is incapable of actually supporting them
(unless you use inappropriate trickery).

However, as you keep claiming that the struct module is what
should be used, I must be missing something about the struct
module.
I don't understand your point.


If all your data is comprised of 8-bit bytes, then you don't
need the struct module.

Go back to the original message of the OP. It says

# I have following packet format which I have to send over Bluetooth.
# packet_type (1 byte unsigned) || packet_length (1 byte unsigned) ||
# packet_data(variable)

So yes, all his date is comprised of 8-bit bytes, and yes, he doesn't
need the struct module. Hence I'm puzzled why people suggest that
he uses the struct module.

I think the key answer is "use the string type, it is appropriate
to represent byte oriented data in python" (also see the subject
of this thread)

Regards,
Martin
 
M

Martin v. Löwis

It deals with variable sized fields just fine:
dtype = 18
dlength = 32
format = "!BB%ds" % dlength

rawdata = struct.pack(format, (dtype,dlength,data))

I wouldn't call this "just fine", though - it involves
a % operator to even compute the format string. IMO,
it is *much* better not to use the struct module for this
kind of problem, and instead rely on regular string
concatenation.

Regards,
Martin
 
J

John Machin

I wouldn't call this "just fine", though - it involves
a % operator to even compute the format string. IMO,
it is *much* better not to use the struct module for this
kind of problem, and instead rely on regular string
concatenation.

IMO, it would be a good idea if struct.[un]pack supported a variable *
length operator that could appear anywhere that an integer constant
could appear, as in C's printf etc and Python's % formatting:

dlen = len(data)
rawdata = struct.pack("!BB*s", dtype, dlen, dlen, data)
# and on the other end of the wire:
dtype, dlen = struct.unpack("!BB", rawdata[:2])
data = struct.unpack("!*s", rawdata[2:], dlen)
# more than 1 count arg could be used if necessary
# *s would return a string
# *B, *H, *I, etc would return a tuple of ints in (3.X-speak)

I've worked with variable-length data that looked like
len1, len2, len3, data1, data2, data3
and the * gadget would have been very handy:
len1, len2, len3 = unpack('!BBB', raw[:3])
data1, data2, data3 = unpack('!*H*i*d', raw[3:], len1, len2, len3)

Note the semantics of '!*H*i*d' would be different from '!8H2i7d'
because otherwise you'd need to do:
bundle = unpack('!*H*i*d', raw[3:], len1, len2, len3)
data1 = bundle[:len1]
data2 = bundle[len1:len1+len2]
data3 = bundle[len1+len2:]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top