Struggling with struct.unpack() and "p" format specifier

Geoffrey · Nov 30, 2004

Hope someone can help.
I am trying to read data from a file binary file and then unpack the
data into python variables. Some of the data is store like this;

xbuffer: '\x00\x00\xb9\x02\x13EXCLUDE_CREDIT_CARD'
# the above was printed using repr(xbuffer).
# Note that int(0x13) = 19 which is exactly the length of the visible
text
#

In the code I have the following statement;
x = st.unpack('>xxBBp',xbuffer)

This throws out the following error;

x = st.unpack('>xxBBp',xbuffer)
error: unpack str size does not match format

As I read the documentation the "p" format string seems to address
this situation, where the number bytes of the string to read is the
first byte of the stored value but I keep getting this error.

Am I missing something ?
Can the "p" format character be used to unpack this type of data ?

As I mentioned, I can parse the string and read it with multiple
statements, I am just looking for a more efficient solution.

Thanks.

Tim Peters · Nov 30, 2004

[Geoffrey said:
I am trying to read data from a file binary file and then unpack the
data into python variables. Some of the data is store like this;

xbuffer: '\x00\x00\xb9\x02\x13EXCLUDE_CREDIT_CARD'
# the above was printed using repr(xbuffer).
# Note that int(0x13) = 19 which is exactly the length of the visible
text
#

In the code I have the following statement;
x = st.unpack('>xxBBp',xbuffer)

This throws out the following error;

x = st.unpack('>xxBBp',xbuffer)
error: unpack str size does not match format

As I read the documentation the "p" format string seems to
address this situation, where the number bytes of the string to
read is the first byte of the stored value but I keep getting this
error.

....

Well, the docs mean it when they say:

Note that for unpack(), the "p" format character consumes count
bytes

You don't have an explicit count in front of your "p" code, so count
defaults to 1, so only one byte of xbuffer will get consumed.

This works, telling struct that this particular p field consumes 20
bytes (including the string-length byte):
(185, 2, 'EXCLUDE_CREDIT_CARD')

Or, a bit more generally, assuming your p field is always at the end,
and is preceded by 4 bytes:
(185, 2, 'EXCLUDE_CREDIT_CARD')

Note that there's no direct support for any kind of variable-width
data in struct. The number of bytes involved has to be deducible from
the format string alone.

Peter Hansen · Nov 30, 2004

Geoffrey said:
I am trying to read data from a file binary file and then unpack the
data into python variables. Some of the data is store like this; ....
As I read the documentation the "p" format string seems to address
this situation, where the number bytes of the string to read is the
first byte of the stored value but I keep getting this error.

Am I missing something ?
Can the "p" format character be used to unpack this type of data ?

I've tried experimenting with "p" and cannot get any meaningful
results. In all cases pack() returns '\x00' while unpack()
with anything other than a one-byte string returns an exception
(unpack str size does not match format) while with a one-byte
string it always returns ('',).

I would be inclined to say that the "p" format in struct (using
Python 2.4rc1 or Python 2.3.3) does not act as documented on
Windows XP SP2, at least...

I hope we've both just missed something obvious.

-Peter

Peter Hansen · Nov 30, 2004

Peter said:
I would be inclined to say that the "p" format in struct (using
Python 2.4rc1 or Python 2.3.3) does not act as documented on
Windows XP SP2, at least...

I hope we've both just missed something obvious.

Okay, we were certainly missing something, but I don't believe
I would call it obvious.

I can't deduce from the documentation the fact that the "p"
format requires a length *in front of the p in the format string*.

Furthermore, it assumes a length of 1 if one is not specified.

And there is no example that shows how to do it correctly.

(I did Google searches and found examples, but by then I
was looking for a bug report and didn't even think to look
at the examples themselves. :-( )

Doc bug? Did anyone else find the documentation on "p"
to be clear and effective?

-Peter

Peter Hansen · Nov 30, 2004

Geoffrey said:
As I mentioned, I can parse the string and read it with multiple
statements, I am just looking for a more efficient solution.

This looks like about the best you can do, using the information
from Tim's reply:

>>> buf = '\0\0\xb9\x02\x13EXCLUDE_CREDIT_CARD'
>>> import struct
>>> x = struct.unpack('>xxBB%sp' % (ord(buf[4])+1), buf)
>>> x

Click to expand...

Click to expand...

(185, 2, 'EXCLUDE_CREDIT_CARD')

If you wanted to avoid hard-coding the 4, you would
be most correct to do this:

header = '>xxBB'
lenIndex = struct.calcsize(header)
x = struct.unpack('%s%dp' % (header, ord(buf[lenIndex])+1), buf)

.... though that doesn't exactly make it all that readable.

-Peter

Geoffrey · Dec 1, 2004

Thanks for your response.

I guess the documentation on the p format wasn't clear to me ... or
perhaps I was just hoping to much for an easy solution !

The data is part of a record structure that is written to a file with
a few "int"'s and "longs" mixed in. The pattern repeats through the
file with sometime up to 2500 repititions.

Clearly I can create a subroutine to read the records and extract out
the fields. I was just hoping I could use the "struct" module and
create a pattern like 'LLHpHLpppH' which would unpack the date and
automatically give me the strings without needing to first determine
their lengths as the length is already embedded in the data.

Any suggestion on how to go about proposing the ability to read
variable length strings based on the preceeding byte value to the
struct module ? It seems it would be a valuable addition, helping
with code clarity, readability and saving quite a few lines of code -
well atleast me anyways !

Thanks again.

Peter Hansen said:
Geoffrey said:

As I mentioned, I can parse the string and read it with multiple
statements, I am just looking for a more efficient solution.

Click to expand...

This looks like about the best you can do, using the information
from Tim's reply:

buf = '\0\0\xb9\x02\x13EXCLUDE_CREDIT_CARD'
import struct
x = struct.unpack('>xxBB%sp' % (ord(buf[4])+1), buf)
x

Click to expand...

Click to expand...

(185, 2, 'EXCLUDE_CREDIT_CARD')

If you wanted to avoid hard-coding the 4, you would
be most correct to do this:

header = '>xxBB'
lenIndex = struct.calcsize(header)
x = struct.unpack('%s%dp' % (header, ord(buf[lenIndex])+1), buf)

... though that doesn't exactly make it all that readable.

-Peter

issue with struct.unpack	16	Aug 25, 2012
struct.unpack	6	Oct 2, 2005
PEP 378: Format Specifier for Thousands Separator	0	May 21, 2013
Struggling for inspiration with lists	1	Dec 17, 2013
Why should I convert PST file to CSV format?	1	Apr 2, 2026
Is it possible to open MBOX files in Maildir format directly?	0	Apr 20, 2026
Help with recompiling a small software into 32 bit format	0	Jun 2, 2023
Question about struct.unpack	3	Feb 22, 2006

Struggling with struct.unpack() and "p" format specifier

Geoffrey

Tim Peters

Peter Hansen

Peter Hansen

Peter Hansen

Geoffrey

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads