Struggling with struct.unpack() and "p" format specifier

Discussion in 'Python' started by Geoffrey, Nov 30, 2004.

  1. Geoffrey

    Geoffrey Guest

    Hope someone can help.
    I am trying to read data from a file binary file and then unpack the
    data into python variables. Some of the data is store like this;

    xbuffer: '\x00\x00\xb9\x02\x13EXCLUDE_CREDIT_CARD'
    # the above was printed using repr(xbuffer).
    # Note that int(0x13) = 19 which is exactly the length of the visible
    text
    #

    In the code I have the following statement;
    x = st.unpack('>xxBBp',xbuffer)

    This throws out the following error;

    x = st.unpack('>xxBBp',xbuffer)
    error: unpack str size does not match format

    As I read the documentation the "p" format string seems to address
    this situation, where the number bytes of the string to read is the
    first byte of the stored value but I keep getting this error.

    Am I missing something ?
    Can the "p" format character be used to unpack this type of data ?

    As I mentioned, I can parse the string and read it with multiple
    statements, I am just looking for a more efficient solution.

    Thanks.
     
    Geoffrey, Nov 30, 2004
    #1
    1. Advertising

  2. Geoffrey

    Tim Peters Guest

    [Geoffrey <>]
    > I am trying to read data from a file binary file and then unpack the
    > data into python variables. Some of the data is store like this;
    >
    > xbuffer: '\x00\x00\xb9\x02\x13EXCLUDE_CREDIT_CARD'
    > # the above was printed using repr(xbuffer).
    > # Note that int(0x13) = 19 which is exactly the length of the visible
    > text
    > #
    >
    > In the code I have the following statement;
    > x = st.unpack('>xxBBp',xbuffer)
    >
    > This throws out the following error;
    >
    > x = st.unpack('>xxBBp',xbuffer)
    > error: unpack str size does not match format
    >
    > As I read the documentation the "p" format string seems to
    > address this situation, where the number bytes of the string to
    > read is the first byte of the stored value but I keep getting this
    > error.


    ....

    Well, the docs mean it when they say:

    Note that for unpack(), the "p" format character consumes count
    bytes

    You don't have an explicit count in front of your "p" code, so count
    defaults to 1, so only one byte of xbuffer will get consumed.

    This works, telling struct that this particular p field consumes 20
    bytes (including the string-length byte):

    >>> struct.unpack('>xxBB20p',xbuffer)

    (185, 2, 'EXCLUDE_CREDIT_CARD')

    Or, a bit more generally, assuming your p field is always at the end,
    and is preceded by 4 bytes:

    >>> struct.unpack('>xxBB%dp' % (len(xbuffer) - 4), xbuffer)

    (185, 2, 'EXCLUDE_CREDIT_CARD')

    Note that there's no direct support for any kind of variable-width
    data in struct. The number of bytes involved has to be deducible from
    the format string alone.
     
    Tim Peters, Nov 30, 2004
    #2
    1. Advertising

  3. Geoffrey

    Peter Hansen Guest

    Geoffrey wrote:
    > I am trying to read data from a file binary file and then unpack the
    > data into python variables. Some of the data is store like this;

    ....
    > As I read the documentation the "p" format string seems to address
    > this situation, where the number bytes of the string to read is the
    > first byte of the stored value but I keep getting this error.
    >
    > Am I missing something ?
    > Can the "p" format character be used to unpack this type of data ?


    I've tried experimenting with "p" and cannot get any meaningful
    results. In all cases pack() returns '\x00' while unpack()
    with anything other than a one-byte string returns an exception
    (unpack str size does not match format) while with a one-byte
    string it always returns ('',).

    I would be inclined to say that the "p" format in struct (using
    Python 2.4rc1 or Python 2.3.3) does not act as documented on
    Windows XP SP2, at least...

    I hope we've both just missed something obvious.

    -Peter
     
    Peter Hansen, Nov 30, 2004
    #3
  4. Geoffrey

    Peter Hansen Guest

    Peter Hansen wrote:
    > I would be inclined to say that the "p" format in struct (using
    > Python 2.4rc1 or Python 2.3.3) does not act as documented on
    > Windows XP SP2, at least...
    >
    > I hope we've both just missed something obvious.


    Okay, we were certainly missing something, but I don't believe
    I would call it obvious.

    I can't deduce from the documentation the fact that the "p"
    format requires a length *in front of the p in the format string*.

    Furthermore, it assumes a length of 1 if one is not specified.

    And there is no example that shows how to do it correctly.

    (I did Google searches and found examples, but by then I
    was looking for a bug report and didn't even think to look
    at the examples themselves. :-( )

    Doc bug? Did anyone else find the documentation on "p"
    to be clear and effective?

    -Peter
     
    Peter Hansen, Nov 30, 2004
    #4
  5. Geoffrey

    Peter Hansen Guest

    Geoffrey wrote:
    > As I mentioned, I can parse the string and read it with multiple
    > statements, I am just looking for a more efficient solution.


    This looks like about the best you can do, using the information
    from Tim's reply:

    >>> buf = '\0\0\xb9\x02\x13EXCLUDE_CREDIT_CARD'
    >>> import struct
    >>> x = struct.unpack('>xxBB%sp' % (ord(buf[4])+1), buf)
    >>> x

    (185, 2, 'EXCLUDE_CREDIT_CARD')

    If you wanted to avoid hard-coding the 4, you would
    be most correct to do this:

    header = '>xxBB'
    lenIndex = struct.calcsize(header)
    x = struct.unpack('%s%dp' % (header, ord(buf[lenIndex])+1), buf)

    .... though that doesn't exactly make it all that readable.

    -Peter
     
    Peter Hansen, Nov 30, 2004
    #5
  6. Geoffrey

    Geoffrey Guest

    Thanks for your response.

    I guess the documentation on the p format wasn't clear to me ... or
    perhaps I was just hoping to much for an easy solution !

    The data is part of a record structure that is written to a file with
    a few "int"'s and "longs" mixed in. The pattern repeats through the
    file with sometime up to 2500 repititions.

    Clearly I can create a subroutine to read the records and extract out
    the fields. I was just hoping I could use the "struct" module and
    create a pattern like 'LLHpHLpppH' which would unpack the date and
    automatically give me the strings without needing to first determine
    their lengths as the length is already embedded in the data.

    Any suggestion on how to go about proposing the ability to read
    variable length strings based on the preceeding byte value to the
    struct module ? It seems it would be a valuable addition, helping
    with code clarity, readability and saving quite a few lines of code -
    well atleast me anyways !

    Thanks again.

    Peter Hansen <> wrote in message news:<coi4o4$8in$>...
    > Geoffrey wrote:
    > > As I mentioned, I can parse the string and read it with multiple
    > > statements, I am just looking for a more efficient solution.

    >
    > This looks like about the best you can do, using the information
    > from Tim's reply:
    >
    > >>> buf = '\0\0\xb9\x02\x13EXCLUDE_CREDIT_CARD'
    > >>> import struct
    > >>> x = struct.unpack('>xxBB%sp' % (ord(buf[4])+1), buf)
    > >>> x

    > (185, 2, 'EXCLUDE_CREDIT_CARD')
    >
    > If you wanted to avoid hard-coding the 4, you would
    > be most correct to do this:
    >
    > header = '>xxBB'
    > lenIndex = struct.calcsize(header)
    > x = struct.unpack('%s%dp' % (header, ord(buf[lenIndex])+1), buf)
    >
    > ... though that doesn't exactly make it all that readable.
    >
    > -Peter
     
    Geoffrey, Dec 1, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Angus Comber

    Struggling with bsearch - on a struct!

    Angus Comber, Feb 5, 2004, in forum: C Programming
    Replies:
    4
    Views:
    367
    Al Bowers
    Feb 6, 2004
  2. Chris Fogelklou
    Replies:
    36
    Views:
    1,405
    Chris Fogelklou
    Apr 20, 2004
  3. mikeSpindler

    struct.unpack() and bit operations

    mikeSpindler, Sep 23, 2004, in forum: Python
    Replies:
    2
    Views:
    581
    Peter Otten
    Sep 23, 2004
  4. asit
    Replies:
    8
    Views:
    414
    Martin Ambuhl
    Jan 13, 2008
  5. Íßêïò
    Replies:
    13
    Views:
    3,488
    Corey Richardson
    Feb 24, 2011
Loading...

Share This Page