Reading binary data

Discussion in 'Python' started by Aaron Scott, Sep 10, 2008.

  1. Aaron Scott

    Aaron Scott Guest

    I've been trying to tackle this all morning, and so far I've been
    completely unsuccessful. I have a binary file that I have the
    structure to, and I'd like to read it into Python. It's not a
    particularly complicated file. For instance:

    signature char[3] "GDE"
    version uint32 2
    attr_count uint32
    {
    attr_id uint32
    attr_val_len uint32
    attr_val char[attr_val_len]
    } ... repeated attr_count times ...

    However, I can't find a way to bring it into Python. This is my code
    -- which I know is definitely wrong, but I had to start somewhere:

    import struct
    file = open("test.gde", "rb")
    output = file.read(3)
    print output
    version = struct.unpack("I", file.read(4))[0]
    print version
    attr_count = struct.unpack("I", file.read(4))[0]
    while attr_count:
    print "---"
    file.seek(4, 1)
    counter = int(struct.unpack("I", file.read(4))[0])
    print file.read(counter)
    attr_count -= 1
    file.close()

    Of course, this doesn't work at all. It produces:

    GDE
    2
    ---
    é
    ---
    ê Å

    I'm completely at a loss. If anyone could show me the correct way to
    do this (or at least point me in the right direction), I'd be
    extremely grateful.
     
    Aaron Scott, Sep 10, 2008
    #1
    1. Advertising

  2. Aaron Scott

    Jon Clements Guest

    On 10 Sep, 18:14, Aaron Scott <> wrote:
    > I've been trying to tackle this all morning, and so far I've been
    > completely unsuccessful. I have a binary file that I have the
    > structure to, and I'd like to read it into Python. It's not a
    > particularly complicated file. For instance:
    >
    > signature   char[3]     "GDE"
    > version     uint32      2
    > attr_count  uint32
    > {
    >     attr_id         uint32
    >     attr_val_len    uint32
    >     attr_val        char[attr_val_len]
    >
    > } ... repeated attr_count times ...
    >
    > However, I can't find a way to bring it into Python. This is my code
    > -- which I know is definitely wrong, but I had to start somewhere:
    >
    > import struct
    > file = open("test.gde", "rb")
    > output = file.read(3)
    > print output
    > version = struct.unpack("I", file.read(4))[0]
    > print version
    > attr_count = struct.unpack("I", file.read(4))[0]
    > while attr_count:
    >         print "---"
    >         file.seek(4, 1)
    >         counter = int(struct.unpack("I", file.read(4))[0])
    >         print file.read(counter)
    >         attr_count -= 1
    > file.close()
    >
    > Of course, this doesn't work at all. It produces:
    >
    > GDE
    > 2
    > ---
    > é
    > ---
    > ê Å
    >
    > I'm completely at a loss. If anyone could show me the correct way to
    > do this (or at least point me in the right direction), I'd be
    > extremely grateful.


    What if we view the data as having an 11 byte header:
    signature, version, attr_count = struct.unpack('3cII',
    yourfile.read(11))

    Then for the list of attr's:
    for idx in xrange(attr_count):
    attr_id, attr_val_len = struct.unpack('II', yourfile.read(8))
    attr_val = yourfile.read(attr_val_len)


    hth, or gives you a pointer anyway
    Jon.
     
    Jon Clements, Sep 10, 2008
    #2
    1. Advertising

  3. Aaron Scott

    Jon Clements Guest

    On 10 Sep, 18:33, Jon Clements <> wrote:
    > On 10 Sep, 18:14, Aaron Scott <> wrote:
    >
    >
    >
    > > I've been trying to tackle this all morning, and so far I've been
    > > completely unsuccessful. I have a binary file that I have the
    > > structure to, and I'd like to read it into Python. It's not a
    > > particularly complicated file. For instance:

    >
    > > signature   char[3]     "GDE"
    > > version     uint32      2
    > > attr_count  uint32
    > > {
    > >     attr_id         uint32
    > >     attr_val_len    uint32
    > >     attr_val        char[attr_val_len]

    >
    > > } ... repeated attr_count times ...

    >
    > > However, I can't find a way to bring it into Python. This is my code
    > > -- which I know is definitely wrong, but I had to start somewhere:

    >
    > > import struct
    > > file = open("test.gde", "rb")
    > > output = file.read(3)
    > > print output
    > > version = struct.unpack("I", file.read(4))[0]
    > > print version
    > > attr_count = struct.unpack("I", file.read(4))[0]
    > > while attr_count:
    > >         print "---"
    > >         file.seek(4, 1)
    > >         counter = int(struct.unpack("I", file.read(4))[0])
    > >         print file.read(counter)
    > >         attr_count -= 1
    > > file.close()

    >
    > > Of course, this doesn't work at all. It produces:

    >
    > > GDE
    > > 2
    > > ---
    > > é
    > > ---
    > > ê Å

    >
    > > I'm completely at a loss. If anyone could show me the correct way to
    > > do this (or at least point me in the right direction), I'd be
    > > extremely grateful.

    >
    > What if we view the data as having an 11 byte header:
    > signature, version, attr_count = struct.unpack('3cII',
    > yourfile.read(11))
    >
    > Then for the list of attr's:
    > for idx in xrange(attr_count):
    >     attr_id, attr_val_len = struct.unpack('II', yourfile.read(8))
    >     attr_val = yourfile.read(attr_val_len)
    >
    > hth, or gives you a pointer anyway
    > Jon.


    CORRECTION: '3cII' should be '3sII'.
     
    Jon Clements, Sep 10, 2008
    #3
  4. Aaron Scott

    Aaron Scott Guest

    > signature, version, attr_count = struct.unpack('3cII',
    > yourfile.read(11))
    >


    This line is giving me an error:

    Traceback (most recent call last):
    File "test.py", line 19, in <module>
    signature, version, attr_count = struct.unpack('3cII',
    file.read(12))
    ValueError: too many values to unpack
     
    Aaron Scott, Sep 10, 2008
    #4
  5. Aaron Scott

    Aaron Scott Guest

    > CORRECTION: '3cII' should be '3sII'.

    Even with the correction, I'm still getting the error.
     
    Aaron Scott, Sep 10, 2008
    #5
  6. Aaron Scott

    Jon Clements Guest

    On Sep 10, 6:45 pm, Aaron Scott <> wrote:
    > > CORRECTION: '3cII' should be '3sII'.

    >
    > Even with the correction, I'm still getting the error.


    Me being silly...

    Quick fix:
    signature = file.read(3)
    then the rest can stay the same, struct.calcsize('3sII') expects a 12
    byte string, whereby you only really have 11 -- alignment and all
    that...

    Jon.
     
    Jon Clements, Sep 10, 2008
    #6
  7. Aaron Scott

    Aaron Scott Guest

    Sorry, I had posted the wrong error. The error I am getting is:

    struct.error: unpack requires a string argument of length 12

    which doesn't make sense to me, since I'm specifically asking for 11.
    Just for kicks, if I change the line to

    print struct.unpack('3sII', file.read(12))

    I get the result

    ('GDE', 33554432, 16777216)

    .... which isn't even close, past the first three characters.
     
    Aaron Scott, Sep 10, 2008
    #7
  8. Aaron Scott

    Aaron Scott Guest

    Taking everything into consideration, my code is now:

    import struct
    file = open("test.gde", "rb")
    signature = file.read(3)
    version, attr_count = struct.unpack('II', file.read(8))
    print signature, version, attr_count
    for idx in xrange(attr_count):
    attr_id, attr_val_len = struct.unpack('II', file.read(8))
    attr_val = file.read(attr_val_len)
    print attr_id, attr_val_len, attr_val
    file.close()

    which gives a result of:

    GDE 2 2
    1 4 é
    2 4 ê Å

    Essentially, the same results I was originally getting :(
     
    Aaron Scott, Sep 10, 2008
    #8
  9. Aaron Scott schreef:
    > Sorry, I had posted the wrong error. The error I am getting is:
    >
    > struct.error: unpack requires a string argument of length 12
    >
    > which doesn't make sense to me, since I'm specifically asking for 11.


    That's because of padding. According to the docs, "By default, C numbers
    are represented in the machine's native format and byte order, and
    properly aligned by skipping pad bytes if necessary (according to the
    rules used by the C compiler)". That means that struct.unpack() assumes
    one byte of padding between the 3-character string and the first
    unsigned int.

    --
    The saddest aspect of life right now is that science gathers knowledge
    faster than society gathers wisdom.
    -- Isaac Asimov

    Roel Schroeven
     
    Roel Schroeven, Sep 10, 2008
    #9
  10. Aaron Scott

    Jon Clements Guest

    On Sep 10, 7:16 pm, Aaron Scott <> wrote:
    > Taking everything into consideration, my code is now:
    >
    > import struct
    > file = open("test.gde", "rb")
    > signature = file.read(3)
    > version, attr_count = struct.unpack('II', file.read(8))
    > print signature, version, attr_count
    > for idx in xrange(attr_count):
    >         attr_id, attr_val_len = struct.unpack('II', file.read(8))
    >         attr_val = file.read(attr_val_len)
    >         print attr_id, attr_val_len, attr_val
    > file.close()
    >
    > which gives a result of:
    >
    > GDE 2 2
    > 1 4 é
    > 2 4 ê Å
    >
    > Essentially, the same results I was originally getting :(


    Umm, how about yourfile.read(100)[or some arbitary value, just to see
    the data) and see what it returns... does it return something that
    looks like values you'd expect in a char[]... I also find it odd that
    the attr_val_len appears to be 4?
     
    Jon Clements, Sep 10, 2008
    #10
  11. On Sep 10, 1:12 pm, Aaron Scott <> wrote:
    > Sorry, I had posted the wrong error. The error I am getting is:
    >
    >      struct.error: unpack requires a string argument of length 12
    >
    > which doesn't make sense to me, since I'm specifically asking for 11.
    > Just for kicks, if I change the line to
    >
    >      print struct.unpack('3sII', file.read(12))
    >
    > I get the result
    >
    >      ('GDE', 33554432, 16777216)
    >
    > ... which isn't even close, past the first three characters.


    Sometimes 'endian' order can cause this. Try '<3sII' and '>3sII' for
    your formats to differentiate.

    Also, if your file is not packed the way that 'struct' expects, you
    might need to read the string and integers separately.

    /Example:

    >>> struct.Struct( '3s' ).size + struct.Struct( 'II' ).size

    11
    >>> struct.Struct( '3sII' ).size

    12
     
    Aaron \Castironpi\ Brady, Sep 10, 2008
    #11
  12. Aaron Scott

    nntpman68 Guest

    What I would do first is to print the result byte by byte each as
    hexadecimal number.

    If you can I would additionally populate the C-structure with numbers,
    which are easier to follow.

    Example:

    signature = "ABC" // same as 0x41 0x42 0x43
    version = 0x61626364
    attr_count = 0x65667678
    .. . .

    assuming version == 2 (0x00000002)
    the first byte should be 'G' == 0x47 )
    if the 4th byte value 2, than you unaligned uint32s and you are little
    endian
    if the 5th byte is 2, then you have 4 byte aligned uint32s and little endian
    if the 7th byte is 2 then you should have unaligned uint32s and big endian
    if the 8th byte is 2 then you should have 4 byte aligned uints32 and big
    endian


    bye

    N


    Aaron Scott wrote:
    > I've been trying to tackle this all morning, and so far I've been
    > completely unsuccessful. I have a binary file that I have the
    > structure to, and I'd like to read it into Python. It's not a
    > particularly complicated file. For instance:
    >
    > signature char[3] "GDE"
    > version uint32 2
    > attr_count uint32
    > {
    > attr_id uint32
    > attr_val_len uint32
    > attr_val char[attr_val_len]
    > } ... repeated attr_count times ...
    >
    > However, I can't find a way to bring it into Python. This is my code
    > -- which I know is definitely wrong, but I had to start somewhere:
    >
    > import struct
    > file = open("test.gde", "rb")
    > output = file.read(3)
    > print output
    > version = struct.unpack("I", file.read(4))[0]
    > print version
    > attr_count = struct.unpack("I", file.read(4))[0]
    > while attr_count:
    > print "---"
    > file.seek(4, 1)
    > counter = int(struct.unpack("I", file.read(4))[0])
    > print file.read(counter)
    > attr_count -= 1
    > file.close()
    >
    > Of course, this doesn't work at all. It produces:
    >
    > GDE
    > 2
    > ---
    > é
    > ---
    > ê Å
    >
    > I'm completely at a loss. If anyone could show me the correct way to
    > do this (or at least point me in the right direction), I'd be
    > extremely grateful.
     
    nntpman68, Sep 10, 2008
    #12
  13. Aaron Scott

    John Machin Guest

    On Sep 11, 4:16 am, Aaron Scott <> wrote:
    > Taking everything into consideration, my code is now:
    >
    > import struct
    > file = open("test.gde", "rb")
    > signature = file.read(3)
    > version, attr_count = struct.unpack('II', file.read(8))
    > print signature, version, attr_count
    > for idx in xrange(attr_count):
    >         attr_id, attr_val_len = struct.unpack('II', file.read(8))
    >         attr_val = file.read(attr_val_len)
    >         print attr_id, attr_val_len, attr_val
    > file.close()
    >
    > which gives a result of:
    >
    > GDE 2 2
    > 1 4 é
    > 2 4 ê Å
    >
    > Essentially, the same results I was originally getting :(


    Stop thrashing about, and do the following:
    (1) print repr(open('test.gde, 'rb').read(100))
    (2) tell us what you EXPECT to see in attr_val etc
    (3) tell us what platform the file was created on and what platform
    it's being read on
    (4) (on the reading platform, at least) import sys; print
    sys.byteorder

    When showing results, do print ..., repr(attr_val)
     
    John Machin, Sep 11, 2008
    #13
  14. Aaron Scott

    Terry Reedy Guest

    Aaron Scott wrote:
    > Taking everything into consideration, my code is now:
    >
    > import struct
    > file = open("test.gde", "rb")
    > signature = file.read(3)
    > version, attr_count = struct.unpack('II', file.read(8))
    > print signature, version, attr_count
    > for idx in xrange(attr_count):
    > attr_id, attr_val_len = struct.unpack('II', file.read(8))
    > attr_val = file.read(attr_val_len)
    > print attr_id, attr_val_len, attr_val
    > file.close()
    >
    > which gives a result of:
    >
    > GDE 2 2
    > 1 4 é
    > 2 4 ê Å
    >
    > Essentially, the same results I was originally getting :


    It appears that your 4-byte attribute values are not what you were
    expecting. Do you have separate info on the supposed contents? In any
    case, I would print repr(attr_val) and even for c in attr_val:
    print(ord(c)).

    tjr
     
    Terry Reedy, Sep 11, 2008
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Denise Smith
    Replies:
    2
    Views:
    658
    Denise Smith
    Nov 22, 2003
  2. My Name
    Replies:
    9
    Views:
    10,395
    Roedy Green
    Jul 21, 2004
  3. Brad Marts

    Reading binary data from file

    Brad Marts, Dec 8, 2003, in forum: C++
    Replies:
    1
    Views:
    437
    Victor Bazarov
    Dec 8, 2003
  4. Dimitri Papoutsis

    Problems with reading binary data files

    Dimitri Papoutsis, Mar 10, 2005, in forum: C++
    Replies:
    4
    Views:
    398
    Old Wolf
    Mar 11, 2005
  5. Replies:
    1
    Views:
    328
    Peter Hansen
    Oct 21, 2004
Loading...

Share This Page