Reading binary data

A

Aaron Scott

I've been trying to tackle this all morning, and so far I've been
completely unsuccessful. I have a binary file that I have the
structure to, and I'd like to read it into Python. It's not a
particularly complicated file. For instance:

signature char[3] "GDE"
version uint32 2
attr_count uint32
{
attr_id uint32
attr_val_len uint32
attr_val char[attr_val_len]
} ... repeated attr_count times ...

However, I can't find a way to bring it into Python. This is my code
-- which I know is definitely wrong, but I had to start somewhere:

import struct
file = open("test.gde", "rb")
output = file.read(3)
print output
version = struct.unpack("I", file.read(4))[0]
print version
attr_count = struct.unpack("I", file.read(4))[0]
while attr_count:
print "---"
file.seek(4, 1)
counter = int(struct.unpack("I", file.read(4))[0])
print file.read(counter)
attr_count -= 1
file.close()

Of course, this doesn't work at all. It produces:

GDE
2
---
é
---
ê Å

I'm completely at a loss. If anyone could show me the correct way to
do this (or at least point me in the right direction), I'd be
extremely grateful.
 
J

Jon Clements

I've been trying to tackle this all morning, and so far I've been
completely unsuccessful. I have a binary file that I have the
structure to, and I'd like to read it into Python. It's not a
particularly complicated file. For instance:

signature   char[3]     "GDE"
version     uint32      2
attr_count  uint32
{
    attr_id         uint32
    attr_val_len    uint32
    attr_val        char[attr_val_len]

} ... repeated attr_count times ...

However, I can't find a way to bring it into Python. This is my code
-- which I know is definitely wrong, but I had to start somewhere:

import struct
file = open("test.gde", "rb")
output = file.read(3)
print output
version = struct.unpack("I", file.read(4))[0]
print version
attr_count = struct.unpack("I", file.read(4))[0]
while attr_count:
        print "---"
        file.seek(4, 1)
        counter = int(struct.unpack("I", file.read(4))[0])
        print file.read(counter)
        attr_count -= 1
file.close()

Of course, this doesn't work at all. It produces:

GDE
2
---
é
---
ê Å

I'm completely at a loss. If anyone could show me the correct way to
do this (or at least point me in the right direction), I'd be
extremely grateful.

What if we view the data as having an 11 byte header:
signature, version, attr_count = struct.unpack('3cII',
yourfile.read(11))

Then for the list of attr's:
for idx in xrange(attr_count):
attr_id, attr_val_len = struct.unpack('II', yourfile.read(8))
attr_val = yourfile.read(attr_val_len)


hth, or gives you a pointer anyway
Jon.
 
J

Jon Clements

I've been trying to tackle this all morning, and so far I've been
completely unsuccessful. I have a binary file that I have the
structure to, and I'd like to read it into Python. It's not a
particularly complicated file. For instance:
signature   char[3]     "GDE"
version     uint32      2
attr_count  uint32
{
    attr_id         uint32
    attr_val_len    uint32
    attr_val        char[attr_val_len]
} ... repeated attr_count times ...
However, I can't find a way to bring it into Python. This is my code
-- which I know is definitely wrong, but I had to start somewhere:
import struct
file = open("test.gde", "rb")
output = file.read(3)
print output
version = struct.unpack("I", file.read(4))[0]
print version
attr_count = struct.unpack("I", file.read(4))[0]
while attr_count:
        print "---"
        file.seek(4, 1)
        counter = int(struct.unpack("I", file.read(4))[0])
        print file.read(counter)
        attr_count -= 1
file.close()
Of course, this doesn't work at all. It produces:

I'm completely at a loss. If anyone could show me the correct way to
do this (or at least point me in the right direction), I'd be
extremely grateful.

What if we view the data as having an 11 byte header:
signature, version, attr_count = struct.unpack('3cII',
yourfile.read(11))

Then for the list of attr's:
for idx in xrange(attr_count):
    attr_id, attr_val_len = struct.unpack('II', yourfile.read(8))
    attr_val = yourfile.read(attr_val_len)

hth, or gives you a pointer anyway
Jon.

CORRECTION: '3cII' should be '3sII'.
 
A

Aaron Scott

signature, version, attr_count = struct.unpack('3cII',
yourfile.read(11))

This line is giving me an error:

Traceback (most recent call last):
File "test.py", line 19, in <module>
signature, version, attr_count = struct.unpack('3cII',
file.read(12))
ValueError: too many values to unpack
 
A

Aaron Scott

CORRECTION: '3cII' should be '3sII'.

Even with the correction, I'm still getting the error.
 
J

Jon Clements

Even with the correction, I'm still getting the error.

Me being silly...

Quick fix:
signature = file.read(3)
then the rest can stay the same, struct.calcsize('3sII') expects a 12
byte string, whereby you only really have 11 -- alignment and all
that...

Jon.
 
A

Aaron Scott

Sorry, I had posted the wrong error. The error I am getting is:

struct.error: unpack requires a string argument of length 12

which doesn't make sense to me, since I'm specifically asking for 11.
Just for kicks, if I change the line to

print struct.unpack('3sII', file.read(12))

I get the result

('GDE', 33554432, 16777216)

.... which isn't even close, past the first three characters.
 
A

Aaron Scott

Taking everything into consideration, my code is now:

import struct
file = open("test.gde", "rb")
signature = file.read(3)
version, attr_count = struct.unpack('II', file.read(8))
print signature, version, attr_count
for idx in xrange(attr_count):
attr_id, attr_val_len = struct.unpack('II', file.read(8))
attr_val = file.read(attr_val_len)
print attr_id, attr_val_len, attr_val
file.close()

which gives a result of:

GDE 2 2
1 4 é
2 4 ê Å

Essentially, the same results I was originally getting :(
 
R

Roel Schroeven

Aaron Scott schreef:
Sorry, I had posted the wrong error. The error I am getting is:

struct.error: unpack requires a string argument of length 12

which doesn't make sense to me, since I'm specifically asking for 11.

That's because of padding. According to the docs, "By default, C numbers
are represented in the machine's native format and byte order, and
properly aligned by skipping pad bytes if necessary (according to the
rules used by the C compiler)". That means that struct.unpack() assumes
one byte of padding between the 3-character string and the first
unsigned int.

--
The saddest aspect of life right now is that science gathers knowledge
faster than society gathers wisdom.
-- Isaac Asimov

Roel Schroeven
 
J

Jon Clements

Taking everything into consideration, my code is now:

import struct
file = open("test.gde", "rb")
signature = file.read(3)
version, attr_count = struct.unpack('II', file.read(8))
print signature, version, attr_count
for idx in xrange(attr_count):
        attr_id, attr_val_len = struct.unpack('II', file.read(8))
        attr_val = file.read(attr_val_len)
        print attr_id, attr_val_len, attr_val
file.close()

which gives a result of:

GDE 2 2
1 4 é
2 4 ê Å

Essentially, the same results I was originally getting :(

Umm, how about yourfile.read(100)[or some arbitary value, just to see
the data) and see what it returns... does it return something that
looks like values you'd expect in a char[]... I also find it odd that
the attr_val_len appears to be 4?
 
A

Aaron \Castironpi\ Brady

Sorry, I had posted the wrong error. The error I am getting is:

     struct.error: unpack requires a string argument of length 12

which doesn't make sense to me, since I'm specifically asking for 11.
Just for kicks, if I change the line to

     print struct.unpack('3sII', file.read(12))

I get the result

     ('GDE', 33554432, 16777216)

... which isn't even close, past the first three characters.

Sometimes 'endian' order can cause this. Try '<3sII' and '>3sII' for
your formats to differentiate.

Also, if your file is not packed the way that 'struct' expects, you
might need to read the string and integers separately.

/Example:
12
 
N

nntpman68

What I would do first is to print the result byte by byte each as
hexadecimal number.

If you can I would additionally populate the C-structure with numbers,
which are easier to follow.

Example:

signature = "ABC" // same as 0x41 0x42 0x43
version = 0x61626364
attr_count = 0x65667678
.. . .

assuming version == 2 (0x00000002)
the first byte should be 'G' == 0x47 )
if the 4th byte value 2, than you unaligned uint32s and you are little
endian
if the 5th byte is 2, then you have 4 byte aligned uint32s and little endian
if the 7th byte is 2 then you should have unaligned uint32s and big endian
if the 8th byte is 2 then you should have 4 byte aligned uints32 and big
endian


bye

N
 
J

John Machin

Taking everything into consideration, my code is now:

import struct
file = open("test.gde", "rb")
signature = file.read(3)
version, attr_count = struct.unpack('II', file.read(8))
print signature, version, attr_count
for idx in xrange(attr_count):
        attr_id, attr_val_len = struct.unpack('II', file.read(8))
        attr_val = file.read(attr_val_len)
        print attr_id, attr_val_len, attr_val
file.close()

which gives a result of:

GDE 2 2
1 4 é
2 4 ê Å

Essentially, the same results I was originally getting :(

Stop thrashing about, and do the following:
(1) print repr(open('test.gde, 'rb').read(100))
(2) tell us what you EXPECT to see in attr_val etc
(3) tell us what platform the file was created on and what platform
it's being read on
(4) (on the reading platform, at least) import sys; print
sys.byteorder

When showing results, do print ..., repr(attr_val)
 
T

Terry Reedy

Aaron said:
Taking everything into consideration, my code is now:

import struct
file = open("test.gde", "rb")
signature = file.read(3)
version, attr_count = struct.unpack('II', file.read(8))
print signature, version, attr_count
for idx in xrange(attr_count):
attr_id, attr_val_len = struct.unpack('II', file.read(8))
attr_val = file.read(attr_val_len)
print attr_id, attr_val_len, attr_val
file.close()

which gives a result of:

GDE 2 2
1 4 é
2 4 ê Å

Essentially, the same results I was originally getting :

It appears that your 4-byte attribute values are not what you were
expecting. Do you have separate info on the supposed contents? In any
case, I would print repr(attr_val) and even for c in attr_val:
print(ord(c)).

tjr
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top