Fast reading and unpacking of binary data (struct module)

D

Daniel Platz

Hi,

I have a Python newbie question about reading data from a binary file.
I have an huge binary file from an external program. I want to read
and process the data in this file in a reasonable time. It turns out
that the reading of the data itself and the processing do not need
most of the time. However, when using the read(bytes) method Python
returns a string representing the binary information in hex. This
string I have to "cast/translate" into a number (in my case a signed
short). For this I am using the method struct.unpack from the struct
module. This unpacking part of the program takes by far the most time.
Is there a way to speed this up or to do it the unpacking more
cleverly than with the struct module?

Thanks in advance.

With kind regards,

Daniel
 
G

Gabriel Genellina

En Tue, 21 Jul 2009 21:00:13 -0300, Daniel Platz
I have an huge binary file from an external program. I want to read
and process the data in this file in a reasonable time. It turns out
that the reading of the data itself and the processing do not need
most of the time. However, when using the read(bytes) method Python
returns a string representing the binary information in hex. This
string I have to "cast/translate" into a number (in my case a signed
short). For this I am using the method struct.unpack from the struct
module. This unpacking part of the program takes by far the most time.
Is there a way to speed this up or to do it the unpacking more
cleverly than with the struct module?

Try creating a Struct object with your format and use its unpack() method.
http://docs.python.org/library/struct.html#struct-objects

If your format consists of just integers, probably an array is more
efficient:
http://docs.python.org/library/array.html#array.array.fromfile
 
N

Neal Becker

Daniel said:
Hi,

I have a Python newbie question about reading data from a binary file.
I have an huge binary file from an external program. I want to read
and process the data in this file in a reasonable time. It turns out
that the reading of the data itself and the processing do not need
most of the time. However, when using the read(bytes) method Python
returns a string representing the binary information in hex. This
string I have to "cast/translate" into a number (in my case a signed
short). For this I am using the method struct.unpack from the struct
module. This unpacking part of the program takes by far the most time.
Is there a way to speed this up or to do it the unpacking more
cleverly than with the struct module?

Thanks in advance.

With kind regards,

Daniel

Consider mmap
Consider numpy
Consider numpy+mmap
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,016
Latest member
TatianaCha

Latest Threads

Top