How to efficiently read binary files?

D

David Lees

I want to process large binary files (>2GB) in Python. I have played
around with prototypes in pure Python and profiled the code. Most of
the time seems to be spent converting back and forth to and from strings
using the struct module. Is there a way to directly read into an array
of integers in Python?

TIA

David Lees
 
G

Grant Edwards

I want to process large binary files (>2GB) in Python. I have played
around with prototypes in pure Python and profiled the code. Most of
the time seems to be spent converting back and forth to and from strings
using the struct module. Is there a way to directly read into an array
of integers in Python?

Perhaps the numarray module?
 
R

Robert Kern

Grant said:
Perhaps the numarray module?

numpy for new code, please. In particular, numarray is limited by 32-bit APIs
even on 64-bit platforms, so a >2GB file will be difficult to process even when
using mmap. numpy removed this restriction on 64-bit platforms. 32-bit users
will still have to split up the file into <2GB chunks, though.

http://numeric.scipy.org/
https://lists.sourceforge.net/lists/listinfo/numpy-discussion

--
Robert Kern
(e-mail address removed)

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
A

Alex Martelli

David Lees said:
I want to process large binary files (>2GB) in Python. I have played
around with prototypes in pure Python and profiled the code. Most of
the time seems to be spent converting back and forth to and from strings
using the struct module. Is there a way to directly read into an array
of integers in Python?

Help on built-in function fromfile:

fromfile(...)
fromfile(f, n)

Read n objects from the file object f and append them to the end of
the array. Also called as read.


Alex
 
D

David Lees

Alex said:
Help on built-in function fromfile:

fromfile(...)
fromfile(f, n)

Read n objects from the file object f and append them to the end of
the array. Also called as read.


Alex

Thank you. This is exactly what I was looking for. I just tried it and
it works great.

David Lees
 
R

Robert Kern

Grant said:
So numarray and numpy were both written to replace numeric?

numpy was written to replace both Numeric and numarray. There is a good
explanation in Chapter 1 of _The Guide to NumPy_ included in the sample chapters:

http://numeric.scipy.org/scipybooksample.pdf

There were a number of features in Numeric that could not be replicated in
numarray, among them the ufunc C API and the speed for small arrays. numpy's
code base is closer to Numeric's but it incorporates nearly all (if not all) of
the features of numarray. The numarray developers fully support the migration to
numpy as the one array package.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
R

Robert Kern

Grant said:
too many batteries...

Too many cryptic complaints...

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
R

Robert Kern

Grant said:
Too easy to miss the intended humor on Usenet...

Oh, I saw the smiley. I knew it was meant to be humorous. I just didn't
understand it. The only "batteries" reference I know of in this context is the
"batteries included" philosophy of the stdlib. Of course, none of Numeric,
numarray, or numpy have anything to do with the stdlib, so again I am confused.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
G

Grant Edwards

Oh, I saw the smiley. I knew it was meant to be humorous. I
just didn't understand it. The only "batteries" reference I
know of in this context is the "batteries included" philosophy
of the stdlib. Of course, none of Numeric, numarray, or numpy
have anything to do with the stdlib, so again I am confused.

Sorry. I forgot about the distinction between the standard
library and the "third-party" packages like scientific
packages.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top