Looking for suggestions on improving numpy code

David Lees · Feb 23, 2008

I am starting to use numpy and have written a hack for reading in a
large data set that has 8 columns and millions of rows. I want to read
and process a single column. I have written the very ugly hack below,
but am sure there is a more efficient and pythonic way to do this. The
file is too big to read by brute force and select a column, so it is
read in chunks and the column selected. Things I don't like in the code:
1. Performing a transpose on a large array
2. Uncertainty about numpy append efficiency

Is there a way to directly read every n'th element from the file into an
array?

david

from numpy import *
from scipy.io.numpyio import fread

fd = open('testcase.bin', 'rb')
datatype = 'h'
byteswap = 0
M = 1000000
N = 8
size = M*N
shape = (M,N)
colNum = 2
sf =1.645278e-04*10
z=array([])
for i in xrange(50):
data = fread(fd, size, datatype,datatype,byteswap)
data = data.reshape(shape)
data = data.transpose()
z = append(z,data[colNum]*sf)

print z.mean()

fd.close()

7stud · Feb 23, 2008

I want to read
and process a single column.

Then why won't a list suffice?

Robert Kern · Feb 25, 2008

David said:
I am starting to use numpy and have written a hack for reading in a
large data set that has 8 columns and millions of rows. I want to read
and process a single column. I have written the very ugly hack below,
but am sure there is a more efficient and pythonic way to do this. The
file is too big to read by brute force and select a column, so it is
read in chunks and the column selected. Things I don't like in the code:
1. Performing a transpose on a large array

Transposition is trivially fast in numpy. It does not copy any memory.

2. Uncertainty about numpy append efficiency

Rest assured that it's slow. Appending to lists is fast since lists preallocate
memory according to a scheme such that the amortized cost of appending elements
is O(1). We don't quite have that luxury in numpy.

Is there a way to directly read every n'th element from the file into an
array?

Since this is a regular binary file, you can memory map the file.

import numpy

M = 1000000
N = 8
column = 2
sf =1.645278e-04*10

m = numpy.memmap('testcase.bin', dtype=numpy.int16, shape=(M,N))
z = m[:,column] * sf

You may want to ask future numpy questions on the numpy mailing list.

http://www.scipy.org/Mailing_Lists

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Personal archive tool, looking for suggestions on improving the code	5	Jul 27, 2010
Improving the web page download code.	5	Aug 27, 2013
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Numpy on python 2.7a	0	May 5, 2009
numpy help	2	Nov 3, 2006
RSA implementation issues in public key pem loader function	0	May 21, 2025
Suggestions on writing a sh <--> python Howto/Tutorial	0	Jul 27, 2011
[ANN] NumPy 1.0 release	5	Oct 26, 2006

Looking for suggestions on improving numpy code

David Lees

7stud

Robert Kern

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads