Looking for suggestions on improving numpy code

Discussion in 'Python' started by David Lees, Feb 23, 2008.

  1. David Lees

    David Lees Guest

    I am starting to use numpy and have written a hack for reading in a
    large data set that has 8 columns and millions of rows. I want to read
    and process a single column. I have written the very ugly hack below,
    but am sure there is a more efficient and pythonic way to do this. The
    file is too big to read by brute force and select a column, so it is
    read in chunks and the column selected. Things I don't like in the code:
    1. Performing a transpose on a large array
    2. Uncertainty about numpy append efficiency

    Is there a way to directly read every n'th element from the file into an
    array?

    david


    from numpy import *
    from scipy.io.numpyio import fread

    fd = open('testcase.bin', 'rb')
    datatype = 'h'
    byteswap = 0
    M = 1000000
    N = 8
    size = M*N
    shape = (M,N)
    colNum = 2
    sf =1.645278e-04*10
    z=array([])
    for i in xrange(50):
    data = fread(fd, size, datatype,datatype,byteswap)
    data = data.reshape(shape)
    data = data.transpose()
    z = append(z,data[colNum]*sf)

    print z.mean()

    fd.close()
     
    David Lees, Feb 23, 2008
    #1
    1. Advertising

  2. David Lees

    7stud Guest

    On Feb 22, 11:37 pm, David Lees <> wrote:
    > I want to read
    > and process a single column.


    Then why won't a list suffice?
     
    7stud, Feb 23, 2008
    #2
    1. Advertising

  3. David Lees

    Robert Kern Guest

    David Lees wrote:
    > I am starting to use numpy and have written a hack for reading in a
    > large data set that has 8 columns and millions of rows. I want to read
    > and process a single column. I have written the very ugly hack below,
    > but am sure there is a more efficient and pythonic way to do this. The
    > file is too big to read by brute force and select a column, so it is
    > read in chunks and the column selected. Things I don't like in the code:
    > 1. Performing a transpose on a large array


    Transposition is trivially fast in numpy. It does not copy any memory.

    > 2. Uncertainty about numpy append efficiency


    Rest assured that it's slow. Appending to lists is fast since lists preallocate
    memory according to a scheme such that the amortized cost of appending elements
    is O(1). We don't quite have that luxury in numpy.

    > Is there a way to directly read every n'th element from the file into an
    > array?


    Since this is a regular binary file, you can memory map the file.


    import numpy

    M = 1000000
    N = 8
    column = 2
    sf =1.645278e-04*10

    m = numpy.memmap('testcase.bin', dtype=numpy.int16, shape=(M,N))
    z = m[:,column] * sf


    You may want to ask future numpy questions on the numpy mailing list.

    http://www.scipy.org/Mailing_Lists

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, Feb 25, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. mo reina
    Replies:
    5
    Views:
    247
    John Bokma
    Jul 27, 2010
  2. Aditya Mahajan

    Suggestions for improving code

    Aditya Mahajan, Mar 11, 2006, in forum: Ruby
    Replies:
    2
    Views:
    122
    Aditya Mahajan
    Mar 14, 2006
  3. Rick Johnson
    Replies:
    0
    Views:
    107
    Rick Johnson
    Feb 28, 2013
  4. Joel Goldstick
    Replies:
    0
    Views:
    113
    Joel Goldstick
    Feb 28, 2013
  5. Ian Kelly
    Replies:
    0
    Views:
    104
    Ian Kelly
    Feb 28, 2013
Loading...

Share This Page