Re: Strange array.array performance

Discussion in 'Python' started by Maxim Khitrov, Feb 19, 2009.

  1. On Thu, Feb 19, 2009 at 2:35 PM, Robert Kern <> wrote:
    > On 2009-02-19 12:52, Maxim Khitrov wrote:
    >>
    >> Hello all,
    >>
    >> I'm currently writing a Python<-> MATLAB interface with ctypes and
    >> array.array class, using which I'll need to push large amounts of data
    >> to MATLAB.

    >
    > Have you taken a look at mlabwrap?
    >
    > http://mlabwrap.sourceforge.net/
    >
    > At the very least, you will probably want to use numpy arrays instead of
    > array.array.
    >
    > http://numpy.scipy.org/


    I have, but numpy is not currently available for python 2.6, which is
    what I need for some other features, and I'm trying to keep the
    dependencies down in any case. Mlabwrap description doesn't mention if
    it is thread-safe, and that's another one of my requirements.

    The only feature that I'm missing with array.array is the ability to
    quickly pre-allocate large chunks of memory. To do that right now I'm
    using array('d', (0,) * size). It would be nice if array accepted an
    int as the second argument indicating how much memory to allocate and
    initialize to 0.

    - Max
     
    Maxim Khitrov, Feb 19, 2009
    #1
    1. Advertising

  2. On Thu, Feb 19, 2009 at 7:01 PM, Scott David Daniels
    <> wrote:
    > Maxim Khitrov wrote:
    >>
    >> On Thu, Feb 19, 2009 at 2:35 PM, Robert Kern <>
    >> wrote:
    >> I have, but numpy is not currently available for python 2.6, which is
    >> what I need for some other features, and I'm trying to keep the
    >> dependencies down in any case....
    >> The only feature that I'm missing with array.array is the ability to
    >> quickly pre-allocate large chunks of memory. To do that right now I'm
    >> using array('d', (0,) * size). It would be nice if array accepted an
    >> int as the second argument indicating how much memory to allocate and
    >> initialize to 0.

    >
    > In the meantime, you could write a function (to ease the shift to numpy)
    > and reduce your interface problem to a very small set of lines:
    > def zeroes_d(n):
    > '''Allocate a n-element vector of 'd' elements'''
    > vector = array.array('d') # fromstring has no performance bug
    > vector.fromstring(n * 8 * '\0')
    > return vector
    > Once numpy is up and running on 2.6, this should be easy to convert
    > to a call to zeroes.


    If I do decide to transition at any point, it will require much
    greater modification. For example, to speed-up retrieval of data from
    Matlab, which is returned to me as an mxArray structure, I allocate an
    array.array for it and then use ctypes.memmove to copy data directly
    into the array's buffer (address obtained through buffer_info()).

    Same thing for sending data, rather than allocate a separate mxArray,
    copy data, and then send, I create an empty mxArray and set its data
    pointer to the array's buffer. I'm sure that there are equivalents in
    numpy, but the point is that the transition, which currently would not
    benefit my code in any significant way, will not be a quick change.

    On the other hand, I have to thank you for the fromstring example. For
    some reason, it never occurred to me that creating a string of nulls
    would be much faster than a tuple of zeros. In fact, you can pass the
    string to the constructor and it calls fromstring automatically. For
    an array of 1 million elements, using a string to initialize is 18x
    faster. :)

    - Max
     
    Maxim Khitrov, Feb 20, 2009
    #2
    1. Advertising

  3. On Thu, Feb 19, 2009 at 7:01 PM, Scott David Daniels
    <> wrote:
    > Maxim Khitrov wrote:
    >>
    >> On Thu, Feb 19, 2009 at 2:35 PM, Robert Kern <>
    >> wrote:
    >> I have, but numpy is not currently available for python 2.6, which is
    >> what I need for some other features, and I'm trying to keep the
    >> dependencies down in any case....
    >> The only feature that I'm missing with array.array is the ability to
    >> quickly pre-allocate large chunks of memory. To do that right now I'm
    >> using array('d', (0,) * size). It would be nice if array accepted an
    >> int as the second argument indicating how much memory to allocate and
    >> initialize to 0.

    >
    > In the meantime, you could write a function (to ease the shift to numpy)
    > and reduce your interface problem to a very small set of lines:
    > def zeroes_d(n):
    > '''Allocate a n-element vector of 'd' elements'''
    > vector = array.array('d') # fromstring has no performance bug
    > vector.fromstring(n * 8 * '\0')
    > return vector
    > Once numpy is up and running on 2.6, this should be easy to convert
    > to a call to zeroes.


    Here's the function that I'll be using from now on. It gives me
    exactly the behavior I need, with an int initializer being treated as
    array size. Still not as efficient as it could be if supported
    natively by array (one malloc instead of two + memmove + extra
    function call), but very good performance nevertheless:

    from array import array as _array
    array_null = dict((tc, '\0' * _array(tc).itemsize) for tc in 'cbBuhHiIlLfd')

    def array(typecode, init):
    if isinstance(init, int):
    return _array(typecode, array_null[typecode] * init)
    return _array(typecode, init)

    - Max
     
    Maxim Khitrov, Feb 20, 2009
    #3
  4. Maxim Khitrov

    John Machin Guest

    On Feb 20, 6:53 am, Maxim Khitrov <> wrote:
    > On Thu, Feb 19, 2009 at 2:35 PM, Robert Kern <> wrote:
    > > On 2009-02-19 12:52, Maxim Khitrov wrote:

    >
    > >> Hello all,

    >
    > >> I'm currently writing a Python<->  MATLAB interface with ctypes and
    > >> array.array class, using which I'll need to push large amounts of data
    > >> to MATLAB.

    >
    > > Have you taken a look at mlabwrap?

    >
    > >  http://mlabwrap.sourceforge.net/

    >
    > > At the very least, you will probably want to use numpy arrays instead of
    > > array.array.

    >
    > >  http://numpy.scipy.org/

    >
    > I have, but numpy is not currently available for python 2.6, which is
    > what I need for some other features, and I'm trying to keep the
    > dependencies down in any case. Mlabwrap description doesn't mention if
    > it is thread-safe, and that's another one of my requirements.
    >
    > The only feature that I'm missing with array.array is the ability to
    > quickly pre-allocate large chunks of memory. To do that right now I'm
    > using array('d', (0,) * size).


    It would go somewhat faster if you gave it a float instead of an int.

    > It would be nice if array accepted an
    > int as the second argument indicating how much memory to allocate and
    > initialize to 0.


    While you're waiting for that to happen, you'll have to use the
    fromstring trick, or another gimmick that is faster and is likely not
    to use an extra temp 8Mb for a 1M-element array, as I presume the
    fromstring does.

    [Python 2.6.1 on Windows XP SP3]
    [Processor: x86 Family 15 Model 36 Stepping 2 AuthenticAMD ~1994 Mhz]

    C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
    ('d',(0,)*
    1000000)"
    10 loops, best of 3: 199 msec per loop

    C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
    ('d',(0.,)*1000000)"
    10 loops, best of 3: 158 msec per loop

    C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
    ('d');x.fromstring('\0'*8*1000000)"
    10 loops, best of 3: 36 msec per loop

    C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
    ('d','\0'*8*1000000)"
    10 loops, best of 3: 35.7 msec per loop

    C:\junk>\python26\python -mtimeit -s"from array import array" "array
    ('d',(0.,))*1000000"
    10 loops, best of 3: 19.5 msec per loop

    HTH,
    John
     
    John Machin, Feb 20, 2009
    #4
  5. On Thu, Feb 19, 2009 at 9:15 PM, John Machin <> wrote:
    > On Feb 20, 6:53 am, Maxim Khitrov <> wrote:
    >> On Thu, Feb 19, 2009 at 2:35 PM, Robert Kern <> wrote:
    >> > On 2009-02-19 12:52, Maxim Khitrov wrote:

    >>
    >> >> Hello all,

    >>
    >> >> I'm currently writing a Python<-> MATLAB interface with ctypes and
    >> >> array.array class, using which I'll need to push large amounts of data
    >> >> to MATLAB.

    >>
    >> > Have you taken a look at mlabwrap?

    >>
    >> > http://mlabwrap.sourceforge.net/

    >>
    >> > At the very least, you will probably want to use numpy arrays instead of
    >> > array.array.

    >>
    >> > http://numpy.scipy.org/

    >>
    >> I have, but numpy is not currently available for python 2.6, which is
    >> what I need for some other features, and I'm trying to keep the
    >> dependencies down in any case. Mlabwrap description doesn't mention if
    >> it is thread-safe, and that's another one of my requirements.
    >>
    >> The only feature that I'm missing with array.array is the ability to
    >> quickly pre-allocate large chunks of memory. To do that right now I'm
    >> using array('d', (0,) * size).

    >
    > It would go somewhat faster if you gave it a float instead of an int.
    >
    >> It would be nice if array accepted an
    >> int as the second argument indicating how much memory to allocate and
    >> initialize to 0.

    >
    > While you're waiting for that to happen, you'll have to use the
    > fromstring trick, or another gimmick that is faster and is likely not
    > to use an extra temp 8Mb for a 1M-element array, as I presume the
    > fromstring does.
    >
    > [Python 2.6.1 on Windows XP SP3]
    > [Processor: x86 Family 15 Model 36 Stepping 2 AuthenticAMD ~1994 Mhz]
    >
    > C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
    > ('d',(0,)*
    > 1000000)"
    > 10 loops, best of 3: 199 msec per loop
    >
    > C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
    > ('d',(0.,)*1000000)"
    > 10 loops, best of 3: 158 msec per loop
    >
    > C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
    > ('d');x.fromstring('\0'*8*1000000)"
    > 10 loops, best of 3: 36 msec per loop
    >
    > C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
    > ('d','\0'*8*1000000)"
    > 10 loops, best of 3: 35.7 msec per loop
    >
    > C:\junk>\python26\python -mtimeit -s"from array import array" "array
    > ('d',(0.,))*1000000"
    > 10 loops, best of 3: 19.5 msec per loop


    Interesting, though I'm not able to replicate that last outcome. The
    string method is still the fastest on my machine. Furthermore, it
    looks like the order in which you do the multiplication also matters -
    (8 * size * '\0') is faster than ('\0' * 8 * size). Here is my test
    and outcome:

    ---
    from array import array
    from timeit import repeat

    print repeat(lambda: array('d', (0,) * 100000), number = 100)
    print repeat(lambda: array('d', (0.0,) * 100000), number = 100)
    print repeat(lambda: array('d', (0.0,)) * 100000, number = 100)
    print repeat(lambda: array('d', '\0' * 100000 * 8), number = 100)
    print repeat(lambda: array('d', '\0' * 8 * 100000), number = 100)
    print repeat(lambda: array('d', 8 * 100000 * '\0'), number = 100)
    ---

    [0.91048107424534941, 0.88766983642377162, 0.88312824645684618]
    [0.72164595848486179, 0.72038338197219343, 0.72346024633711981]
    [0.10763947529894136, 0.1047547164728595, 0.10461521722863232]
    [0.05856873793382178, 0.058508825334111947, 0.058361838698573365]
    [0.057632016342657799, 0.057521392119007864, 0.057227118035289237]
    [0.056006643320014149, 0.056331811311153501, 0.056187433215103333]

    The array('d', (0.0,)) * 100000 method is a good compromise between
    performance and amount of memory used, so maybe I'll use that instead.

    - Max
     
    Maxim Khitrov, Feb 20, 2009
    #5
  6. Maxim Khitrov

    Aahz Guest

    In article <>,
    Maxim Khitrov <> wrote:
    >
    >Interesting, though I'm not able to replicate that last outcome. The
    >string method is still the fastest on my machine. Furthermore, it
    >looks like the order in which you do the multiplication also matters -
    >(8 * size * '\0') is faster than ('\0' * 8 * size).


    That's not surprising -- the latter does two string multiplication
    operations, which I would expect to be slower than int multiplication.
    --
    Aahz () <*> http://www.pythoncraft.com/

    "All problems in computer science can be solved by another level of
    indirection." --Butler Lampson
     
    Aahz, Mar 12, 2009
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jm
    Replies:
    1
    Views:
    529
    alien2_51
    Dec 12, 2003
  2. z. f.
    Replies:
    0
    Views:
    714
    z. f.
    Feb 3, 2005
  3. Sojwal
    Replies:
    2
    Views:
    325
    Chris Uppal
    Jan 27, 2004
  4. Maxim Khitrov

    Strange array.array performance

    Maxim Khitrov, Feb 19, 2009, in forum: Python
    Replies:
    0
    Views:
    209
    Maxim Khitrov
    Feb 19, 2009
  5. Software Engineer
    Replies:
    0
    Views:
    357
    Software Engineer
    Jun 10, 2011
Loading...

Share This Page