efficiently create and fill array.array from C code?

T

Thomas Jollans

Hi,

I'm writing some buffer-centric number-crunching routines in C for
Python code that uses array.array objects for storing/manipulating data.
I would like to:

1. allocate a buffer of a certain size
2. fill it
3. return it as an array.

I can't see any obvious way to do this with the array module, but I was
hoping somebody here might be able to help. My best shot would be to:

1. create a bytearray with PyByteArray_FromStringAndSize(NULL, byte_len)
2. fill its buffer
3. initialize an array from the bytearray.

The issue I have with this approach is that array will copy the data to
its own buffer. I'd much rather create an array of a certain size, get a
write buffer, and fill it directly -- is that possible?

I expect that numpy allows this, but I don't really want to depend on
numpy, especially as they haven't released a py3k version yet.

-- Thomas
 
M

Martin

Hi,

I'm writing some buffer-centric number-crunching routines in C for
Python code that uses array.array objects for storing/manipulating data.
I would like to:

1. allocate a buffer of a certain size
2. fill it
3. return it as an array.

I can't see any obvious way to do this with the array module, but I was
hoping somebody here might be able to help. My best shot would be to:

1. create a bytearray with PyByteArray_FromStringAndSize(NULL, byte_len)
2. fill its buffer
3. initialize an array from the bytearray.

The issue I have with this approach is that array will copy the data to
its own buffer. I'd much rather create an array of a certain size, get a
write buffer, and fill it directly -- is that possible?

I expect that numpy allows this, but I don't really want to depend on
numpy, especially as they haven't released a py3k version yet.

-- Thomas

You want Numpy...

e.g.

import numpy as np
array = np.zeros(100, dtype=np.uint8)

then either something like this to fill it

for i in xrange(len(100)):
array = 2

or

array = np.zeros(0)
for i in xrange(len(100)):
array = np.append(array, 2)


Mart
 
H

Hrvoje Niksic

Thomas Jollans said:
1. allocate a buffer of a certain size
2. fill it
3. return it as an array.

The fastest and more robust approach (I'm aware of) is to use the
array.array('typecode', [0]) * size idiom to efficiently preallocate the
array, and then to get hold of the pointer pointing into array data
using the buffer interface.

Please send a message to (e-mail address removed), a SIG specializing in the
Python/C API, if you need more help with implementing this.
 
T

Thomas Jollans

Thomas Jollans said:
1. allocate a buffer of a certain size
2. fill it
3. return it as an array.

The fastest and more robust approach (I'm aware of) is to use the
array.array('typecode', [0]) * size idiom to efficiently preallocate the
array, and then to get hold of the pointer pointing into array data
using the buffer interface.

Ah, create a single-element array, and multiply that. That's not a bad
approach, the overhead is probably equivalent to what I have now:
currently, I create an uninitialized(!) bytes of the correct size, fill
it myself, and initialize an array from that. Both approaches have the
overhead of creating one extra Python object (bytes/single-element
array) and either copying one element over and over, or memcpy'ing the
whole buffer.
Please send a message to (e-mail address removed), a SIG specializing in the
Python/C API, if you need more help with implementing this.

I'll probably subscribe to that list, thanks for the hint.

-- Thomas
 
H

Hrvoje Niksic

Thomas Jollans said:
Thomas Jollans said:
1. allocate a buffer of a certain size
2. fill it
3. return it as an array.

The fastest and more robust approach (I'm aware of) is to use the
array.array('typecode', [0]) * size idiom to efficiently preallocate the
array, and then to get hold of the pointer pointing into array data
using the buffer interface.

Ah, create a single-element array, and multiply that. That's not a bad
approach, the overhead is probably equivalent to what I have now:
currently, I create an uninitialized(!) bytes of the correct size, fill
it myself, and initialize an array from that. Both approaches have the
overhead of creating one extra Python object (bytes/single-element
array) and either copying one element over and over, or memcpy'ing the
whole buffer.

If I understand your approach correctly, it requires both the C buffer
and the full-size array.array to be present in memory at the same time,
so that you can memcpy the data from one to the other. Multiplying the
single-element array does needlessly copy the initial element over and
over (doing so in reasonably efficient C), but has the advantage that it
allows the larger array to be overwritten in-place.

Numpy arrays allow for creation of arrays out of uninitialized memory,
which avoids the initial overhead - at the cost of depending on numpy,
of course.
 
T

Thomas Jollans

Thomas Jollans said:
1. allocate a buffer of a certain size
2. fill it
3. return it as an array.

The fastest and more robust approach (I'm aware of) is to use the
array.array('typecode', [0]) * size idiom to efficiently preallocate the
array, and then to get hold of the pointer pointing into array data
using the buffer interface.

Ah, create a single-element array, and multiply that. That's not a bad
approach, the overhead is probably equivalent to what I have now:
currently, I create an uninitialized(!) bytes of the correct size, fill
it myself, and initialize an array from that. Both approaches have the
overhead of creating one extra Python object (bytes/single-element
array) and either copying one element over and over, or memcpy'ing the
whole buffer.

If I understand your approach correctly, it requires both the C buffer
and the full-size array.array to be present in memory at the same time,
so that you can memcpy the data from one to the other. Multiplying the
single-element array does needlessly copy the initial element over and
over (doing so in reasonably efficient C), but has the advantage that it
allows the larger array to be overwritten in-place.

Ah yes, I didn't think of that. My approach uses about twice the memory
for sufficiently large buffers.
Numpy arrays allow for creation of arrays out of uninitialized memory,
which avoids the initial overhead - at the cost of depending on numpy,
of course.

I've done some digging, and it appears there is code in Cython that
allows this with standard arrays as well [0], and there's no reason this
couldn't be adapted to work with pure-C extensions -- but this is risky
business as it appears there is no public API for the array module,
meaning the internal structure might just change from one minor version
to another without the change even being documented...

[0] <URL:http://trac.cython.org/cython_trac/ticket/314>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top