efficiently create and fill array.array from C code?

Thomas Jollans · Jun 13, 2010

Hi,

I'm writing some buffer-centric number-crunching routines in C for
Python code that uses array.array objects for storing/manipulating data.
I would like to:

1. allocate a buffer of a certain size
2. fill it
3. return it as an array.

I can't see any obvious way to do this with the array module, but I was
hoping somebody here might be able to help. My best shot would be to:

1. create a bytearray with PyByteArray_FromStringAndSize(NULL, byte_len)
2. fill its buffer
3. initialize an array from the bytearray.

The issue I have with this approach is that array will copy the data to
its own buffer. I'd much rather create an array of a certain size, get a
write buffer, and fill it directly -- is that possible?

I expect that numpy allows this, but I don't really want to depend on
numpy, especially as they haven't released a py3k version yet.

-- Thomas

Martin · Jun 13, 2010

Hi,

I'm writing some buffer-centric number-crunching routines in C for
Python code that uses array.array objects for storing/manipulating data.
I would like to:

1. allocate a buffer of a certain size
2. fill it
3. return it as an array.

I can't see any obvious way to do this with the array module, but I was
hoping somebody here might be able to help. My best shot would be to:

1. create a bytearray with PyByteArray_FromStringAndSize(NULL, byte_len)
2. fill its buffer
3. initialize an array from the bytearray.

The issue I have with this approach is that array will copy the data to
its own buffer. I'd much rather create an array of a certain size, get a
write buffer, and fill it directly -- is that possible?

I expect that numpy allows this, but I don't really want to depend on
numpy, especially as they haven't released a py3k version yet.

-- Thomas

You want Numpy...

e.g.

import numpy as np
array = np.zeros(100, dtype=np.uint8)

then either something like this to fill it

for i in xrange(len(100)):
array = 2

or

array = np.zeros(0)
for i in xrange(len(100)):
array = np.append(array, 2)

Mart

Hrvoje Niksic · Jun 14, 2010

Thomas Jollans said:
1. allocate a buffer of a certain size
2. fill it
3. return it as an array.

The fastest and more robust approach (I'm aware of) is to use the
array.array('typecode', [0]) * size idiom to efficiently preallocate the
array, and then to get hold of the pointer pointing into array data
using the buffer interface.

Please send a message to (e-mail address removed), a SIG specializing in the
Python/C API, if you need more help with implementing this.

Thomas Jollans · Jun 14, 2010

Thomas Jollans said:
Thomas Jollans said:

1. allocate a buffer of a certain size
2. fill it
3. return it as an array.

Click to expand...

The fastest and more robust approach (I'm aware of) is to use the
array.array('typecode', [0]) * size idiom to efficiently preallocate the
array, and then to get hold of the pointer pointing into array data
using the buffer interface.

Ah, create a single-element array, and multiply that. That's not a bad
approach, the overhead is probably equivalent to what I have now:
currently, I create an uninitialized(!) bytes of the correct size, fill
it myself, and initialize an array from that. Both approaches have the
overhead of creating one extra Python object (bytes/single-element
array) and either copying one element over and over, or memcpy'ing the
whole buffer.

Please send a message to (e-mail address removed), a SIG specializing in the
Python/C API, if you need more help with implementing this.

I'll probably subscribe to that list, thanks for the hint.

-- Thomas

Hrvoje Niksic · Jun 14, 2010

Thomas Jollans said:
Thomas Jollans said:

1. allocate a buffer of a certain size
2. fill it
3. return it as an array.

Click to expand...

The fastest and more robust approach (I'm aware of) is to use the
array.array('typecode', [0]) * size idiom to efficiently preallocate the
array, and then to get hold of the pointer pointing into array data
using the buffer interface.

Click to expand...

Ah, create a single-element array, and multiply that. That's not a bad
approach, the overhead is probably equivalent to what I have now:
currently, I create an uninitialized(!) bytes of the correct size, fill
it myself, and initialize an array from that. Both approaches have the
overhead of creating one extra Python object (bytes/single-element
array) and either copying one element over and over, or memcpy'ing the
whole buffer.

If I understand your approach correctly, it requires both the C buffer
and the full-size array.array to be present in memory at the same time,
so that you can memcpy the data from one to the other. Multiplying the
single-element array does needlessly copy the initial element over and
over (doing so in reasonably efficient C), but has the advantage that it
allows the larger array to be overwritten in-place.

Numpy arrays allow for creation of arrays out of uninitialized memory,
which avoids the initial overhead - at the cost of depending on numpy,
of course.

Thomas Jollans · Jun 14, 2010

Thomas Jollans said:
Thomas Jollans said:

1. allocate a buffer of a certain size
2. fill it
3. return it as an array.

The fastest and more robust approach (I'm aware of) is to use the
array.array('typecode', [0]) * size idiom to efficiently preallocate the
array, and then to get hold of the pointer pointing into array data
using the buffer interface.

Click to expand...

Ah, create a single-element array, and multiply that. That's not a bad
approach, the overhead is probably equivalent to what I have now:
currently, I create an uninitialized(!) bytes of the correct size, fill
it myself, and initialize an array from that. Both approaches have the
overhead of creating one extra Python object (bytes/single-element
array) and either copying one element over and over, or memcpy'ing the
whole buffer.

Click to expand...

If I understand your approach correctly, it requires both the C buffer
and the full-size array.array to be present in memory at the same time,
so that you can memcpy the data from one to the other. Multiplying the
single-element array does needlessly copy the initial element over and
over (doing so in reasonably efficient C), but has the advantage that it
allows the larger array to be overwritten in-place.

Ah yes, I didn't think of that. My approach uses about twice the memory
for sufficiently large buffers.

Numpy arrays allow for creation of arrays out of uninitialized memory,
which avoids the initial overhead - at the cost of depending on numpy,
of course.

I've done some digging, and it appears there is code in Cython that
allows this with standard arrays as well [0], and there's no reason this
couldn't be adapted to work with pure-C extensions -- but this is risky
business as it appears there is no public API for the array module,
meaning the internal structure might just change from one minor version
to another without the change even being documented...

[0] <URL:http://trac.cython.org/cython_trac/ticket/314>

Javascript fill function data from multidimensonal array	0	Dec 12, 2022
Question concerning array.array and C++	0	Nov 5, 2008
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Packing byte fields and an array object into struct	4	Nov 13, 2013
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
Boomer trying to learn coding in C and C++	6	Dec 16, 2022
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023

efficiently create and fill array.array from C code?

Thomas Jollans

Martin

Hrvoje Niksic

Thomas Jollans

Hrvoje Niksic

Thomas Jollans

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads