exposing C array to python namespace: NumPy and array module.

B

Bo Peng

Dear list,

I am writing a Python extension module that needs a way to expose pieces
of a big C array to python. Currently, I am using NumPy like the following:

PyObject* res = PyArray_FromDimsAndData(1, int*dim, PyArray_DOUBLE,
char*buf);

Users will get a Numeric Array object and can change its values (and
actually change the underlying C array).

This works fine. However, when I deliver my module, I find NumPy is
unnecessarily large for this simple task. As a matter of fact, I had to
build from source NumPy, ATLAS etc on Solaris, Linux, Mac.... and if a
user would like to use my module, he has to do the same thing!

Python's array module is built-in, easy to use, but *without* a
FromLenAndData function! Even the buffer interface provides only 'get
buffer' but no 'set buffer' functions. Could anyone tell me how I can
create an array object from existing data? Some vague ideas might be
used: 1. PyCObject (I do not really understand the manual), 2. copy and
modify arraymodule.c to my project (doable at all? License issue?) 3.
Create an array object and hack it. (no api to do this.)

I would strongly suggest an arraymodule.h with Array_FromLenAndData.

Many thanks in advance.
Bo
 
C

Craig Ringer

Python's array module is built-in, easy to use, but *without* a
FromLenAndData function! Even the buffer interface provides only 'get
buffer' but no 'set buffer' functions. Could anyone tell me how I can
create an array object from existing data?

Python has no array objects in the core language, only lists. The
distinction is important when discussing numarray etc, because Python
lists and NumPy etc arrays are very different.

While you can build a Python list from a subsection of your C array,
changes made in Python won't be pushed back to the C array it was
created from. If this is OK, you can probably build the list using just
a for loop - I'm not sure if there are any more efficient methods for
variable length lists.

If the Python user needs to be able to change the underlying array, I'd
probably drop the use of the built-in list class entirely and write my
own class that looks like a list (and smells like a list, and tastes
like a list - lucky we didn't step in it!). It can be pretty simple,
providing as few of the list protocol methods as:

__getitem__ (a PyList_GetItem equivalent)
__setitem__ (a PyList_SetItem equivalent)

and preferably:

__len__
__iter__

or as much of the list protocol as documented on the Python/C API page
as you need.

I'd probably implement the class in Python, and have my extension module
provide a couple of simple functions to the underlying C array. These
could be considered private to your list class. That'd make writing
things like the __iter__ method much nicer, while still letting you
implement __len__, __getitem__, __setitem__, etc in C. For example, I
might write:

class CArray(object):
def __init__(self, ...):
...

def __getitem__(self, index):
_carray_getitem(self, index)

def __len__(self):
_carray_len(self, index)

def __iter__(self):
# build and return an interator using Python
...


If you want to write part of your extension module in Python and part in
C, there are two main ways to do it. The usual way is to write a
'wrapper' in Python that imports the C parts, wraps them where necessary
or just pushes them into its own namespace, etc.

The less common way is to import __builtins__ and __main__ into your C
extension module's namespace then PyRun_String() python code in it to
set things up. I find this approach MUCH more useful when embedding
Python in an app and I only want to write small bits of my module in
Python.


The other alternative is to code your class entirely in C, implementing
all the __methods__ as C functions. Unattractive as far as I'm
concerned, but then I find constructing classes using Python's C API
irritating and less clear than it could be.



Here's the code -- hideously reformatted to avoid wrapping in the mail -
in my initmodule() function that I use to set up the module so that
Python code can execute in its namespace. You can ignore the
const_cast<> stuff, chances are your compiler will ignore the const
problems.

----
// 'd' is the dictionary of the extension module, as obtained
// with PyModule_GetDict(module)

PyObject* builtinModule = PyImport_ImportModuleEx(
const_cast<char*>("__builtin__"),
d, d, Py_BuildValue(const_cast<char*>("[]"))
);
if (builtinModule == NULL)
{
// Error handling will not be shown; it'll depend on your module anyway.
}
PyDict_SetItemString(d, const_cast<char*>("__builtin__"),
builtinModule);

PyObject* exceptionsModule = PyImport_ImportModuleEx(
const_cast<char*>("exceptions"), d, d,
Py_BuildValue(const_cast<char*>("[]"))
);
if (exceptionsModule == NULL) {}
PyDict_SetItemString(d, const_cast<char*>("exceptions"),
exceptionsModule);

// We can now run Python code in the module's namespace. For
// example (untested), as my real examples wouldn't do you any
// good, they're too bound to the internal API of my module:

QString python_code = "";
python_code += "def sample_function():\n";
python_code += " print \"See, it worked\"\n";
// My app sets sysdefaultencoding to utf-8, hence:
char* python_code_cstring = python_code.utf8();

// Note that we pass our module dictionary as both
// locals and globals. This makes the code effectively
// run "in" the extension module, as if it was being
// run during loading of a Python module after an
// 'import' statement.
PyObject* result = PyRun_String(python_code_cstring,
Py_file_input,
d,d);
if (result == NULL)
{
qDebug("Python code to declare sample_function failed!");
PyErr_Print(); // also clears the exception
}
// Because 'result' may be NULL, not a PyObject*, we must call PyXDECREF
not Py_DECREF
Py_XDECREF(result);


--

Ugh - I'd forgotten how ugly C code limited to 80 cols and without
syntax highlighting really was. Especially when the reformatting is done
as badly as I've done it. I hope you can make some sense out of that,
anyway. Note that once the setup is done you can run as many python code
snippets as you want, for declaring variables, functions, classes, etc.

In my case, its easier to execute snippets as shown above than it is to
worry about the module search path and wrapping things using a Python
module. If you're doing substantial amounts of Python coding for your
module, you'll almost certainly be better off writing a Python module
that uses your C module internally (see PIL for a good example of this).
 
B

Bo Peng

Craig said:
Python has no array objects in the core language, only lists. The
distinction is important when discussing numarray etc, because Python
lists and NumPy etc arrays are very different.

Thank you very much for the detailed reply!

Sorry if I was not clear enough. I was talking about the differece
between python array module
(http://docs.python.org/lib/module-array.html, Modules/arraymodule.c in
the source tree) and NumPy array. They both use C-style memory block
arrangement for efficient memory access. While NumPy has both, the array
module is designed to be used purely in Python so there is no header
file and no function to build an array from a pointer.

One of the methods you suggested (creating a new type) already
implemented in arraymodule.c. I am not sure if it is appropriate to add
the file into my project and add a 'CreateFromLenAndBuf' function.

Bo
 
B

Bo Peng

Craig said:
Python has no array objects in the core language, only lists. The
distinction is important when discussing numarray etc, because Python
lists and NumPy etc arrays are very different.

Thank you very much for the detailed reply!

Sorry if I was not clear enough. I was talking about the differece
between python array module
(http://docs.python.org/lib/module-array.html, Modules/arraymodule.c in
the source tree) and NumPy array. They both use C-style memory block
arrangement for efficient memory access. While NumPy has both, the array
module is designed to be used purely in Python so there is no header
file and no function to build an array from a pointer.

One of the methods you suggested (creating a new type) already
implemented in arraymodule.c. I am not sure if it is appropriate to add
the file into my project and add a 'CreateFromLenAndBuf' function.

Bo
 
C

Craig Ringer

Sorry if I was not clear enough. I was talking about the differece
between python array module
(http://docs.python.org/lib/module-array.html, Modules/arraymodule.c in
the source tree) and NumPy array. They both use C-style memory block
arrangement for efficient memory access. While NumPy has both, the array
module is designed to be used purely in Python so there is no header
file and no function to build an array from a pointer.

Thanks for clarifying that - I had misunderstood your reference to
arraymodule.c .

I guess the core language doesn't have an array type, but as there's a
standard lib module that does (I'd forgotten it was there), it hardly
matters.

It would seem sensible to extend that module with a C API for mapping an
existing array. That would be a rather handy thing to have in the
standard library.
One of the methods you suggested (creating a new type) already
implemented in arraymodule.c. I am not sure if it is appropriate to add
the file into my project and add a 'CreateFromLenAndBuf' function.

That sounds like a reasonable approach to me, but I'm hardly an expert.
The code's license permits you to do so, and it's hardly worth repeating
the work if you don't have to.
 
R

Raymond L. Buvel

Bo said:
Dear list,

I am writing a Python extension module that needs a way to expose pieces
of a big C array to python. Currently, I am using NumPy like the following:

PyObject* res = PyArray_FromDimsAndData(1, int*dim, PyArray_DOUBLE,
char*buf);

Users will get a Numeric Array object and can change its values (and
actually change the underlying C array).

This works fine. However, when I deliver my module, I find NumPy is
unnecessarily large for this simple task. As a matter of fact, I had to
build from source NumPy, ATLAS etc on Solaris, Linux, Mac.... and if a
user would like to use my module, he has to do the same thing!

Python's array module is built-in, easy to use, but *without* a
FromLenAndData function! Even the buffer interface provides only 'get
buffer' but no 'set buffer' functions. Could anyone tell me how I can
create an array object from existing data? Some vague ideas might be
used: 1. PyCObject (I do not really understand the manual), 2. copy and
modify arraymodule.c to my project (doable at all? License issue?) 3.
Create an array object and hack it. (no api to do this.)

I would strongly suggest an arraymodule.h with Array_FromLenAndData.

Many thanks in advance.
Bo
I don't know how much this will help but when I am faced with a problem
like this, I use Pyrex and look at the generated code. All you need to
do in Pyrex is import the array module and create your array like you
would in Python. To get the data into the array you will need to use
the buffer interface and fill it in from your C code.
 
S

Scott David Daniels

Bo said:
Dear list,
I am writing a Python extension module that needs a way to expose pieces
of a big C array to python. Currently, I [use] NumPy.... Users ... actually
> change the underlying C array.

Python's array module is built-in, easy to use, but *without* a
FromLenAndData function!
Python's array module is not built to do this well. It can re-size the
array, delete elements inside the array, and other things that don't
work very well with C-managed data. I wrote "blocks and views" to
overcome this problem. A "View" of data can be pointed at data, and
the "view" behaves much like a Python array (except that you cannot
affect the array's size). You can even take "slices" of the view,
which will produce a new view referring to the same base memory. There
are two kinds of views available, read-only views and writable views.

Have a look at:
http://members.dsl-only.net/~daniels/Block.html

to see if it addresses your problem. It is MIT-licensed (give credit,
but feel free to use). Let me know if it works OK, could use a tweak,
or is completely useless. I'll be more than happy to respond to
questions.

--Scott David Daniels
(e-mail address removed)
 
B

Bo Peng

Scott said:
Python's array module is not built to do this well. It can re-size the
array, delete elements inside the array, and other things that don't
work very well with C-managed data. I wrote "blocks and views" to
overcome this problem.

As always the case, this problem have been encountered and solved! Your
code is exactly what I needed. However, I was too impatient to wait for
your reply. :) I have realized the problems of arraymodule.c and added
a ob_ownmem member in the object structure. Existing operations have
been modified (mostly unchanged) according to this flag and the new
module works well.

Thank everyone for the help. especially in thie new year's day.

Bo
 
S

Scott David Daniels

Bo said:
I was too impatient to wait for your reply. :)
I call 21-hour turnaround over New Year's Eve pretty good. Clearly I
will never be quick enough for you ;-). Since I presented this at
the Vancouver Python Workshop last August, I'll claim a negative
five months response time (possibly a personal best).

--Scott David Daniels
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top