My extension code generator for C++

  • Thread starter Rouslan Korneychuk
  • Start date
R

Rouslan Korneychuk

It's still in the rough, but I wanted to give an update on my C++
extension generator. It's available at http://github.com/Rouslan/PyExpose

The documentation is a little slim right now but there is a
comprehensive set of examples in test/test_kompile.py (replace the k
with a c. For some reason, if I post this message with the correct name,
it doesn't show up). The program takes an input file like

<?xml version="1.0"?>
<module name="modulename" include="vector">
<doc>module doc string</doc>

<class name="DVector" type="std::vector&lt;double&gt;">
<doc>class doc string</doc>
<init overload=""/>
<init overload="size_t,const double&amp;"/>
<property name="size" get="size" set="resize"/>
<def func="push_back"/>
<def name="__sequence__getitem__" func="at"
return-semantic="copy"/>
<def name="__sequence__setitem__" assign-to="at"/>
</class>
</module>

and generates the code for a Python extension.

The goal has been to generate code with zero overhead. In other words I
wanted to eliminate the tedium of creating an extension without
sacrificing anything. In addition to generating a code file, the
previous input would result in a header file with the following:

extern PyTypeObject obj_DVectorType;
inline PyTypeObject *get_obj_DVectorType() { return &obj_DVectorType; }
struct obj_DVector {
PyObject_HEAD
storage_mode mode;
std::vector<double,std::allocator<double> > base;

PY_MEM_NEW_DELETE
obj_DVector() : base() {

PyObject_Init(reinterpret_cast<PyObject*>(this),get_obj_DVectorType());
mode = CONTAINS;
}
obj_DVector(std::allocator<double> const & _0) : base(_0) {

PyObject_Init(reinterpret_cast<PyObject*>(this),get_obj_DVectorType());
mode = CONTAINS;
}
obj_DVector(long unsigned int _0,double const &
_1,std::allocator<double> const & _2) : base(_0,_1,_2) {

PyObject_Init(reinterpret_cast<PyObject*>(this),get_obj_DVectorType());
mode = CONTAINS;
}
obj_DVector(std::vector<double,std::allocator<double> > const & _0)
: base(_0) {

PyObject_Init(reinterpret_cast<PyObject*>(this),get_obj_DVectorType());
mode = CONTAINS;
}
};

so the object can be allocated in your own code as a single block of
memory rather than having a PyObject contain a pointer to the exposed type.

storage_type is an enumeration, adding very little to the size of the
Python object (or maybe nothing depending on alignment), but if you add
new-initializes="true" to the <class> tag and the exposed type never
needs to be held by a pointer/reference (as is the case when the exposed
type is inside another class/struct), even that variable gets omitted.

The code also never uses PyArg_ParseTuple or its variants. It converts
every argument using the appropriate PyX_FromY functions. I noticed
PyBindGen does the following when a conversion is needed for one argument:

py_retval = Py_BuildValue((char *) "(O)", value);
if (!PyArg_ParseTuple(py_retval, (char *) "i", &self->obj->y)) {
Py_DECREF(py_retval);
return -1;
}
Py_DECREF(py_retval);

On the other hand, here's the implementation for __sequence__getitem__:

PyObject * obj_DVector___sequence__getitem__(obj_DVector
*self,Py_ssize_t index) {
try {
std::vector<double,std::allocator<double> > &base =
cast_base_DVector(reinterpret_cast<PyObject*>(self));
return PyFloat_FromDouble(base.at(py_ssize_t_to_ulong(index)));

} EXCEPT_HANDLERS(0)
}

(cast_base_DVector checks that base is initialized and gets a reference
to it with regard to how it's stored in obj_DVector. If the class is
new-initialized and only needs one means of storage, it's code will just
be "return obj_DVector->base;" and should be inlined by an optimizing
compiler.)


I'm really interested in what people think of this little project.
 
T

Thomas Jollans

It's still in the rough, but I wanted to give an update on my C++
extension generator. It's available at http://github.com/Rouslan/PyExpose

Question that pops to mind immediately: How does this differentiate
itself from SWIG? ( I can't say I'm familiar with SWIG, but the question
had to be posed. )
The documentation is a little slim right now but there is a
comprehensive set of examples in test/test_kompile.py (replace the k
with a c. For some reason, if I post this message with the correct name,
it doesn't show up). The program takes an input file like

<?xml version="1.0"?>
<module name="modulename" include="vector">
<doc>module doc string</doc>

<class name="DVector" type="std::vector&lt;double&gt;">
<doc>class doc string</doc>
<init overload=""/>
<init overload="size_t,const double&amp;"/>
<property name="size" get="size" set="resize"/>
<def func="push_back"/>
<def name="__sequence__getitem__" func="at"
return-semantic="copy"/>

func="operator[]" would also work, I assume?
<def name="__sequence__setitem__" assign-to="at"/>
</class>
</module>

and generates the code for a Python extension.

[snip]

I'm really interested in what people think of this little project.

How does it deal with pointers? What if something returns a const
pointer - is const correctness enforced?

All in all, it looks rather neat.

Thomas
 
R

Rouslan Korneychuk

Question that pops to mind immediately: How does this differentiate
itself from SWIG? ( I can't say I'm familiar with SWIG, but the question
had to be posed. )

I have never tried swig, but as far as I understand, SWIG uses a layered
approach where part of the extension is defined C/C++ and that is
wrapped in Python code. Mine implements the extension completely in C++.
The documentation is a little slim right now but there is a
comprehensive set of examples in test/test_kompile.py (replace the k
with a c. For some reason, if I post this message with the correct name,
it doesn't show up). The program takes an input file like

<?xml version="1.0"?>
<module name="modulename" include="vector">
<doc>module doc string</doc>

<class name="DVector" type="std::vector&lt;double&gt;">
<doc>class doc string</doc>
<init overload=""/>
<init overload="size_t,const double&amp;"/>
<property name="size" get="size" set="resize"/>
<def func="push_back"/>
<def name="__sequence__getitem__" func="at"
return-semantic="copy"/>

func="operator[]" would also work, I assume?
<def name="__sequence__setitem__" assign-to="at"/>
</class>
</module>

and generates the code for a Python extension.

[snip]

I'm really interested in what people think of this little project.

How does it deal with pointers? What if something returns a const
pointer - is const correctness enforced?

When returning pointers or references, you either have to specify a
conversion explicitly or use the "return-semantic" attribute. The
current options are "copy", which dereferences the pointer and copies by
value, and "managed-ref" which is for exposed classes, where the
returned PyObject stores the value as a reference and holds on to a
reference-counted pointer to the object the returned the value (there is
also "self" which has nothing to do with returning pointers. With
"self", the return value of the wrapped method is ignored and a pointer
to the class is returned).

I can easily add other options for "return-semantic", such as keeping a
pointer and deleting it upon destruction. I just implemented the ones I
need for the thing I'm working on.

As far as returning const pointers and const correctness, I'm not sure
exactly what you mean. If you mean is there a mechanism to hold on to
const objects and prevent them form being modified, the answer is no.
It's not something I need.
All in all, it looks rather neat.

Thomas

Thanks for the comment.
 
R

Rouslan Korneychuk

I missed one:
func="operator[]" would also work, I assume?

Yes, you can also supply a function if the first parameter accepts the
type being wrapped (__rop__ methods will even accept the second
parameter taking the wrapped type).
 
S

Stefan Behnel

Rouslan Korneychuk, 03.07.2010 19:22:
The code also never uses PyArg_ParseTuple or its variants. It converts
every argument using the appropriate PyX_FromY functions. I noticed
PyBindGen does the following when a conversion is needed for one argument:

py_retval = Py_BuildValue((char *) "(O)", value);
if (!PyArg_ParseTuple(py_retval, (char *) "i", &self->obj->y)) {
Py_DECREF(py_retval);
return -1;
}
Py_DECREF(py_retval);

On the other hand, here's the implementation for __sequence__getitem__:

PyObject * obj_DVector___sequence__getitem__(obj_DVector
*self,Py_ssize_t index) {
try {
std::vector<double,std::allocator<double> > &base =
cast_base_DVector(reinterpret_cast<PyObject*>(self));
return PyFloat_FromDouble(base.at(py_ssize_t_to_ulong(index)));

} EXCEPT_HANDLERS(0)
}

Check the code that Cython uses for these things. It generates specialised
type conversion code that has received a lot of careful benchmarking and
testing on different platforms.

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,056
Messages
2,570,446
Members
47,096
Latest member
noshit.sherlock

Latest Threads

Top