PyCObject & malloc creating memory leak

Tom Conneely · Sep 29, 2010

I'm attempting to write a library for reading data via USB from a
device and processing the data to display graphs. I have already
implemented parts of this code as pure python, as a proof of concept
but I have now moved on to implementing the functions in a C
extension.

My original plan was to have the data processing and data acquisition
functions running in separate processes, with a multiprocessing.Queue
for passing the raw data packets. The raw data is read in as a char*,
with a non constant length, hence I have allocated memory using
PyMem_Malloc and I am returning from the acquisition function a
PyCObject containing a pointer to this char* buffer, along with a
destructor. The following code shows a simple test function I've
written (with some module/class boilerplate removed) to demonstrate
this.

static void p_destruct(void *p) {
PyMem_Free((void*)p);
}

static PyObject *malloc_buffer(MyClass *k1) {

PyObject *cobj;
char *foo = PyMem_Malloc(1024 * sizeof(char));

if (foo == NULL) {
return NULL;
}

foo = "foo";
cobj = PyCObject_FromVoidPtr(foo, p_destruct);

return cobj;
}

static PyObject *retrieve_buffer(MyClass *k1, PyObject *args) {
char *foo2;
PyObject cobj2;

char *kwlist[] = {"foo1", NULL};

if (!PyArg_ParseTuple(args, "O", &cobj2)) {
return NULL;
}

foo2 = PyCObject_AsVoidPtr(cobj2);

//Do something
PySys_WriteStdout(foo2);

Py_RETURN_NONE;
}

So if I call these functions in a loop, e.g. The following will
generate ~10GB of data

x = MyClass()
for i in xrange(0, 10 * 2**20):
c = x.malloc_buffer()
x.retrieve_buffer(c)

All my memory disapears, until python crashes with a MemoryError. By
placing a print in the destructor function I know it's being called,
however it's not actually freeing the memory. So in short, what am I
doing wrong?

This is the first time I've written a non-trivial python C extension,
and I'm still getting my head round the whole Py_INC/DECREF and the
correct way to manage memory, so I spent a while playing around with
incref/decref but I left these out of my above example to keep what
I'm trying to achieve clearer.

Also, I'm aware PyCObject is deprecated in >=2.7 but I'm targeting
Python 2.6 at the moment, and I will move on to using capsules once
I've made the big jump with some other libraries. So if there is
anything that could be hugely different using capsules could you point
this out.

I'm developing using:
Python - 2.6.5
Windows XP (although linux is a future target platform)
msvc compiler

Cheers, any help would be greatly appreciated.

Antoine Pitrou · Sep 29, 2010

My original plan was to have the data processing and data acquisition
functions running in separate processes, with a multiprocessing.Queue
for passing the raw data packets. The raw data is read in as a char*,
with a non constant length, hence I have allocated memory using
PyMem_Malloc and I am returning from the acquisition function a
PyCObject containing a pointer to this char* buffer, along with a
destructor.

That sounds overkill, and I also wonder how you plan to pass that
object in a multiprocessing Queue (which relies on objects being
pickleable). Why don't you simply create a PyString object instead?

So if I call these functions in a loop, e.g. The following will
generate ~10GB of data

x = MyClass()
for i in xrange(0, 10 * 2**20):
c = x.malloc_buffer()
x.retrieve_buffer(c)

All my memory disapears, until python crashes with a MemoryError. By
placing a print in the destructor function I know it's being called,
however it's not actually freeing the memory. So in short, what am I
doing wrong?

Python returns memory to the OS by calling free(). Not all OSes
actually relinquish memory when free() is called; some will simply set
it aside for the next allocation.
Another possible (and related) issue is memory fragmentation. Again, it
depends on the memory allocator.

Regards

Antoine.

Tom Conneely · Sep 30, 2010

Thanks for your reply, you've given me plenty to think about

That sounds overkill, and I also wonder how you plan to pass that
object in a multiprocessing Queue (which relies on objects being
pickleable). Why don't you simply create a PyString object instead?

Could you elaborate on why you feel this is overkill? Also, your right
about
passing the PyCObjects through a Queue, something which I hadn't
really
considered, so I've switched to using python strings as you
suggested,
an overhead I hoped to avoid but you can't win them all I suppose.

Python returns memory to the OS by calling free(). Not all OSes
actually relinquish memory when free() is called; some will simply set
it aside for the next allocation.
Another possible (and related) issue is memory fragmentation. Again, it
depends on the memory allocator.

Yes, I know that's the case but the "freed" memory should be used for
the
next allocation, or atleast at some point before python runs out of
memory.
Anyway, this is besides the point as I've switched to using strings.

Again thanks for taking the time to help me out,
Tom Conneely

Tom Conneely · Sep 30, 2010

I'm posting this last message as I've found the source of my initial
memory leak problem, unfortunately it was an embarrassingly basic
mistake. In my defence I've got a horrible cold, but I'm just making
excuses.

I begin by mallocing the memory, which gives me a pointer "foo" to
that memory:
char *foo = PyMem_Malloc(1024 * sizeof(char));

then assign a value to it:
foo = "foo";

of course what this actually does is change the pointer to point to a
new memory address containing a constant "foo". Hence, when I free the
memory in the PYCObject's destructor, the pointer is for the constant
"foo", not the memory I initially allocated.

I only posted this to help people searching, sorry for the noise.

Tom Conneely

Antoine Pitrou · Sep 30, 2010

Thanks for your reply, you've given me plenty to think about

Could you elaborate on why you feel this is overkill? Also, your right
about
passing the PyCObjects through a Queue, something which I hadn't
really
considered, so I've switched to using python strings as you
suggested,
an overhead I hoped to avoid but you can't win them all I suppose.

Well, there should be no overhead. Actually, a string should be cheaper
since:
- the string contents are allocated inline with the PyObject header
- while your PyCObject contents were allocated separately (two
allocations rather than one)

Regards

Antoine.

C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
extracting null pointer address from PyCObject with ctypes	14	Oct 10, 2008
Number of objects grows unbouned...Memory leak	1	May 3, 2014
PyCObject_FromVoidPtr etc.	1	Mar 22, 2011
Python Memory Leak using SWIG	1	Jun 4, 2007
swig/python memory leak question	0	Jan 23, 2008
Python C API String Memory Consumption	10	Apr 7, 2009
Reference counting problems?	0	Dec 9, 2010

PyCObject & malloc creating memory leak

Tom Conneely

Antoine Pitrou

Tom Conneely

Tom Conneely

Antoine Pitrou

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads