C API and memory allocation

F

Floris Bruynooghe

Hi

I'm slightly confused about some memory allocations in the C API.
Take the first example in the documentation:

static PyObject *
spam_system(PyObject *self, PyObject *args)
{
const char *command;
int sts;

if (!PyArg_ParseTuple(args, "s", &command))
return NULL;
sts = system(command);
return Py_BuildValue("i", sts);
}

What I'm confused about is the memory usage of "command". As far as I
understand the compiler provides space for the size of the pointer, as
sizeof(command) would indicate. So I'm assuming PyArg_ParseTuple()
must allocate new memory for the returned string. However there is
nothing in the API that provides for freeing that allocated memory
again. So does this application leak memory then? Or am I
misunderstanding something fundamental?


Regards
Floris
 
F

Floris Bruynooghe

Hello again

So I'm assuming PyArg_ParseTuple()
must allocate new memory for the returned string.  However there is
nothing in the API that provides for freeing that allocated memory
again.

I've dug a little deeper into this and found that PyArg_ParseTuple
(and friends) end up using PyString_AS_STRING() (Python/getargs.c:793)
which according to the documentation returns a pointer to the internal
buffer of the string and not a copy and that because of this you
should not attempt to free this buffer.

But how can python now know how long to keep that buffer object in
memory for? When the reference count of the string object goes to
zero the object can be deallocated I though, and then your pointer
will point to something different all of a sudden. Does this mean you
always have too keep a reference to the original objects when you've
extracted information from them with PyArg_Parse*() functions? (At
least while you want to hang on to that information.)

Regards
Floris
 
G

Gabriel Genellina

En Wed, 17 Dec 2008 21:35:04 -0200, Floris Bruynooghe
I've dug a little deeper into this and found that PyArg_ParseTuple
(and friends) end up using PyString_AS_STRING() (Python/getargs.c:793)
which according to the documentation returns a pointer to the internal
buffer of the string and not a copy and that because of this you
should not attempt to free this buffer.

Yes; but you don't have to dig into the implementation; from
http://docs.python.org/c-api/arg.html :

s (string or Unicode object) [const char *]
Convert a Python string or Unicode object to a C pointer to a character
string. You must not provide storage for the string itself; a pointer to
an existing string is stored into the character pointer variable whose
address you pass.
But how can python now know how long to keep that buffer object in
memory for?

It doesn't - *you* have to ensure that the original string object isn't
destroyed (by example, incrementing its reference count as long as you
keep the pointer), or copy the string contents into your own buffer.
 
A

Aaron Brady

En Wed, 17 Dec 2008 21:35:04 -0200, Floris Bruynooghe  
<[email protected]> escribió:
Yes; but you don't have to dig into the implementation; from  http://docs.python.org/c-api/arg.html:

s (string or Unicode object) [const char *]
Convert a Python string or Unicode object to a C pointer to a character  
string. You must not provide storage for the string itself; a pointer to  
an existing string is stored into the character pointer variable whose  
address you pass.
But how can python now know how long to keep that buffer object in
memory for?

It doesn't - *you* have to ensure that the original string object isn't  
destroyed (by example, incrementing its reference count as long as you  
keep the pointer), or copy the string contents into your own buffer.

I missed something. How did you get a reference to the original
string object, with which to increment its reference count? How do
you know its length to copy it into your own buffer?
 
G

Gabriel Genellina

I missed something. How did you get a reference to the original
string object, with which to increment its reference count?

From the original arguments to the function -- the first argument you pass
to PyArg_ParseTuple &co.
How do
you know its length to copy it into your own buffer?

Use the "s#" format instead, which returns both a pointer to the string
contents and its length. Even if you're not going to copy the buffer, it's
required in case the string could contain any NUL byte.
 
S

Stefan Behnel

Floris said:
I'm slightly confused about some memory allocations in the C API.

If you want to reduce the number of things you have to get your head
around, learn Cython instead of the raw C-API. It's basically Python, does
all the reference counting for you and also reduces the amount of memory
handling you have to care about.

http://cython.org/

Stefan
 
I

Ivan Illarionov

How did you get a reference to the original
string object, with which to increment its reference count?

Use the "O!" format instead of "s":
PyObject *pystr;
.... PyArg_ParseTuple(args, "O!", &PyStringObject, &pystr) ...

Then you can use PyString_AS_STRING explicitly, and control ref.
counts yourself.
How do you know its length to copy it into your own buffer?

Use the "s#" format, as Gabriel has said.

Ivan
 
A

Aaron Brady

Use the "O!" format  instead of "s":
PyObject *pystr;
... PyArg_ParseTuple(args, "O!", &PyStringObject, &pystr) ... edit: &PyString_Type

Then you can use PyString_AS_STRING explicitly, and control ref.
counts yourself.


Use the "s#" format, as Gabriel has said.

Ivan

I see. Do I read correctly that 's' is only useful when the
argument's position is known? Otherwise you can't know its length or
change its reference count.
 
S

Stefan Behnel

Aaron said:
I see. Do I read correctly that 's' is only useful when the
argument's position is known?

I assume you meant "length".

Otherwise you can't know its length or
change its reference count.

The internal representation of Python byte strings is 0 terminated, so
strlen() will work.

Stefan
 
M

MRAB

Stefan said:
I assume you meant "length".



The internal representation of Python byte strings is 0 terminated, so
strlen() will work.
But remember that a bytestring can contain a zero byte (chr(0) in Python
2.x).
 
F

Floris Bruynooghe

If you want to reduce the number of things you have to get your head
around, learn Cython instead of the raw C-API. It's basically Python, does
all the reference counting for you and also reduces the amount of memory
handling you have to care about.

http://cython.org/

Sure that is a good choice in some cases. Not in my case currently
though, it would mean another build dependency on all our build hosts
and I'm just (trying to) stop an existing extension module from
leaking memory, no way I'm going to re-write that from scratch.

But interesting discussion though, thanks!
Floris
 
A

Aaron Brady

I assume you meant "length".

No, position in the argument list. Otherwise you can't change its
reference count; in which case, a pointer to the string object's
contents (a char*) is useless after control leaves the caller's scope.
The internal representation of Python byte strings is 0 terminated, so
strlen() will work.

As MRAB said, Python strings can contain null bytes, since they carry
their lengths. Therefore strlen will always succeed, but isn't always
right.
7

'strlen' says '3'.

So, with 's', you are limited to the operations preceding null bytes
in the current scope (with the GIL held).

I hold this is strong enough to put the burden of proof on the
defenders of having 's'. What is its use case?
 
S

Stefan Behnel

Aaron said:
As MRAB said, Python strings can contain null bytes,

Sure, they can. Most byte strings I've seen didn't, though. And if you know
that they don't contain any null bytes (UTF-8 serialised XML, for example,
or ASCII encoded text, or ...), 's' is just fine. If you need content *and*
length, use 's#'. Matter of use case, as usual.

Stefan
 
H

Hrvoje Niksic

Aaron Brady said:
I hold this is strong enough to put the burden of proof on the
defenders of having 's'. What is its use case?

Passing the string to a C API that can't handle (or don't care about)
embedded null chars anyway. Filename API's are a typical example.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,262
Messages
2,571,056
Members
48,769
Latest member
Clifft

Latest Threads

Top