Py_INCREF() incomprehension

E

Ervin Hegedüs

Hello Python users,

I'm working on a Python module in C - that's a cryptographic module,
which uses a 3rd-party lib from a provider (a bank).
This module will encrypt and decrypt the messages for the provider web service.

Here is a part of source:

static PyObject*
mycrypt_encrypt(PyObject *self, PyObject *args)
{
int cRes = 0;
int OutLen = 0;

char * url;
char * path;

if (!PyArg_ParseTuple(args, "ss", &url, &path)) {
return NULL;
}

OutLen = strlen(url)*4;
outdata=calloc(OutLen, sizeof(char));

if (!outdata) {
handle_err(UER_NOMEM);
return NULL;
}
cRes = ekiEncodeUrl (url, strlen(url)+1, outdata, &OutLen, 1, path);

if (cRes == 0) {
return Py_BuildValue("s", outdata);
} else {
handle_err(cRes);
return NULL;
}

return Py_None;
}

where ekiEncodeUrl is in a 3rd-party library.
I should call this function from Python like this:

import mycrypt

message = "PID=IEB0001&MSGT=10&TRID=000000012345678"
crypted = mycrypt(mymessage, "/path/to/key");

Everything works fine, but sorry for the recurrent question: where
should I use the Py_INCREF()/Py_DECREF() in code above?


Thank you,

cheers:

a.
 
T

Thomas Rachel

Am 26.04.2011 11:48, schrieb Ervin Hegedüs:
Everything works fine, but sorry for the recurrent question: where
should I use the Py_INCREF()/Py_DECREF() in code above?

That depends on the functions which are called. It should be given in
the API description. The same counts for the incoming parameters (which
are borrowed AFAIR - but better have a look).

The most critical parts are indeed

* the input parameters

and

* Py_BuildValue()

.. Maybe you could as well have a look at some example code.

Especially look at the concepts called "borrowed reference" vs. "owned
reference".

And, for your other question:
if (cRes == 0) {
return Py_BuildValue("s", outdata);
}

You ask how to stop leaking memory? Well, simply by not leaking it :)

Just free the memory area:

if (cRes == 0) {
PyObject* ret = Py_BuildValue("s", outdata);
free(outdata);
return ret;
}

BTW: Is there any reason for using calloc()? malloc() would probably be
faster...


Thomas
 
H

Hegedüs Ervin

Hello,

thanks for the answer,
That depends on the functions which are called. It should be given
in the API description. The same counts for the incoming parameters
(which are borrowed AFAIR - but better have a look).

I've read API doc (which you've included in another mail), but
that's not clear for me. :(
The most critical parts are indeed

* the input parameters

and

* Py_BuildValue()

. Maybe you could as well have a look at some example code.

Especially look at the concepts called "borrowed reference" vs.
"owned reference".

And, for your other question:


You ask how to stop leaking memory? Well, simply by not leaking it :)

great! :)
Just free the memory area:

if (cRes == 0) {
PyObject* ret = Py_BuildValue("s", outdata);
free(outdata);
return ret;
}

so, it means when I implicit allocate a new object (whit
Py_BuildValue()), Python's GC will free that pointer when it
doesn't require anymore?
BTW: Is there any reason for using calloc()? malloc() would probably
be faster...

may be, I didn't measure it ever... but calloc() gives clear
space... :)


thanks:

a.
 
T

Thomas Rachel

Am 26.04.2011 16:03, schrieb Hegedüs Ervin:
I've read API doc (which you've included in another mail), but
that's not clear for me. :(

No probem, I'll go in detail, now as I have read it again. (I didn't
want to help from memory, as it is some time ago I worked with it, and
didn't have time to read it.)

The ownership rules say that the input parameter belongs to the caller
who holds it at least until we return. (We just "borrow" it.) So no
action needed.


This function "transfers ownership", as it is none of
(PyTuple_GetItem(), PyList_GetItem(), PyDict_GetItem(),
PyDict_GetItemString()).

So the value it returns belongs to us, for now.

We do transfer ownership to our caller (implicitly), so no action is
required as well here.

so, it means when I implicit allocate a new object (whit
Py_BuildValue()), Python's GC will free that pointer when it
doesn't require anymore?

In a way, yes. But you have to obey ownership: whom belongs the current
reference? If it is not ours, and we need it, we do Py_(X)INCREF(). If
we got it, but don't need it, we do Py_(X)DECREF().

may be, I didn't measure it ever... but calloc() gives clear
space... :)

Ok. (But as sizeof(char) is, by C standard definition, always 1, you can
write it shorter.)


Thomas
 
H

Hegedüs Ervin

Dear Thomas,

thank you again,
The ownership rules say that the input parameter belongs to the
caller who holds it at least until we return. (We just "borrow" it.)
So no action needed.

ok, its' clear, I understand,
This function "transfers ownership", as it is none of
(PyTuple_GetItem(), PyList_GetItem(), PyDict_GetItem(),
PyDict_GetItemString()).

So the value it returns belongs to us, for now.

We do transfer ownership to our caller (implicitly), so no action is
required as well here.
also,


In a way, yes. But you have to obey ownership: whom belongs the
current reference? If it is not ours, and we need it, we do
Py_(X)INCREF(). If we got it, but don't need it, we do
Py_(X)DECREF().

right, it's clear again,
Ok. (But as sizeof(char) is, by C standard definition, always 1, you
can write it shorter.)

oh' well, thanks, I just wrote "from a wrist" :), I just realize
it now... :)

Another question: here is an another part ot my code:

static PyObject*
mycrypt_decrypt(PyObject *self, PyObject *args)
{
if (!PyArg_ParseTuple(args, "ss", &data, &path)) {
return NULL;
}

....

}

When I call this function from Python without argument or more
than it expects, I get an exception, eg.:
TypeError: function takes exactly 2 arguments (0 given)

But, when I don't read input arguments (there isn't
PyArg_ParseTuple), there isn't exception.

How Python handle the number of arguments? I just ask this,
because I don't set errstring with PyErr_SetString, but I get
TypeError - how does Python knows, this error raised?

Hope you understand my question... :)

thanks for all:


a.
 
T

Thomas Rachel

Am 26.04.2011 19:28, schrieb Hegedüs Ervin:
Another question: here is an another part ot my code:

static PyObject*
mycrypt_decrypt(PyObject *self, PyObject *args)
{
if (!PyArg_ParseTuple(args, "ss",&data,&path)) {
return NULL;
}

...

}

When I call this function from Python without argument or more
than it expects, I get an exception, eg.:
TypeError: function takes exactly 2 arguments (0 given)

But, when I don't read input arguments (there isn't
PyArg_ParseTuple), there isn't exception.

How Python handle the number of arguments?

From what you tell it: with PyArg_ParseTuple(). (see
http://docs.python.org/c-api/arg.html for this).

You give a format string (in your case: "ss", again: better use "s#s#"
if possible) which is parsed in order to get the (needed number of)
parameters.

If you call with () or only one arg, args points to an empty tuple, but
the parser wants two arguments -> bang.

If you call with more than two args, the function notices it too: the
arguments would just be dropped, which is probably not what is wanted.

If you call with two args, but of wrong type, they don't match to "s"
(=string) -> bang again.

Only with calling with the correct number AND type of args, the function
says "ok".

Why is "s#" better than "s"? Simple: the former gives the string length
as well. "s" means a 0-terminated string, which might not be what you
want, especially with binary data (what you have, I suppose).

If you give e.g. "ab\0cd" where "s" is used, you get an exception as
well, as this string cannot be parsed cmpletely. So better use "s#" and
get the length as well.

I just ask this,
because I don't set errstring with PyErr_SetString, but I get
TypeError - how does Python knows, this error raised?

There is magic inside... :)


Thomas
 
H

Hegedüs Ervin

Hello,
From what you tell it: with PyArg_ParseTuple(). (see
http://docs.python.org/c-api/arg.html for this).

You give a format string (in your case: "ss", again: better use
"s#s#" if possible) which is parsed in order to get the (needed
number of) parameters.

If you call with () or only one arg, args points to an empty tuple,
but the parser wants two arguments -> bang.

If you call with more than two args, the function notices it too:
the arguments would just be dropped, which is probably not what is
wanted.

If you call with two args, but of wrong type, they don't match to
"s" (=string) -> bang again.

Only with calling with the correct number AND type of args, the
function says "ok".

Why is "s#" better than "s"? Simple: the former gives the string
length as well. "s" means a 0-terminated string, which might not be
what you want, especially with binary data (what you have, I
suppose).

If you give e.g. "ab\0cd" where "s" is used, you get an exception as
well, as this string cannot be parsed cmpletely. So better use "s#"
and get the length as well.

so, if em I right, if PyArg_ParseTuple() fails, _it_ raises
TypeError exception... (?)

I think it's clear, thanks :)
There is magic inside... :)

waov :)

and (maybe) final question: :)

I defined many exceptions:

static PyObject *cibcrypt_error_nokey;
static PyObject *cibcrypt_error_nofile;
static PyObject *cibcrypt_error_badpad;
....

void handle_err(int errcode) {
switch(errcode) {
case -1: PyErr_SetString(cibcrypt_error_nokey, "Can't find key.");
break;
....
}
....
cibcrypt_error_nokey = PyErr_NewException("cibcrypt.error_nokey", NULL, NULL);
....
PyModule_AddObject(o, "error", cibcrypt_error_nokey);

I am right, here also no need any Py_INCREF()/Py_DECREF() action,
based on this doc:
http://docs.python.org/c-api/arg.html

"Another useful function is PyErr_SetFromErrno(), which only
takes an exception argument and constructs the associated value
by inspection of the global variable errno. The most general
function is PyErr_SetObject(), which takes two object arguments,
the exception and its associated value. You don’t need to
Py_INCREF() the objects passed to any of these function"


so, this part of code is right?


thanks again:


a.
 
T

Thomas Rachel

Am 26.04.2011 20:44, schrieb Hegedüs Ervin:
and (maybe) final question: :)

I defined many exceptions:

static PyObject *cibcrypt_error_nokey;
static PyObject *cibcrypt_error_nofile;
static PyObject *cibcrypt_error_badpad;
...

void handle_err(int errcode) {
switch(errcode) {
case -1: PyErr_SetString(cibcrypt_error_nokey, "Can't find key.");
break;
...
}
...
cibcrypt_error_nokey = PyErr_NewException("cibcrypt.error_nokey", NULL, NULL);
...
PyModule_AddObject(o, "error", cibcrypt_error_nokey);

Then I would not use the name "error" here, but maybe "error_nokey" or
even better "NoKeyException".

Oops: there is an inconsistency in the docu: on the one hand, it says

There are exactly two important exceptions to this rule:
PyTuple_SetItem() and PyList_SetItem().

stating these are the only ones who take over ownership.

But PyModule_AddObject() claims to "steal" a reference as well...

I am right, here also no need any Py_INCREF()/Py_DECREF() action,
based on this doc:
http://docs.python.org/c-api/arg.html

I'm not sure: On one hand, you pass ownership of the error objects to
the module. There - one could think - they are until the module is unloaded.

But what if someone does "del module.NoKeyException"? In this case, the
object could have been destroyed, and you are using it -> BANG.

On the other hand, if you keep one instance internally, it is not
possible any longer to unload the module without a memory leak...


As already stated - you might want to have a look at some other C
modules and mimic their behaviour... (and hope they are doing it right...)


Thomas
 
H

Hegedüs Ervin

hello,

Am 26.04.2011 20:44, schrieb Hegedüs Ervin:


Then I would not use the name "error" here, but maybe "error_nokey"
or even better "NoKeyException".

Oops: there is an inconsistency in the docu: on the one hand, it says

There are exactly two important exceptions to this rule:
PyTuple_SetItem() and PyList_SetItem().

stating these are the only ones who take over ownership.

But PyModule_AddObject() claims to "steal" a reference as well...



I'm not sure: On one hand, you pass ownership of the error objects
to the module. There - one could think - they are until the module
is unloaded.

But what if someone does "del module.NoKeyException"? In this case,
the object could have been destroyed, and you are using it -> BANG.

On the other hand, if you keep one instance internally, it is not
possible any longer to unload the module without a memory leak...


As already stated - you might want to have a look at some other C
modules and mimic their behaviour... (and hope they are doing it
right...)

so, I've checked it - there wasn't any Py_INCREF(), I just calmed
down.

But. :)

My module contains just 4 functions (in C), which translate 3rd
party lib to Python. The name would be _mycrypt.so example.

I wrapped it a pure Python module, its name is mycrypt.py.

Then, I've import pure Python module in a main program, like
this:

=%=
mycrypt.py:

import _mycrypt
....
=%=

=%=
userapp.py:

import mycrypt
....
=%=

I've missed out something, and then I didn't get exception,
instead there were a segfault. :(


I've put it a Py_INCREF() after every PyModule_AddObject(), eg.:

PyModule_AddObject(o, "error", cibcrypt_error_nokey);
Py_INCREF(cibcrypt_error_nokey);

and now if there is some expected exception, I get it.


Any explanation?


Thanks:


a.


ps: this is just for my passion, but I would like to understand
it very-very much :)
 
G

Gregory Ewing

Hegedüs Ervin said:
I've put it a Py_INCREF() after every PyModule_AddObject(), eg.:

PyModule_AddObject(o, "error", cibcrypt_error_nokey);
Py_INCREF(cibcrypt_error_nokey);

That looks correct, because PyModule_AddObject is documented as
stealing a reference to the object.

By the way, it probably doesn't make a difference here, but
it's better style to do the Py_INCREF *before* calling
a function that steals a reference, to be sure that the
object can't get spuriously deallocated.
 
T

Thomas Rachel

Am 01.05.2011 22:00, schrieb Hegedüs Ervin:
My module contains just 4 functions (in C), which translate 3rd
party lib to Python. The name would be _mycrypt.so example.

I wrapped it a pure Python module, its name is mycrypt.py.

Then, I've import pure Python module in a main program, like
this:

=%=
mycrypt.py:

import _mycrypt
...
=%=

=%=
userapp.py:

import mycrypt
...
=%=

AFAICS, it looks ok.

I've missed out something, and then I didn't get exception,
instead there were a segfault. :(

I guess this is the point where yo should start printf programing.

* What happens during module initialization?
* What happens n the functions?
* Where does the stuff fail?
* What are the reference counts of the involved objects?

etc.

I've put it a Py_INCREF() after every PyModule_AddObject(), eg.:

PyModule_AddObject(o, "error", cibcrypt_error_nokey);
Py_INCREF(cibcrypt_error_nokey);

and now if there is some expected exception, I get it.
Any explanation?

I don't have one - I would think that if the module object exists for
all the time, it would be enough to have one reference there.

But obviously it is not enough - did you at any time del something
related to here? The module or one of its attributes?

Anyway, it seems safer to do INCREF here - so do it. (As Gregory already
stated - it looks cleaner if you do INCREF before AddObject.)

ps: this is just for my passion, but I would like to understand
it very-very much :)

Understandable. That's that the printf debugging of the refcounts can be
good for - even if you don't really have a problem.


Thomas
 
H

Hegedüs, Ervin

hello,

Thomas, Gregory,

thank you for your ansrwers,
I guess this is the point where yo should start printf programing.


oh', already done :)
* What happens during module initialization?
successfully initialized,
* What happens n the functions?
* Where does the stuff fail?
* What are the reference counts of the involved objects?

sorry for the dumb question: how can I controll number of
reference in C?
I don't have one - I would think that if the module object exists
for all the time, it would be enough to have one reference there.

But obviously it is not enough - did you at any time del something
related to here? The module or one of its attributes?

Anyway, it seems safer to do INCREF here - so do it. (As Gregory
already stated - it looks cleaner if you do INCREF before
AddObject.)
ok,


Understandable. That's that the printf debugging of the refcounts
can be good for - even if you don't really have a problem.

thanks, I'll go to read the docs :)

bye:

a.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top