Safe to call Py_Initialize() frequently?

R

roschler

I've created a Python server that embeds Python 2.5 and runs Python
jobs. I want to be able to completely "flush" the interpreter between
each job. That means resetting all variables, stopping all user
created threads, and resetting the interpreter sys module path. If it
does not cause memory leaks, slowdowns, or other problems I would like
to call Py_Initialize() before running each job. I expect to run a
job about once a second. Are there any known issues with doing this
or anything else that would make this a bad approach?

If it is a safe approach, do I have to pair each Py_Initialize() call
with a Py_Finalize() call?

If it is not a safe approach, is there another way to get what I want?

Thanks.
 
M

Mark Hammond

I've created a Python server that embeds Python 2.5 and runs Python
jobs. I want to be able to completely "flush" the interpreter between
each job. That means resetting all variables, stopping all user
created threads, and resetting the interpreter sys module path. If it
does not cause memory leaks, slowdowns, or other problems I would like
to call Py_Initialize() before running each job. I expect to run a
job about once a second. Are there any known issues with doing this
or anything else that would make this a bad approach?

Calling Py_Initialize() multiple times has no effect. Calling
Py_Initialize and Py_Finalize multiple times does leak (Python 3 has
mechanisms so this need to always be true in the future, but it is true
now for non-trivial apps.
If it is a safe approach, do I have to pair each Py_Initialize() call
with a Py_Finalize() call?

If it is not a safe approach, is there another way to get what I want?

Start a new process each time?

Cheers,

Mark
 
R

roschler

On 21/03/2009 4:20 AM, roschler wrote:

Calling Py_Initialize() multiple times has no effect.  Calling
Py_Initialize and Py_Finalize multiple times does leak (Python 3 has
mechanisms so this need to always be true in the future, but it is true
now for non-trivial apps.


Start a new process each time?

Cheers,

Mark

Hello Mark,

Thank you for your reply. I didn't know that Py_Initialize worked
like that.

How about using Py_NewInterpreter() and Py_EndInterpreter() with each
job? Any value in that approach? If not, is there at least a
reliable way to get a list of all active threads and terminate them so
before starting the next job? Starting a new process each time seems
a bit heavy handed.

Robert.
 
G

Graham Dumpleton

Hello Mark,

Thank you for your reply.  I didn't know that Py_Initialize worked
like that.

How about using Py_NewInterpreter() and Py_EndInterpreter() with each
job?  Any value in that approach?  If not, is there at least a
reliable way to get a list of all active threads and terminate them so
before starting the next job?  Starting a new process each time seems
a bit heavy handed.

Using Py_EndInterpreter() is even more fraught with danger. The first
problem is that some third party C extension modules will not work in
sub interpreters because they use simplified GIL state API. The second
problem is that third party C extensions often don't cope well with
the idea that an interpreter may be destroyed that it was initialised
in, with the module then being subsequently used again in a new sub
interpreter instance.

Given that it is one operation per second, creating a new process, be
it a completely fresh one or one forked from existing Python process,
would be simpler.

Graham
 
G

Graham Dumpleton

Calling
Py_Initialize and Py_Finalize multiple times does leak (Python 3 has
mechanisms so this need to always be true in the future, but it is true
now for non-trivial apps.

Mark, can you please clarify this statement you are making. The
grammar used makes it a bit unclear.

Are you saying, that effectively by design, Python 3.0 will always
leak memory upon Py_Finalize() being called, or that it shouldn't leak
memory and that problems with older versions of Python have been fixed
up?

I know that some older versions of Python leaked memory on Py_Finalize
(), but if this is now guaranteed to always be the case and nothing
can be done about it, then the final death knell will have been rung
on mod_python and also embedded mode of mod_wsgi. This is because both
those systems rely on being able to call Py_Initialize()/Py_Finalize()
multiple times. At best they would have to change how they handle
initialisation of Python and defer it until sub processes have been
forked, but this will have some impact on performance and memory
usage.

So, more information appreciated.

Related link on mod_wsgi list about this at:

http://groups.google.com/group/modwsgi/browse_frm/thread/65305cfc798c088c?hl=en

Graham
 
M

Mark Hammond

Mark, can you please clarify this statement you are making. The
grammar used makes it a bit unclear.

Yes, sorry - s/this need to/this need not/
Are you saying, that effectively by design, Python 3.0 will always
leak memory upon Py_Finalize() being called, or that it shouldn't leak
memory and that problems with older versions of Python have been fixed
up?

The latter - kindof - py3k provides an enhanced API that *allows*
extensions to be 'safe' in this regard, but it doesn't enforce it.
Modules 'trivially' ported from py2k will not magically get this ability
- they must explicitly take advantage of it. pywin32 is yet to do so
(ie, it is a 'trivial' port...)

I hope this clarifies...

Mark
 
G

Graham Dumpleton

Yes, sorry - s/this need to/this need not/


The latter - kindof - py3k provides an enhanced API that *allows*
extensions to be 'safe' in this regard, but it doesn't enforce it.
Modules 'trivially' ported from py2k will not magically get this ability
- they must explicitly take advantage of it.  pywin32 is yet to do so
(ie, it is a 'trivial' port...)

I hope this clarifies...

Yes, but ...

There still may be problems. The issues is old, but suspect that
comments in the issue:

http://bugs.python.org/issue1856

maybe still hold true.

That is, that there are some things that Python doesn't free up which
are related to Python simplified GIL state API. Normally this wouldn't
matter as another call to Py_Initialize() would see existing data and
reuse it. So, doesn't strictly leak memory in that sense.

In mod_wsgi however, Apache will completely unload the mod_wsgi module
on a restart. This would also mean that the Python library is also
unloaded from memory. When it reloads both, the global static
variables where information was left behind have been lost and nulled
out. Thus Python when initialised again, will recreate the data it
needs.

So, for case where Python library unloaded, looks like may well suffer
a memory leak regardless.

As to third party C extension modules, they aren't really an issue,
because all that is done in Apache parent process is Py_Initialize()
and Py_Finalize() and nothing else really. Just done to get
interpreter setup before forking child processes.

There is more detail on this analysis in that thread on mod_wsgi list
at:

Graham
 
A

Aahz

[p&e]

In mod_wsgi however, Apache will completely unload the mod_wsgi module
on a restart. This would also mean that the Python library is also
unloaded from memory. When it reloads both, the global static
variables where information was left behind have been lost and nulled
out. Thus Python when initialised again, will recreate the data it
needs.

So, for case where Python library unloaded, looks like may well suffer
a memory leak regardless.

As to third party C extension modules, they aren't really an issue,
because all that is done in Apache parent process is Py_Initialize()
and Py_Finalize() and nothing else really. Just done to get
interpreter setup before forking child processes.

There is more detail on this analysis in that thread on mod_wsgi list
at:

Missing reference?
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it." --Brian W. Kernighan
 
G

Graham Dumpleton

[p&e]

In mod_wsgi however, Apache will completely unload the mod_wsgi module
on a restart. This would also mean that the Python library is also
unloaded from memory. When it reloads both, the global static
variables where information was left behind have been lost and nulled
out. Thus Python when initialised again, will recreate the data it
needs.
So, for case where Python library unloaded, looks like may well suffer
a memory leak regardless.
As to third party C extension modules, they aren't really an issue,
because all that is done in Apache parent process is Py_Initialize()
and Py_Finalize() and nothing else really. Just done to get
interpreter setup before forking child processes.
There is more detail on this analysis in that thread on mod_wsgi list
at:

Missing reference?

It was in an earlier post. Yes I knew I forget to add it again, but
figured people would read the whole thread.

http://groups.google.com/group/modwsgi/browse_frm/thread/65305cfc798c088c

Graham
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top