Garbage collection of recursive inner function

  • Thread starter from.future.import
  • Start date
F

from.future.import

Hi,

I encountered garbage collection behaviour that I didn't expect when
using a recursive function inside another function: the definition of
the inner function seems to contain a circular reference, which means
it is only collected by the mark-and-sweep collector, not by reference
counting. Here is some code that demonstrates it:

===
def outer():

def inner(n):
if n == 0:
return 1
else:
return n * inner(n - 1)

return 42

import gc
gc.set_debug(gc.DEBUG_SAVEALL)
print outer()
gc.collect()
print gc.garbage
===

Output when executed:
$ python internal_func_gc.py
42
[<cell at 0xb7dec3ec: function object at 0xb7dbd3ac>, (<cell at
0xb7dec3ec: function object at 0xb7dbd3ac>,), <function inner at
0xb7dbd3ac>]

Note that the inner function is not called at all, it is only defined.
If the inner function is moved outside the scope of the outer
function, gc.garbage will be empty. If the inner function is inside
but not recursive, gc.garbage will also be empty. If the outer
function is called twice, there will be twice as many objects in
gc.garbage.

Is this expected behaviour? Collecting an object when its refcount
reaches zero is preferable to collecting it with mark-and-sweep, but
maybe there is a reason that a circular reference must exist in this
situation. I want to check that first so I don't report a bug for
something that is not a bug.

Bye,
Maarten
 
D

Diez B. Roggisch

Hi,

I encountered garbage collection behaviour that I didn't expect when
using a recursive function inside another function: the definition of
the inner function seems to contain a circular reference, which means
it is only collected by the mark-and-sweep collector, not by reference
counting. Here is some code that demonstrates it:

===
def outer():

def inner(n):
if n == 0:
return 1
else:
return n * inner(n - 1)

return 42

import gc
gc.set_debug(gc.DEBUG_SAVEALL)
print outer()
gc.collect()
print gc.garbage
===

Output when executed:
$ python internal_func_gc.py
42
[<cell at 0xb7dec3ec: function object at 0xb7dbd3ac>, (<cell at
0xb7dec3ec: function object at 0xb7dbd3ac>,), <function inner at
0xb7dbd3ac>]

Note that the inner function is not called at all, it is only defined.
If the inner function is moved outside the scope of the outer
function, gc.garbage will be empty. If the inner function is inside
but not recursive, gc.garbage will also be empty. If the outer
function is called twice, there will be twice as many objects in
gc.garbage.

Is this expected behaviour? Collecting an object when its refcount
reaches zero is preferable to collecting it with mark-and-sweep, but
maybe there is a reason that a circular reference must exist in this
situation. I want to check that first so I don't report a bug for
something that is not a bug.

The reference comes from the closure of inner. And inner is part of the
closure, so there is a circular reference.

I don't see a way to overcome this - consider the following code:

def outer():

def inner():
inner()

if random.random() > .5:
return inner


How is the GC/refcounting to know if it can create a reference or not?



Diez
 
T

Terry Reedy

I encountered garbage collection behaviour that I didn't expect when
using a recursive function inside another function:

To understand this, it helps to realize that Python functions are not,
in themselves, recursive. Recursiveness at any time is a property of a
function in an environment, which latter can change. More specifically,
a function call is recursive if the expression indicating the function
to call happens to indicate the function containing the call at the time
of evaluation just before the evaluation of the argument expressions.
See examples below.
> the definition of
the inner function seems to contain a circular reference, which means
it is only collected by the mark-and-sweep collector, not by reference
counting. Here is some code that demonstrates it:

The inner function is part of a circular reference that is originally
part of the outer function, but which may survive the call to outer
def outer():
def inner(n):
if n == 0:
return 1
else:
return n * inner(n - 1)

inner1 = inner
def inner(n): return 1
# original inner still exists but is no longer 'recursive'

def out2():
def inner1(n): return 1
def inner(n):
if n: return n*inner1(n-1)
else: return 1
# inner is obviously not recursive
inner1 = inner
# but now it is
If the inner function is moved outside the scope of the outer
function, gc.garbage will be empty.

With 'inner' in the global namespace, no (circular) closure is needed to
keep it alive past the outer lifetime.
> If the inner function is inside
but not recursive, gc.garbage will also be empty.

Not necessarily so. What matters is that inner has a non-local
reference to outer's local name 'inner'. Try
def inner(): return inner
which contains no calls, recursive or otherwise.
> If the outer function is called twice,
> there will be twice as many objects in gc.garbage.

And so on, until gc happens.
Is this expected behaviour? Collecting an object when its refcount
reaches zero is preferable to collecting it with mark-and-sweep, but

Adding 'inner = None' at the end of an outer function will break the
cycle and with CPython, all will be collected when outer exits.
Jython and IronPython do not, I believe, do reference counting.

Adding 'del inner' gives
SyntaxError: cannot delete variable 'inner' referenced in inner scope.
maybe there is a reason that a circular reference must exist in this
situation. I want to check that first so I don't report a bug for
something that is not a bug.

Not a bug, but an educational example and possibly useful to someone
running on CPython with gc turned off and making lots of calls to
functions with inner functions with recursive references. I learned a
bit answering this.

Terry Jan Reedy
 
F

from.future.import

To understand this, it helps to realize that Python functions are not,
in themselves, recursive. Recursiveness at any time is a property of a
function in an environment, which latter can change. More specifically,
a function call is recursive if the expression indicating the function
to call happens to indicate the function containing the call at the time
of evaluation just before the evaluation of the argument expressions.

I didn't realize that the function looks up itself in the outer
environment when it makes the recursive call, instead of at definition
time.
Adding 'inner = None' at the end of an outer function will break the
cycle and with CPython, all will be collected when outer exits.

I think I'll use that for inner functions that do need to access the
outer environment, but do not need to live longer than the call to the
outer function.
Not a bug, but an educational example and possibly useful to someone
running on CPython with gc turned off and making lots of calls to
functions with inner functions with recursive references. I learned a
bit answering this.

That describes our application: in some cases, we have several
gigabytes of small objects, in which case mark-and-sweep garbage
collection takes quite a long time, especially if some of the objects
have been pushed into the swap. I have broken all cycles in our own
data structures a while ago, but got an unexpected memory leak because
of these cyclic references from inner functions.

Thanks for your clear explanation!

Bye,
Maarten
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,076
Latest member
OrderKetoBeez

Latest Threads

Top