Some thoughts on garbage collection

F

Frank Millman

Hi all

I don't know whether this will be of interest, but I have just carried
out an exercise that I found useful, so I thought I would share it.

I am writing a multi-user business app, with a multi-threaded server
program to handle client connections. The server should 'serve
forever'. Each client connection is handled within its own thread, and
can run a variety of processes. Each process is a class, and is run by
instantiating it. As each client logs off, or the connection is broken,
the thread is terminated. I wanted to confirm positively that all
objects created by the thread are garbage collected. I was concerned
that if I left a superfluous reference dangling somewhere I might end
up with the equivalent of a memory leak.

I know from previous experience that you cannot confirm deletion of a
class object by printing a message from a __del__() method, if there
are any cyclic references to the object, as it will then not be garbage
collected at all. Therefore I use the 'delwatcher' class that was
explained by Tim Evans in a post on this subject a couple of years ago
-

class DelWatcher:
def __init__(self,obj):
self.objrepr = repr(obj)
def __del__(self):
print '%s deleted' % self.objrepr

class a:
def __init__(self):
self._delwatcher = DelWatcher(self)

You can now see that an object of class a is deleted, even if it has
cyclic references.

I used this technique, and I could see some objects being deleted at
the correct point. Others, however, did not get deleted until the
server program terminated. After some investigation, I found that I was
not generating enough 'dirty' objects to trigger the garbage collector
into running. I fixed that by putting the following lines at the top of
the server program -

import gc
gc.set_threshold(10) # on my system, the default is 700

Now I can see all objects being deleted at or close to the expected
point, and therefore I am confident that I do not have any leaks.
Obviously delwatcher and set_threshold are temporary measures to prove
a point - I have now removed them.

Is this a sensible approach, or are there easier ways to achieve this?

Frank Millman
 
P

Paul Rubin

Frank Millman said:
Is this a sensible approach, or are there easier ways to achieve this?

In general you're supposed to just let gc do its thing. Doing your
own storage management defeats the purpose of gc. At most I'd say
check for leaks by running some native extension to scan all the
in-memory objects to see if anything didn't get gc'd.
 
F

Frank Millman

Paul said:
In general you're supposed to just let gc do its thing. Doing your
own storage management defeats the purpose of gc.

In principle I agree. My concern was that I might have inadvertently
done something wrong (e.g. left a reference dangling) which would
prevent gc from removing all objects which I wanted to be removed.
At most I'd say
check for leaks by running some native extension to scan all the
in-memory objects to see if anything didn't get gc'd.

If I knew what you meant, I would agree with you :)

As all I really know is Python, the method I used was the best way I
could think of to accomplish what I wanted.

Frank
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Frank said:
In principle I agree. My concern was that I might have inadvertently
done something wrong (e.g. left a reference dangling) which would
prevent gc from removing all objects which I wanted to be removed.

Depends on what it really is that you want to know. If you want
to know whether gc can release all garbage objects, you should
look at gc.garbage, after a gc.collect call immediately before
the end of the program.

The issue here is that objects implementing __del__ in a cycle will
never get collected (but added to gc.garbage); this is something
you need to be aware of.

If you want to find out if there are objects which you hold onto
too long, you can look at len(gc.get_objects()) from time to
time. This won't be all objects, but just the container objects.
If you see the number growing over time, you have a leak.

You could then also categorize this by type, e.g.

frequency = {}
for o in gc.get_objects():
o = o.__class__.__name__
frequency[o] = frequency.get(o, 0) + 1
print sorted(frequency.iteritems(), key=operator.itemgetter(1),
reverse=1)

If you are interested in total object counts, you need to run
the debug version of Python.

HTH,
Martin
 
F

Frank Millman

Martin said:
Depends on what it really is that you want to know.

If you want to find out if there are objects which you hold onto
too long, you can look at len(gc.get_objects()) from time to
time. This won't be all objects, but just the container objects.
If you see the number growing over time, you have a leak.

Thank you - this is what I wanted to know.
You could then also categorize this by type, e.g.

frequency = {}
for o in gc.get_objects():
o = o.__class__.__name__
frequency[o] = frequency.get(o, 0) + 1
print sorted(frequency.iteritems(), key=operator.itemgetter(1),
reverse=1)

Very useful. Thanks.

Frank
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top