Hunting a memory leak

D

Debian User

Hi,

I'm trying to discover a memory leak on a program of mine. I've taken
several approaches, but the leak still resists to appear.

First of all, I've tried to use the garbage collector to look for
uncollectable objects. I've used the next:

# at the beginning of code
gc.enable()
gc.set_debug(gc.DEBUG_LEAK)

<snip>

# at the end of code:
print "\nGARBAGE:"
gc.collect()

print "\nGARBAGE OBJECTS:"
for x in gc.garbage:
s = str(x)
print type(x),"\n ", s

With that, I get no garbage objects.

Then I've taken an approach that I've seen in python developers list
contributed by Walter Dörwald, that basically consists in creating a
debug version of python, create a unitest with the leaking code, and
modify the unittest.py to extract the increment of total reference
counting in that code (see
http://aspn.activestate.com/ASPN/Mail/Message/python-dev/1770868).

With that, I see that my reference count grows by one each time the
test execute. But the problem is: is there some way to look at the
object (or make a memory dump) that is leaking?.

I've used valgrind (http://developer.kde.org/~sewardj/) to see if it
could detect the leak. In fact, it detects a bunch of them, but I am
afraid that they are not related with the leak I'm looking for. I am
saying that because, when I loop over my leaky code, valgrind always
report the same amount of leaky memory, independently of the number of
iterations (while top is telling me that memory use is growing!).

My code uses extension modules in C, so I am afraid this does not
contribute to alleviate the problem. I think all the malloc are
correctly freed, but I can't be sure (however, valgrind does not
detect nothing wrong in the extension).

I am sorry, but I cannot be more explicit about the code because it
is quite complex (it is the PyTables package, http://pytables.sf.net),
and I was unable to make a simple example to be published
here. However, if anyone is tempted to have a look at the code, you
can download it from
(http://sourceforge.net/project/showfiles.php?group_id=63486). I am
attaching a unittest that exposes the leak.

I am a bit desperate. Any hint?

Francesc Alted

--

# Unittest to expose the memory leak
import sys
import unittest
import os
import tempfile

from tables import *
# Next imports are only necessary for this test suite
#from tables import Group, Leaf, Table, Array

verbose = 0

class WideTreeTestCase(unittest.TestCase):

def test00_Leafs(self):

import time
maxchilds = 2
if verbose:
print '\n', '-=' * 30
print "Running %s.test00_wideTree..." % \
self.__class__.__name__
print "Maximum number of childs tested :", maxchilds
# Open a new empty HDF5 file
file = tempfile.mktemp(".h5")
#file = "test_widetree.h5"

fileh = openFile(file, mode = "w")
if verbose:
print "Children writing progress: ",
for child in range(maxchilds):
if verbose:
print "%3d," % (child),
a = [1, 1]
fileh.createGroup(fileh.root, 'group' + str(child),
"child: %d" % child)
# Comment the createArray call to see the leak disapear
fileh.createArray("/group" + str(child), 'array' + str(child),
a, "child: %d" % child)
if verbose:
print
# Close the file
fileh.close()


#----------------------------------------------------------------------

def suite():
theSuite = unittest.TestSuite()
theSuite.addTest(unittest.makeSuite(WideTreeTestCase))

return theSuite


if __name__ == '__main__':
unittest.main(defaultTest='suite')
 
M

Michael Hudson

Debian User said:
I'm trying to discover a memory leak on a program of mine. I've taken
several approaches, but the leak still resists to appear.

First of all, I've tried to use the garbage collector to look for
uncollectable objects.
[snip]

Then I've taken an approach that I've seen in python developers list
contributed by Walter Dörwald, that basically consists in creating a
debug version of python, create a unitest with the leaking code, and
modify the unittest.py to extract the increment of total reference
counting in that code (see
http://aspn.activestate.com/ASPN/Mail/Message/python-dev/1770868).

Well, somewhere in that same thread are various references to a
TrackRefs class. Have you tried using that? It should tell you what
type of object is leaking, which is a good start.
With that, I see that my reference count grows by one each time the
test execute. But the problem is: is there some way to look at the
object (or make a memory dump) that is leaking?.

See above :)
I've used valgrind (http://developer.kde.org/~sewardj/) to see if it
could detect the leak. In fact, it detects a bunch of them, but I am
afraid that they are not related with the leak I'm looking for. I am
saying that because, when I loop over my leaky code, valgrind always
report the same amount of leaky memory, independently of the number of
iterations (while top is telling me that memory use is growing!).

There are various things (interned strings, f'ex) that always tend to
be alive at the end of a Python program: these are only leaks in a
very warped sense.

I don't know if there's a way to get vaglrind to tell you what's
allocated but not deallocated between two arbitrary points of program
execution.
My code uses extension modules in C, so I am afraid this does not
contribute to alleviate the problem.

Well, in all likelyhood, the bug is IN the C extension module. Have
you tried stepping through the code in a debugger? Sometime's that's
a good way of spotting a logic error.
I am sorry, but I cannot be more explicit about the code because it
is quite complex (it is the PyTables package, http://pytables.sf.net),
and I was unable to make a simple example to be published
here. However, if anyone is tempted to have a look at the code, you
can download it from
(http://sourceforge.net/project/showfiles.php?group_id=63486). I am
attaching a unittest that exposes the leak.

I am a bit desperate. Any hint?

Not really. Try using TrackRefs.

Cheers,
mwh
 
E

Edward K. Ream

I'm trying to discover a memory leak on a program of mine. I've taken
several approaches, but the leak still resists to appear.

First, single-stepping through C code is surprisingly effective. I heartily
recommend it.

Here are some ideas you might use if you are truly desperate. You will have
to do some work to make them useful in your situation.

1. Keep track of all newly-created objects. Warning: the id trick used in
this code is not proper because newly allocated objects can have the same
address as old objects, so you should devise a better way by creating a more
unique hash. Or just use the code as is and see whether the "invalid" code
tells you something ;-)

global lastObjectsDict
objects = gc.get_objects()

newObjects = [o for o in objects if not lastObjectsDict.has_key(id(o))]

lastObjectsDict = {}
for o in objects:
lastObjectsDict[id(o)]=o

2. Keep track of the number of objects.

def printGc(message=None,onlyPrintChanges=false):

if not debugGC: return None

if not message:
message = callerName(n=2) # Left as an exercise for the reader.

global lastObjectCount

try:
n = len(gc.garbage)
n2 = len(gc.get_objects())
delta = n2-lastObjectCount
if not onlyPrintChanges or delta:
if n:
print "garbage: %d, objects: %+6d =%7d %s" %
(n,delta,n2,message)
else:
print "objects: %+6d =%7d %s" %
(n2-lastObjectCount,n2,message)

lastObjectCount = n2
return delta
except:
traceback.print_exc()
return None

3. Print lots and lots of info...

def printGcRefs (verbose=true):

refs = gc.get_referrers(app().windowList[0])
print '-' * 30

if verbose:
print "refs of", app().windowList[0]
for ref in refs:
print type(ref)
if 0: # very verbose
if type(ref) == type({}):
keys = ref.keys()
keys.sort()
for key in keys:
val = ref[key]
if isinstance(val,leoFrame.LeoFrame): # changes as
needed
print key,ref[key]
else:
print "%d referers" % len(refs)

Here app().windowList is a key data structure of my app. Substitute your
own as a new argument.

Basically, Python will give you all the information you need. The problem
is that there is way too much info, so you must experiment with filtering
it. Don't panic: you can do it.

4. A totally different approach. Consider this function:

def clearAllIvars (o):

"""Clear all ivars of o, a member of some class."""

o.__dict__.clear()

This function will grind concrete walls into grains of sand. The GC will
then recover each grain separately.

My app contains several classes that refer to each other. Rather than
tracking all the interlocking references, when it comes time to delete the
main data structure my app simply calls clearAllIvars for the various
classes. Naturally, some care is needed to ensure that calls are made in
the proper order.

HTH.

Edward
 
F

Francesc Alted

Debian User said:
I'm trying to discover a memory leak on a program of mine. I've taken
several approaches, but the leak still resists to appear.

First of all, I've tried to use the garbage collector to look for
uncollectable objects.
[snip]

Then I've taken an approach that I've seen in python developers list
contributed by Walter Dörwald, that basically consists in creating a
debug version of python, create a unitest with the leaking code, and
modify the unittest.py to extract the increment of total reference
counting in that code (see
http://aspn.activestate.com/ASPN/Mail/Message/python-dev/1770868).

Well, somewhere in that same thread are various references to a
TrackRefs class. Have you tried using that? It should tell you what
type of object is leaking, which is a good start.
With that, I see that my reference count grows by one each time the
test execute. But the problem is: is there some way to look at the
object (or make a memory dump) that is leaking?.

See above :)
I've used valgrind (http://developer.kde.org/~sewardj/) to see if it
could detect the leak. In fact, it detects a bunch of them, but I am
afraid that they are not related with the leak I'm looking for. I am
saying that because, when I loop over my leaky code, valgrind always
report the same amount of leaky memory, independently of the number of
iterations (while top is telling me that memory use is growing!).

There are various things (interned strings, f'ex) that always tend to
be alive at the end of a Python program: these are only leaks in a
very warped sense.

I don't know if there's a way to get vaglrind to tell you what's
allocated but not deallocated between two arbitrary points of program
execution.
My code uses extension modules in C, so I am afraid this does not
contribute to alleviate the problem.

Well, in all likelyhood, the bug is IN the C extension module. Have
you tried stepping through the code in a debugger? Sometime's that's
a good way of spotting a logic error.
I am sorry, but I cannot be more explicit about the code because it
is quite complex (it is the PyTables package, http://pytables.sf.net),
and I was unable to make a simple example to be published
here. However, if anyone is tempted to have a look at the code, you
can download it from
(http://sourceforge.net/project/showfiles.php?group_id=63486). I am
attaching a unittest that exposes the leak.

I am a bit desperate. Any hint?

Not really. Try using TrackRefs.

Cheers,
mwh
 
F

Francesc Alted

[Ooops. Something went wrong with my newsreader config ;-)]

Thanks for the responses!. I started by fetching the TrackRefs() class
from http://cvs.zope.org/Zope3/test.py and pasted it in my local copy
of unittest.py. Then, I've modified the TestCase.__call__ try: block
from the original:


try:
testMethod()
ok = 1

to read:

try:
rc1 = rc2 = None
#Pre-heating
for i in xrange(10):
testMethod()
gc.collect()
rc1 = sys.gettotalrefcount()
track = TrackRefs()
# Second (first "valid") loop
for i in xrange(10):
testMethod()
gc.collect()
rc2 = sys.gettotalrefcount()
print "First output of TrackRefs:"
track.update()
print >>sys.stderr, "%5d %s.%s.%s()" % (rc2-rc1,
testMethod.__module__, testMethod.im_class.__name__,
testMethod.im_func.__name__)
# Third loop
for i in xrange(10):
testMethod()
gc.collect()
rc3 = sys.gettotalrefcount()
print "Second output of TrackRefs:"
track.update()
print >>sys.stderr, "%5d %s.%s.%s()" % (rc3-rc2,
testMethod.__module__, testMethod.im_class.__name__,
testMethod.im_func.__name__)
ok = 1

However, I'm not sure if I have made a good implementation. My
understanding is that the first loop is for pre-heating (to avoid
false count-refs due to cache issues and so). The second loop should
already give good count references and, thereby, I've made a call to
track.update(). Finally, I wanted to re-check the results of the
second loop with a third one. Therefore, I expected more or less the
same results in second and third loops.

But... the results are different!. Following are the results of this run:

$ python2.3 widetree3.py
First output of TrackRefs:
<type 'str'> 13032 85335
<type 'tuple'> 8969 38402
<type 'Cfunc'> 1761 11931
<type 'code'> 1215 4871
<type 'function'> 1180 5189
<type 'dict'> 841 4897
<type 'builtin_function_or_method'> 516 2781
<type 'int'> 331 3597
<type 'wrapper_descriptor'> 295 1180
<type 'method_descriptor'> 236 944
<type 'classobj'> 145 1092
<type 'module'> 107 734
<type 'list'> 94 440
<type 'type'> 86 1967
<type 'getset_descriptor'> 84 336
<type 'weakref'> 75 306
<type 'float'> 73 312
<type 'member_descriptor'> 70 280
<type 'ufunc'> 52 364
<type 'instance'> 42 435
<type 'instancemethod'> 41 164
<class 'numarray.ufunc._BinaryUFunc'> 25 187
<class 'numarray.ufunc._UnaryUFunc'> 24 173
<type 'frame'> 9 44
<type 'long'> 7 28
<type 'property'> 6 25
<type 'PyCObject'> 4 20
<class 'unittest.TestSuite'> 3 31
<type 'file'> 3 23
<type 'listiterator'> 3 12
<type 'bool'> 2 41
<class 'random.Random'> 2 30
<type '_sre.SRE_Pattern'> 2 9
<type 'complex'> 2 8
<type 'thread.lock'> 2 8
<type 'NoneType'> 1 2371
<class 'unittest._TextTestResult'> 1 16
<type 'ellipsis'> 1 12
<class '__main__.WideTreeTestCase'> 1 11
<class 'tables.IsDescription.metaIsDescription'> 1 10
<class 'unittest.TestProgram'> 1 9
<class 'numarray.ufunc._ChooseUFunc'> 1 8
<class 'unittest.TestLoader'> 1 7
<class 'unittest.TrackRefs'> 1 6
<class 'unittest.TextTestRunner'> 1 6
<type 'NotImplementedType'> 1 6
<class 'numarray.ufunc._PutUFunc'> 1 5
<class 'numarray.ufunc._TakeUFunc'> 1 5
<class 'unittest._WritelnDecorator'> 1 5
<type 'staticmethod'> 1 4
<type 'classmethod'> 1 4
<type 'classmethod_descriptor'> 1 4
<type 'unicode'> 1 4
7 __main__.WideTreeTestCase.test00_Leafs()
Second output of TrackRefs:
<type 'int'> 37 218
<type 'type'> 0 74
212 __main__.WideTreeTestCase.test00_Leafs()
..
----------------------------------------------------------------------
Ran 1 test in 0.689s

OK
[21397 refs]
$

As you can see, for the second loop (first output of TrackRefs), a lot
of objects appear, but after the third loop (second output of
TrackRefs), much less appear (only objects of type "int" and
"type"). Besides, the increment of the total references for the second
loop is only 7 while for the third loop is 212. Finally, to add even
more confusion, these numbers are *totally* independent of the number
of iterations I put in the loops. You see 10 in the code, but you can
try with 100 (in one or all the loops) and you get exactly the same
figures.

I definitely think that I have made a bad implementation of the try:
code block, but I can't figure out what's going wrong.

I would appreciate some ideas.

Francesc Alted
 
E

Edward K. Ream

I would appreciate some ideas.

I doubt many people will be willing to rummage through your app's code to do
your debugging for you. Here are two general ideas:

1. Try to simplify the problem. Pick something, no matter how small (and
the smaller the better) that doesn't seem to be correct and do what it takes
to find out why it isn't correct. If trackRefs is Python code you can hack
that code to give you more (or less!) info. Once you discover the answer to
one mystery, the larger mysteries may become clearer. For example, you can
concentrate on one particular data structure, one particular data type or
one iteration of your test suite.

2. Try to enjoy the problem. The late great Earl Nightingale had roughly
this advice: Don't worry. Simply consider the problem calmly, and have
confidence that the solution will eventually come to you, probably when you
are least expecting it. I've have found that this advice really works, and
it works for almost any problem. Finding "worthy" bugs is a creative
process, and creativity can be and should be highly enjoyable.

In this case, your problem is: "how to start finding my memory leaks".
Possible answers to this problem might be various strategies for getting
more (or more focused!) information. Then you have new problems: how to
implement the various strategies. In all cases, the advice to be calm and
patient applies. Solving this problem will be highly valuable to you, no
matter how long it takes :)

Edward

P.S. And don't hesitate to ask more questions, especially once you have more
concrete data or mysteries.

EKR
 
F

Francesc Alted

I doubt many people will be willing to rummage through your app's code to do
your debugging for you. Here are two general ideas:

Thanks for the words of encouragement. After the weekend I'm more fresh and
try to follow your suggestions (and those of Earl Nightingale ;-).

Cheers,

Francesc Alted
 
M

Michael Hudson

Francesc Alted said:
As you can see, for the second loop (first output of TrackRefs), a lot
of objects appear, but after the third loop (second output of
TrackRefs), much less appear (only objects of type "int" and
"type"). Besides, the increment of the total references for the second
loop is only 7 while for the third loop is 212. Finally, to add even
more confusion, these numbers are *totally* independent of the number
of iterations I put in the loops. You see 10 in the code, but you can
try with 100 (in one or all the loops) and you get exactly the same
figures.

I definitely think that I have made a bad implementation of the try:
code block, but I can't figure out what's going wrong.

I would appreciate some ideas.

In my experience of hunting these you want to call gc.collect() and
track.update() *inside* the loops. Other functions you might want to
call are things like sre.purge(), _strptime.clear_cache(),
linecache.clearcache()... there's a seemingly unbounded number of
caches around that can interfere.

Cheers,
mwh
 
W

Will Ware

Debian said:
I'm trying to discover a memory leak on a program of mine...

Several years ago, I came up with a memory leak detector that I used for
C extensions with Python 1.5.2. This was before there were gc.* methods
available, and I'm guessing they probably do roughly the same things.
Still, in the unlikely event it's helpful:
http://www.faqts.com/knowledge_base/view.phtml/aid/6006

Now that I think of it, this might be helpful after all. With this
approach, you're checking the total refcount at various points in the
loop in your C code, rather than only in the Python code. Take a look
anyway.

Good luck
Will Ware
 
F

Francesc Alted

Edward said:
Here are two general ideas:

1. Try to simplify the problem. Pick something, no matter how small (and
the smaller the better) that doesn't seem to be correct and do what it
takes to find out why it isn't correct.

Yeah... using this approach I was finally able to hunt the leak!!!.

The problem was hidden in C code that is used to access to a C library. I'm
afraid that valgrind was unable to detect that because the underlying C
library does not call the standard malloc to create the leaking objects.

Of course, the Python reference counters were unable to detect that as well
(although some black points still remain, but not very important).

Anyway, thanks very much for the advices and encouragement!

Francesc Alted
 
E

Edward K. Ream

Yeah... using this approach I was finally able to hunt the leak!!!.
....
Anyway, thanks very much for the advices and encouragement!

You are welcome. IMO, if you can track down memory problems in C you can
debug just about anything, with the notable exception of numeric programs.
Debugging numeric calculations is hard, and will always remain so.

Edward
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top