How do you debug memory usage?

D

David

Hi list.

I've tried Googling for this and I checked the Python docs, but to no avail.

What is the best way to debug memory usage in a Python script?

I'd like to see what objects are using the most memory. Ideally this
would be aware of references too. So if I have a small list that
contains (references rather) some large variables, then I could see
that the global list variable was responsible for X MB that couldn't
be collected by the garbage collector.

It would be nice if it could be used in a similar way to the 'du'
Linux command. eg:

eg output:
A (type: list): 8 bytes, 10MB
- a (type: string): 6 MB, 0 bytes
- b (type: string): 4 MB, 0 bytes
B (type: dict): 8 bytes, 5 MB
- d (type: string): 3 MB, 0 bytes
- c (type: string): 2 MB, 0 bytes

In the output above, the 1st size is the memory used by the object
itself, and the 2nd size is the memory used by objects it refers to. A
& B are global vars (a string and a dict), a,b,c, and d are strings
that were added to A & B at some point in the past, and aren't refered
to by anything else. Also, the output is in descending order of size.

Are there any tools/modules/etc I can use like this?

David.
 
N

Noah

Hi list.
What is the best way to debug memory usage in a Python script?
...
Are there any tools/modules/etc I can use like this?
David.

You need to use the debug build of Python to get exact numbers,
but there are a few tricks you can use with the standard build
to detect memory leaks. The simplest thing is to simply watch the
RSS column output of `ps aux` (I'm assuming you are using UNIX).

The other trick I got from Chris Siebenmann
http://utcc.utoronto.ca/~cks/space/blog/python/GetAllObjects
I modified his example a little bit. This does not tell you how
many bytes of memory your running code is using, but it will
show you the number of objects. When looking for memory leaks,
counting the number of objects is sufficient to detect leaks.
For example, say you suspect a function is leaking memory.
You could call it in a loop like this and watch the count of
objects before and after each call.

while True:
print "Number objects before:", len(get_all_objects())
suspect_function()
print "Number objects after:", len(get_all_objects())

Here is my modified version of Chris' get_all_objects() function.
All I did was force garbage collection using gc.collect().
This makes sure that you are not counting objects that Python has
left in memory, but plans on deleting at some point.

import gc
# Recursively expand slist's objects
# into olist, using seen to track
# already processed objects.
def _getr(slist, olist, seen):
for e in slist:
if id(e) in seen:
continue
seen[id(e)] = None
olist.append(e)
tl = gc.get_referents(e)
if tl:
_getr(tl, olist, seen)
# The public function.
def get_all_objects():
"""Return a list of all live Python objects, not including the
list itself."""
gc.collect()
gcl = gc.get_objects()
olist = []
seen = {}
# Just in case:
seen[id(gcl)] = None
seen[id(olist)] = None
seen[id(seen)] = None
# _getr does the real work.
_getr(gcl, olist, seen)
return olist
 
D

David

Here is my modified version of Chris' get_all_objects() function.
All I did was force garbage collection using gc.collect().
This makes sure that you are not counting objects that Python has
left in memory, but plans on deleting at some point.

Thanks for the logic.

I want to debug rdiff-backup (Python backup tool for Linux) - it's
using 2 GB of memory (1GB ram, 1GB swap) on a backup server at work.

I'll use your method to find out why this is happening. Even if it
doesn't give me exact details, it should be enough info to go by.

David
 
N

Noah

I want to debug rdiff-backup (Python backup tool for Linux) - it's
using 2 GB of memory (1GB ram, 1GB swap) on a backup server at work.
...
David

Rsync uses a lot of memory:
http://www.samba.org/rsync/FAQ.html#4
rdiff-backup uses librsync, not rsync.
I'm not sure if rsync uses librsync, but one could speculate that
they share some code and perhaps some of the same problems.
But 2GB seems excessive unless you are dealing with millions of files.
A memory leak seems more likely.
 
D

David

Rsync uses a lot of memory:
http://www.samba.org/rsync/FAQ.html#4
rdiff-backup uses librsync, not rsync.
I'm not sure if rsync uses librsync, but one could speculate that
they share some code and perhaps some of the same problems.
But 2GB seems excessive unless you are dealing with millions of files.
A memory leak seems more likely.

Thanks for your input. In case you're interested here's a link to my
post to a local Linux user group:

http://lists.clug.org.za/pipermail/clug-tech/2008-May/040532.html

David.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top