Python Memory Usage

G

greg.novak

I am using Python to process particle data from a physics simulation.
There are about 15 MB of data associated with each simulation, but
there are many simulations. I read the data from each simulation into
Numpy arrays and do a simple calculation on them that involves a few
eigenvalues of small matricies and quite a number of temporary
arrays. I had assumed that that generating lots of temporary arrays
would make my program run slowly, but I didn't think that it would
cause the program to consume all of the computer's memory, because I'm
only dealing with 10-20 MB at a time.

So, I have a function that reliably increases the virtual memory usage
by ~40 MB each time it's run. I'm measuring memory usage by looking
at the VmSize and VmRSS lines in the /proc/[pid]/status file on an
Ubuntu (edgy) system. This seems strange because I only have 15 MB of
data.

I started looking at the difference between what gc.get_objects()
returns before and after my function. I expected to see zillions of
temporary Numpy arrays that I was somehow unintentionally maintaining
references to. However, I found that only 27 additional objects were
in the list that comes from get_objects(), and all of them look
small. A few strings, a few small tuples, a few small dicts, and a
Frame object.

I also found a tool called heapy (http://guppy-pe.sourceforge.net/)
which seems to be able to give useful information about memory usage
in Python. This seemed to confirm what I found from manual
inspection: only a few new objects are allocated by my function, and
they're small.

I found Evan Jones article about the Python 2.4 memory allocator never
freeing memory in certain circumstances: http://evanjones.ca/python-memory.html.
This sounds a lot like what's happening to me. However, his patch was
applied in Python 2.5 and I'm using Python 2.5. Nevertheless, it
looks an awful lot like Python doesn't think it's holding on to the
memory, but doesn't give it back to the operating system, either. Nor
does Python reuse the memory, since each successive call to my
function consumes an additional 40 MB. This continues until finally
the VM usage is gigabytes and I get a MemoryException.

I'm using Python 2.5 on an Ubuntu edgy box, and numpy 1.0.3. I'm also
using a few routines from scipy 0.5.2, but for this part of the code
it's just the eigenvalue routines.

It seems that the standard advice when someone has a bit of Python
code that progressively consumes all memory is to fork a process. I
guess that's not the worst thing in the world, but it certainly is
annoying. Given that others seem to have had this problem, is there a
slick package to do this? I envision:
value = call_in_separate_process(my_func, my_args)

Suggestions about how to proceed are welcome. Ideally I'd like to
know why this is going on and fix it. Short of that workarounds that
are more clever than the "separate process" one are also welcome.

Thanks,
Greg
 
M

malkarouri

I am using Python to process particle data from a physics simulation.
There are about 15 MB of data associated with each simulation, but
there are many simulations. I read the data from each simulation into
Numpy arrays and do a simple calculation on them that involves a few
eigenvalues of small matricies and quite a number of temporary
arrays. I had assumed that that generating lots of temporary arrays
would make my program run slowly, but I didn't think that it would
cause the program to consume all of the computer's memory, because I'm
only dealing with 10-20 MB at a time.

So, I have a function that reliably increases the virtual memory usage
by ~40 MB each time it's run. I'm measuring memory usage by looking
at the VmSize and VmRSS lines in the /proc/[pid]/status file on an
Ubuntu (edgy) system. This seems strange because I only have 15 MB of
data.

I started looking at the difference between what gc.get_objects()
returns before and after my function. I expected to see zillions of
temporary Numpy arrays that I was somehow unintentionally maintaining
references to. However, I found that only 27 additional objects were
in the list that comes from get_objects(), and all of them look
small. A few strings, a few small tuples, a few small dicts, and a
Frame object.

I also found a tool called heapy (http://guppy-pe.sourceforge.net/)
which seems to be able to give useful information about memory usage
in Python. This seemed to confirm what I found from manual
inspection: only a few new objects are allocated by my function, and
they're small.

I found Evan Jones article about the Python 2.4 memory allocator never
freeing memory in certain circumstances: http://evanjones.ca/python-memory.html.
This sounds a lot like what's happening to me. However, his patch was
applied in Python 2.5 and I'm using Python 2.5. Nevertheless, it
looks an awful lot like Python doesn't think it's holding on to the
memory, but doesn't give it back to the operating system, either. Nor
does Python reuse the memory, since each successive call to my
function consumes an additional 40 MB. This continues until finally
the VM usage is gigabytes and I get a MemoryException.

I'm using Python 2.5 on an Ubuntu edgy box, and numpy 1.0.3. I'm also
using a few routines from scipy 0.5.2, but for this part of the code
it's just the eigenvalue routines.

It seems that the standard advice when someone has a bit of Python
code that progressively consumes all memory is to fork a process. I
guess that's not the worst thing in the world, but it certainly is
annoying. Given that others seem to have had this problem, is there a
slick package to do this? I envision:
value = call_in_separate_process(my_func, my_args)

Suggestions about how to proceed are welcome. Ideally I'd like to
know why this is going on and fix it. Short of that workarounds that
are more clever than the "separate process" one are also welcome.

Thanks,
Greg

I had almost the same problem. Will this do?

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/511474

Any comments are welcome (I wrote the recipe with Pythonistas' help).

Regards,
Muhammad Alkarouri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top