Memory leak in Python

diffuser78 · May 8, 2006

I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am assuming
is that I am running out of memory.

What should I do to make sure that my code runs fine without becoming
unstable. How should I address the memory leak problem if any ? I have
a gig of RAM.

Every help is appreciated.

compromise · May 8, 2006

Can you paste an example of the code you're using?

vbgunz · May 8, 2006

how big is the set? 100MB, more? what are you doing with the set? do
you have a small example that can prove the set is causing the freeze?
I am not the sharpest tool in the shed but it sounds like you might be
multiplying your set in/directly either permanently or temporarily on
purpose or accident.

diffuser78 · May 9, 2006

Its kinda 65o lines of code...not the best idea to paste the code.

Sybren Stuvel · May 9, 2006

(e-mail address removed) enlightened us with:

I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am
assuming is that I am running out of memory.

Before acting on your assumptions, you need to verify them. Run 'top'
and hit 'M' to sort by memory usage. After that, use 'ulimit' to limit
the allowed memory usage, run your program again, and see if it stops
at some point due to memory problems.

Sybren

Peter Tillotson · May 9, 2006

1) Review your design - You say you are processing a large data set,
just make sure you are not trying to store 3 versions. If you are
missing a design, create a flow chart or something that is true to the
code you have produced. You could probably even post the design if you
are brave enough.

2) Check your implementation - make sure you manage lists, arrays etc
correctly. You need to sever links (references) to objects for them to
get swept up. I know it is obvious but easy to do in a hasty implementation.

3) Verify and test problem characteristics, profilers, top etc. It is
hard for us to help you much without more info. Test your assumptions.

Problem solving and debugging is a process, not some mystic art. Though
sometime the Gremlins disappear after a pint or two

p

Dennis Lee Bieber · May 9, 2006

I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am assuming
is that I am running out of memory.

What should I do to make sure that my code runs fine without becoming
unstable. How should I address the memory leak problem if any ? I have
a gig of RAM.

Does the memory come back after the process exits?

You don't show any sample of code or data... Nor do you mention what
OS/processor is involved.

Many systems do not return /allocated/ memory to the OS until the
top-level process exits, even if the memory is "freed" from the
viewpoint of the process.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

bruno at modulix · May 9, 2006

I have a python code which is running on a huge data set. After
starting the program the computer becomes unstable and gets very
diffucult to even open konsole to kill that process. What I am assuming
is that I am running out of memory.

What should I do to make sure that my code runs fine without becoming
unstable. How should I address the memory leak problem if any ? I have
a gig of RAM.

Every help is appreciated.

Just a hint : if you're trying to load your whole "huge data set" in
memory, you're in for trouble whatever the language - for an example,
doing a 'buf = openedFile.read()' on a 100 gig file may not be a good
idea...

diffuser78 · May 9, 2006

I am using Ubuntu Linux.

My program is a simulation program with four classes and it mimics bit
torrent file sharing systems on 2000 nodes. Now, each node has lot of
attributes and my program kinds of tries to keep tab of everything. As
I mentioned its a simulation program, it starts at time T=0 and goes on
untill all nodes have recieved all parts of the file(BitTorrent
concept). The ending time goes to thousands of seconds. In each sec I
process all the 2000 nodes.

Psuedo Code

Time = 0
while (True){
For all nodes in the system{
Process + computation
}
Time++
If (DownloadFinished == True) exit;
}

Dennis Lee Bieber · May 9, 2006

I am using Ubuntu Linux.

My program is a simulation program with four classes and it mimics bit
torrent file sharing systems on 2000 nodes. Now, each node has lot of
attributes and my program kinds of tries to keep tab of everything. As
I mentioned its a simulation program, it starts at time T=0 and goes on
untill all nodes have recieved all parts of the file(BitTorrent
concept). The ending time goes to thousands of seconds. In each sec I
process all the 2000 nodes.

Any chance each of your nodes is creating a whole allocation of the
same "data file" in memory? And those are not being freed at the end of
the "transfer".

Psuedo Code

Time = 0
while (True){
For all nodes in the system{
Process + computation
}
Time++
If (DownloadFinished == True) exit;
}

<eeek> C-code (or is it Java...) Given how many references refer to
Python as "executable pseudo-code" <G>

time = 0
while not downloadFinished:
for eachNode in system:
# process
time++

<G>
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

diffuser78 · May 9, 2006

The amount of data I read in is actually small.

If you see my algorithm above it deals with 2000 nodes and each node
has ot of attributes.

When I close the program my computer becomes stable and performs as
usual. I check the performance in Performance monitor and using "top"
and the total memory is being used and on top of that around half a gig
swap memory is also being used.

Please give some helpful pointers to overcome such memory errors.

I revisited my code to find nothing so obvious which would let this
leak happen. How to kill cross references in the program. I am kinda
newbie and not completely aware of the finetuning such programming
process.

Thanks

Karthik Gurusamy · May 9, 2006

The amount of data I read in is actually small.

If you see my algorithm above it deals with 2000 nodes and each node
has ot of attributes.

When I close the program my computer becomes stable and performs as
usual. I check the performance in Performance monitor and using "top"
and the total memory is being used and on top of that around half a gig
swap memory is also being used.

Please give some helpful pointers to overcome such memory errors.

I revisited my code to find nothing so obvious which would let this
leak happen. How to kill cross references in the program. I am kinda
newbie and not completely aware of the finetuning such programming
process.

I suspect you are trying to store each node's attributes in every other
node.
Basically you have a O(N^2) algorithm (in space and probably more in
time).
For N=2000, N^2 is pretty big and you see memory issues.

Try not to store O(N^2) information and see if you can just scale
memory requirements linearly in N. That is, see if you can store
attributes of a node at only one place per node.

I'm just guessing your implementation; but from what you say
(peer-to-peer), I feel there is a O(N^2) requirements. Also try
experimenting with small N (100 nodes say).

Thanks,
Karthik

Sybren Stuvel · May 10, 2006

(e-mail address removed) enlightened us with:

My program is a simulation program with four classes and it mimics
bittorrent file sharing systems on 2000 nodes.

Wouldn't it be better to use an existing simulator? That way, you
won't have to do the stuff you don't want to think about, and focus on
the more interesting parts. There are plenty of discrete-event and
discrete-time simulators to choose from.

Sybren

Serge Orlov · May 10, 2006

I am using Ubuntu Linux.

My program is a simulation program with four classes and it mimics bit
torrent file sharing systems on 2000 nodes. Now, each node has lot of
attributes and my program kinds of tries to keep tab of everything. As
I mentioned its a simulation program, it starts at time T=0 and goes on
untill all nodes have recieved all parts of the file(BitTorrent
concept). The ending time goes to thousands of seconds. In each sec I
process all the 2000 nodes.

Most likely you keep references to objects you don't need, so python
garbage collector cannot remove those objects. If you cannot figure it
out looking at the source code, you can gather some statistics to help
you, for example use module gc to iterate over all objects in your
program (gc.get_objects()) and find out objects of which type are
growing with each iteration.

bruno at modulix · May 10, 2006

The amount of data I read in is actually small.

So the problem is probably elsewhere... Sorry, since you were talking
about huge dataset, the good old "read-whole-file-in-memory" antipattern
seemed an obvious guess.

If you see my algorithm above it deals with 2000 nodes and each node
has ot of attributes.

When I close the program my computer becomes stable and performs as
usual. I check the performance in Performance monitor and using "top"
and the total memory is being used and on top of that around half a gig
swap memory is also being used.

Please give some helpful pointers to overcome such memory errors.

A real memory leak would cause the memory usage to keep increasing as
long as your program is running. If this is not the case, it's not a
"memory error", but a design/program error. FWIW, apps like Zope can end
up using a whole lot of memory, but there's no known memory-leak problem
AFAIK. And believe me, a Zope app can end up managing a *really huge
lot* of objects (>= many thousands).

I revisited my code to find nothing so obvious which would let this
leak happen. How to kill cross references in the program.

Using weakref and/or gc might help.

FWIW, the default memory management in Python is based on
reference-counting. As long as anything keeps a reference to an object,
this object will stay alive. If you have lot of cross-references and
2000+ big objects, you may effectively end up eating all the ram and
more. The gc module can detect and manage some cyclic references (obj A
has a ref on obj B which has a ref on obj A). The weakref module uses
'proxy' references that let reference-counting do it's job (I guess the
doc will be much more explicit than me).

Another possible improvement could be to use the flyweight design
pattern to share memory for some attributes :

- a general (while somewhat Java-oriented) explanation:
http://www.exciton.cs.rice.edu/JavaResources/DesignPatterns/FlyweightPattern.htm

- two Python exemples (the second being based on the first)
http://www.suttoncourtenay.org.uk/duncan/accu/pythonpatterns.html#flyweight
http://push.cx/2006/python-flyweights

HTH

diffuser78 · May 10, 2006

Sure, are there any available simulators...since i am modifying some
stuff i thought of creating one of my own. But if you know some
exisiting simlators , those can be of great help to me.

Thanks

diffuser78 · May 10, 2006

With 1024 nodes it runs fine...but takes around4 hours to run on AMD
3100.

diffuser78 · May 10, 2006

I ran simulation for 128 nodes and used the following

oo = gc.get_objects()
print len(oo)

on every time step the number of objects are increasing. For 128 nodes
I had 1058177 objects.

I think I need to revisit the code and remove the references....but how
to do that. I am still a newbie coder and every help will be greatly
appreciated.

thanks

Sybren Stuvel · May 11, 2006

(e-mail address removed) enlightened us with:

Sure, are there any available simulators...since i am modifying some
stuff i thought of creating one of my own. But if you know some
exisiting simlators , those can be of great help to me.

Don't know any by name, but I'm sure you can find some on Google. Do
you need a discrete-event or a discrete-time simulator?

Sybren

Kent Johnson · May 11, 2006

Sure, are there any available simulators...since i am modifying some
stuff i thought of creating one of my own. But if you know some
exisiting simlators , those can be of great help to me.

http://simpy.sourceforge.net/

memory leak?	0	Dec 9, 2008
PyCObject & malloc creating memory leak	4	Sep 29, 2010
Improper creating of logger instances or a Memory Leak?	5	Jun 18, 2011
swig/python memory leak question	0	Jan 23, 2008
Python Memory Leak using SWIG	1	Jun 4, 2007
Debugging memory leaks	20	Jun 12, 2013
Looping-related Memory Leak	7	Jun 26, 2008
Find memory leak in 1.9 app	5	Jan 15, 2011

Memory leak in Python

diffuser78

compromise

vbgunz

diffuser78

Sybren Stuvel

Peter Tillotson

Dennis Lee Bieber

bruno at modulix

diffuser78

Dennis Lee Bieber

diffuser78

Karthik Gurusamy

Sybren Stuvel

Serge Orlov

bruno at modulix

diffuser78

diffuser78

diffuser78

Sybren Stuvel

Kent Johnson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads