Debugging memory leaks

W

writeson

Hi all,

I've written a program using Twisted that uses SqlAlchemy to access a database using threads.deferToThread(...) and SqlAlchemy's scoped_session(...). This program runs for a long time, but leaks memory slowly to the point of needing to be restarted. I don't know that the SqlAlchemy/threads thing is the problem, but thought I'd make you aware of it.

Anyway, my real question is how to go about debugging memory leak problems in Python, particularly for a long running server process written with Twisted. I'm not sure how to use heapy or guppy, and objgraph doesn't tell me enough to locate the problem. If anyone as any suggestions or pointers it would be very much appreciated!

Thanks in advance,
Doug
 
D

dieter

writeson said:
...
Anyway, my real question is how to go about debugging memory leak problems in Python, particularly for a long running server process written with Twisted. I'm not sure how to use heapy or guppy, and objgraph doesn't tell me enough to locate the problem.

Analysing memory leaks is really difficult: huge amounts of data
is involved and usually, it is almost impossible to determine which
of the mentioned objects are leaked and which are rightfully in use.
In addition, long running Python processes usually have degrading
memory use - due to memory fragmentation. There is nothing you
can do against this.

Therefore: if the leak seems to be small, it may be much more
advicable to restart your process periodically (during times
where a restart does not hurt much) rather than try to find
(and fix) the leaks. Only when the leak is large enough that
it would force you to too frequent restarts, a deeper
analysis may be advicable (large leaks are easier to locate as well).


I have analysed memory leaks several times for Zope applications.

Zope helped me much by its "Top Refcount" functionality.
This uses the fact that a class/type instance (in many cases)
holds a reference to the corresponding class/type instance
(it seems not to work for all elementary types).
Thus looking at the refcount of the class/type gives you
an indication how many instances of this class/type are around.
Zope presents this information sorted by number.
Then you send requests against Zope and reexamine the information:
You get something like:

Class June 13, 2013 8:18 am June 13, 2013 8:18 am Delta
....ApplicationManager 15 22 +7
....DebugManager 9 12 +3

In the case above, my requests have created 7 additional
"ApplicationManager" and 3 additional "DebugManager" instances.

If Zope objects for which this functionality works leak,
then this is a powerful tool to detect those object classes.

You could implement something similar for your server.


As mentioned, the approach does not work for (many; all?) elementary
Python types. Thus, if the leak involves only those instances, it
cannot be detected this way.

Memory leaks are often introduced by C extensions - and do not
involve Python objects (but leak C level memory). Those, too,
cannot be analysed by Python level approaches.
 
W

writeson

Dieter,

Thanks for the response, and you're correct, debugging memory leaks is tough! So far I haven't had much luck other than determining I have a leak. I've used objgraph to see that objects are being created that don't seem to get cleaned up. What I can't figure out so far is why, they are local variable objects that "should" get cleaned up when they go out scope.

Ah well, I'll keep pushing!
Thanks again,
Doug
 
D

Dave Angel

Dieter,

Thanks for the response, and you're correct, debugging memory leaks is tough! So far I haven't had much luck other than determining I have a leak. I've used objgraph to see that objects are being created that don't seem to get cleaned up. What I can't figure out so far is why, they are local variable objects that "should" get cleaned up when they go out scope.

Pure python code shouldn't have any leaks, but instead can have what I
call stagnation. That's data that's no longer useful, but the program
has fooled the system into thinking it should hang onto it.

A leak happens in C code, when all the pointers to a given hunk of
memory have gone away, and there's no way to access it any longer.

In pure Python, you don't work with pointers, but with references, and
they are ref-counted. When the count goes to zero, the object is freed.
Periodically a gc sweep happens, which catches those circular
references which never actually go to zero.


So post a fragment of code that seems to cause the problem, and maybe
someone can explain why.

1) objects with a __del__() method
2) objects that are "cached" by some mechanism
3) objects that collectively represent a lot of data
4) objects that are exposed to buggy C code
 
G

Giorgos Tzampanakis

Analysing memory leaks is really difficult: huge amounts of data is
involved and usually, it is almost impossible to determine which of the
mentioned objects are leaked and which are rightfully in use. In
addition, long running Python processes usually have degrading memory
use - due to memory fragmentation. There is nothing you can do against
this.

Therefore: if the leak seems to be small, it may be much more advicable
to restart your process periodically (during times where a restart does
not hurt much) rather than try to find (and fix) the leaks. Only when
the leak is large enough that it would force you to too frequent
restarts, a deeper analysis may be advicable (large leaks are easier to
locate as well).


Am I the only one who thinks this is terrible advice?
 
S

Steve Simmons

Giorgos Tzampanakis said:
Am I the only one who thinks this is terrible advice?

No you are not alone. Ignoring a bug is only sensible if you absolutely understand what is going wrong - and by the time you understand the problem that well, you probably have enough understanding to fix it. If tools are available (as the OP knows), then learn them and use them to find/fix the bug.
Steve S

Sent from a Galaxy far far away
 
C

Chris Angelico

Am I the only one who thinks this is terrible advice?

Definitely not alone there, but I'm biased; I like to keep systems and
processes running for ridiculous lengths of time. Up until we suffered
a simultaneous UPS failure and power outage, I had one process still
running from shortly after the system had been booted... over two
years previously. (That same program now has 20 weeks+ of uptime.)
Granted, that would be impractical in Python, since it's not easy to
edit code of a live system; but still, once your code is stable, you
wouldn't be restarting for that, and your uptime figures should be
able to reflect that.

ChrisA
 
S

Steven D'Aprano

Am I the only one who thinks this is terrible advice?


Sub-optimal, maybe, but terrible? Not even close. Terrible advice would
be "open up all the ports on your firewall, that will fix it!"

If it takes, say, 200 person-hours to track down this memory leak, and
another 200 person-hours to fix it, that's an awful large expense. In
that case, it would surely be better to live with the inconvenience and
mess of having a nightly/weekly/monthly reboot. On the other hand, maybe
it will only take 1 hour to find, and fix, the leak. Who knows?

My advice is to give yourself a deadline:

"If I have not found the leak in one week, or found and fixed it in three
weeks, then I'll probably never fix it and I should just give up and
apply palliative reboots to work around the problem."

Either that or hire an expert at debugging memory leaks.
 
C

Chris Angelico

Sub-optimal, maybe, but terrible? Not even close. Terrible advice would
be "open up all the ports on your firewall, that will fix it!"
...

My advice is to give yourself a deadline:

"If I have not found the leak in one week, or found and fixed it in three
weeks, then I'll probably never fix it and I should just give up and
apply palliative reboots to work around the problem."

Either that or hire an expert at debugging memory leaks.

It's terrible advice in generality, because it encourages a sloppiness
of thinking: "Memory usage doesn't matter, we'll just instruct people
to reset everything now and then". When you have a problem on your
hands, you always have to set a deadline [1] but sometimes you have to
set the deadline the other way, too: "I'll just reboot it now, but if
it runs out of memory within a week, I *have* to find the problem".
Also, I think everyone should have at least one shot at a project that
has to stay up for multiple months, preferably a year. Even if you
never actually achieve a whole year of uptime, *think* that way. It'll
help you get things into perspective: "If I were running this all
year, that might be an issue, but who cares about a memory leak in a
script that's going to be finished in an hour!".

[1] cf http://www.gnu.org/fun/jokes/last.bug.html

ChrisA
 
R

rusi

Anyway, my real question is how to go about debugging memory leak problems in Python, particularly for a long running
server process written with Twisted. I'm not sure how to use heapy or guppy, and objgraph doesn't tell me enough to
locate the problem. If anyone as any suggestions or pointers it would be very much appreciated!

Can you explain in more detail what you get stuck with using heapy/
guppy?
 
R

rusi

Am I the only one who thinks this is terrible advice?

I would expect a typical desktop app to run for a couple of hours --
maybe a couple of days.
Living with a small (enough) leak there may be ok.
[In particular I believe that most commercial apps will leak a bit if
run long enough]

The case of something server-ish is quite different.
A server in principle runs forever.
And so if it leaks its not working.
 
G

Giorgos Tzampanakis

Sub-optimal, maybe, but terrible? Not even close. Terrible advice would
be "open up all the ports on your firewall, that will fix it!"

If it takes, say, 200 person-hours to track down this memory leak, and
another 200 person-hours to fix it, that's an awful large expense. In
that case, it would surely be better to live with the inconvenience and
mess of having a nightly/weekly/monthly reboot. On the other hand, maybe
it will only take 1 hour to find, and fix, the leak. Who knows?

But having a memory leak in the first place is an indication that
something is very wrong with your program. Either you're keeping
references that you didn't mean to be keeping (which indicates that there
can be larger side-effects than just wasted memory) or a linked C library
is leaking memory, which is bad for reasons which I won't cover here since
they are self-evident.
 
C

Chris Angelico

Am I the only one who thinks this is terrible advice?

I would expect a typical desktop app to run for a couple of hours --
maybe a couple of days.
Living with a small (enough) leak there may be ok.
[In particular I believe that most commercial apps will leak a bit if
run long enough]

The case of something server-ish is quite different.
A server in principle runs forever.
And so if it leaks its not working.

I keep my clients running for months. My Windows laptop (let's not
even get started on my Linux boxes) got rebooted a few weeks ago
(can't remember why), but I've had it running for two months or more
at a time. And that's Windows XP, not the most stable OS ever
invented, and a computer that's used fairly constantly - two web
browsers, a MUD client that retains full history, music/movie playing
with VLC, SciTE, IDLE, BitTorrent, and a bunch of other stuff. And I
don't reboot it; I don't even restart applications if I can help it
(except VLC, I tend to close that when I'm done). Any memory leak in
any of the apps I use would be highly visible and extremely annoying;
and there *were* such leaks in the Flash players of yesterday.
Fortunately now I can leave browsers running constantly. (Either that,
or the plugins container gets restarted. Not sure.)

Just because it's a client doesn't mean it can't be treated seriously. :)

Of course, my style IS unusual. Most people don't do what I do.

ChrisA
 
S

Steven D'Aprano

But having a memory leak in the first place is an indication that
something is very wrong with your program. Either you're keeping
references that you didn't mean to be keeping (which indicates that
there can be larger side-effects than just wasted memory) or a linked C
library is leaking memory, which is bad for reasons which I won't cover
here since they are self-evident.

You mean a bug in your code is a sign that there is a bug in your code?
Who would have imagined! *wink*

Of course you are right that a memory leak is a bug. And like all bugs,
ideally you will want to fix it. But sometimes bugs are too difficult to
fix, and the inconvenience of restarting the app too minor to care. For
many applications, "restart once a month" is no big deal, especially
compared to "spend three months trying to track down this one bug,
instead of doing work that will actually pay the bills".

I'm not suggesting that living with a memory leak is *in and of itself* a
good thing, only that sometimes there are higher priorities.
 
R

rusi

Is a web browser a “typical desktop app”? A filesystem browser? An
instant messenger? A file transfer application? A podcatcher? All of
those typically run for months at a time on my desktop.

Any memory leak in any of those is going to cause trouble, please hunt
them all down with fire and exterminate with prejudice.

Oh well -- I guess I am an old geezer who shuts my machine when I am
done!
Yeah I know -- not so good for the disk though its good for the
planet!
 
D

dieter

Chris Angelico said:
...
It's terrible advice in generality, because it encourages a sloppiness
of thinking: "Memory usage doesn't matter, we'll just instruct people
to reset everything now and then".

"Memory usage" may matter. But if you loose 1 kb a day, your process
can run 3 years before you have lost 1 MB. Compare this to the
485 MB used when you start "firefox". The situation looks different
when you loose 10 MB a day.
 
C

Chris Angelico

"Memory usage" may matter. But if you loose 1 kb a day, your process
can run 3 years before you have lost 1 MB. Compare this to the
485 MB used when you start "firefox". The situation looks different
when you loose 10 MB a day.

Right. Everything needs to be scaled. Everything needs to be in
perspective. Losing 1 kilobit per day is indeed trivial; even losing
one kilobyte per day, which is what I assume you meant :), isn't
significant. But it's not usually per day, it's per leaking action.
Suppose your web browser leaks 1024 usable bytes of RAM every HTTP
request. Do you know how much that'll waste per day? CAN you know?

ChrisA
 
R

rusi

As do I. And when I power on the machine, it resumes exactly where it
left off: with the exact same contents of memory as when I pressed the
Suspend button.

That is, the memory leak will continue to accumulate as the run time of
the process continues.


You can have both: a continuous session, and stop consuming power while
not using your machine.

Suspend is low-power, hibernate is 0-power
http://www.unixmen.com/suspend-vs-hibernate-in-linux-what-is-the-difference/

And I keep having some issues with hibernate
 
C

Chris Angelico

Suspend is low-power, hibernate is 0-power
http://www.unixmen.com/suspend-vs-hibernate-in-linux-what-is-the-difference/

And I keep having some issues with hibernate

You can configure the Suspend button to hibernate the computer. Though
my personal preference, when hibernating a computer, is to trigger it
directly from software. Anyway, same difference; shut down a computer
without shutting down a process. I do the same with several of my VMs
- when I'm done with them, Save Machine State. (Except the one for
Magic: The Gathering Online. For some reason MTGO has problems if I
don't actually reboot it periodically, so that one I just shut down.)

ChrisA
 
D

dieter

Chris Angelico said:
...
Right. Everything needs to be scaled. Everything needs to be in
perspective. Losing 1 kilobit per day is indeed trivial; even losing
one kilobyte per day, which is what I assume you meant :), isn't
significant. But it's not usually per day, it's per leaking action.
Suppose your web browser leaks 1024 usable bytes of RAM every HTTP
request. Do you know how much that'll waste per day? CAN you know?

What I suggested to the original poster was that *he* checks
whether *his* server leaks a really significant amount of memory
-- and starts to try a (difficult) memory leak analysis only in this
case. If he can restart his server periodically, this may make
the analysis unnecessary.

I also reported that I have undertaken such an analysis several times and
what helped me in these cases. I know - by experience - how difficult
those analysis are. And there have been cases, where I failed despite
much effort: the systems I work with are huge, consisting of
thousands of components, developed by various independent groups,
using different languages (Python, C, Java); each of those components
may leak memory; most components are "foreign" to me.
Surely, you understand that in such a context a server restart
in the night of a week end (leading to a service disruption of a
few seconds) seems an attractive alternative to trying to locate the leaks.

Things would change drastically if the leak is big enough to force a restart
every few hours. But big leaks are *much* easier to detect
and locate than small leaks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top