Self healthcheck

Asaf Las · Jan 22, 2014

Hi

When designing long running background process
is it feasible to monitor object/memory leakage due
to improper programming?
If it could be possible to make module which monitor and
record trends if alive objects then event can be
generated and logged if noof "zombie" objects
are to increase in longer run.

Would the gc.count() serve for such purpose?

Thanks

Asaf

Chris Angelico · Jan 22, 2014

When designing long running background process
is it feasible to monitor object/memory leakage due
to improper programming?

I assume you're talking about pure Python code, running under CPython.
(If you're writing an extension module, say in C, there are completely
different ways to detect reference leaks; and other Pythons will
behave slightly differently.) There's no way to detect truly
unreferenced objects, because they simply won't exist - not after a
garbage collection run, and usually sooner than that. But if you want
to find objects that you're somehow not using and yet still have live
references to, you'll need to define "using" in a way that makes
sense. Generally there aren't many ways that that can happen, so those
few places are candidates for a weak reference system (maybe you map a
name to the "master object" representing that thing, and you can
recreate the master object from the disk, so when nothing else is
referring to it, you can happily flush it out - that mapping is a good
candidate for weak references).

But for most programs, don't bother. CPython is pretty good at keeping
track of its own references, so chances are you don't need to - and if
you're seeing the process's memory usage going up, it's entirely
possible you can neither detect nor correct the problem in Python code
(eg heap fragmentation).

ChrisA

Asaf Las · Jan 22, 2014

I assume you're talking about pure Python code, running under CPython.
(If you're writing an extension module, say in C, there are completely
different ways to detect reference leaks; and other Pythons will
behave slightly differently.) There's no way to detect truly
unreferenced objects, because they simply won't exist - not after a
garbage collection run, and usually sooner than that. But if you want
to find objects that you're somehow not using and yet still have live
references to, you'll need to define "using" in a way that makes
sense. Generally there aren't many ways that that can happen, so those
few places are candidates for a weak reference system (maybe you map a
name to the "master object" representing that thing, and you can
recreate the master object from the disk, so when nothing else is
referring to it, you can happily flush it out - that mapping is a good
candidate for weak references).

But for most programs, don't bother. CPython is pretty good at keeping
track of its own references, so chances are you don't need to - and if
you're seeing the process's memory usage going up, it's entirely
possible you can neither detect nor correct the problem in Python code
(eg heap fragmentation).
ChrisA

Hi Chris

Yes the question was about CPython. But i am not after CPython leaks
though detecting these would be good, but my own mistakes leading to
accumulation of data in mutable structures.
there will be few processes running python code standalone communicating
across servers and every activity will be spread over time so
i have to persistently keep record of activity and remove it later when
activity is finished. In addition to checking objects directly i would
like to analyze also app health indirectly via checking amount of data
it holds. let say there is permanently 100 activities per second and
typical object count figure is 1000 (in abstract units averaged over long enough time window), so i would check throughput and memory to see if my program is healthy in terms of leaking resources and generate log if it
is not.
Input to such module will be traffic events (whatever event significant
to object creation).
So i am looking for proper way to detect memory held by CPython app. And
it would be good if memory can be deduced down to object/class name so
blamed one could be identified and reported.

Thanks

Asaf

Nicholas Cole · Jan 22, 2014

Hi Chris

Yes the question was about CPython. But i am not after CPython leaks
though detecting these would be good, but my own mistakes leading to
accumulation of data in mutable structures.
there will be few processes running python code standalone communicating
across servers and every activity will be spread over time so
i have to persistently keep record of activity and remove it later when
activity is finished. In addition to checking objects directly i would
like to analyze also app health indirectly via checking amount of data
it holds. let say there is permanently 100 activities per second and
typical object count figure is 1000 (in abstract units averaged over long
enough time window), so i would check throughput and memory to see if my
program is healthy in terms of leaking resources and generate log if it
is not.
Input to such module will be traffic events (whatever event significant
to object creation).
So i am looking for proper way to detect memory held by CPython app. And
it would be good if memory can be deduced down to object/class name so
blamed one could be identified and reported.

There are some good tools recommended here:

http://stackoverflow.com/questions/110259/which-python-memory-profiler-is-recommended

But in general: use weak references wherever possible would be my advice.
They not only prevent cycles but will highlight the kinds of bug in your
code that is likely to cause the sort of problem you are worried about.

Frank Millman · Jan 22, 2014

Asaf Las said:
Yes the question was about CPython. But i am not after CPython leaks
though detecting these would be good, but my own mistakes leading to
accumulation of data in mutable structures.
there will be few processes running python code standalone communicating
across servers and every activity will be spread over time so
i have to persistently keep record of activity and remove it later when
activity is finished.

I had a similar concern. My main worry, which turned out to be well-founded,
was that I would create an object as a result of some user input, but when
the user had finished with it, and in theory it could be garbage-collected,
in practice it would not be due to some obscure circular reference
somewhere.

For short-running tasks this is not a cause for concern, but for a
long-running server these can build up over time and end up causing a
problem.

My solution was to log every time an object was created, with some
self-identifying piece of information, and then log when it was deleted,
with the same identifier. After running the program for a while I could then
analyse the log and ensure that each creation had a corresponding deletion.

The tricky bit was logging the deletion. It is a known gotcha in Python that
you cannot rely on the __del__ method, and indeed it can cause a circular
reference in itself which prevents the object from being garbage-collected.
I found a solution somewhere which explained the use of a 'delwatcher'
class. This is how it works -

class MainObject:
def __init__(self, identifier):
self._del = delwatcher('MainObject', identifier)

class delwatcher:
def __init__(self, obj_type, identifier):
self.obj_type = obj_type
self.identifier = identifier
log('{}: id={} created'.format(self.obj_type, self.identifier))
def __del__(self):
log('{}: id={} deleted'.format(self.obj_type, self.identifier))

In this case calling __del__() is safe, as no reference to the main object
is held.

If you do find that an object is not being deleted, it is then
trial-and-error to find the problem and fix it. It is probably a circular
reference

HTH

Frank Millman

Asaf Las · Jan 22, 2014

There are some good tools recommended here:
http://stackoverflow.com/questions/110259/which-python-memory-profiler-is-recommended
But in general: use weak references wherever possible would be
my advice. They not only prevent cycles but will highlight the
kinds of bug in your code that is likely to cause the sort of
problem you are worried about.

Thanks! i will look into these!

Asaf Las · Jan 22, 2014

class MainObject:
def __init__(self, identifier):
self._del = delwatcher('MainObject', identifier)
class delwatcher:
def __init__(self, obj_type, identifier):
self.obj_type = obj_type
self.identifier = identifier
log('{}: id={} created'.format(self.obj_type, self.identifier))
def __del__(self):
log('{}: id={} deleted'.format(self.obj_type, self.identifier))
If you do find that an object is not being deleted, it is then
trial-and-error to find the problem and fix it. It is probably a circular
reference

Frank Millman

Thanks Frank. Good approach!

One question - You could do:
class MainObject:
def __init__(self, identifier):
self._del = delwatcher(self)
then later

class delwatcher:
def __init__(self, tobject):
self.obj_type = type(tobject)
self.identifier = id(tobject)
...

when creating delwatcher. Was there special reason to not to use them?
is this because of memory is reused when objects are deleted
and created again so same reference could be for objects created
in different time slots?

Thanks

Asaf

Dave Angel · Jan 22, 2014

Asaf Las said:
Thanks Frank. Good approach!

One question - You could do:
class MainObject:
def __init__(self, identifier):
self._del = delwatcher(self)
then later

class delwatcher:
def __init__(self, tobject):
self.obj_type = type(tobject)
self.identifier = id(tobject)
...

when creating delwatcher. Was there special reason to not to use them?
is this because of memory is reused when objects are deleted
and created again so same reference could be for objects created
in different time slots?

I couldn't make sense of most of that. But an ID only uniquely
corresponds to an object while that object still exists. The
system may, and will, reuse iD's constantly.

Frank Millman · Jan 23, 2014

Asaf Las said:
Thanks Frank. Good approach!

One question - You could do:
class MainObject:
def __init__(self, identifier):
self._del = delwatcher(self)
then later

class delwatcher:
def __init__(self, tobject):
self.obj_type = type(tobject)
self.identifier = id(tobject)
...

when creating delwatcher. Was there special reason to not to use them?
is this because of memory is reused when objects are deleted
and created again so same reference could be for objects created
in different time slots?

I read Dave's reply, and he is correct in saying that id's are frequently
re-used in python.

However, in this particular case, I think you are right, it is safe to use
the id to identify the object. An id can only be re-used if the original
object is deleted, and that is the whole point of this exercise. We expect
to see the id come up in a 'created' message, and then the same id appear in
a 'deleted' message. If this happens, we are not concerned if the same id
reappears in a subsequent 'created' message.

Frank

PyModule(G.py): Now Python has REAL globals -- and their scoped to boot!	0	Nov 13, 2013
My first wxPython App with Twisted	1	Aug 5, 2009
IDLE: A cornicopia of mediocrity and obfuscation.	72	Jan 31, 2011
pythonwebkit-gtk, pythonwebkit-dfb	0	May 17, 2011
Pickling dynamically generated classes	0	Jan 26, 2008
word_set = set() def should_preceed_with_an(phrase): first_word =	1	Jan 26, 2013
Call for Papers: International Conference on Education andInformation Technology ICEIT 2011	0	Jun 16, 2011
pywin32 - word object reference module - automating form filling	2	Jun 9, 2009

Self healthcheck

Asaf Las

Chris Angelico

Asaf Las

Nicholas Cole

Frank Millman

Asaf Las

Asaf Las

Dave Angel

Frank Millman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads