Reference Counts

R

raghu

Hi All,

I am a new user of Python and am having a bit of problem understanding
the Reference counting and memory leakage issues.

Requesting help from experienced users....

I wrote the following simple program.


#!/usr/bin/python

import sys
global a

print "Total Reference count at the start =",sys.gettotalrefcount()
a=1
print "a ref count =",sys.getrefcount(a)
b=a
print "a ref count =",sys.getrefcount(a)

del a
del b

print "Total Reference count at the end =",sys.gettotalrefcount()


I executed it. I am seeing the following.

Total Reference count at the start = 16538
a ref count = 49
a ref count = 50
Total Reference count at the end = 16540
[6416 refs]

There are a few questions that I am having on this.

(1) Why should 'a' reference count be 49 before I even made an
assignment ?
(2) The Total Reference count at the end has increased by 2 . Why ? Am
I leaking memory ?
(3) I have read somewhere that an increase in sys.gettotalrefcount() is
indicative of a memory leak ? Aint that correct ?

Thanks for the help.

Bye,
raghavan V
 
H

Heiko Wundram

Am Donnerstag 18 Mai 2006 08:28 schrieb raghu:
#!/usr/bin/python

import sys
global a

print "Total Reference count at the start =",sys.gettotalrefcount()
a=1
print "a ref count =",sys.getrefcount(a)
b=a
print "a ref count =",sys.getrefcount(a)

del a
del b

print "Total Reference count at the end =",sys.gettotalrefcount()
...
Total Reference count at the start = 16538
a ref count = 49
a ref count = 50
Total Reference count at the end = 16540
[6416 refs]

There are a few questions that I am having on this.

(1) Why should 'a' reference count be 49 before I even made an
assignment ?

Because "1" is a special integer. Small integers (-1..100, but this depends on
the Python version) are interned, similar to strings, so there are already
references to the integer object before you assign it to "a" (at least one; 1
is such a "magic" constant that you can guess that there are already other
references to it in other places of the stdlib, which has loaded when your
script runs, so it's not hard to imagine that 1 already has 48 references
outside of your program).
(2) The Total Reference count at the end has increased by 2 . Why ? Am
I leaking memory ?

No. I'd guess that the names "a" and "b" were interned as strings (as they are
used as dict lookup keys in the globals() dict), and you have one reference
to each interned object.
(3) I have read somewhere that an increase in sys.gettotalrefcount() is
indicative of a memory leak ? Aint that correct ?

Yes. It is correct if consecutive runs of your algorithm always yield a higher
sys.gettotalrefcount() for each run. In this case (where you run
your "algorithm" only once), it isn't. It just shows you some of the innards
of the Python runtime machinery.

Execute the following script to see the result of a memory leak:
import sys

x = {}
i = 0
def test():
global x, i
x = "test"
i += 1
# Forget to clean up x... LEAK a reference to "test"!

for j in xrange(10000):
print "Before", j, ":", sys.gettotalrefcount()
test()
print "After", j, ":", sys.gettotalrefcount()
And, the following (slightly altered) program doesn't exhibit this memory
leak:
import sys

x = {}
i = 0
def test():
global x, i
x = "test"
i += 1
del x[i-1] # Properly clean up x.

for j in xrange(10000):
print "Before", j, ":", sys.gettotalrefcount()
test()
print "After", j, ":", sys.gettotalrefcount()
I don't have a debug build of Python at hand, so I can't run them now. But, if
you're interested in the results, you'll certainly do that yourself. ;-)

--- Heiko.
 
R

raghu

Heiko,

Thanks for the explanation. I understood the idea of 1 being interned.
Also I understood the globals vars having a reference in the internal
dict.

I ran the "leaky" version of the program and yes...it showed a
progressively increasing totalrefcount as below.
Before 0 : 16579
After 0 : 16581
Before 1 : 16581
After 1 : 16583
Before 2 : 16583
After 2 : 16585
Before 3 : 16585
After 3 : 16587
Before 4 : 16587
After 4 : 16589
Before 5 : 16589
After 5 : 16591
Before 6 : 16591
After 6 : 16593
Before 7 : 16593
After 7 : 16595
Before 8 : 16595
After 8 : 16597
Before 9 : 16597
After 9 : 16599
Before 10 : 16599
After 10 : 16601
Before 11 : 16601
After 11 : 16603
Before 12 : 16603
After 12 : 16605
Before 13 : 16605
After 13 : 16607
Before 14 : 16607
After 14 : 16609
Before 15 : 16609

However, the 'non-leaky' one showed a funny trend ...it kept increasing
the totalrefcount for five iterations (see 1 thru 5) and then dropped
down by 5 ( See Before 5 : 16584
After 5 : 16580 ) suddenly and again increase as shown below. However,
at the time when the script finsished execution, we were not too far
from the starting totalrefcount (16584 from 16579),


Before 0 : 16579
After 0 : 16580
Before 1 : 16580
After 1 : 16581
Before 2 : 16581
After 2 : 16582
Before 3 : 16582
After 3 : 16583
Before 4 : 16583
After 4 : 16584
Before 5 : 16584
After 5 : 16580
Before 6 : 16580
After 6 : 16581
Before 7 : 16581
After 7 : 16582
Before 8 : 16582
After 8 : 16583
Before 9 : 16583
After 9 : 16584
Before 10 : 16584
After 10 : 16580
Before 11 : 16580
After 11 : 16581
Before 12 : 16581
After 12 : 16582
Before 13 : 16582
After 13 : 16583
Before 14 : 16583
After 14 : 16584
Before 15 : 16584

What is the Mystery behind the increase and the subsequent drop ?

Thanks.

Raghavan V
 
H

Heiko Wundram

Am Donnerstag 18 Mai 2006 09:33 schrieb raghu:
However, the 'non-leaky' one showed a funny trend ...it kept increasing
the totalrefcount for five iterations (see 1 thru 5) and then dropped
down by 5 ( See Before 5 : 16584
After 5 : 16580 ) suddenly and again increase as shown below. However,
at the time when the script finsished execution, we were not too far
from the starting totalrefcount (16584 from 16579),

The cyclic garbage collector isn't run after every byte-code instruction, but
only after several have executed (because of performance issues). That's why
you see an increase in reference counts, until the interpreter calls the
garbage collector, which frees the object cycles, and so forth. I don't
exactly know what the "magic constant" (i.E. number of byte-code instructions
between subsequent runs of the garbage collector) is, but I presume it's
somewhere in the order of 100 bytecode instructions.

Why you need the cyclic gc to clean up the data structures my sample creates
is beyond be, but I'd guess it has something to do with the internal
structure of dicts.

Anyway, you can easily test this hypothesis by calling

gc.collect()

explicitly in the main loop after test() has run (remember to import
gc... ;-)). This forces a run of the cyclic gc. If funny pattern still
remains I wouldn't know of any other explanation... ;-) But, as long as
references aren't being leaked (you don't see the drop in references after
every x instructions), there's nothing to worry about.

--- Heiko.
 
R

raghu

Hmm...

I tried the gc.collect(). It aint helping. The reference count still
keeps growing till 5 after it which it drops. As you said, it is not
gonna hurt right away.

The only downside in that mysterious up and down thingie is that , we
could get to a wrong conclusion about a leak, if we ran the algo for
say 3 times. Right ?

Thanks Heiko for all the help. And in case, you get to decode the
mystery behind the increase before the decrease ..kindly let me know.

Bye,
Raghavan V
 
T

Tim Peters

[raghu, on Heiko Wundram's test program:

import sys

x = {}
i = 0
def test():
global x, i
x = "test"
i += 1
del x[i-1] # Properly clean up x.

for j in xrange(10000):
print "Before", j, ":", sys.gettotalrefcount()
test()
print "After", j, ":", sys.gettotalrefcount()
]
Hmm...

I tried the gc.collect(). It aint helping. The reference count still
keeps growing till 5 after it which it drops. As you said, it is not
gonna hurt right away.

The only downside in that mysterious up and down thingie is that , we
could get to a wrong conclusion about a leak, if we ran the algo for
say 3 times. Right ?

Thanks Heiko for all the help. And in case, you get to decode the
mystery behind the increase before the decrease ..kindly let me know.

It has nothing to do with cyclic gc. It has to do with this:

del x[i-1] # Properly clean up x.

When you remove a dictionary key, under the covers the dictionary slot
that contained the deleted <key, value> pair has its value portion set
to a special "dummy" object. This is invisible from the Python level
(there's no way you can get at this object), but the dummy object is a
real Python object, and like all Python objects has a refcount. Each
time the dummy object gets used, its refcount is bumped, and that
shows up in gettotalrefcount(). From time to time the dict resizes,
and then the number of references to the dummy dict value changes
sharply.

Except in the upcoming 2.5 release. The refcount on the dummy dict
value (and the similar dummy set value) confuses people so much that
gettotalrefcount() has been changed in 2.5 to subtract the refcounts
on the dummy dict and set values. As a result, running under current
2.5 trunk Heiko's program is steady as a rock:

Before 0 : 25632
After 0 : 25632
Before 1 : 25632
After 1 : 25632
Before 2 : 25632
After 2 : 25632
Before 3 : 25632
After 3 : 25632
Before 4 : 25632
After 4 : 25632
Before 5 : 25632
After 5 : 25632
Before 6 : 25632
After 6 : 25632
Before 7 : 25632
After 7 : 25632
Before 8 : 25632
After 8 : 25632
Before 9 : 25632
After 9 : 25632
Before 10 : 25632
After 10 : 25632
....
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,262
Messages
2,571,044
Members
48,769
Latest member
Clifft

Latest Threads

Top