Memory usage steadily going up while pickling objects

  • Thread starter Giorgos Tzampanakis
  • Start date
G

Giorgos Tzampanakis

I have a program that saves lots (about 800k) objects into a shelve
database (I'm using sqlite3dbm for this since all the default python dbm
packages seem to be unreliable and effectively unusable, but this is
another discussion).

The process takes about 10-15 minutes. During that time I see memory usage
steadily rising, sometimes resulting in a MemoryError. Now, there is a
chance that my code is keeping unneeded references to the stored objects,
but I have debugged it thoroughly and haven't found any.

So I'm beginning to suspect that the pickle module might be keeping an
internal cache of objects being pickled. Is this true?
 
D

Dave Angel

I have a program that saves lots (about 800k) objects into a shelve
database (I'm using sqlite3dbm for this since all the default python dbm
packages seem to be unreliable and effectively unusable, but this is
another discussion).

The process takes about 10-15 minutes. During that time I see memory usage
steadily rising, sometimes resulting in a MemoryError. Now, there is a
chance that my code is keeping unneeded references to the stored objects,
but I have debugged it thoroughly and haven't found any.

So I'm beginning to suspect that the pickle module might be keeping an
internal cache of objects being pickled. Is this true?

You can learn quite a bit by using the sys.getrefcount() function. If
you think a variable has only one reference (if it had none, it'd be
very hard to test), and you call sys.getrefcount(), you can check if
your assumption is right.

Note that if the object is part of a complex object, there may be
several mutual references, so the count may be more than you expect.
But you can still check the count before and after calling the pickle
stuff, and see if it has increased.

Note that even if it has not, that doesn't prove you don't have a problem.

Could the problem be the sqlite stuff? Can you disable that part of the
logic, and see whether just creating the data still produces the leak?
 
P

Peter Otten

Giorgos said:
I have a program that saves lots (about 800k) objects into a shelve
database (I'm using sqlite3dbm for this since all the default python dbm
packages seem to be unreliable and effectively unusable, but this is
another discussion).

The process takes about 10-15 minutes. During that time I see memory usage
steadily rising, sometimes resulting in a MemoryError. Now, there is a
chance that my code is keeping unneeded references to the stored objects,
but I have debugged it thoroughly and haven't found any.

So I'm beginning to suspect that the pickle module might be keeping an
internal cache of objects being pickled. Is this true?

Pickler/Unpickler objects use a cache to maintain object identity, but at
least shelve in the standard library uses a new Pickler/Unpickler for each
set/get operation.

I don't have sqlite3dbm, but you can try the following:
import shelve
class A: pass ....
a = A()
s = shelve.open("tmp.shelve")
s["x"] = s["y"] = a
s["x"] is s["y"]
False

If you are getting True there must be a cache. One way to enable a cache
yourself is writeback:
s = shelve.open("tmp.shelve", writeback=True)
s["x"] = s["y"] = a
s["x"] is s["y"]
True

You didn't do that, I guess?
 
G

Giorgos Tzampanakis

Giorgos said:
I have a program that saves lots (about 800k) objects into a shelve
database (I'm using sqlite3dbm for this since all the default python dbm
packages seem to be unreliable and effectively unusable, but this is
another discussion).

The process takes about 10-15 minutes. During that time I see memory usage
steadily rising, sometimes resulting in a MemoryError. Now, there is a
chance that my code is keeping unneeded references to the stored objects,
but I have debugged it thoroughly and haven't found any.

So I'm beginning to suspect that the pickle module might be keeping an
internal cache of objects being pickled. Is this true?

Pickler/Unpickler objects use a cache to maintain object identity, but at
least shelve in the standard library uses a new Pickler/Unpickler for each
set/get operation.

I don't have sqlite3dbm, but you can try the following:
import shelve
class A: pass ...
a = A()
s = shelve.open("tmp.shelve")
s["x"] = s["y"] = a
s["x"] is s["y"]
False

This returns False in my case.
If you are getting True there must be a cache. One way to enable a cache
yourself is writeback:

No, I haven't enabled writeback.
 
G

Giorgos Tzampanakis

You can learn quite a bit by using the sys.getrefcount() function. If
you think a variable has only one reference (if it had none, it'd be
very hard to test), and you call sys.getrefcount(), you can check if
your assumption is right.

Note that if the object is part of a complex object, there may be
several mutual references, so the count may be more than you expect.
But you can still check the count before and after calling the pickle
stuff, and see if it has increased.

Note that even if it has not, that doesn't prove you don't have a problem.

Could the problem be the sqlite stuff? Can you disable that part of the
logic, and see whether just creating the data still produces the leak?

I tried both with the standard shelve and with sqlite3dbm and
sys.getrefcount() of the stored object (and any of the objects it
references) does not seem to go up after it's stored... I also tried
closing the shelve after storing each object and re-opening it right away
with the "n" flag (which instructs it to start with a new, empty database)
and the memory still rises with the same rate.

So it seems that the pickle module does keep some internal cache or
something like that. I don't want to resort to reading the pickle source
code, but it seems I will have to...
 
P

Peter Otten

Giorgos said:
So it seems that the pickle module does keep some internal cache or
something like that.

I don't think there's a global cache. The Pickler/Unpickler has a per-
instance cache (the memo dict) that you can clear with the clear_memo()
method, but that doesn't matter here.
I don't want to resort to reading the pickle source
code, but it seems I will have to...

I'd look somewhere else...
 
G

Giorgos Tzampanakis

I don't think there's a global cache. The Pickler/Unpickler has a per-
instance cache (the memo dict) that you can clear with the clear_memo()
method, but that doesn't matter here.


I'd look somewhere else...

Indeed. The problem was in my code after all. Still, thanks to all for the
memory debugging tips!
 
D

dieter

Giorgos Tzampanakis said:
...
So it seems that the pickle module does keep some internal cache or
something like that.

This is highly unlikely: the "ZODB" (Zope object database)
uses pickle (actually, it is "cPickle", the "C" implementation
of the "pickle" module) for serialization. The "ZODB" is
used in long running Zope processes. Should pickling cause
significant memory leackage, this would have been observed
(and reported).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top