Pickle MemoryError - any ideas?

P

Peter

I have created a class that contains a list of files (contents,
binary) - so it uses a LOT of memory.

When I first pickle.dump the list it creates a 1.9GByte file on the
disk. I can load the contents back again, but when I attempt to dump
it again (with or without additions), I get the following:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\Python26\Lib\pickle.py", line 1362, in dump
Pickler(file, protocol).dump(obj)
File "c:\Python26\Lib\pickle.py", line 224, in dump
self.save(obj)
File "c:\Python26\Lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "c:\Python26\Lib\pickle.py", line 600, in save_list
self._batch_appends(iter(obj))
File "c:\Python26\Lib\pickle.py", line 615, in _batch_appends
save(x)
File "c:\Python26\Lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "c:\Python26\Lib\pickle.py", line 488, in save_string
self.write(STRING + repr(obj) + '\n')
MemoryError

I get this error either attempting to dump the entire list or dumping
it in "segments" i.e. the list is 2229 elements long, so from the
command line I attempted using pickle to dump individual parts of the
list into into files i.e. every 500 elements were saved to their own
file - but I still get the same error.

I used the following sequence when attempting to dump the list in
segments - X and Y were 500 element indexes apart, the sequence fails
on [1000:1500]:

f = open('archive-1', 'wb', 2)
pickle.dump(mylist[X:Y], f)
f.close()

I am assuming that available memory has been exhausted, so I tried
"waiting" between dumps in the hopes that garbage collection might
free some memory - but that doesn't help at all.

In summary:

1. The list gets originally created from various sources
2. the list can be dumped successfully
3. the program restarts and successfully loads the list
4. the list can not be (re) dumped without getting a MemoryError

This seems like a bug in pickle?

Any ideas (other than the obvious - don't save all of these files
contents into a list! Although that is the only "answer" I can see at
the moment :)).

Thanks
Peter
 
C

Carl Banks

I have created a class that contains a list of files (contents,
binary) - so it uses a LOT of memory.

When I first pickle.dump the list it creates a 1.9GByte file on the
disk. I can load the contents back again, but when I attempt to dump
it again (with or without additions), I get the following:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\Python26\Lib\pickle.py", line 1362, in dump
    Pickler(file, protocol).dump(obj)
  File "c:\Python26\Lib\pickle.py", line 224, in dump
    self.save(obj)
  File "c:\Python26\Lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "c:\Python26\Lib\pickle.py", line 600, in save_list
    self._batch_appends(iter(obj))
  File "c:\Python26\Lib\pickle.py", line 615, in _batch_appends
    save(x)
  File "c:\Python26\Lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "c:\Python26\Lib\pickle.py", line 488, in save_string
    self.write(STRING + repr(obj) + '\n')
MemoryError

(Aside) Wow, pickle concatenates strings like this?

I get this error either attempting to dump the entire list or dumping
it in "segments" i.e. the list is 2229 elements long, so from the
command line I attempted using pickle to dump individual parts of the
list into into files i.e. every 500 elements were saved to their own
file - but I still get the same error.

I used the following sequence when attempting to dump the list in
segments - X and Y were 500 element indexes apart, the sequence fails
on [1000:1500]:

f = open('archive-1', 'wb', 2)
pickle.dump(mylist[X:Y], f)
f.close()

First thing to do is try cPickle module instead of pickle.

I am assuming that available memory has been exhausted, so I tried
"waiting" between dumps in the hopes that garbage collection might
free some memory - but that doesn't help at all.

Waiting won't trigger a garbage collection. Well first of all, it's
not garbage collection but cycle collection (objects not part of
refernce cycles are collected immediately after they're destroyed, at
least they are in CPyhton), and given that your items are all binary
data, I doubt there are many reference cycles in your data.

Anyway, cycle collection is triggered when object creation/deletion
counts meet certain criteria (which won't happen if you are waiting),
but you could call gc.collect() to force a cycle collection.

In summary:

1. The list gets originally created from various sources
2. the list can be dumped successfully
3. the program restarts and successfully loads the list
4. the list can not be (re) dumped without getting a MemoryError

This seems like a bug in pickle?
No


Any ideas (other than the obvious - don't save all of these files
contents into a list! Although that is the only "answer" I can see at
the moment :)).

You should at least consider if one of the dbm-style databases (dbm,
gdbm, or dbhash) meets your needs.


Carl Banks
 
J

John Nagle

I have created a class that contains a list of files (contents,
binary) - so it uses a LOT of memory.

When I first pickle.dump the list it creates a 1.9GByte file on the
disk. I can load the contents back again, but when I attempt to dump
it again (with or without additions), I get the following:

Be sure to destroy the pickle object when you're done with it.
Don't reuse it. Pickle has a cache - it saves every object pickled, and
if the same object shows up more than once, the later instances
are represented as a cache ID. This can fill memory unnecessarily.

See
"http://groups.google.com/group/comp.lang.python/browse_thread/thread/3f8b999c25af263a"

John Nagle
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top