Unpickling crashing my machine

  • Thread starter =?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=
  • Start date
?

=?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=

No response, so I'm reposting this ad it seems an "interesting" problem...

I have a huge dataset which contains a lot of individual records
represented by class instances.

I pickle this to a file :

way #1 :
for object in objects :
cPickle.dump( object, myfile, -1 )

way #2 :
p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )

When I try to unpickle this big file :

p = cPickle.Unpickler( open( ... ))
many times p.load()... display a progress counter...

Loading the file generated by #1 works fine, with linear speed.
Loading the file generated by #2 :
- the progress counter runs as fast as #1
- eats all memory, then swap
- when eating swap, the progress counter slows down a lot (of course)
- and the process must be killed to save the machine.

I'm talking lots of memory here. The pickled file is about 80 MB, when
loaded it fits into RAM no problem.
However I killed the #2 when it had already hogged about 700 Mb of RAM,
and showed no sign of wanting to stop.

What's the problem ?
 
P

Peter Otten

Pierre-Frédéric Caillaud said:
No response, so I'm reposting this ad it seems an "interesting" problem...

I have a huge dataset which contains a lot of individual records
represented by class instances.

I pickle this to a file :

way #1 :
for object in objects :
cPickle.dump( object, myfile, -1 )

way #2 :
p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )

When I try to unpickle this big file :

p = cPickle.Unpickler( open( ... ))
many times p.load()... display a progress counter...

Loading the file generated by #1 works fine, with linear speed.
Loading the file generated by #2 :
- the progress counter runs as fast as #1
- eats all memory, then swap
- when eating swap, the progress counter slows down a lot (of course)
- and the process must be killed to save the machine.

I'm talking lots of memory here. The pickled file is about 80 MB, when
loaded it fits into RAM no problem.
However I killed the #2 when it had already hogged about 700 Mb of RAM,
and showed no sign of wanting to stop.

What's the problem ?

I have just tried to pickle the same object twice using both methods you
describe. The file created using the Pickler is shorter than the one
written by dump(), which I suppose creates a new pickler for every call.
That means that the pickler keeps a cache of objects already written and
therefore the Unpickler must do the same. I believe that what you see is
the excessive growth of that cache.

Peter
 
S

Skip Montanaro

...

Peter> I believe that what you see is the excessive growth of that
Peter> cache.

Correct. If each object dumped is independent of all the other objects
being dumped you should clear the memo after each dump() call in the second
case:

p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )
p.memo_clear()

The first case doesn't suffer from this problem because each call to
cPickle.dump() creates a new Pickler, and thus a new memo.

Skip
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top