Segmenting a pickle stream without unpickling

B

Boris Borcic

Assuming that the items of my_stream share no content (they are
dumps of db cursor fetches), is there a simple way to do the
equivalent of

def pickles(my_stream) :
from cPickle import load,dumps
while 1 :
yield dumps(load(my_stream))

without the overhead associated with unpickling objects
just to pickle them again ?

TIA, Boris Borcic
 
P

Paul Rubin

Boris Borcic said:
def pickles(my_stream) :
from cPickle import load,dumps
while 1 :
yield dumps(load(my_stream))

without the overhead associated with unpickling objects
just to pickle them again ?

I think you'd have to write something special. The unpickler parses
as it goes along, and all the dispatch actions build up objects.
You'd have to write a set of actions that just read past the
representations. I think there's no way to know where an object ends
without parsing it, including parsing any objects nested inside it.
 
T

Tim Peters

[Boris Borcic]
Assuming that the items of my_stream share no content (they are
dumps of db cursor fetches), is there a simple way to do the
equivalent of

def pickles(my_stream) :
from cPickle import load,dumps
while 1 :
yield dumps(load(my_stream))

without the overhead associated with unpickling objects
just to pickle them again ?

cPickle (but not pickle.py) Unpickler objects have a barely documented
noload() method. This "acts like" load(), except doesn't import
modules or construct objects of user-defined classes. The return
value of noload() is undocumented and usually useless. ZODB uses it a
lot ;-)

Anyway, that can go much faster than load(), and works even if the
classes and modules referenced by pickles aren't available in the
unpickling environment. It doesn't return the individual pickle
strings, but they're easy to get at by paying attention to the file
position between noload() calls. For example,

import cPickle as pickle
import os

# Build a pickle file with 4 pickles.

PICKLEFILE = "temp.pck"

class C:
pass

f = open(PICKLEFILE, "wb")
p = pickle.Pickler(f, 1)

p.dump(2)
p.dump([3, 4])
p.dump(C())
p.dump("all done")

f.close()

# Now use noload() to extract the 4 pickle
# strings in that file.

f = open(PICKLEFILE, "rb")
limit = os.path.getsize(PICKLEFILE)
u = pickle.Unpickler(f)
pickles = []
pos = 0
while pos < limit:
u.noload()
thispos = f.tell()
f.seek(pos)
pickles.append(f.read(thispos - pos))
pos = thispos

from pprint import pprint
pprint(pickles)

That prints a list containing the 4 pickle strings:

['K\x02.',
']q\x01(K\x03K\x04e.',
'(c__main__\nC\nq\x02o}q\x03b.',
'U\x08all doneq\x04.']

You could do much the same by calling pickletools.dis() and ignoring
its output, but that's likely to be slower.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,166
Latest member
DollyBff32
Top