pickling a subclass of tuple

F

fedor

Hi all, happy new year,

I was trying to pickle a instance of a subclass of a tuple when I ran
into a problem. Pickling doesn't work with HIGHEST_PROTOCOL. How should
I rewrite my class so I can pickle it?

Thanks ,

Fedor

#!/usr/bin/env python
import pickle
class A(tuple):
def __new__(klass, arg1,arg2):
return super(A,klass).__new__(klass, (arg1,arg2))
a=A(1,2)
print "no pickle",a
print "normal pickle",pickle.loads(pickle.dumps(a))
print "highest protocol",
pickle.loads(pickle.dumps(a,pickle.HIGHEST_PROTOCOL))

This is the output:
'''
no pickle (1, 2)
normal pickle (1, 2)
highest protocol
Traceback (most recent call last):
File "./test.py", line 9, in ?
print "highest
protocol",pickle.loads(pickle.dumps(a,pickle.HIGHEST_PROTOCOL))
File "/usr/lib/python2.3/pickle.py", line 1394, in loads
return Unpickler(file).load()
File "/usr/lib/python2.3/pickle.py", line 872, in load
dispatch[key](self)
File "/usr/lib/python2.3/pickle.py", line 1097, in load_newobj
obj = cls.__new__(cls, *args)
TypeError: __new__() takes exactly 3 arguments (2 given)
'''
 
A

Alex Martelli

fedor said:
Hi all, happy new year,

I was trying to pickle a instance of a subclass of a tuple when I ran
into a problem. Pickling doesn't work with HIGHEST_PROTOCOL. How should
I rewrite my class so I can pickle it?

You're falling afoul of an optimization in pickle's protocol 2, which is
documented in pickle.py as follows:

# A __reduce__ implementation can direct protocol 2 to
# use the more efficient NEWOBJ opcode, while still
# allowing protocol 0 and 1 to work normally. For this to
# work, the function returned by __reduce__ should be
# called __newobj__, and its first argument should be a
# new-style class. The implementation for __newobj__
# should be as follows, although pickle has no way to
# verify this:
#
# def __newobj__(cls, *args):
# return cls.__new__(cls, *args)
#
# Protocols 0 and 1 will pickle a reference to __newobj__,
# while protocol 2 (and above) will pickle a reference to
# cls, the remaining args tuple, and the NEWOBJ code,
# which calls cls.__new__(cls, *args) at unpickling time
# (see load_newobj below). If __reduce__ returns a
# three-tuple, the state from the third tuple item will be
# pickled regardless of the protocol, calling __setstate__
# at unpickling time (see load_build below).

Essentially, and simplifying just a little...: you're inheriting
__reduce_ex__ (because you're not overriding it), but you ARE overriding
__new__ *and changing its signature* -- so, the inherited __reduce__ex__
is used, and, with this protocol 2 optimization, it essentially assumes
that __new__ is similarly used -- or, at least, that a __new__ is used
which does not arbitrarily change the signature!

So, if you want to change __new__'s signature, and yet be picklable by
protocol 2, you have to override __reduce_ex__ to return the right
"args"... those your class's __new__ expects!


For example, you could consider something like...:

def __newobj__(cls, *args):
return cls.__new__(cls, *args)

class A(tuple):
def __new__(klass, arg1, arg2):
return super(A, klass).__new__(klass, (arg1, arg2))

def __reduce_ex__(self, proto=0):
if proto >= 2:
return __newobj__, (A, self[0], self[1])
else:
return super(A, self).__reduce_ex__(proto)

Note the key difference in A's __reduce_ex__ (for proto=2) wrt tuple's
(which is the same as object's) -- that's after an "import a" where a.py
has this code as well as an 'a = A(1, 2)'...:
[/QUOTE]

Apart from the additional tuple items (not relevant here), tuple's
reduce returns args as (<class 'a.A'>, (1, 2)) -- two items: the class
and the tuplevalue; so with protocol 2 this ends up calling A.__new__(A,
(1,2))... BOOM, because, differently from tuple.__new__, YOUR override
doesn't accept this signature! So, I suggest tweaking A's reduce so it
returns args as (<class 'a.A'>, 1, 2)... apparently the only signature
you're willing to accept in your A.__new__ method.

Of course, if A.__new__ can have some flexibility, you COULD have it
accept the same signature as tuple.__new__ and then you wouldn't have to
override __reduce_ex__. Or, you could override __reduce_ex__ in other
ways, say:

def __reduce_ex__(self, proto=0):
if proto >= 2:
proto = 1
return super(A, self).__reduce_ex__(proto)

this would avoid the specific optimization that's tripping you up due to
your signature-change in __new__.

The best solution may be to forget __reduce_ex__ and take advantage of
the underdocumented special method __getnewargs__ ...:

class A(tuple):
def __new__(klass, arg1, arg2):
return super(A, klass).__new__(klass, (arg1, arg2))

def __getnewargs__(self):
return self[0], self[1]

This way, you're essentially choosing to explicitly tell the "normal"
__reduce_ex__ about the particular arguments you want to be used for the
__new__ call needed to reconstruct your object on unpickling! This
highlights even better the crucial difference, due strictly to the
change in __new__'s signature...:
((1, 2),)



It IS, I guess, somewhat unfortunate that you have to understand
pickling in some depth to let you change __new__'s signature and yet
fully support pickling... on the other hand, when you're overriding
__new__ you ARE messing with some rather deep infrastructure,
particularly if you alter its signature so that it doesn't accept
"normal" calls any more, so it's not _absurd_ that compensatory depth of
understanding is required;-).


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top