Pickling limitation with instances defining __cmp__/__hash__?

Discussion in 'Python' started by Erik Max Francis, Jun 28, 2005.

  1. I've come across a limitation in unpickling certain types of complex
    data structures which involve instances that override __hash__, and was
    wondering if it was known (basic searches didn't seem to come up with
    anything similar) and if there is a workaround for it short of
    restructuring the data structures in question.

    The fundamental issue rests with defining classes which override __cmp__
    and __hash__ in order to be used as keys in dictionaries (and elements
    of sets). __cmp__ and __hash__ are defined to manipulate a single
    attribute of the class, which never changes for the lifetime of an
    object. In a simplified form:

    class C:

    def __init__(self, x):
    self.x = x

    def __cmp__(self, other):
    return cmp(self.x, other.x)

    def __hash__(self):
    return hash(self.x)

    Even if C contains other members which are manipulated, making it
    technically mutable, since the one attribute (in this example, x) which
    is used for __cmp__ and __hash__ is never changed after the creation of
    the object, it is legal to use as a dictionary key. (Formally, the
    atrribute in question is a name which is guaranteed to be unique.)

    The difficulty arises when the data structures that are built up in C
    contain a circular reference to itself as a dictionary key. In my
    particular case the situation is rather involved, but the simplest
    example which reproduces the problem (using C) would be:

    c = C(1)
    c.m = {c: '1'}

    So far this is fine and behaves as expected. Pickling the object c
    results in no problems. Unpickling it, however, results in an error:

    data = pickle.dumps(c)
    d = pickle.loads(data) # line 25

    Traceback (most recent call last):
    File "/home/max/tmp/hash.py", line 25, in ?
    d = pickle.loads(data)
    File "/usr/local/lib/python2.4/pickle.py", line 1394, in loads
    return Unpickler(file).load()
    File "/usr/local/lib/python2.4/pickle.py", line 872, in load
    dispatch[key](self)
    File "/usr/local/lib/python2.4/pickle.py", line 1218, in load_setitem
    dict[key] = value
    File "/home/max/tmp/hash.py", line 15, in __hash__
    return hash(self.x)
    AttributeError: C instance has no attribute 'x'

    By poking around, one can see that the error is occurring because the
    unpickler algorithm is trying to use the instance as a key in a
    dictionary before the instance has been completely initialized (in fact,
    the __dict__ of this object is the empty dictionary!).

    The error happens regardless of whether pickle or cPickle is used (so I
    used pickle to give a more meaningful traceback above), nor whether the
    protocol is 0 or HIGHEST_PROTOCOL.

    Is this issue known? I don't see any mention of this kind of
    circularity in the Python Library Reference 3.14.4. Second, is there
    any reasonably straightforward workaround to this limitation, short of
    reworking things so that these self-referenced objects aren't used as
    dictionary keys?

    --
    Erik Max Francis && && http://www.alcyone.com/max/
    San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
    You'll learn / Life is worth it / Watch the tables turn
    -- TLC
    Erik Max Francis, Jun 28, 2005
    #1
    1. Advertising

  2. Erik Max Francis wrote:

    > I've come across a limitation in unpickling certain types of complex
    > data structures which involve instances that override __hash__, and was
    > wondering if it was known (basic searches didn't seem to come up with
    > anything similar) and if there is a workaround for it short of
    > restructuring the data structures in question.


    Replying to my own (old) post here, I finally got back to this and found
    the best solution was to define surrogate set and dictionary classes
    that internally used the IDs as keys, eliminating the circular
    dependency. Examples of SafeSet and SafeDict serving this purpose
    follow, though note that I only defined the methods that I used, rather
    than the full and complete interfaces for sets and dictionaries (though
    it should serve as an example for those who need to do more):

    class SafeSet(_ReprMixin):

    @staticmethod
    def ider(thing):
    return thing.id

    def __init__(self, ider=None):
    if ider is not None:
    self.ider = ider
    self._map = {} # id -> thing

    def __len__(self):
    return len(self._map)

    def __contains__(self, thing):
    return self.ider(thing) in self._map

    def add(self, thing):
    key = self.ider(thing)
    if self._map.has_key(key):
    assert self._map[key] is thing
    self._map[key] = thing

    def remove(self, thing):
    del self._map[self.ider(thing)]

    def pop(self):
    iterator = self._map.iterkeys()
    next = iterator.next()
    return self._map.pop(next)

    def clear(self):
    self._map.clear()

    def copy(self):
    return copy.copy(self)

    def update(self, sequence):
    for thing in sequence:
    self.add(thing)

    def difference(self, other):
    thisSet = set(self._map.iterkeys())
    otherSet = set(other._map.iterkeys())
    newSet = thisSet.difference(otherSet)
    safeSet = SafeSet()
    for key in newSet:
    safeSet.add(self._map[key])
    return safeSet

    def __iter__(self):
    return self._map.itervalues()

    def __str__(self):
    return 'set(' + str(self._map.keys()) + ')'


    class SafeDict(_ReprMixin):

    @staticmethod
    def ider(thing):
    return thing.id

    def __init__(self, ider=None):
    if ider is not None:
    self.ider = ider
    self._keys = {} # id -> key
    self._values = {} # id -> value

    def __len__(self):
    return len(self._keys)

    def __contains__(self, thing):
    return self.ider(thing) in self._keys

    def __getitem__(self, thing):
    return self._values[self.ider(thing)]

    def __setitem__(self, thing, value):
    key = self.ider(thing)
    self._keys[key] = thing
    self._values[key] = value

    def __delitem__(self, thing, value):
    key = self.ider(thing)
    del self._keys[key]
    del self._values[key]

    def keys(self):
    return self._keys.values()

    def iterkeys(self):
    return self._keys.itervalues()

    def values(self):
    return self._values.values()

    def itervalues(self):
    return self._values.itervalues()

    def items(self):
    return [(self._keys[x], self._values[x]) for x in self._keys]

    def iteritems(self):
    return ((self._keys[x], self._values[x]) for x in self._keys)

    def clear(self):
    self._keys.clear()
    self._values.clear()

    def copy(self):
    return copy.copy(self)

    def update(self, mapping):
    for key, value in mapping.iteritems():
    self[key] = value

    def has_key(self, thing):
    return self._keys.has_key(self.ider(thing))

    def get(self, thing, default=None):
    return self._values.get(self.ider(thing), default)

    def setdefault(self, thing, default):
    key = self.ider(thing)
    if key in self._keys:
    return self._values[key]
    else:
    self._keys[key] = thing
    self._values[key] = default

    def __iter__(self):
    return self._keys.itervalues()

    def __str__(self):
    return str(self._values)



    --
    Erik Max Francis && && http://www.alcyone.com/max/
    San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
    The only completely consistent people are the dead.
    -- Aldous Huxley
    Erik Max Francis, Aug 9, 2005
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?ISO-8859-1?Q?Jan-Erik_Meyer-L=FCtgens?=

    dictionary keys, __hash__, __cmp__

    =?ISO-8859-1?Q?Jan-Erik_Meyer-L=FCtgens?=, Nov 4, 2003, in forum: Python
    Replies:
    8
    Views:
    365
    Michael Hudson
    Nov 5, 2003
  2. Alex
    Replies:
    1
    Views:
    319
    Alex Martelli
    Oct 29, 2005
  3. Nicolas M. Thiéry

    Pickling classes (not class instances)

    Nicolas M. Thiéry, Jan 10, 2009, in forum: Python
    Replies:
    2
    Views:
    347
    Aaron Brady
    Feb 14, 2009
  4. Tim Delaney
    Replies:
    0
    Views:
    102
    Tim Delaney
    Feb 18, 2013
  5. Terry Reedy
    Replies:
    0
    Views:
    81
    Terry Reedy
    Feb 18, 2013
Loading...

Share This Page