frozendict (v0.1)

K

kj

Following a suggestion from MRAB, I attempted to implement a
frozendict class. My implementation took a lot more work than
something this simple should take, and it still sucks. So I'm
hoping someone can show me a better way. Specifically, I'm hoping
that there is a "recipe" for building off standard classes that
cover all the bases with a minimum of tedious repetitive work.

Here's my little monster:

ass frozendict():
_DESTRUCTIVE = set(('__delitem__ __setitem__ clear pop popitem setdefault '
'update').split())

_NON_DESTRUCTIVE = set(('__contains__ __format__ __getitem__ __hash__ '
'__init__ __iter__ __len__ __repr__ __sizeof__ '
'__str__ copy fromkeys get has_key items iteritems '
'iterkeys itervalues keys values'.split()))

_COMPARISONS = set(('__cmp__ __eq__ __ge__ __gt__ __le__ __lt__ '
'__ne__').split())


def __init__(self, iterable=(), **kwargs):
self._dict = dict(iterable, **kwargs)


def __hash__(self):
return hash(tuple(self.items()))


def __getattr__(self, attrib):
class_ = self.__class__
dict_ = self._dict
if attrib in class_._COMPARISONS:
return lambda x: dict_.__getattribute__(attrib)(x._dict)
elif attrib in class_._NON_DESTRUCTIVE:
return dict_.__getattribute__(attrib)
else:
if attrib in class_._DESTRUCTIVE:
raise TypeError("'%s' object is not mutable" % class_.__name__)
else:
raise AttributeError("'%s' object has no attribute '%s'" %
(class_.__name__, attrib))



I didn't implement this as a subclass of dict to avoid having to
write a dumb little "blocking" method for every destructive dict
method. (I couldn't figure out how to write a loop to define these
overriding methods programmatically, because their signatures are
all over the place.)

I didn't implement it as a subclass of object with an internal dict
delegate, because I couldn't figure a reasonable way to pass certain
object methods to the delegate (since in this case frozendict.__getattr__
wouldn't be called).

The handling of comparison methods is particularly horrific and
inefficient.

If "Beautiful is better than ugly", I sure how there's another way
that is a lot more beautiful than this one.

TIA!

~kj
 
A

Arnaud Delobelle

kj said:
Following a suggestion from MRAB, I attempted to implement a
frozendict class. My implementation took a lot more work than
something this simple should take, and it still sucks. So I'm
hoping someone can show me a better way. Specifically, I'm hoping
that there is a "recipe" for building off standard classes that
cover all the bases with a minimum of tedious repetitive work.

Here's my little monster:

class frozendict(): [...]
def __hash__(self):
return hash(tuple(self.items()))

There is a problem with this hash function stemming from the fact that
the hash value of a tuple is different depending on the order of its
elements, i.e. hash((1, 2)) is not equal to hash((2, 1)).

Now, if there is a hash collision in the keys of the dictionary, then
the order of enumeration of its items will depend on the order in which
they were inserted into the dictionary. Here is a simple example below
((-468864544, 2), ('a', 1))

A simple fix is to use hash(frozenset(self.items())) instead.
 
A

Arnaud Delobelle

Oops I sent off my reply before I had finished!

kj said:
Following a suggestion from MRAB, I attempted to implement a
frozendict class. My implementation took a lot more work than
something this simple should take, and it still sucks. So I'm
hoping someone can show me a better way. Specifically, I'm hoping
that there is a "recipe" for building off standard classes that
cover all the bases with a minimum of tedious repetitive work.

Here's my little monster:

ass frozendict():
_DESTRUCTIVE = set(('__delitem__ __setitem__ clear pop popitem setdefault '
'update').split()) [...]
def __hash__(self):
return hash(tuple(self.items()))

Also, you'll probably want to cache your hash value.
def __getattr__(self, attrib):
class_ = self.__class__
dict_ = self._dict
if attrib in class_._COMPARISONS:
return lambda x: dict_.__getattribute__(attrib)(x._dict)
elif attrib in class_._NON_DESTRUCTIVE:
return dict_.__getattribute__(attrib)
else:
if attrib in class_._DESTRUCTIVE:
raise TypeError("'%s' object is not mutable" % class_.__name__)
else:
raise AttributeError("'%s' object has no attribute '%s'" %
(class_.__name__, attrib))



I didn't implement this as a subclass of dict to avoid having to
write a dumb little "blocking" method for every destructive dict
method. (I couldn't figure out how to write a loop to define these
overriding methods programmatically, because their signatures are
all over the place.)

You could subclass dict and have something like:

class frozendict(dict):
DESTRUCTIVE = ...
for meth in DESTRUCTIVE:
exec """def %s(s, *a, **k): raise TypeError("not mutable")"""
del DESTRUCTIVE
...
 
K

kj

In said:
A simple fix is to use hash(frozenset(self.items())) instead.

Thanks for pointing out the hash bug. It was an oversight: I meant
to write

def __hash__(self):
return hash(sorted(tuple(self.items())))

I imagine that frozenset is better than sorted(tuple(...)) here,
but it's not obvious to me why.

At any rate, using your suggestions in this and your other post,
the current implementation of frozendict stands at:

class frozendict(dict):
for method in ('__delitem__ __setitem__ clear pop popitem setdefault '
'update').split():
exec """
def %s(self, *a, **k):
cn = self.__class__.__name__
raise TypeError("'%%s' object is not mutable" %% cn)
""" % method

def __hash__(self):
return hash(frozenset(self.items()))

....which is a lot nicer!

Thanks!

~kj
 
S

Steven D'Aprano

Thanks for pointing out the hash bug. It was an oversight: I meant to
write

def __hash__(self):
return hash(sorted(tuple(self.items())))

I imagine that frozenset is better than sorted(tuple(...)) here, but
it's not obvious to me why.


Because it's always better to use a well-written, fast, efficient,
correct, well-tested wheel than to invent your own slow, incorrect
wheel :)
 
A

Arnaud Delobelle

kj said:
Thanks for pointing out the hash bug. It was an oversight: I meant
to write

def __hash__(self):
return hash(sorted(tuple(self.items())))

But sorted returns a list!

so you need rather: hash(tuple(sorted(self.items())))

However, it still won't work because some keys may be incomparable.

E.g., try with {1:'a', 1j:'b'}
 
J

Jonas H.

I imagine that frozenset is better than sorted(tuple(...)) here,
but it's not obvious to me why.

dicts are unsorted. That means their item-order is undefined. So are sets.

If you want a hash that is independent from the order of items, you
could ensure the items are always in the same order when you do the
hashing; or you could use a hashing algorithm that ignore item order.

As creating a `frozenset` is probably more efficient than sorting, that
is the preferred solution.

Here's my implementation suggestion:

class frozendict(dict):
def _immutable_error(self, *args, **kwargs):
raise TypeError("%r object is immutable" % self.__class__.__name__)

__setitem__ = __delitem__ = clear = pop \
= popitem = setdefault = update = _immutable_error

def __hash__(self):
return hash(frozenset(self.iteritems()))

Only 9 lines :)

Jonas
 
K

kj

Because it's always better to use a well-written, fast, efficient,
correct, well-tested wheel than to invent your own slow, incorrect
wheel :)

IOW, "don't you worry your little head about why."
 
K

kj

In said:
E.g., try with {1:'a', 1j:'b'}

I see. Thanks for this clarification. I learned a lot from it.

I guess that frozenset must have some way of canonicalizing the
order of its elements that is dependent on their Python values but
not on their comparability. My first thought was that they are
ordered according to their hash values, but this theory doesn't
hold:
abc = ('a', 'b', 'c')
sorted(map(hash, abc)) [-468864544, -340864157, -212863774]
map(hash, frozenset(abc))
[-468864544, -212863774, -340864157]

I.e. the ordering of the elements in the frozenset does not correspond
to the ordering of their hashes in either direction. Hmmm.

I tried to understand this by looking at the C source but I gave
up after 10 fruitless minutes. (This has been invariably the
outcome of all my attempts at finding my way through the Python C
source.)

I guess the take-home message is that frozenset is a more general
way to canonicalize an iterable object than sorting, even though
the reasons for this still remain a mystery to me... Then again,
just looking at the voodoo that goes into algorithms for computing
hashes fills me with despair. As much as I dislike it, sooner or
later I'll have to go on faith.

~kj
 
K

kj

In said:
At any rate, using your [i.e. Arnaud's] suggestions in this and
your other post, the current implementation of frozendict stands
at:
class frozendict(dict):
for method in ('__delitem__ __setitem__ clear pop popitem setdefault '
'update').split():
exec """
def %s(self, *a, **k):
cn = self.__class__.__name__
raise TypeError("'%%s' object is not mutable" %% cn)
""" % method
def __hash__(self):
return hash(frozenset(self.items()))
...which is a lot nicer!

As a side comment on my own post, this is the second time in less
than a week that I find myself having to resort to exec'ing some
code generated from a template. This one is worse, because there's
nothing runtime-dependent about the code being exec'd. It sticks
in my craw somehow. It just doesn't seem quite right that at the
time of doing something as bread-and-butter as implementing a
subclass, one has to choose between explicitly writing a whole
bunch of identical methods, and exec-based hacks like what I'm
doing above. There's got to be a "better" to tell Python: "override
all these superclass methods with this one."

Or maybe the problem here is that, in a perfect world, frozendict
would be in the standard library, as a superclass of dict lacking
all of dict's destructive methods. :)

~kj
 
K

kj

In said:
On 10/08/2010 02:23 AM, kj wrote:
Here's my implementation suggestion:
class frozendict(dict):
def _immutable_error(self, *args, **kwargs):
raise TypeError("%r object is immutable" % self.__class__.__name__)
__setitem__ = __delitem__ = clear = pop \
= popitem = setdefault = update = _immutable_error
def __hash__(self):
return hash(frozenset(self.iteritems()))
Only 9 lines :)

Thanks, you just answered the question I just posted, while I was
still writing it!

~kj
 
J

Jonas H.

I tried to understand this by looking at the C source but I gave
up after 10 fruitless minutes. (This has been invariably the
outcome of all my attempts at finding my way through the Python C
source.)

It's not you. CPython's code is ... [censored]

Anyway, you don't even need to read C code to understand how sets are
implemented.

There is a (now deprecated) Python module, Libs/set.py, that has
implementations for `Set` and `ImmutableSet` (nowadays `set` and
`frozenset`).

The implementation strategy you can see there is quite simple. The code
uses dictionary keys to store the set items and "ignores" the dictionary
values, so that `.add(value)` is implemented as `._dict[value] =
some_value_nobody_cares_about`.

Here comes a very limited example set implementation using a dict:

class PrimitiveSet(object):
def __init__(self):
self._dict = {}

def add(self, value):
self._dict[value] = True

def __contains__(self, value):
return value in self._dict

def __repr__(self):
return 'PrimitiveSet(%r)' % self._dict.keys()
>>> s = PrimitiveSet()
>>> 'hello' in s False
>>> s.add('hello')
>>> 'hello' in s True
>>> s PrimitiveSet(['hello'])
>>> s.add(tuple(xrange(10)))
>>> s PrimitiveSet([(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), 'hello'])
>>> s.add(xrange(5))
>>> s
PrimitiveSet([(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), xrange(5), 'hello'])

This has a few implications for sets:
* dict keys are unordered/sorted. so are sets.
* dict keys are unique. same for set values.
* dict keys have to be hashable (immutable). same for sets values.

So far our implementation is not hashable, and we need a custom
implementation for __hash__ (because dicts aren't hashable, so we cannot
re-use dictionary methods).
There is one requirement for set hashes: they have to be independent of
the item order (there *is* an order in memory of course, and it may vary
depending on the order assignments to our dict are done).

Here is an extract from the Python set implementation,
`BaseSet._compute_hash`:

def _compute_hash(self):
# Calculate hash code for a set by xor'ing the hash codes of
# the elements. This ensures that the hash code does not depend
# on the order in which elements are added to the set. [...]
result = 0
for elt in self:
result ^= hash(elt)
return result

Hope this helps :)

Jonas
 
A

Arnaud Delobelle

kj said:
In said:
E.g., try with {1:'a', 1j:'b'}

I see. Thanks for this clarification. I learned a lot from it.

I guess that frozenset must have some way of canonicalizing the
order of its elements that is dependent on their Python values but
not on their comparability. My first thought was that they are
ordered according to their hash values, but this theory doesn't
hold:
abc = ('a', 'b', 'c')
sorted(map(hash, abc)) [-468864544, -340864157, -212863774]
map(hash, frozenset(abc))
[-468864544, -212863774, -340864157]

I.e. the ordering of the elements in the frozenset does not correspond
to the ordering of their hashes in either direction. Hmmm.

I tried to understand this by looking at the C source but I gave
up after 10 fruitless minutes. (This has been invariably the
outcome of all my attempts at finding my way through the Python C
source.)

I guess the take-home message is that frozenset is a more general
way to canonicalize an iterable object than sorting, even though
the reasons for this still remain a mystery to me... Then again,
just looking at the voodoo that goes into algorithms for computing
hashes fills me with despair. As much as I dislike it, sooner or
later I'll have to go on faith.

Computing the hash value of a container object usually involves
accumulating the hash values of its components, i.e. something like

def hash(container):
hashval = initial_value
for x in container:
hashval = f(hashval, hash(x))
return hashval

Now if one chooses f such that:
* f(x, y) == f(y, x)
* f(x, f(y, z)) = f(f(x, y), z)

(IOW, f defines a commutative and associative operation)

Then it doesn't matter in what order the elements of container are
enumerated, the resulting hash value will always be the same. This
avoids having to find a canonical order to iterate over then elements
in.
 
K

kj

In said:
Hope this helps :)


It did! Thanks! For one thing now I see that I was barking up
the wrong tree in focusing on a canonical order, when, as the code
you posted shows, it is actually not required for hashing.

In fact, I'd come to the conclusion that frozensets had a consistent
order (i.e. frozensets that were equal according to '==' would be
iterated over in the same order), but now I'm not sure that this
is the case. (Granted, semantically, there's nothing in the
definition of a frozenset that would imply a consistent iteration
order.)

Thanks again!

~kj
 
S

Steven D'Aprano

In <[email protected]> Steven D'Aprano


IOW, "don't you worry your little head about why."

Shame on you for deleting the context of my answer.

You said:

"I imagine that frozenset is better than sorted(tuple(...)) here, but
it's not obvious to me why."

Because frozenset already works. Because somebody else has done the work,
and debugged it, and tested it, and profiled it, and optimized it, and
ensured that it is correct and fast. That saves you a lot of effort, and
frees you from having to duplicate the same functionality, and debugging
it, testing it, profiling it, and optimizing it. Every line of code that
you can avoid writing is a win.

If you're a carpenter, it is better to buy a hammer than it is to go out
and dig up your own iron ore, smelt the metal, and forge it into a
hammer: unless you're in the business of making hammers, you've got
better things to do with your time. And if you're a programmer, you've
got better things to do than write a slower, buggier version of code that
already exists.

If you want to know *what* the frozenset does that is better, then go
read the source code. For all I know you might discover that it does the
same thing as your code. It's *still* better to use frozenset, simply
because it already exists. Unless you profile your code and discover that
frozenset is too slow and you can do better, there is no reason not to
use it, and many reasons to use it.

This has nothing to do with "your little head", but about code reuse.
Almost the entire history of programming, from the invention of sub-
routines to the open source movement, is about code reuse.
 
S

Steven D'Aprano

In said:
At any rate, using your [i.e. Arnaud's] suggestions in this and your
other post, the current implementation of frozendict stands at:
class frozendict(dict):
for method in ('__delitem__ __setitem__ clear pop popitem'
'setdefault update').split():
exec """
def %s(self, *a, **k):
cn = self.__class__.__name__
raise TypeError("'%%s' object is not mutable" %% cn)
""" % method
def __hash__(self):
return hash(frozenset(self.items()))
...which is a lot nicer!

As a side comment on my own post, this is the second time in less than a
week that I find myself having to resort to exec'ing some code generated
from a template. This one is worse, because there's nothing
runtime-dependent about the code being exec'd.

Er, *all* Python classes are created at runtime. The class statement is
executed at runtime, not compile time. Not that it really matters.

But in any case, it's easy enough to avoid exec with a factory function.
The following is untested, but should work:


def no_mutate_factory(name):
def inner(self, *args, **kwargs):
cn = self.__class__.__name__
raise TypeError('%s instance is not mutable' % cn)
inner.__name__ = name
return inner


class FrozenDict(dict):
update = no_mutate_factory('update')
clear = no_mutate_factory('clear')
# ...


It's a bit messy to have to list the name of the method twice. But you
can inject the appropriate methods into the class by adding them from
outside the class block:

class FrozenDict(dict):
pass

for name in 'update clear'.split() # and the others
setattr(FrozenDict, name, no_mutate_factory(name))

del name # Avoid namespace pollution, if you care.
 
J

John Nagle

Following a suggestion from MRAB, I attempted to implement a
frozendict class.

That really should be built into the language. "dict" is the
last built-in type that doesn't have an immutable form.

John Nagle
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top