sorteddict PEP proposal [started off as orderedict]

M

Mark Summerfield

I disagree with your reason for introducing a sorteddict - the main
reason should be if it makes code that uses it simpler and more
readable, with efficiency (within reasonable limits) being of
secondary importance. Unfortunately for sorteddict, your code is
simpler and more explicit for most obvious use cases.

But, with Bill's cached sorted key list it'll be reasonably efficient,
and it's likely possible to update the sorted cache more efficiently
than resorting if there's only been a small number of insertions since
the last sort, but I haven't the time to do a thorough job.

I think that keeping a sorted list of keys is worthwhile, but because
the key or cmp functions could access keys or values or both you have
to invalidate whenever a key or value is added or changed. Here's my
version (v. similar to yours of course, but sorting when necessary):

class sorteddict(dict):

def __init__(self, iterable=None, cmp=None, key=None,
reverse=False):
if iterable is None:
iterable = []
dict.__init__(self, iterable)
self.__cmp = cmp
self.__key = key
self.__reverse = reverse
self.__keys = None

def update(self, *args, **kwargs):
dict.update(self, *args, **kwargs)
self.__keys = None

@classmethod
def fromkeys(cls, iterable, value=None):
dictionary = cls()
for key in iterable:
dictionary[key] = value
return dictionary

def copy(self):
dictionary = sorteddict(dict.copy(self), cmp=self.__cmp,
key=self.__key,
reverse=self.__reverse)
dictionary.__keys = None if self.__keys is None else
self.__keys[:]
return dictionary

def clear(self):
self.__keys = None
dict.clear(self)

def setdefault(self, key, value):
self.__keys = None
return dict.setdefault(self, key, value)

def pop(self, key, value=None):
if key not in self:
return value
self.__keys = None
return dict.pop(self, key, value)

def popitem(self):
self.__keys = None
return dict.popitem(self)

def keys(self):
if self.__keys is None:
self.__keys = sorted(dict.keys(self), cmp=self.__cmp,
key=self.__key,
reverse=self.__reverse)
return self.__keys[:]

def values(self):
if self.__keys is None:
self.__keys = sorted(dict.keys(self), cmp=self.__cmp,
key=self.__key,
reverse=self.__reverse)
return [self[key] for key in self.__keys]

def items(self):
if self.__keys is None:
self.__keys = sorted(dict.keys(self), cmp=self.__cmp,
key=self.__key,
reverse=self.__reverse)
return [(key, self[key]) for key in self.__keys]

def __iter__(self):
if self.__keys is None:
self.__keys = sorted(dict.keys(self), cmp=self.__cmp,
key=self.__key,
reverse=self.__reverse)
return iter(self.__keys)

def iterkeys(self):
if self.__keys is None:
self.__keys = sorted(dict.keys(self), cmp=self.__cmp,
key=self.__key,
reverse=self.__reverse)
return iter(self.__keys)

def itervalues(self):
if self.__keys is None:
self.__keys = sorted(dict.keys(self), cmp=self.__cmp,
key=self.__key,
reverse=self.__reverse)
for key in self.__keys:
yield self[key]

def iteritems(self):
if self.__keys is None:
self.__keys = sorted(dict.keys(self), cmp=self.__cmp,
key=self.__key,
reverse=self.__reverse)
for key in self.__keys:
yield key, self[key]

def __delitem__(self, key):
self.__keys = None
dict.__delitem__(self, key)

def __setitem__(self, key, value):
self.__keys = None
dict.__setitem__(self, key, value)

def __repr__(self):
return str(self)

def __str__(self):
if self.__keys is None:
self.__keys = sorted(dict.keys(self), cmp=self.__cmp,
key=self.__key,
reverse=self.__reverse)
return "{%s}" % ", ".join(
["%r: %r" % (key, self[key]) for key in self.__keys])
 
P

Paul Hankin

...
class sorteddict(dict):

...
if self.__keys is None:
self.__keys = sorted(dict.keys(self), cmp=self.__cmp,
key=self.__key,
reverse=self.__reverse)

You'd be better defining __keys like this:

def __getkeys(self):
if self.__keycache is None:
self.__keycache = dict.keys(self)
self.__keycache.sort(cmp=...)
return self.__keycache

__keys = property(__getkeys)

and where you have 'self.__keys = None', either replace with
'self.__keycache = None' or more cleanly with a call to:

def __invalidate_key_cache(self):
self.__keycache = None

to improve the code quality.
 
M

Mark Summerfield

You'd be better defining __keys like this:

def __getkeys(self):
if self.__keycache is None:
self.__keycache = dict.keys(self)
self.__keycache.sort(cmp=...)
return self.__keycache

__keys = property(__getkeys)

and where you have 'self.__keys = None', either replace with
'self.__keycache = None' or more cleanly with a call to:

def __invalidate_key_cache(self):
self.__keycache = None

to improve the code quality.

Yes, that's much better.

As you've no doubt realised, this particular implementation gives best
performance when the pattern of use is: lots of edits, lots of
lookups, ..., and gives worst performance when the pattern of use is:
edit, lookup, edit, lookup (in which case using a dict and sorted() is
probably better).

So there is lots of scope for someone to do a version that has good
performance for all patterns of use:)

I've put a full version with doctests & docstrings on PyPI:
http://pypi.python.org/pypi/sorteddict

Here's the stripped version (just 92 lines):

class sorteddict(dict):

def __init__(self, iterable=None, cmp=None, key=None,
reverse=False):
if iterable is None:
iterable = []
dict.__init__(self, iterable)
self.__cmp = cmp
self.__key = key
self.__reverse = reverse
self.__keycache = None

@property
def __keys(self):
if self.__keycache is None:
self.__keycache = dict.keys(self)
self.__keycache.sort(cmp=self.__cmp, key=self.__key,
reverse=self.__reverse)
return self.__keycache

def __invalidate_key_cache(self):
self.__keycache = None

def update(self, *args, **kwargs):
self.__invalidate_key_cache()
dict.update(self, *args, **kwargs)

@classmethod
def fromkeys(cls, iterable, value=None):
dictionary = cls()
for key in iterable:
dictionary[key] = value
return dictionary

def copy(self):
return sorteddict(dict.copy(self), cmp=self.__cmp,
key=self.__key, reverse=self.__reverse)

def clear(self):
self.__invalidate_key_cache()
dict.clear(self)

def setdefault(self, key, value):
self.__invalidate_key_cache()
return dict.setdefault(self, key, value)

def pop(self, key, value=None):
if key not in self:
return value
self.__invalidate_key_cache()
return dict.pop(self, key, value)

def popitem(self):
self.__invalidate_key_cache()
return dict.popitem(self)

def keys(self):
return self.__keys[:]

def values(self):
return [self[key] for key in self.__keys]

def items(self):
return [(key, self[key]) for key in self.__keys]

def __iter__(self):
return iter(self.__keys)

def iterkeys(self):
return iter(self.__keys)

def itervalues(self):
for key in self.__keys:
yield self[key]

def iteritems(self):
for key in self.__keys:
yield key, self[key]

def __delitem__(self, key):
self.__invalidate_key_cache()
dict.__delitem__(self, key)

def __setitem__(self, key, value):
self.__invalidate_key_cache()
dict.__setitem__(self, key, value)

def __repr__(self):
raise NotImplementedError()

def __str__(self):
return "{%s}" % ", ".join(
["%r: %r" % (key, self[key]) for key in self.__keys])
 
D

Duncan Booth

Mark Summerfield said:
As you've no doubt realised, this particular implementation gives best
performance when the pattern of use is: lots of edits, lots of
lookups, ..., and gives worst performance when the pattern of use is:
edit, lookup, edit, lookup (in which case using a dict and sorted() is
probably better).

So there is lots of scope for someone to do a version that has good
performance for all patterns of use:)

I that's the point though: you can't write one implementation that has good
performance for all patterns of use: you have to either compromise on
performance somewhere, or use an implementation specifically matched to
your use case.

For example, if the use case was some updates followed by iterating over
the first few keys only, it may be faster to use a heap for iteration.

I suspect that for many use patterns you could improve performance
significantly by having two levels of invalidation for the the key list: in
__setitem__, if the key already exists you don't need to discard the list
in __keycache (although if a key or cmp function was provided it may no
longer be sorted). In that case it might be worthwhile keeping __keycache
but flagging it as needing to be sorted again next time it is used. Given
that Python's sort is very fast on sorted or nearly sorted lists this could
provide a worthwhile speedup for cases where the values change but the keys
don't change often.
 
H

Hrvoje Niksic

Duncan Booth said:
I that's the point though: you can't write one implementation that has good
performance for all patterns of use

An implementation of sorted dict using a balanced tree as the
underlying data structure would give decent performance in all the
mentioned use cases. For example, red-black trees search, insert, and
delete in O(log n) time.
 
M

Mark Summerfield

An implementation of sorted dict using a balanced tree as the
underlying data structure would give decent performance in all the
mentioned use cases. For example, red-black trees search, insert, and
delete in O(log n) time.


Basically, as implemented, I have to invalidate if there is any change
to a key _or_ to a value because if cmp or key functions are specified
they could be using the key, the value, or even both. (If key and cmp
were both None and reverse was False, I could use the bisect module
and maintain the keys in sorted order at all times in that case, but
then performance would vary considerably depending on use which I
think would be v. confusing for users.)

You are quite right that using a balanced tree (red-black, AVL, or
b*tree) would provide decent performance in all the use cases. There
appear to be two on PyPI (rbtree and pyavl), but both seem to be C
extensions. Antoon Pardon has a pure Python AVL tree, but whether he
could or would produce a version that had the sorteddict API, I don't
know.
 
H

Hrvoje Niksic

Mark Summerfield said:
An implementation of sorted dict using a balanced tree as the
underlying data structure would give decent performance in all the
mentioned use cases. For example, red-black trees search, insert, and
delete in O(log n) time.

Basically, as implemented, I have to invalidate if there is any
change [...]

No argument here, as long as the limitation is understood to be a
consequence of the current implementation model. Seriously proposing
a sorteddict that is a mere syntactic sugar over dict dooms the PEP to
rejection.

Major programming language libraries have included sorted mapping and
set types for a while now, making the performance and complexity
constraints generally well understood. We should make use of that
knowledge when designing sorteddict.
 
M

Mark Summerfield

Basically, as implemented, I have to invalidate if there is any
change [...]

No argument here, as long as the limitation is understood to be a
consequence of the current implementation model. Seriously proposing
a sorteddict that is a mere syntactic sugar over dict dooms the PEP to
rejection.

On the Py3K list GvR made it clear that sorteddict wouldn't be in
Python until an implementation had been in popular use for some
time... so I will not put forward a PEP for it.
Major programming language libraries have included sorted mapping and
set types for a while now, making the performance and complexity
constraints generally well understood. We should make use of that
knowledge when designing sorteddict.

I know! I use Python and C++ and Qt, so what with the STL and Qt's
containers I'm used to having far more containers to choose from than
Python offers. I think Python could do with both sorteddict and
sortedset in the collections module, but to get decent performance
will require a balanced tree (or a skiplist) implementation, and I
don't have the time to do that. Apart from Antoon Pardon's AVL tree,
there's also Chris Gonnerman's RBTree, also pure Python, with a dict
API and that seems to accept a cmp function, but again, I don't know
whether it could be adapted to the sorteddict((mapping | sequence |
nothing), cmp=None, key=None, reverse=None) API that Paul Hankin
described.
 
P

Paul Hankin

I that's the point though: you can't write one implementation that has good
performance for all patterns of use: you have to either compromise on
performance somewhere, or use an implementation specifically matched to
your use case.

For example, if the use case was some updates followed by iterating over
the first few keys only, it may be faster to use a heap for iteration.

I suspect that for many use patterns you could improve performance
significantly by having two levels of invalidation for the the key list: in
__setitem__, if the key already exists you don't need to discard the list
in __keycache (although if a key or cmp function was provided it may no
longer be sorted). In that case it might be worthwhile keeping __keycache
but flagging it as needing to be sorted again next time it is used. Given
that Python's sort is very fast on sorted or nearly sorted lists this could
provide a worthwhile speedup for cases where the values change but the keys
don't change often.

More flexibly, keep a set of inserted keys that haven't yet been
included in the sorted list, and a set of deleted keys that haven't
yet been removed from the sorted list. The cache is invalid if either
of these sets are empty - and to make it valid you can choose what to
do based on the sizes of the two sets (and the sorted list). For
instance, if there's just been one insertion you're probably better
doing an insertion rather than a full resort. Of course, there's a few
nasty cases here but it's always possible to just throw away the
sorted list and reconstruct it from the dict keys when things get too
hairy (eg, the user deletes a key that's in the inserted-but-not-yet-
sorted set).
 
P

Paul Hankin

An implementation of sorted dict using a balanced tree as the
underlying data structure would give decent performance in all the
mentioned use cases. For example, red-black trees search, insert, and
delete in O(log n) time.

But dicts do search, insert and delete in O(1) time, so using some
variety of balanced tree will give you much worse performance when
you're doing regular dict operations.
 
A

Antoon Pardon

Mark Summerfield said:
I that's the point though: you can't write one implementation
that has good performance for all patterns of use
An implementation of sorted dict using a balanced tree as the
underlying data structure would give decent performance in all the
mentioned use cases. For example, red-black trees search, insert, and
delete in O(log n) time.
Basically, as implemented, I have to invalidate if there is any
change [...]

No argument here, as long as the limitation is understood to be a
consequence of the current implementation model. Seriously proposing
a sorteddict that is a mere syntactic sugar over dict dooms the PEP to
rejection.

On the Py3K list GvR made it clear that sorteddict wouldn't be in
Python until an implementation had been in popular use for some
time... so I will not put forward a PEP for it.
Major programming language libraries have included sorted mapping and
set types for a while now, making the performance and complexity
constraints generally well understood. We should make use of that
knowledge when designing sorteddict.

I know! I use Python and C++ and Qt, so what with the STL and Qt's
containers I'm used to having far more containers to choose from than
Python offers. I think Python could do with both sorteddict and
sortedset in the collections module, but to get decent performance
will require a balanced tree (or a skiplist) implementation, and I
don't have the time to do that. Apart from Antoon Pardon's AVL tree,
there's also Chris Gonnerman's RBTree, also pure Python, with a dict
API and that seems to accept a cmp function, but again, I don't know
whether it could be adapted to the sorteddict((mapping | sequence |
nothing), cmp=None, key=None, reverse=None) API that Paul Hankin
described.

Well you should decide what you want. In a previous exchange one of the
things that was wanted was that you could already seed a tree by using
key word arguments so that you could do the following:

t=Tree(a=15, b=30)

which would be equivallent to:

t=Tree()
t.update(a=15,b=30)

But with the above proposal

t=Tree(cmp=cmp)

is not equivallent to

t=Tree()
t.update(cmp=cmp)

So it seems the API needs some more sorting out.
 
H

Hrvoje Niksic

Paul Hankin said:
But dicts do search, insert and delete in O(1) time, so using some
variety of balanced tree will give you much worse performance when
you're doing regular dict operations.

I wouldn't call it "much worse"; while O(log(n)) is worse than O(1),
it's still very fast, which is why popular programming language
libraries have an ordered mapping type based on balanced trees. Also
note that dict performance can degrade with hash collisions, while
trees can maintain complexity guarantees on all operations.

In the end, it's a tradeoff. Hash tables offer O(1) access, but lack
ordering. Balanced trees offer ordering at the price of O(log n)
access. Both have their uses, but neither is syntactic sugar for the
other.
 
M

Mark Summerfield

I that's the point though: you can't write one implementation
that has good performance for all patterns of use
An implementation of sorted dict using a balanced tree as the
underlying data structure would give decent performance in all the
mentioned use cases. For example, red-black trees search, insert, and
delete in O(log n) time.
Basically, as implemented, I have to invalidate if there is any
change [...]
No argument here, as long as the limitation is understood to be a
consequence of the current implementation model. Seriously proposing
a sorteddict that is a mere syntactic sugar over dict dooms the PEP to
rejection.
On the Py3K list GvR made it clear that sorteddict wouldn't be in
Python until an implementation had been in popular use for some
time... so I will not put forward a PEP for it.
I know! I use Python and C++ and Qt, so what with the STL and Qt's
containers I'm used to having far more containers to choose from than
Python offers. I think Python could do with both sorteddict and
sortedset in the collections module, but to get decent performance
will require a balanced tree (or a skiplist) implementation, and I
don't have the time to do that. Apart from Antoon Pardon's AVL tree,
there's also Chris Gonnerman's RBTree, also pure Python, with a dict
API and that seems to accept a cmp function, but again, I don't know
whether it could be adapted to the sorteddict((mapping | sequence |
nothing), cmp=None, key=None, reverse=None) API that Paul Hankin
described.

Well you should decide what you want. In a previous exchange one of the
things that was wanted was that you could already seed a tree by using
key word arguments so that you could do the following:

t=Tree(a=15, b=30)

which would be equivallent to:

t=Tree()
t.update(a=15,b=30)

But with the above proposal

t=Tree(cmp=cmp)

is not equivallent to

t=Tree()
t.update(cmp=cmp)

So it seems the API needs some more sorting out.

I think you missed some of the previous postings.

The sorteddict API that has emerged so far is (1) apart from the
constructor, everything is identical to dict, (2) the constructor
takes the same args as sorted(), so if you want to seed with a dict or
with keywords you write sorteddict(dict(a=1,b=2), ...), (or you could
create a sorteddict and use update() since that takes the same args as
dict's constructor).

Could your AVLTree support cmp and keys and reverse?
 
D

Duncan Booth

Paul Hankin said:
More flexibly, keep a set of inserted keys that haven't yet been
included in the sorted list, and a set of deleted keys that haven't
yet been removed from the sorted list. The cache is invalid if either
of these sets are empty - and to make it valid you can choose what to
do based on the sizes of the two sets (and the sorted list). For
instance, if there's just been one insertion you're probably better
doing an insertion rather than a full resort. Of course, there's a few
nasty cases here but it's always possible to just throw away the
sorted list and reconstruct it from the dict keys when things get too
hairy (eg, the user deletes a key that's in the inserted-but-not-yet-
sorted set).

Yes that sounds good. Did you mean 'The cache is invalid if either of
these sets is not empty'?

If you delete a key which is in the inserted set you can simply delete
it from the inserted set. The inserted set can be added to sorted list
simply by appending and sorting, and the deleted keys can be removed
with a list comprehension. The only other thing you need is to catch
when the sort key has changed: a public method to request a resort would
handle this (but of course it wouldn't actually resort until it needed
the sorted keys again.)

Mark's sortteddict.py modified to avoid calling __invalidate_key (and
__repr__ also implemented):

------------ sorteddict.py -------------
#!/usr/bin/env python
# Copyright (c) 2007 Qtrac Ltd. All rights reserved.
# This module is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
# The API and implementation are based on ideas from Paul Hankin writing
# on comp.lang.python.

"""
A dictionary that is sorted by key or by the given cmp or key function.

Provides a dictionary with the same methods and behavior as a standard
dict and that can be used as a drop-in replacement for a dict (apart
from the constructor), but which always returns iterators and lists
(whether of keys or values) in sorted order. It does not matter when
items are inserted or when removed, the items in the sorteddict are
always returned in sorted order. The ordering is implicitly based on the
key's __lt__() (or failing that __cmp__()) method if no cmp or key
function is given.

The main benefit of sorteddicts is that you never have to explicitly
sort.

One use case is where you have a set of objects that you want to make
available in various sorted orders. You could, of course, sort on demand,
but if the number of objects is very large (and if you're prepared to
sacrifice some memory), it may be faster to use a sorteddict. For
example:
.... def __init__(self, forename, surname, elected):
.... self.forename = forename
.... self.surname = surname
.... self.elected = elected.... ("Ramsay", "MacDonald", "1924-01-22"),
.... ("Arthur", "Balfour", "1902-07-11"),
.... ("Herbert Henry", "Asquith", "1908-04-07"),
.... ("Stanley", "Baldwin", "1924-11-04"),
.... ("David", "Lloyd George", "1916-12-07"),
.... ("Andrew", "Bonar Law", "1922-10-23"),
.... ("Henry", "Campbell-Bannerman", "1905-12-05"),
.... ("Stanley", "Baldwin", "1923-05-23"),
.... ("Ramsay", "MacDonald", "1929-06-05")):
.... pm = PrimeMinister(forename, surname, elected)
.... byForename[pm] = pm
.... bySurname[pm] = pm
.... byName[pm] = pm
.... byElected[pm] = pm
[pm.forename for pm in byForename.values()]
['Andrew', 'Arthur', 'David', 'Henry', 'Herbert Henry', 'Ramsay', \
'Ramsay', 'Stanley', 'Stanley']
[pm.surname for pm in bySurname.values()]
['Asquith', 'Baldwin', 'Baldwin', 'Balfour', 'Bonar Law', \
'Campbell-Bannerman', 'Lloyd George', 'MacDonald', 'MacDonald']
["%s %s %s" % (pm.forename, pm.surname, pm.elected) \
for pm in byName.values()]
['Ramsay MacDonald 1929-06-05', 'Ramsay MacDonald 1924-01-22', \
'David Lloyd George 1916-12-07', 'Henry Campbell-Bannerman 1905-12-05', \
'Andrew Bonar Law 1922-10-23', 'Arthur Balfour 1902-07-11', \
'Stanley Baldwin 1924-11-04', 'Stanley Baldwin 1923-05-23', \
'Herbert Henry Asquith 1908-04-07']
["%s %s %s" % (pm.forename, pm.surname, pm.elected) \
for pm in byElected.values()]
['Arthur Balfour 1902-07-11', 'Henry Campbell-Bannerman 1905-12-05', \
'Herbert Henry Asquith 1908-04-07', 'David Lloyd George 1916-12-07', \
'Andrew Bonar Law 1922-10-23', 'Stanley Baldwin 1923-05-23', \
'Ramsay MacDonald 1924-01-22', 'Stanley Baldwin 1924-11-04', \
'Ramsay MacDonald 1929-06-05']

Thanks to Python's object references, even though there are four
sorteddicts referring to the same PrimeMinister objects, only one instance
of each object is held in memory.
files = ["README.txt", "readme", "MANIFEST", "test.py"]
d = sorteddict([(name, name) for name in files], .... cmp=lambda a, b: cmp(a.lower(), b.lower()))
d.keys()
['MANIFEST', 'readme', 'README.txt', 'test.py']

Here are a few tests for some of the base class methods that are not
reimplemented:
d = sorteddict(dict(s=1, a=2, n=3, i=4, t=5, y=6))
d.get("X", 21) 21
d.get("i") 4
d.has_key("a") True
d.has_key("x") False
"a" in d True
"x" in d False
len(d) 6
del d["n"]
del d["y"]
len(d) 4
d.clear()
len(d) 0
d = sorteddict(dict(s=1, a=2, n=3, i=4, t=5, y=6))
d["i"] 4
d["y"] 6
d["z"]
Traceback (most recent call last):
....
KeyError: 'z'
"""

__author__ = "Mark Summerfield"
__version__ = "1.1.0"


class sorteddict(dict):
"""A dictionary that is sorted by key or by the given cmp or key function.

The sorteddict always returns any list or iterator in sorted order.
For best performance prefer a key argument to a cmp one. If neither
is given __lt__() (falling back to __cmp__()) will be used for
ordering.

This particular implementation has reasonable performance if the
pattern of use is: lots of edits, lots of lookups, ..., but gives
its worst performance if the pattern of use is: edit, lookup, edit,
lookup, ..., in which case using a plain dict and sorted() will
probably be better.

If you want to initialize with a dict, either use
sorteddict(dict(...), ...), or create the sorteddict and then call
update with the arguments normally passed to a dict constructor.
"""

def __init__(self, iterable=None, cmp=None, key=None, reverse=False):
"""Initializes the sorteddict using the same arguments as sorted()
>>> d = sorteddict(dict(s=1, a=2, n=3, i=4, t=5, y=6))
>>> d.items() [('a', 2), ('i', 4), ('n', 3), ('s', 1), ('t', 5), ('y', 6)]
>>> str(sorteddict()) '{}'
>>> e = sorteddict(d)
>>> e.items()
[('a', 2), ('i', 4), ('n', 3), ('s', 1), ('t', 5), ('y', 6)]
"""
if iterable is None:
iterable = []
dict.__init__(self, iterable)
self.__cmp = cmp
self.__key = key
self.__reverse = reverse
self.__keycache = None
self.__addkeys = set()
self.__delkeys = set()

@property
def __keys(self):
if self.__keycache is None:
self.__keycache = dict.keys(self)
self.__addkeys = set()
self.__delkeys = set()
self.__keycache.sort(cmp=self.__cmp, key=self.__key,
reverse=self.__reverse)
else:
if self.__delkeys:
delkeys = self.__delkeys
self.__keycache = [key for key in self.__keycache if key not in delkeys]
self.__delkeys = set()

if self.__addkeys:
self.__keycache += list(self.__addkeys)
self.__addkeys = set()
self.__keycache.sort(cmp=self.__cmp, key=self.__key,
reverse=self.__reverse)
return self.__keycache


def __invalidate_key_cache(self):
self.__keycache = None
self.__addkeys = set()
self.__delkeys = set()

def __addkey(self, key):
if key in self.__delkeys:
del self.__delkeys[key]
else:
self.__addkeys.add(key)

def __removekey(self, key):
if key in self.__addkeys:
del self.__addkeys[key]
else:
self.__delkeys.add(key)

def update(self, *args, **kwargs):
"""Updates the sorteddict using the same arguments as dict
>>> d = sorteddict(dict(s=1, a=2, n=3, i=4, t=5))
>>> d.update(a=4, z=-4)
>>> d.items() [('a', 4), ('i', 4), ('n', 3), ('s', 1), ('t', 5), ('z', -4)]
>>> del d["a"]
>>> del d["i"]
>>> d.update({'g': 9}, a=1, z=3)
>>> d.items() [('a', 1), ('g', 9), ('n', 3), ('s', 1), ('t', 5), ('z', 3)]
>>> e = sorteddict(dict(p=4, q=5))
>>> del d["a"]
>>> del d["n"]
>>> e.update(d)
>>> e.items()
[('g', 9), ('p', 4), ('q', 5), ('s', 1), ('t', 5), ('z', 3)]
"""
self.__invalidate_key_cache()
dict.update(self, *args, **kwargs)


@classmethod
def fromkeys(cls, iterable, value=None):
"""A class method that returns an sorteddict whose keys are
from the iterable and each of whose values is value
>>> d = sorteddict()
>>> e = d.fromkeys("KYLIE", 21)
>>> e.items() [('E', 21), ('I', 21), ('K', 21), ('L', 21), ('Y', 21)]
>>> e = sorteddict.fromkeys("KYLIE", 21)
>>> e.items()
[('E', 21), ('I', 21), ('K', 21), ('L', 21), ('Y', 21)]
"""
dictionary = cls()
for key in iterable:
dictionary[key] = value
return dictionary


def copy(self):
"""Returns a shallow copy of this sorteddict
>>> d = sorteddict(dict(s=1, a=2, n=3, i=4, t=5, y=6))
>>> e = d.copy()
>>> e.items() [('a', 2), ('i', 4), ('n', 3), ('s', 1), ('t', 5), ('y', 6)]
>>> d = sorteddict()
>>> e = d.copy()
>>> e.items()
[]
"""
return sorteddict(dict.copy(self), cmp=self.__cmp,
key=self.__key, reverse=self.__reverse)


def clear(self):
"""Removes every item from this sorteddict
>>> d = sorteddict(dict(s=1, a=2, n=3, i=4, t=5, y=6))
>>> len(d) 6
>>> d.clear()
>>> len(d) 0
>>> d["m"] = 3
>>> d["a"] = 5
>>> d["z"] = 7
>>> d["e"] = 9
>>> d.keys()
['a', 'e', 'm', 'z']
"""
self.__invalidate_key_cache()
dict.clear(self)


def setdefault(self, key, value):
"""If key is in the dictionary, returns its value;
otherwise adds the key with the given value which is also
returned
>>> d = sorteddict(dict(s=1, a=2, n=3, i=4, t=5, y=6))
>>> d.setdefault("n", 99) 3
>>> d.values() [2, 4, 3, 1, 5, 6]
>>> d.setdefault("r", -20) -20
>>> d.items()[2:] [('n', 3), ('r', -20), ('s', 1), ('t', 5), ('y', 6)]
>>> d.setdefault("@", -11) -11
>>> d.setdefault("z", 99) 99
>>> d.setdefault("m", 50) 50
>>> d.keys()
['@', 'a', 'i', 'm', 'n', 'r', 's', 't', 'y', 'z']
"""
if key not in self:
self.__addkey(key)
return dict.setdefault(self, key, value)


def pop(self, key, value=None):
"""If key is in the dictionary, returns its value and removes it
from the dictionary; otherwise returns the given value
>>> d = sorteddict(dict(s=1, a=2, n=3, i=4, t=5, y=6))
>>> d.pop("n") 3
>>> "n" in d False
>>> d.pop("q", 41) 41
>>> d.keys() ['a', 'i', 's', 't', 'y']
>>> d.pop("a") 2
>>> d.pop("t") 5
>>> d.keys()
['i', 's', 'y']
"""
if key not in self:
return value
self.__removekey(key)
return dict.pop(self, key, value)


def popitem(self):
"""Returns and removes an arbitrary item from the dictionary
3
"""
item = dict.popitem(self)
self.__removekey(item[0])
return item


def keys(self):
"""Returns the dictionary's keys in sorted order
['a', 'i', 'n', 's', 't', 'y']
"""
return self.__keys[:]


def values(self):
"""Returns the dictionary's values in key order
>>> d = sorteddict(dict(s=1, a=2, n=3, i=4, t=5, y=6))
>>> d.values() [2, 4, 3, 1, 5, 6]
>>> d = sorteddict(dict(s=1, a=2, n=3, i=4, t=5, y=6), ... reverse=True)
>>> d.values() [6, 5, 1, 3, 4, 2]
>>> d = sorteddict(dict(S=1, a=2, N=3, i=4, T=5, y=6), ... cmp=lambda a, b: cmp(a.lower(), b.lower()))
>>> d.keys()
['a', 'i', 'N', 'S', 'T', 'y']
"""
return [self[key] for key in self.__keys]


def items(self):
"""Returns the dictionary's items in key order
[('a', 2), ('i', 4), ('n', 3), ('s', 1), ('t', 5), ('y', 6)]
"""
return [(key, self[key]) for key in self.__keys]


def __iter__(self):
"""Returns an iterator over the dictionary's keys
['a', 'i', 'n', 's', 't', 'y']
"""
return iter(self.__keys)


def iterkeys(self):
"""Returns an iterator over the dictionary's keys
['a', 'i', 'n', 's', 't', 'y']
"""
return iter(self.__keys)


def itervalues(self):
"""Returns an iterator over the dictionary's values in key order
[2, 4, 3, 1, 5, 6]
"""
for key in self.__keys:
yield self[key]


def iteritems(self):
"""Returns an iterator over the dictionary's values in key order
[('a', 2), ('i', 4), ('n', 3), ('s', 1), ('t', 5), ('y', 6)]
"""
for key in self.__keys:
yield key, self[key]


def __delitem__(self, key):
"""Deletes the item with the given key from the dictionary
>>> d = sorteddict(dict(s=1, a=2, n=3, i=4, t=5, y=6))
>>> del d["s"]
>>> del d["y"]
>>> del d["a"]
>>> d.keys()
['i', 'n', 't']
"""
self.__removekey(key)
dict.__delitem__(self, key)


def __setitem__(self, key, value):
"""If key is in the dictionary, sets its value to value;
otherwise adds the key to the dictionary with the given value
>>> d = sorteddict(dict(s=1, a=2, n=3, i=4, t=5, y=6))
>>> d["t"] = -17
>>> d["z"] = 43
>>> d["@"] = -11
>>> d["m"] = 22
>>> d["r"] = 5
>>> d.keys()
['@', 'a', 'i', 'm', 'n', 'r', 's', 't', 'y', 'z']
"""
if key not in self:
self.__addkey(key)
dict.__setitem__(self, key, value)


def __repr__(self):
""" sorteddict({'a': 1}, key=<function keyfn at ...>, reverse=True)
"""
args = []
if self:

args.append(dict.__repr__(self))
if self.__cmp is not None:
args.append('cmp=%r' % self.__cmp)
if self.__key is not None:
args.append('key=%r' % self.__key)
if self.__reverse is not False:
args.append('reverse=%r' % self.__reverse)
return "%s(%s)" % (self.__class__.__name__, ', '.join(args))


def __str__(self):
"""Returns a human readable string representation of the
dictionary

The returned string is proportional in size to the number of
items so could be very large.
"{1: 'x', 2: 'a', 3: 'm'}"
"""
return "{%s}" % ", ".join(
["%r: %r" % (key, self[key]) for key in self.__keys])


if __name__ == "__main__":
import doctest
doctest.testmod()

----------------------------------------
 
P

Paul Hankin

Yes that sounds good. Did you mean 'The cache is invalid if either of
these sets is not empty'?

Yes :)
If you delete a key which is in the inserted set you can simply delete
it from the inserted set.

No, in case it was already in the sorted list before the insert. You
have to remove it from the inserted set AND add it to the deleted set.
 
M

Mark Summerfield

Yes :)


No, in case it was already in the sorted list before the insert. You
have to remove it from the inserted set AND add it to the deleted set.

So should Duncan's

def __removekey(self, key):
if key in self.__addkeys:
del self.__addkeys[key]
else:
self.__delkeys.add(key)

be changed to:

def __removekey(self, key):
if key in self.__addkeys:
del self.__addkeys[key]
self.__delkeys.add(key)

?

(BTW I'm happy to put up a new version on PyPI, but I'm v. conscious
of the fact that the key ideas and code have come from you and Duncan,
so if either of you would like to adopt it, I will gladly step aside.
I just want a sorteddict in Python, and I don't mind who does it:)
 
S

Steve Holden

Mark said:
I think you missed some of the previous postings.

The sorteddict API that has emerged so far is (1) apart from the
constructor, everything is identical to dict, (2) the constructor
takes the same args as sorted(), so if you want to seed with a dict or
with keywords you write sorteddict(dict(a=1,b=2), ...), (or you could
create a sorteddict and use update() since that takes the same args as
dict's constructor).

Could your AVLTree support cmp and keys and reverse?
I don't see how you can guarantee ordering of the seeded components,
since you are creating an inherently unpredictable dict as the component
to specify the initial contents of the sorteddict.

Or is an arbitrary initial ordering acceptable?

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline
 
P

Paul Hankin

No, in case it was already in the sorted list before the insert. You
have to remove it from the inserted set AND add it to the deleted set.

So should Duncan's

def __removekey(self, key):
if key in self.__addkeys:
del self.__addkeys[key]
else:
self.__delkeys.add(key)

be changed to:

def __removekey(self, key):
if key in self.__addkeys:
del self.__addkeys[key]
self.__delkeys.add(key)

Yes, and the same in __addkey: if it's in __delkeys it should be
removed from there, and added to __addkeys. There's an invariant: any
key is in at most one of __addkeys and __delkeys.
(BTW I'm happy to put up a new version on PyPI, but I'm v. conscious
of the fact that the key ideas and code have come from you and Duncan,
so if either of you would like to adopt it, I will gladly step aside.
I just want a sorteddict in Python, and I don't mind who does it:)

Speaking for myself, I'm happy to have helped out, but you should
carry on.
 
J

Jeremy Sanders

Mark said:
The sorteddict API that has emerged so far is (1) apart from the
constructor, everything is identical to dict, (2) the constructor
takes the same args as sorted(), so if you want to seed with a dict or
with keywords you write sorteddict(dict(a=1,b=2), ...), (or you could
create a sorteddict and use update() since that takes the same args as
dict's constructor).

first() and last() would also be nice on a sorted dict.

Jeremy
 
M

Mark Summerfield

So should Duncan's
def __removekey(self, key):
if key in self.__addkeys:
del self.__addkeys[key]
else:
self.__delkeys.add(key)
be changed to:
def __removekey(self, key):
if key in self.__addkeys:
del self.__addkeys[key]
self.__delkeys.add(key)

Yes, and the same in __addkey: if it's in __delkeys it should be
removed from there, and added to __addkeys. There's an invariant: any
key is in at most one of __addkeys and __delkeys.

OK, I have changed both:

def __addkey(self, key):
if key in self.__delkeys:
del self.__delkeys[key]
self.__addkeys.add(key)


def __removekey(self, key):
if key in self.__addkeys:
del self.__addkeys[key]
self.__delkeys.add(key)
Speaking for myself, I'm happy to have helped out, but you should
carry on.

OK, I will if Duncan doesn't want it:)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
SterlingLa
Top