New Ordered Dictionery to Criticise

F

Fuzzyman

Sorry for this hurried message - I've done a new implementation of out
ordered dict. This comes out of the discussion on this newsgroup (see
blog entry for link to archive of discussion).

See the latest blog entry to get at it :
http://www.voidspace.org.uk/python/weblog/index.shtml

Criticism solicited (honestly) :)

We (Nicola Larosa and I) haven't yet made any optimisations - but there
are two implementations to play with.

One allows you to access the keys attribute as if it was a sequence (as
well as a method).

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Fuzzyman said:
Criticism solicited (honestly) :)

A couple of minor points:
- I would drop 2.2 compatibility
- self=self isn't needed in the functions, because of
nested scopes
- popitem(self) can be rewritten as

def popitem(self):
try:
key = self._sequence.pop()
except IndexError:
# always try to give the same exception string as
# dict
raise KeyError("popitem(): dictionary is empty")
return key, self.pop(key)

- One problem with the FancyDict is that it allows
d.keys.append(100)

Regards,
Martin
 
C

Christoph Zwerschke

Fuzzyman said:
Sorry for this hurried message - I've done a new implementation of out
ordered dict. This comes out of the discussion on this newsgroup (see
blog entry for link to archive of discussion).

Thanks. I'll try to check it out and put my oar in over the next
weekend. One thing I already noticed: str() and repr() both output the
OrderedDict as if it was an ordinary dictionary. For str() this may be
acceptable, but repr() should output "a valid Python expression that
could be used to recreate an object with the same value".

-- Christoph
 
F

Fuzzyman

Martin said:
A couple of minor points:
- I would drop 2.2 compatibility

There are a lot of cheap hosting accounts where Python 2.2 is all that
is available. I would only drop support if there is some *compelling*
reason to do so.
- self=self isn't needed in the functions, because of
nested scopes

Cool, I'll test this out.
- popitem(self) can be rewritten as

def popitem(self):
try:
key = self._sequence.pop()
except IndexError:
# always try to give the same exception string as
# dict
raise KeyError("popitem(): dictionary is empty")
return key, self.pop(key)

Yup, that's nicer - thanks.
- One problem with the FancyDict is that it allows
d.keys.append(100)

Are you sure ?

No file given.
Alias System Command
------------------------------
cls cls
copy copy
ddir dir /ad /on
dir dir /on
echo echo
ldir dir /ad /on
ls dir /on
mkdir mkdir
ren ren
rmdir rmdir
------------------------------
Total number of aliases: 10
Movable Python
IPython Interactive Shell. See the manual for a list of features and
tips.
Ctrl-D to exit.
Out[3]:{}
----------------------> d.keys()
Out[4]:[]
---------------------------------------------------------------------------
exceptions.TypeError Traceback (most
recent call
last)

\configobj4\pythonutils\odict.py in append(self, item)
713 def __iadd__(self, other): raise TypeError('Can\'t add in
place to keys')
714 def __imul__(self, n): raise TypeError('Can\'t multiply
keys in place')
--> 715 def append(self, item): raise TypeError('Can\'t append
items to keys')
716 def insert(self, i, item): raise TypeError('Can\'t insert
items into keys')
717 def pop(self, i=-1): raise TypeError('Can\'t pop items from
keys')

TypeError: Can't append items to keys

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
F

Fuzzyman

Christoph said:
Thanks. I'll try to check it out and put my oar in over the next
weekend.

Cool, thanks.

Someone has commented on my blog that passing lists to ``keys``,
``values``, and ``items`` to assign to them "smells funny".

They've suggested new methods ``setkeys``, ``setitems``, ``setvalues``
instead. Any opinions ?
One thing I already noticed: str() and repr() both output the
OrderedDict as if it was an ordinary dictionary. For str() this may be
acceptable, but repr() should output "a valid Python expression that
could be used to recreate an object with the same value".

Yup. If you look in the code, there is already a commented out version
of ``__repr__`` that does this.

*Unfortunately*, it means changing *all* the doctests - so it will have
to wait until one of us can be bothered. :)

It will happen in the course of time I guess.

All the best,


Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Fuzzyman said:
Are you sure ?

Not at all. This was from inspection only; I propably
misinterpreted the code.

Regards,
Martin
 
F

Fuzzyman

Fuzzyman said:
Sorry for this hurried message - I've done a new implementation of out
ordered dict. This comes out of the discussion on this newsgroup (see
blog entry for link to archive of discussion).

See the latest blog entry to get at it :
http://www.voidspace.org.uk/python/weblog/index.shtml

Hello all,

I've just done a new "beta 2" version. It has a full version of
FancyODict with the custome "callable sequence objects" for keys,
values and items. They are almost completely covered by tests.

You can download the new(er) version from :
http://www.voidspace.org.uk/cgi-bin/voidspace/downman.py?file=odictbeta2.py

Full discussion of the remaining issues below, or at :
http://www.voidspace.org.uk/python/weblog/arch_d7_2005_11_26.shtml#e147

Progress on the updated implementation of dict continues. (I hestitate
to say *new* version, as it's just a heavy makeover for the old code -
which was basically sound).

``FancyODict`` is now a full implementation of an Ordered Dictionary,
with custom *callable sequence objects* for ``keys``, ``values``, and
``items``. These can be called like normal methods, but can also be
accessed directly as sequence objects. This includes assigning to,
indexing, and slicing - as well as all the other relevant sequence
methods. {sm;:)}

I've also added an optional index to ``OrderedDict.popitem``.

I'm sure there are lots of ways this can be optimised for efficiency -
but the new objects have got pretty full test coverage.

You can download the new version (for testing) from `odict Beta 2
<http://www.voidspace.org.uk/cgi-bin/voidspace/downman.py?file=odictbeta2.py>`_

The following issues still remain :

* ``FancyOdict`` is a separate class from ``OrderedDict``.

Because this version is *undoubtably* less efficient than
OrderedDict, my current thinking is that I should leave them separate
(and have both available). Being able to operate on the
keys/values/items as sequences is for convenience only.

Anyone got a suggestion for a better name than ``FancyODict`` ?

* You can no longer access the key order directly. The old ``sequence``
attribute is depracated and will eventually go away.

You can currently alter the order (of keys, values *and* items) by
passing an iterable into those methods.

Someone has suggested that this "smells bad" - and it ought to be
done through separate `setkeys``, ``setvalues``, and ``setitems``
methods.

I'm *inclined* to agree, but I don't feel strongly about it. Anyone
else got any opinions ?

* ``repr`` ought to return a value that ``eval`` could use to turn back
into an OrderedDict.

I have actually done an implementation of this, but it would mean
that *all* the doctests need to be changed. I *will* do this at some
point.

* Slice assignment.

The semantics for slice assignment are fiddly.

For example, what do you do if in a slice assignment a key collides
with an existing key ?

My current implementation does what an ordinary dictionary does,
the new value overwrites the previous one. This means that the
dictionary can reduce in size as the assignment progresses. {sm;:?}

I think this is preferable to raising an error and preventing
assignment. It does prevent an optimisation whereby I calculate the
indexes of all the new items in advance.

It also means you can't rely on the index of a key from a slice
assignment, unless you know that there will be no key collisions.

In general I'm *against* preventing programmers from doing things,
so long as the documentation carries an appropriate warning.

An example will probably help illustrate this :

... raw:: html

{+coloring}

d = OrderedDict()
d[1] = 1
d[2] = 2
d[3] = 3
d[4] = 4
d.keys()
[1, 2, 3]

# fetching every other key
# using an extended slice
# this actually returns an OrderedDict
d[::2]
{1: 1, 3: 3}

# we can assign to every other key
# using an ordered dict
d[::2] = OrderedDict([(2, 9), (4, 8)])
len(d) == 4
False

d
{2: 9, 4: 8}

"""
Because of the key collisions the length of
d has changed - it now only has two keys instead
of four.
"""

{-coloring}
 
B

Bengt Richter

Hello all,

I've just done a new "beta 2" version. It has a full version of
FancyODict with the custome "callable sequence objects" for keys,
values and items. They are almost completely covered by tests.

You can download the new(er) version from :
http://www.voidspace.org.uk/cgi-bin/voidspace/downman.py?file=odictbeta2.py
Ran my tests on it: this one may be of interest:
__________________________ entrypoint: test_popitem ___________________________

def test_popitem():
d = CandidateDict([(k,i*100) for i,k in enumerate('abcde')])
> assert d.popitem() == ('e', 400)

[C:\pywk\ut\test\test_odicts.py:228]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
...
...
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

def __getattr__(self, name):
"""
Implemented so that access to ``sequence`` raises a warning.
[]
"""
if name == 'sequence':
warn('use of sequence attribute is deprecated. Use keys method '
'instead.', DeprecationWarning)
# NOTE: Still (currently) returns a direct reference. Need to
# because code that uses sequence will expect to be able to
# mutate it in place.
return self._sequence
else:
# NOTE: strictly unnecessary, but it raises the appropriate error
> getattr(self, name)

[c:\pywk\clp\odictbeta2.py:309]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Recursion detected (same locals & position)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============= tests finished: 19 passed, 9 failed in 1.57 seconds =============
Full discussion of the remaining issues below, or at :
http://www.voidspace.org.uk/python/weblog/arch_d7_2005_11_26.shtml#e147

Progress on the updated implementation of dict continues. (I hestitate
to say *new* version, as it's just a heavy makeover for the old code -
which was basically sound).

``FancyODict`` is now a full implementation of an Ordered Dictionary,
with custom *callable sequence objects* for ``keys``, ``values``, and
``items``. These can be called like normal methods, but can also be
accessed directly as sequence objects. This includes assigning to,
indexing, and slicing - as well as all the other relevant sequence
methods. {sm;:)}

I've also added an optional index to ``OrderedDict.popitem``.

I'm sure there are lots of ways this can be optimised for efficiency -
but the new objects have got pretty full test coverage.

You can download the new version (for testing) from `odict Beta 2
<http://www.voidspace.org.uk/cgi-bin/voidspace/downman.py?file=odictbeta2.py>`_

The following issues still remain :

* ``FancyOdict`` is a separate class from ``OrderedDict``.

Because this version is *undoubtably* less efficient than
OrderedDict, my current thinking is that I should leave them separate
(and have both available). Being able to operate on the
keys/values/items as sequences is for convenience only.

Anyone got a suggestion for a better name than ``FancyODict`` ?

* You can no longer access the key order directly. The old ``sequence``
attribute is depracated and will eventually go away.

You can currently alter the order (of keys, values *and* items) by
passing an iterable into those methods.
I'm playing this game, but I wonder how much of it really makes sense ;-)
Someone has suggested that this "smells bad" - and it ought to be
done through separate `setkeys``, ``setvalues``, and ``setitems``
methods.

I'm *inclined* to agree, but I don't feel strongly about it. Anyone
else got any opinions ?
I don't see a big difference. What's important, if you ask me, is
whether you get an idea of what will happen when you pass the iterable as
an argument. OTTOMH that suggests that keys/values/items[slice] = iterable
have more expectation-generating precedent, and in fact perhaps should define
what is to happen if no other constraints are violated.
* ``repr`` ought to return a value that ``eval`` could use to turn back
into an OrderedDict.

I have actually done an implementation of this, but it would mean
that *all* the doctests need to be changed. I *will* do this at some
point.
Point in favor of py.test's separation of test/tested I guess.
* Slice assignment.

The semantics for slice assignment are fiddly.

For example, what do you do if in a slice assignment a key collides
with an existing key ?
IMO, a slice defines the sliced-out as well as the untouched, and I think
it is an error to collide with the unouched part, since it requires either
modifying the "untouched" or ignoring incoming slice data.
Incoming can go by the rules of a plain dict constuctor, except preserving order,
and the rest should really be untouched IMO. BTW, I think new keys in the
incoming slice data should NOT wind up at the end of the overall dict, since
that violates the "untouched" part.
My current implementation does what an ordinary dictionary does,
the new value overwrites the previous one. This means that the
dictionary can reduce in size as the assignment progresses. {sm;:?}
That part is ok, though I wonder if some might want to have a strict-replacement
mode for an odict, so slices can't change sizes.
BTW just thought of a name: itemdict -- to suggest the item-list-ordered semantics.
I think this is preferable to raising an error and preventing
assignment. It does prevent an optimisation whereby I calculate the
indexes of all the new items in advance.
a strict_replacement=True kwyord arg to the descriptor could let you
do both.
It also means you can't rely on the index of a key from a slice
assignment, unless you know that there will be no key collisions.
Well, if you guarantee no name collisions in the source items by using
an ordered dict as source, and make collisions with the "untouched"
parts of a slice assignment an error, then you can rely on it.
IMO that would be a useful middle way.
In general I'm *against* preventing programmers from doing things,
so long as the documentation carries an appropriate warning. Me too, in general ;-)

An example will probably help illustrate this :

.. raw:: html

{+coloring}

d = OrderedDict()
d[1] = 1
d[2] = 2
d[3] = 3
d[4] = 4
d.keys()
[1, 2, 3]
Where's the 4?
(I.e., why not just take your examples off an interactive screen? ;-)
# fetching every other key
# using an extended slice
# this actually returns an OrderedDict
d[::2]
{1: 1, 3: 3}
OrderedDict([(1, 1), (3, 3)]) # in future, right?
(i.e. '%s(%s)' %( type(d).__name__. d.items()) # whatever the final name ;-)
# we can assign to every other key
# using an ordered dict
Why the restriction to ordered dict, other than as a convenient
way to pre-normalize the item list? Doesn't update accept a plain item list?
d[::2] = OrderedDict([(2, 9), (4, 8)])
i.e., why not accept plain [(2, 9), (4, 8)] here?
len(d) == 4
False
BTW, this one I don't do yet. I reject aslice.step not in (None, 1)
I seemed a bit too weird. But I guess I could just let the programmer decide ;-)
d
{2: 9, 4: 8}

"""
Because of the key collisions the length of
d has changed - it now only has two keys instead
of four.
"""
IMO the above collision is an error that should raise an exception,
because the collision is with a key outside the (albeit interlaced)
target slice, so you are effectively assigning outside the slice.

BTW, what about slices of keys and values? E.g. does

d.keys[:] = ['prefix_'+ key for key in d.keys()]

make sense? Note that it is different from d[:] = something.

Similarly d.values, thoug d.items[sliceinst] might be the same
as d[sliceinst] for assignment. d[sliceinst] returns an ordered dict though,
where as d.items[sliceinst] returns the same item list slice as
d.items()[sliceinst].

Regards,
Bengt Richter
 
F

Fuzzyman

Hello Bengt,

Bengt said:
Hello all,

I've just done a new "beta 2" version. It has a full version of
FancyODict with the custome "callable sequence objects" for keys,
values and items. They are almost completely covered by tests.

You can download the new(er) version from :
http://www.voidspace.org.uk/cgi-bin/voidspace/downman.py?file=odictbeta2.py
Ran my tests on it: this one may be of interest:
__________________________ entrypoint: test_popitem ___________________________

def test_popitem():
d = CandidateDict([(k,i*100) for i,k in enumerate('abcde')])
assert d.popitem() == ('e', 400)

[C:\pywk\ut\test\test_odicts.py:228]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
...
...
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

def __getattr__(self, name):
"""
Implemented so that access to ``sequence`` raises a warning.
[]
"""
if name == 'sequence':
warn('use of sequence attribute is deprecated. Use keys method '
'instead.', DeprecationWarning)
# NOTE: Still (currently) returns a direct reference. Need to
# because code that uses sequence will expect to be able to
# mutate it in place.
return self._sequence
else:
# NOTE: strictly unnecessary, but it raises the appropriate error
getattr(self, name)

[c:\pywk\clp\odictbeta2.py:309]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Recursion detected (same locals & position)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============= tests finished: 19 passed, 9 failed in 1.57 seconds =============

Yep the line :

getattr(self, name)

is a bug.

I've replaced it now (not yet "published") with
object.__getattr__(self, name)
It could equally just be replaced with a raise AttributeError

You've picked up on another typo in the code, whichis odd because
``popitem`` is tested.

Oh well - works now.

[snip..]
I'm playing this game, but I wonder how much of it really makes sense ;-)

Can you explain what you mean ?

Someone has suggested that this "smells bad" - and it ought to be
done through separate `setkeys``, ``setvalues``, and ``setitems``
methods.

I'm *inclined* to agree, but I don't feel strongly about it. Anyone
else got any opinions ?
I don't see a big difference. What's important, if you ask me, is
whether you get an idea of what will happen when you pass the iterable as
an argument. OTTOMH that suggests that keys/values/items[slice] = iterable
have more expectation-generating precedent, and in fact perhaps should define
what is to happen if no other constraints are violated.

Don't completely follow.

Are you saying that

d = OrderedDict(some_iterable)
d.keys[slice] = iterable

is more *understandable* ("expectation-generating precedent" WTF) ?

The alternative being : d.keys(iterable)

In which case I'm inclined to agree. *However* - having custom objects
for items/keys/values is undoubtably less efficient.

Therefore I'm supplying two classes OrderedDict and FancyODict (name
not yet fixed).

OrderedDict needs a way of changing the keys - so it needs either

d.keys(iterable)

*or*

d.setkeys(iterable)

Which is preferable, is my question.

I think you're saying you don't care :)
In which case I will leave it as it presently is.
Point in favor of py.test's separation of test/tested I guess.

Probably. :)

doctest is part of the standard library - which is a point in it's
favour.

IMO, a slice defines the sliced-out as well as the untouched, and I think
it is an error to collide with the unouched part, since it requires either
modifying the "untouched" or ignoring incoming slice data.
Incoming can go by the rules of a plain dict constuctor, except preserving order,
and the rest should really be untouched IMO. BTW, I think new keys in the
incoming slice data should NOT wind up at the end of the overall dict, since
that violates the "untouched" part.

"new keys in the incoming slice data should NOT wind up at the end of
the overall dict"

I'm not a 100% certain I understand you, and I possibly didn't explain
myself properly.

If you assign to a slice, and there is a key collision. The old
(pre-existing) key is removed and the new inserted at the expected
index - so it doesn't end up at the "end".

That part is ok, though I wonder if some might want to have a strict-replacement
mode for an odict, so slices can't change sizes.
BTW just thought of a name: itemdict -- to suggest the item-list-ordered semantics.
a strict_replacement=True kwyord arg to the descriptor could let you
do both.

Then I have to implement *both* ways of doing it ! I'll consider it.
Well, if you guarantee no name collisions in the source items by using
an ordered dict as source, and make collisions with the "untouched"
parts of a slice assignment an error, then you can rely on it.
IMO that would be a useful middle way.

I'll consider that as well. As I said I don't like raising errors when
I don't have to. If the programmer wants to guarantee there are no
collisions then he can check himself. :)
In general I'm *against* preventing programmers from doing things,
so long as the documentation carries an appropriate warning.
Me too, in general ;-) [snip..]
# we can assign to every other key
# using an ordered dict
Why the restriction to ordered dict, other than as a convenient
way to pre-normalize the item list? Doesn't update accept a plain item list?
Allow slice assignment of a list of tuples like update... hmmm...

Maybe if anyone asks for it "in the wild".

d[::2] = OrderedDict([(2, 9), (4, 8)])
i.e., why not accept plain [(2, 9), (4, 8)] here?
len(d) == 4
False
BTW, this one I don't do yet. I reject aslice.step not in (None, 1)
I seemed a bit too weird. But I guess I could just let the programmer decide ;-)

I didn't see any reason to restrict it, seeing as for my (unoptimised)
implementation it doesn't make any difference.
IMO the above collision is an error that should raise an exception,
because the collision is with a key outside the (albeit interlaced)
target slice, so you are effectively assigning outside the slice.

Yeah - maybe. Still not sure about this.
BTW, what about slices of keys and values? E.g. does

d.keys[:] = ['prefix_'+ key for key in d.keys()]

make sense? Note that it is different from d[:] = something.

Yep - it allows you to change the keys in place. (Retaining the
position in the sequence)
Similarly d.values, thoug d.items[sliceinst] might be the same
as d[sliceinst] for assignment. d[sliceinst] returns an ordered dict though,
where as d.items[sliceinst] returns the same item list slice as
d.items()[sliceinst].

Yep. Lots to consider. Only a few minor tweaks and my implementation is
basically working enough to release.

All the best,


Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,158
Latest member
Vinay_Kumar Nevatia
Top