PEP 372 -- Adding an ordered directory to collections

A

Armin Ronacher

Abstract
========

This PEP proposes an ordered dictionary as a new data structure for
the ``collections`` module, called "odict" in this PEP for short. The
proposed API incorporates the experiences gained from working with
similar implementations that exist in various real-world applications
and other programming languages.


Rationale
=========

In current Python versions, the widely used built-in dict type does
not specify an order for the key/value pairs stored. This makes it
hard to use dictionaries as data storage for some specific use cases.

Some dynamic programming languages like PHP and Ruby 1.9 guarantee a
certain order on iteration. In those languages, and existing Python
ordered-dict implementations, the ordering of items is defined by the
time of insertion of the key. New keys are appended at the end, keys
that are overwritten and not moved.

The following example shows the behavior for simple assignments:
d = odict()
d['parrot'] = 'dead'
d['penguin'] = 'exploded'
d.items()
[('parrot', 'dead'), ('penguin', 'exploded')]

That the ordering is preserved makes an odict useful for a couple of
situations:

- XML/HTML processing libraries currently drop the ordering of
attributes, use a list instead of a dict which makes filtering
cumbersome, or implement their own ordered dictionary. This affects
ElementTree, html5lib, Genshi and many more libraries.

- There are many ordererd dict implementations in various libraries
and applications, most of them subtly incompatible with each other.
Furthermore, subclassing dict is a non-trivial task and many
implementations don't override all the methods properly which can
lead to unexpected results.

Additionally, many ordered dicts are implemented in an inefficient
way, making many operations more complex then they have to be.

- PEP 3115 allows metaclasses to change the mapping object used for
the class body. An ordered dict could be used to create ordered
member declarations similar to C structs. This could be useful, for
example, for future ``ctypes`` releases as well as ORMs that define
database tables as classes, like the one the Django framework ships.
Django currently uses an ugly hack to restore the ordering of
members in database models.

- Code ported from other programming languages such as PHP often
depends on a ordered dict. Having an implementation of an
ordering-preserving dictionary in the standard library could ease
the transition and improve the compatibility of different libraries.


Ordered Dict API
================

The ordered dict API would be mostly compatible with dict and existing
ordered dicts. (Note: this PEP refers to the Python 2.x dictionary
API; the transfer to the 3.x API is trivial.)

The constructor and ``update()`` both accept iterables of tuples as
well as mappings like a dict does. The ordering however is preserved
for the first case:
d = odict([('a', 'b'), ('c', 'd')])
d.update({'foo': 'bar'})
d
collections.odict([('a', 'b'), ('c', 'd'), ('foo', 'bar')])

If ordered dicts are updated from regular dicts, the ordering of new
keys is of course undefined again unless ``sort()`` is called.

All iteration methods as well as ``keys()``, ``values()`` and
``items()`` return the values ordered by the the time the key-value
pair was inserted:
d['spam'] = 'eggs'
d.keys() ['a', 'c', 'foo', 'spam']
d.values() ['b', 'd', 'bar', 'eggs']
d.items()
[('a', 'b'), ('c', 'd'), ('foo', 'bar'), ('spam', 'eggs')]

New methods not available on dict:

``odict.byindex(index)``

Index-based lookup is supported by ``byindex()`` which returns
the key/value pair for an index, that is, the "position" of a
key in the ordered dict. 0 is the first key/value pair, -1
the last.
('foo', 'bar')

``odict.sort(cmp=None, key=None, reverse=False)``

Sorts the odict in place by cmp or key. This works exactly
like ``list.sort()``, but the comparison functions are passed
a key/value tuple, not only the value.
>>> d = odict([(42, 1), (1, 4), (23, 7)])
>>> d.sort()
>>> d
collections.odict([(1, 4), (23, 7), (42, 1)])

``odict.reverse()``

Reverses the odict in place.

``odict.__reverse__()``

Supports reverse iteration by key.


Questions and Answers
=====================

What happens if an existing key is reassigned?

The key is not moved but assigned a new value in place. This is
consistent with existing implementations and allows subclasses to
change the behavior easily::

class movingcollections.odict):
def __setitem__(self, key, value):
self.pop(key, None)
odict.__setitem__(self, key, value)

What happens if keys appear multiple times in the list passed to the
constructor?

The same as for regular dicts: The latter item overrides the
former. This has the side-effect that the position of the first
key is used because the key is actually overwritten:
>>> odict([('a', 1), ('b', 2), ('a', 3)])
collections.odict([('a', 3), ('b', 2)])

This behavior is consistent with existing implementations in
Python, the PHP array and the hashmap in Ruby 1.9.

Why is there no ``odict.insert()``?

There are few situations where you really want to insert a key at
an specified index. To avoid API complication, the proposed
solution for this situation is creating a list of items,
manipulating that and converting it back into an odict:
>>> d = odict([('a', 42), ('b', 23), ('c', 19)])
>>> l = d.items()
>>> l.insert(1, ('x', 0))
>>> odict(l)
collections.odict([('a', 42), ('x', 0), ('b', 23), ('c', 19)])


Example Implementation
======================

A poorly performing example implementation of the odict written in
Python is available:

`odict.py <http://dev.pocoo.org/hg/sandbox/raw-file/tip/
odict.py>`_

The version for ``collections`` should be implemented in C and use a
linked list internally.

Other implementations of ordered dicts in various Python projects or
standalone libraries, that inspired the API proposed here, are:

- `odict in Babel`_
- `OrderedDict in Django`_
- `The odict module`_
- `ordereddict`_ (a C implementation of the odict module)
- `StableDict`_
- `Armin Rigo's OrderedDict`_


... _odict in Babel: http://babel.edgewall.org/browser/trunk/babel/util.py?rev=374#L178
... _OrderedDict in Django:
http://code.djangoproject.com/browser/django/trunk/django/utils/datastructures.py?rev=7140#L53
... _The odict module: http://www.voidspace.org.uk/python/odict.html
... _ordereddict: http://www.xs4all.nl/~anthon/Python/ordereddict/
... _StableDict: http://pypi.python.org/pypi/StableDict/0.2
... _Armin Rigo's OrderedDict: http://codespeak.net/svn/user/arigo/hack/pyfuse/OrderedDict.py


Future Directions
=================

With the availability of an ordered dict in the standard library,
other libraries may take advantage of that. For example, ElementTree
could return odicts in the future that retain the attribute ordering
of the source file.


Copyright
=========

This document has been placed in the public domain.
 
M

Michele Simionato

Abstract
========

This PEP proposes an ordered dictionary as a new data structure for
the ``collections`` module, called "odict" in this PEP for short.  The
proposed API incorporates the experiences gained from working with
similar implementations that exist in various real-world applications
and other programming languages.

+1, I have been waiting for an odict in the library for at least 5
years.


Michele Simionato
 
W

Wolfgang Grafen

Armin said:
Other implementations of ordered dicts in various Python projects or
standalone libraries, that inspired the API proposed here, are:

- `odict in Babel`_
- `OrderedDict in Django`_
- `The odict module`_
- `ordereddict`_ (a C implementation of the odict module)
- `StableDict`_
- `Armin Rigo's OrderedDict`_


.. _odict in Babel: http://babel.edgewall.org/browser/trunk/babel/util.py?rev=374#L178
.. _OrderedDict in Django:
http://code.djangoproject.com/browser/django/trunk/django/utils/datastructures.py?rev=7140#L53
.. _The odict module: http://www.voidspace.org.uk/python/odict.html
.. _ordereddict: http://www.xs4all.nl/~anthon/Python/ordereddict/
.. _StableDict: http://pypi.python.org/pypi/StableDict/0.2
.. _Armin Rigo's OrderedDict: http://codespeak.net/svn/user/arigo/hack/pyfuse/OrderedDict.py
I want add to this list my seqdict package, maybe the first
implementation of an ordered dict in Python?
http://home.arcor.de/wolfgang.grafen/Python/Modules/Modules.html

I have never seen a real world dictionary which wasn't in order, so I
ever felt the Python dictionary was not complete. I support the idea of
an ordered dictionary in Python.

Best regards

Wolfgang
 
B

bearophileHUGS

Oh, very good, better late than never.
This is my pure Python version, it performs get, set and del
operations too in O(1):
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498195
Working with C such data structure becomes much faster, because it can
use true pointers.

Then another data structure that may be named SortedDict can be
created, that keeps items in their natural order (instead of the
insertion order).

Bye,
bearophile
 
J

Jeroen Ruigrok van der Werven

-On [20080616 10:41] said:
This PEP proposes an ordered dictionary as a new data structure for
the ``collections`` module, called "odict" in this PEP for short.

I fully support adding this. In my work I have had to create this datatype a
few times already.

The suggested design choices seem logical to me.

+1
 
C

Calvin Spealman

This gets my +1, for what its worth.

I don't really see a good reason not to include the insert() method,
however. I don't see that it would complicate things much, if at all.
>>> d = odict([('a', 42), ('b', 23)])
>>> d.insert(1, ('c', 19))
>>> d
collections.odict([('a', 42), ('c', 19), ('b', 23)])

Abstract
========

This PEP proposes an ordered dictionary as a new data structure for
the ``collections`` module, called "odict" in this PEP for short. The
proposed API incorporates the experiences gained from working with
similar implementations that exist in various real-world applications
and other programming languages.


Rationale
=========

In current Python versions, the widely used built-in dict type does
not specify an order for the key/value pairs stored. This makes it
hard to use dictionaries as data storage for some specific use cases.

Some dynamic programming languages like PHP and Ruby 1.9 guarantee a
certain order on iteration. In those languages, and existing Python
ordered-dict implementations, the ordering of items is defined by the
time of insertion of the key. New keys are appended at the end, keys
that are overwritten and not moved.

The following example shows the behavior for simple assignments:
d = odict()
d['parrot'] = 'dead'
d['penguin'] = 'exploded'
d.items()
[('parrot', 'dead'), ('penguin', 'exploded')]

That the ordering is preserved makes an odict useful for a couple of
situations:

- XML/HTML processing libraries currently drop the ordering of
attributes, use a list instead of a dict which makes filtering
cumbersome, or implement their own ordered dictionary. This affects
ElementTree, html5lib, Genshi and many more libraries.

- There are many ordererd dict implementations in various libraries
and applications, most of them subtly incompatible with each other.
Furthermore, subclassing dict is a non-trivial task and many
implementations don't override all the methods properly which can
lead to unexpected results.

Additionally, many ordered dicts are implemented in an inefficient
way, making many operations more complex then they have to be.

- PEP 3115 allows metaclasses to change the mapping object used for
the class body. An ordered dict could be used to create ordered
member declarations similar to C structs. This could be useful, for
example, for future ``ctypes`` releases as well as ORMs that define
database tables as classes, like the one the Django framework ships.
Django currently uses an ugly hack to restore the ordering of
members in database models.

- Code ported from other programming languages such as PHP often
depends on a ordered dict. Having an implementation of an
ordering-preserving dictionary in the standard library could ease
the transition and improve the compatibility of different libraries.


Ordered Dict API
================

The ordered dict API would be mostly compatible with dict and existing
ordered dicts. (Note: this PEP refers to the Python 2.x dictionary
API; the transfer to the 3.x API is trivial.)

The constructor and ``update()`` both accept iterables of tuples as
well as mappings like a dict does. The ordering however is preserved
for the first case:
d = odict([('a', 'b'), ('c', 'd')])
d.update({'foo': 'bar'})
d
collections.odict([('a', 'b'), ('c', 'd'), ('foo', 'bar')])

If ordered dicts are updated from regular dicts, the ordering of new
keys is of course undefined again unless ``sort()`` is called.

All iteration methods as well as ``keys()``, ``values()`` and
``items()`` return the values ordered by the the time the key-value
pair was inserted:
d['spam'] = 'eggs'
d.keys() ['a', 'c', 'foo', 'spam']
d.values() ['b', 'd', 'bar', 'eggs']
d.items()
[('a', 'b'), ('c', 'd'), ('foo', 'bar'), ('spam', 'eggs')]

New methods not available on dict:

``odict.byindex(index)``

Index-based lookup is supported by ``byindex()`` which returns
the key/value pair for an index, that is, the "position" of a
key in the ordered dict. 0 is the first key/value pair, -1
the last.
('foo', 'bar')

``odict.sort(cmp=None, key=None, reverse=False)``

Sorts the odict in place by cmp or key. This works exactly
like ``list.sort()``, but the comparison functions are passed
a key/value tuple, not only the value.
d = odict([(42, 1), (1, 4), (23, 7)])
d.sort()
d
collections.odict([(1, 4), (23, 7), (42, 1)])

``odict.reverse()``

Reverses the odict in place.

``odict.__reverse__()``

Supports reverse iteration by key.


Questions and Answers
=====================

What happens if an existing key is reassigned?

The key is not moved but assigned a new value in place. This is
consistent with existing implementations and allows subclasses to
change the behavior easily::

class movingcollections.odict):
def __setitem__(self, key, value):
self.pop(key, None)
odict.__setitem__(self, key, value)

What happens if keys appear multiple times in the list passed to the
constructor?

The same as for regular dicts: The latter item overrides the
former. This has the side-effect that the position of the first
key is used because the key is actually overwritten:
odict([('a', 1), ('b', 2), ('a', 3)])
collections.odict([('a', 3), ('b', 2)])

This behavior is consistent with existing implementations in
Python, the PHP array and the hashmap in Ruby 1.9.

Why is there no ``odict.insert()``?

There are few situations where you really want to insert a key at
an specified index. To avoid API complication, the proposed
solution for this situation is creating a list of items,
manipulating that and converting it back into an odict:
d = odict([('a', 42), ('b', 23), ('c', 19)])
l = d.items()
l.insert(1, ('x', 0))
odict(l)
collections.odict([('a', 42), ('x', 0), ('b', 23), ('c', 19)])


Example Implementation
======================

A poorly performing example implementation of the odict written in
Python is available:

`odict.py <http://dev.pocoo.org/hg/sandbox/raw-file/tip/
odict.py>`_

The version for ``collections`` should be implemented in C and use a
linked list internally.

Other implementations of ordered dicts in various Python projects or
standalone libraries, that inspired the API proposed here, are:

- `odict in Babel`_
- `OrderedDict in Django`_
- `The odict module`_
- `ordereddict`_ (a C implementation of the odict module)
- `StableDict`_
- `Armin Rigo's OrderedDict`_


.. _odict in Babel: http://babel.edgewall.org/browser/trunk/babel/
util.py?rev=374#L178
.. _OrderedDict in Django:
http://code.djangoproject.com/browser/django/trunk/django/utils/
datastructures.py?rev=7140#L53
.. _The odict module: http://www.voidspace.org.uk/python/odict.html
.. _ordereddict: http://www.xs4all.nl/~anthon/Python/ordereddict/
.. _StableDict: http://pypi.python.org/pypi/StableDict/0.2
.. _Armin Rigo's OrderedDict: http://codespeak.net/svn/user/arigo/
hack/pyfuse/OrderedDict.py


Future Directions
=================

With the availability of an ordered dict in the standard library,
other libraries may take advantage of that. For example, ElementTree
could return odicts in the future that retain the attribute ordering
of the source file.


Copyright
=========

This document has been placed in the public domain.
 
P

Paul McGuire

1. With the current dict, the following code

a = { "A" : 1, "B" : 2 }
b = { "B" : 2, "A" : 1 }

a==b evaluates to True. I assume that if these were odicts, this
would evaluate to False.

2. Will odict.byindex support slicing?

3. Will odict inherit from dict?

4. The current dict API (as of Python 2.5) is given by dir as:

['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__',
'__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__',
'__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__',
'__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__setitem__', '__str__', 'clear', 'copy',
'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys',
'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update',
'values']

If I read your PEP correctly, you propose adding:

byindex
sort
reverse
__reverse__ (should be '__reversed__', to match list's reverse
iterator)

(This isn't really a question, your PEP was not completely explicit
about what odict's API would be, so I thought enumerating what dict
does currently might illuminate some other unresolved aspects of
odict. For instance, since odict is somewhat list-like in its notion
of keys and values, will pop support list-like popping as well as what
dict currently provides? Will you need a 'popbyindex' for the same
reason you need 'byindex', so that you have unambiguous lookup of
items by index vs. by key. Perhaps doing dir(list) will help you to
see other items to resolve/clarify.)

5. The more I think and write about this, the more struck I am at the
similarity of odict and the ParseResults class in pyparsing. For
instance, in ParseResults, there is also support for dict-style and
list-style item referencing, and I chose to restrict some cases so
that using [] notation would not be ambiguous. You might want to add
pyparsing.ParseResults as another reference of current "odicts in the
wild" (although ParseResults implements quite a bit of additional
behavior that would not be required or necessarily appropriate for
odict).

I vote +0 on this PEP - I've never personally needed an odict, but I
can see how some of the XML and ORM coding would be simplified by one.

-- Paul

(I could go either way on the name 'odict' or 'ordereddict'. Given
the desire for this to evenutally become a built-in, keep the name
short. On the other hand, collections already has 'defaultdict', so
is 'ordereddict' so bad?)
 
P

Paul McGuire

:)

Yes, I thought about that even as I was writing that post. But I also said,
"ParseResults implements quite a bit of additional behavior that would not
be required or necessarily appropriate for odict." Even if odict existed,
I think I would have needed ParseResults anyway (but using an odict
internally might have simplified things for me, instead of the combined list
and dict that I have now).

-- Paul


-----Original Message-----
From: Shane Geiger [mailto:[email protected]]
Sent: Monday, June 16, 2008 9:20 AM
To: Paul McGuire
Cc: (e-mail address removed)
Subject: Re: PEP 372 -- Adding an ordered directory to collections

Paul,

You seem to be contradicting yourself. You may have never needed an odict,
yet you seem to have implemented one on your own. Maybe you needed one but
did not realize it?

Shane
5. The more I think and write about this, the more struck I am at the
similarity of odict and the ParseResults class in pyparsing. For
instance, in ParseResults, there is also support for dict-style and
list-style item referencing, and I chose to restrict some cases so
that using [] notation would not be ambiguous. You might want to add
pyparsing.ParseResults as another reference of current "odicts in the
wild" (although ParseResults implements quite a bit of additional
behavior that would not be required or necessarily appropriate for
odict).

I vote +0 on this PEP - I've never personally needed an odict, but I
can see how some of the XML and ORM coding would be simplified by one.

--
Shane Geiger
IT Director
National Council on Economic Education
(e-mail address removed) | 402-438-8958 | http://www.ncee.net

Leading the Campaign for Economic and Financial Literacy
 
G

Guest

I'm very happy to see this PEP. I have needed to use ordered
dictionaries many times, and this has always felt to me like a
surprising omission from Python.
 
B

bruno.desthuilliers

Abstract
========

This PEP proposes an ordered dictionary as a new data structure for
the ``collections`` module, called "odict" in this PEP for short. The
proposed API incorporates the experiences gained from working with
similar implementations that exist in various real-world applications
and other programming languages.
(snip)

+1

"insertion-ordered" dicts are something I often need, and while there
are usable workarounds, they either require boilerplate code,
introduce external dependancies to (possibly unoptimised) code or make
you reinvent the (square) wheel, all of which is (IMHO) below average
Python's standard regarding data structures.
 
M

Martin v. Löwis

``odict.byindex(index)``
Index-based lookup is supported by ``byindex()`` which returns
the key/value pair for an index, that is, the "position" of a
key in the ordered dict. 0 is the first key/value pair, -1
the last.

('foo', 'bar')

For this API, I think it's important to make some performance
guarantees. It seems fairly difficult to make byindex O(1), and
simultaneously also make insertion/deletion better than O(n).

IOW, the PEP should somehow specify which operations are efficient,
and which ones aren't.

Regards,
Martin
 
B

bearophileHUGS

Martin v. L.:
For this API, I think it's important to make some performance guarantees.

I may appreciate them for all Python collections :)

It seems fairly difficult to make byindex O(1), and
simultaneously also make insertion/deletion better than O(n).

It may be possible to make both of them O(log n), but I may appreciate
byindex O(n) and insertion/deletion O(1).

Bye,
bearophile
 
P

Paul McGuire

For this API, I think it's important to make some performance
guarantees. It seems fairly difficult to make byindex O(1), and
simultaneously also make insertion/deletion better than O(n).

IOW, the PEP should somehow specify which operations are efficient,
and which ones aren't.

Regards,
Martin

My guess is that the two main memory allocate/deallocate cases are 1)
appending a new item to the end, and 2) GC'ing the entire data
structure. I would optimize these 2 at the expense of all others.

-- Paul
 
M

Martin v. Löwis

For this API, I think it's important to make some performance guarantees.
I may appreciate them for all Python collections :)
See

http://wiki.python.org/moin/TimeComplexity


It may be possible to make both of them O(log n), but I may appreciate
byindex O(n) and insertion/deletion O(1).

If so, what's the advantage of using that method over d.items[n]?

Regards,
Martin
 
M

Martin v. Löwis

My guess is that the two main memory allocate/deallocate cases are 1)
appending a new item to the end, and 2) GC'ing the entire data
structure. I would optimize these 2 at the expense of all others.

Does that include dictionary lookups?

Regards,
Martin
 
P

Paul McGuire

Does that include dictionary lookups?

Regards,
Martin


Well, I did (try to) qualify my guess as pertaining specifically to
memory allocate/deallocate cases, the other topics in the nearby posts
were talking about updates to the odict, so I hope I can be forgiven
this oversight. But you're right, the keyed access is one of the main
reasons this is being done with an odict instead of just a list of
tuples.

The point I was trying to make was that sometimes, the GC case occurs
far more frequently than other update methods, yet is overlooked
because it happens invisibly to most Python users.

I worked on a project a while ago where there was a huge performance
penalty in releasing a list structure, because each node was freed
individually, causing surrounding freelist entries to be updated for
every node. In typical worst-case scenario fashion, a second list was
built at the same time as the first, and the default system memory
allocator interleaved the list nodes. The truly frustrating part was
that at this point in the program, we were trying to free *all* the
nodes in *both* lists, and would have preferred to just release the
whole chunk of memory, if it had just been allocated in a large chunk
instead of a node at a time. I implemented a zone-based memory
allocator, so that nodes were allocated within a memory zone, but when
we were done, we just released the entire zone, with tremendous
improvement in system performance.

But what kind of keyed access is likely to be used on an odict? For
those cases where all of the items are desired, then I would recommend
that the odict iterator be preferred, for this type of retrieval:

for k,v in odictvar.iteritems():
# do something with key k and value v

and discourage this usage:

for k in odictvar.keys(): # or iterkeys()
# do something with key k and value odictvar[k]

How about this as a simple first approach? Let's assume that the
nominal usage pattern for an odict is 1) create all entries in the
odict, probably using append, 2) access some or all of the entries,
either by iterating over the keys or values, or by keyed access to
specific values, and 3) delete the odict. Have the odict keep an
internal dict representing the keyed lookups, and have this internal
dict set to null whenever the odict is updated. When the odict is
accessed by specific key, the internal dict is used - if it null, it
is built using dict(odict.iteritems()). In the nominal usage, this
dict will only be built once, since the keyed accesses wont begin
until after all of the key-value pairs have been added to the odict.

Or as a first-first approach (to avoid premature optimization), just
implement the odict as a list of tuples, and do linear search for
matching key if accessed by key.

I would bias any performance choices about keyed access toward cases
where the queried key exists in the odict - I think that in those
cases where odicts are used, such as ORMs and XML, the keys for a
particular odict are known and expected to exist. Those applications
where the keys are not known are likely to be meta-type apps, like
debuggers and introspection GUIs. I contend that the meat-and-
potatoes production apps will be those like database queries - when I
get an odict back after querying an EMPLOYEES table, I really should
reasonably expect odictvar["EMPNO"] to exist.

And what would be the conditions of an abusive form of odict? How
about an application log kept in an odict, keyed by timestamp? This
could have many 1000's of entries, and yet, I would guess that an
odict would be the wrong data structure for this, and that a list of
tuples or LogMessage objects would be more appropriate. But someone
is likely to do it - if they do, what will happen? Perhaps trying to
anticipate "abusive" or degenerate uses of an odict might shed some
light on ways to avoid, discourage, or maybe even accommodate these
uses in odict.

Well, this has become something of a rant, and a speculative one at
that, but I think the "delete the entire data structure" memory case
should be given reasonably high design attention. (And maybe this is
already the norm in Python development - I'm really quite ignorant of
this process, not being in the Python development circle.)

Always nice to hear from you Martin - cheers!

-- Paul
 
B

bearophileHUGS

Martin v. L.:

Thank you, I think that's not a list of guarantees, while a list of
how things are now in CPython.

If so, what's the advantage of using that method over d.items[n]?

I think I have lost the thread here, sorry. So I explain again what I
mean. I think for this data structure it's important to keep all the
normal dict operations at the same speed. If you use a C
implementation vaguely similar to my pure python recipe you can
perform the del in O(1) too, because pairs are joined in (double)
linked list. But such data structure is O(n) to find the n-th item
inserted into the sequence.

Bye,
bearophile
 
M

Martin v. Löwis

I think I have lost the thread here, sorry. So I explain again what I
mean. I think for this data structure it's important to keep all the
normal dict operations at the same speed. If you use a C
implementation vaguely similar to my pure python recipe you can
perform the del in O(1) too, because pairs are joined in (double)
linked list. But such data structure is O(n) to find the n-th item
inserted into the sequence.

Right. So byindex(n) would be O(n) then, right? If so, what's the
purpose of having that method in the first place?

The PEP doesn't give a rationale, but just proposes that the method
be there. My guess is that it includes it for performance reasons.
However, I think the PEP (author) is misguided in assuming that
making byindex() a method of odict, you get better performance than
directly doing .items()[n] - which, as you say, you won't.

Regards,
Martin
 
M

Martin v. Löwis

Well, this has become something of a rant,

Indeed - and I was only asking about .byindex(n) :)

I don't know why that method is included in the PEP. Speculating
myself, I assume that the PEP author wants it to be O(1). As
bearophile explains, that's not possible/not a good idea.

As for releasing the odict - that will likely have mostly the
same cost as releasing a regular dictionary, except that you
might have to look at each key twice (once in the regular dict,
once in the structure preserving the order). It might be
that the odict uses a linked-list style approach, which indeed
would nto be so good from a allocation/deallocation overhead
point of view. With the approach bearophile suggests (i.e.
each slot in the dict has a reference to the predecessor and
successor key also), a pure Python implementation would normally
allocate additional memory per key, whereas a pure C implementation
wouldn't (adding the space for the prev/next pointer right into
the slot's struct definition.

Regards,
Martin
 
B

bearophileHUGS

Martin v. L.:
However, I think the PEP (author) is misguided in assuming that
making byindex() a method of odict, you get better performance than
directly doing .items()[n] - which, as you say, you won't.

In Python 2.5 .items()[n] creates a whole list, and then takes one
item of such list.
An O(n) byindex() just scans the items to return the n-th.
So while being both O(n) in time, the .items()[n] may allocate quite
more memory.

Bye,
bearophile
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top