Ordered Sets

Discussion in 'Python' started by Raymond Hettinger, Mar 19, 2009.

  1. Raymond Hettinger, Mar 19, 2009
    #1
    1. Advertising

  2. Raymond Hettinger

    Aahz Guest

    In article <>,
    Raymond Hettinger <> wrote:
    >
    >Here's a new, fun recipe for you guys:
    >
    >http://code.activestate.com/recipes/576694/


    That is *sick* and perverted.
    --
    Aahz () <*> http://www.pythoncraft.com/

    "Programming language design is not a rational science. Most reasoning
    about it is at best rationalization of gut feelings, and at worst plain
    wrong." --GvR, python-ideas, 2009-3-1
    Aahz, Mar 19, 2009
    #2
    1. Advertising

  3. Raymond Hettinger

    Nigel Rantor Guest

    Aahz wrote:
    > In article <>,
    > Raymond Hettinger <> wrote:
    >> Here's a new, fun recipe for you guys:
    >>
    >> http://code.activestate.com/recipes/576694/

    >
    > That is *sick* and perverted.


    I'm not sure why.

    Would it be less sick if it had been called "UniqueList" ?

    n
    Nigel Rantor, Mar 20, 2009
    #3
  4. Raymond Hettinger

    Aahz Guest

    In article <>,
    Nigel Rantor <> wrote:
    >Aahz wrote:
    >> In article <>,
    >> Raymond Hettinger <> wrote:
    >>>
    >>> Here's a new, fun recipe for you guys:
    >>>
    >>> http://code.activestate.com/recipes/576694/

    >>
    >> That is *sick* and perverted.

    >
    >I'm not sure why.
    >
    >Would it be less sick if it had been called "UniqueList" ?


    The doubly-linked list part is what's sick and perverted.
    --
    Aahz () <*> http://www.pythoncraft.com/

    "Programming language design is not a rational science. Most reasoning
    about it is at best rationalization of gut feelings, and at worst plain
    wrong." --GvR, python-ideas, 2009-3-1
    Aahz, Mar 20, 2009
    #4
  5. [Aahz]
    > The doubly-linked list part is what's sick and perverted.


    The doubly-linked list part is what gives it big-oh running
    times the same as regular sets. If the underlying sequence
    is stored as a list, then deletion goes from O(1) to O(n).
    If the insertion times are recorded in a dictionary, then
    the insertion order has to be sorted prior to iteration
    which makes iteration go from O(n) to O(n log n).


    Raymond
    Raymond Hettinger, Mar 20, 2009
    #5
  6. [Scott David Daniels]
    > The double-linked list part could be done with weakrefs and
    > perhaps Aahz would relax a bit.


    Am thinking that the __del__() takes care of business.
    The clear() operation removes the links and all references
    to them (including circular).


    Raymond
    Raymond Hettinger, Mar 20, 2009
    #6
  7. Raymond Hettinger

    Aaron Brady Guest

    On Mar 20, 1:50 pm, Scott David Daniels <> wrote:
    > Raymond Hettinger wrote:
    > > [Aahz]
    > >> The doubly-linked list part is what's sick and perverted.

    >
    > > The doubly-linked list part is what gives it big-oh running
    > > times the same as regular sets.  If the underlying sequence
    > > is stored as a list, then deletion goes from O(1) to O(n).
    > > If the insertion times are recorded in a dictionary, then
    > > the insertion order has to be sorted prior to iteration
    > > which makes iteration go from O(n) to O(n log n).

    >
    > The double-linked list part could be done with weakrefs and
    > perhaps Aahz would relax a bit.
    >
    > --Scott David Daniels
    >


    Whether the structure holds one reference to its "containees" or three
    doesn't matter. But you're right, weakrefs are relaxing.

    I pronounce it the perfect data structure. Tell your CS 200 professor
    immediately. Shout it from the rooftops.

    IINM if I'm not mistaken, an O(1) 'multiadd' method wouldn't be hard
    either. If the item exists already, insert it in place in the linked
    list.
    Aaron Brady, Mar 21, 2009
    #7
  8. On Fri, 20 Mar 2009 11:50:28 -0700, Scott David Daniels wrote:

    > Raymond Hettinger wrote:
    >> [Aahz]
    >>> The doubly-linked list part is what's sick and perverted.

    >>
    >> The doubly-linked list part is what gives it big-oh running times the
    >> same as regular sets. If the underlying sequence is stored as a list,
    >> then deletion goes from O(1) to O(n). If the insertion times are
    >> recorded in a dictionary, then the insertion order has to be sorted
    >> prior to iteration which makes iteration go from O(n) to O(n log n).

    >
    > The double-linked list part could be done with weakrefs and perhaps Aahz
    > would relax a bit.


    I'm not sure whether Aahz's description is meant to be taken seriously,
    or if there is a smiley hidden in his post. After all, doubly-linked
    lists are a standard building block in data structure design.

    But assuming it is meant as a serious criticism, I'm not sure that I like
    the idea of using weakrefs. I think it is a feature, not a bug, that
    sets, lists and dicts keep a reference to the items in them. I think it
    would be bizarre if putting an item into an OrderSet meant that you
    needed to put it somewhere else as well to prevent it being garbage
    collected.

    Or have I misunderstood the consequences of using weakrefs?



    --
    Steven
    Steven D'Aprano, Mar 21, 2009
    #8
  9. Raymond Hettinger

    Terry Reedy Guest

    Steven D'Aprano wrote:
    > On Fri, 20 Mar 2009 11:50:28 -0700, Scott David Daniels wrote:
    >
    >> Raymond Hettinger wrote:
    >>> [Aahz]
    >>>> The doubly-linked list part is what's sick and perverted.
    >>> The doubly-linked list part is what gives it big-oh running times the
    >>> same as regular sets. If the underlying sequence is stored as a list,
    >>> then deletion goes from O(1) to O(n). If the insertion times are
    >>> recorded in a dictionary, then the insertion order has to be sorted
    >>> prior to iteration which makes iteration go from O(n) to O(n log n).

    >> The double-linked list part could be done with weakrefs and perhaps Aahz
    >> would relax a bit.

    >
    > I'm not sure whether Aahz's description is meant to be taken seriously,
    > or if there is a smiley hidden in his post. After all, doubly-linked
    > lists are a standard building block in data structure design.
    >
    > But assuming it is meant as a serious criticism, I'm not sure that I like
    > the idea of using weakrefs. I think it is a feature, not a bug, that
    > sets, lists and dicts keep a reference to the items in them. I think it
    > would be bizarre if putting an item into an OrderSet meant that you
    > needed to put it somewhere else as well to prevent it being garbage
    > collected.
    >
    > Or have I misunderstood the consequences of using weakrefs?


    I think the idea was 2 weakrefs and 1 normal ref instead of 3 normal refs.
    Terry Reedy, Mar 21, 2009
    #9
  10. [Terry Reedy]
    > I think the idea was 2 weakrefs and 1 normal ref instead of 3 normal refs.


    Right. Here's a link to a weakref version (though I think the
    previous __del__ version also does the trick):

    http://code.activestate.com/recipes/576696/


    Raymond
    Raymond Hettinger, Mar 21, 2009
    #10
  11. Raymond Hettinger

    pataphor Guest

    Raymond Hettinger wrote:

    > Right. Here's a link to a weakref version (though I think the
    > previous __del__ version also does the trick):
    >
    > http://code.activestate.com/recipes/576696/


    Interesting. But how about this one? Does it also have the reference
    problem? By the way collections.MutableSet needs python >= 2.6

    P.

    import collections

    class OrderedSet(collections.MutableSet):
    'Set that remembers the order elements were added'
    # Big-O running times for all methods are the same as for regular sets.

    def __init__(self, iterable=None):
    self.__map = {} # key --> [prev,key,next]
    if iterable is not None:
    self |= iterable

    def __len__(self):
    return len(self.__map)

    def __contains__(self, key):
    return key in self.__map

    def add(self, key):
    if not self.__map:
    self.start = key
    self.__map = {key: [key,key,key]}
    elif key not in self.__map:
    a,b,c = self.__map[self.start]
    self.__map[a][2] = key
    self.__map[0] = key
    self.__map[key] = [a,key,b]

    def discard(self, key):
    if key in self.__map:
    a,b,c = self.__map.pop(key)
    if self.__map:
    self.__map[a][2] = c
    self.__map[c][0] = a
    if b == self.start:
    self.start = c

    def __iter__(self):
    if self.__map:
    a,b,c = self.__map[self.start]
    while c is not self.start:
    yield b
    a,b,c = self.__map[c]
    yield b

    def __reversed__(self):
    if self.__map:
    a,b,c = self.__map[self.start]
    while a is not self.start:
    yield a
    a,b,c = self.__map[a]
    yield a

    def pop(self, last=True):
    if not self:
    raise KeyError('set is empty')
    key = next(reversed(self)) if last else next(iter(self))
    self.discard(key)
    return key

    def __repr__(self):
    if not self:
    return '%s()' % (self.__class__.__name__,)
    return '%s(%r)' % (self.__class__.__name__, list(self))

    def __eq__(self, other):
    if isinstance(other, OrderedSet):
    return len(self) == len(other) and list(self) == list(other)
    return not self.isdisjoint(other)

    def test():
    D = OrderedSet('abcd')
    R = OrderedSet()
    for x in list(D):
    print(D,R)
    R.add(D.pop(last = False))
    print(D,R)

    if __name__ == '__main__':
    test()
    pataphor, Mar 26, 2009
    #11
  12. En Thu, 26 Mar 2009 12:20:17 -0300, Scott David Daniels
    <> escribió:

    > (2) Why, oh why, do people feel so comforted adding double_underscores
    > to data structures? If I want to inherit from your mapping in order
    > to log the attempts to discard a key not actually in the set, I
    > have to either us the nasty name mangling to get at self.__map, or
    > name my subclass OrderedSet and take advantage of a not-guaranteed
    > name mangling. What on earth makes you unwilling to go with "_map"
    > and credit your code's consumers with a bit of common sense?
    >
    > Sorry, this latter rant has been building up for a while (spurred on by
    > a sojourn into the unittest code where they have the private disease in
    > spades.


    My commiserations.

    --
    Gabriel Genellina
    Gabriel Genellina, Mar 28, 2009
    #12
  13. Raymond Hettinger

    Aahz Guest

    In article <01d457aa$0$17208$>,
    Steven D'Aprano <> wrote:
    >On Fri, 20 Mar 2009 11:50:28 -0700, Scott David Daniels wrote:
    >> Raymond Hettinger wrote:
    >>> [Aahz]
    >>>>
    >>>> The doubly-linked list part is what's sick and perverted.
    >>>
    >>> The doubly-linked list part is what gives it big-oh running times the
    >>> same as regular sets. If the underlying sequence is stored as a list,
    >>> then deletion goes from O(1) to O(n). If the insertion times are
    >>> recorded in a dictionary, then the insertion order has to be sorted
    >>> prior to iteration which makes iteration go from O(n) to O(n log n).

    >>
    >> The double-linked list part could be done with weakrefs and perhaps Aahz
    >> would relax a bit.

    >
    >I'm not sure whether Aahz's description is meant to be taken seriously,
    >or if there is a smiley hidden in his post. After all, doubly-linked
    >lists are a standard building block in data structure design.


    It's a half-smiley. I find the trick of using a Python list to store the
    doubly-linked list difficult to understand (as opposed to the usual
    mechanism of a node class). I understand why it was done (time and space
    efficiency), but I also still feel emotionally that it's somewhat sick
    and perverted. I probably would feel more comfortable if the
    doubly-linked list were abstracted out and commented. Heck, the whole
    thing needs unit tests.

    I needed to look up the code again, so I might as well repeat the URL to
    make it handy for everyone else:

    http://code.activestate.com/recipes/576694/
    --
    Aahz () <*> http://www.pythoncraft.com/

    "Debugging is twice as hard as writing the code in the first place.
    Therefore, if you write the code as cleverly as possible, you are, by
    definition, not smart enough to debug it." --Brian W. Kernighan
    Aahz, Mar 29, 2009
    #13
  14. Raymond Hettinger

    CTO Guest

    On Mar 28, 3:33 am, "Gabriel Genellina" <>
    wrote:
    > En Thu, 26 Mar 2009 12:20:17 -0300, Scott David Daniels  
    > <> escribió:
    >
    > > (2) Why, oh why, do people feel so comforted adding double_underscores
    > >      to data structures?


    Probably because other authors feel too comfortable using single
    underscores for
    things the end user should mess with, as in namedtuple.
    CTO, Mar 29, 2009
    #14
  15. Raymond Hettinger

    pataphor Guest

    On Mon, 30 Mar 2009 03:30:04 -0500
    Nick Craig-Wood <> wrote:

    > >>> class Node(object):

    > ... __slots__ = ["prev", "next", "this"]
    > ... def __init__(self, prev, next, this):
    > ... self.prev = prev
    > ... self.next = next
    > ... self.this = this


    [...]

    > So the Node class actually takes less memory 38 Mbytes vs 53 Mbytes
    > for the list.


    Well OK, but if we're going to start optimizing, it seems we don't need
    to store the key itself in the Node or the list. Here's a version of my
    script that stores only prev and next. Another twist is that it is now
    possible to arbitrarily set the starting point for the iteration. (Just
    in case someone needed a cue for further ranting)

    P.

    import collections

    class OrderedSet(collections.MutableSet):
    'Set that remembers the order elements were added'
    # Big-O running times for all methods are the same as for regular
    sets.
    def __init__(self, iterable=None):
    self.links = {} # key --> [prev,next]
    if iterable is not None:
    self |= iterable

    def __len__(self):
    return len(self.links)

    def __contains__(self, key):
    return key in self.links

    def add(self, key):
    if not self:
    self.start = key
    self.links = {key: [key,key]}
    elif key not in self.links:
    links = self.links
    start = self.start
    prev, next = links[start]
    links[prev][1] = key
    links[start][0] = key
    links[key] = [prev,start]

    def discard(self, key):
    links = self.links
    if key in links:
    prev,next = links.pop(key)
    if self.links:
    links[prev][1] = next
    links[next][0] = prev
    if key == self.start:
    self.start = next
    else:
    del self.start

    def __iter__(self):
    links = self.links
    start = self.start
    if links:
    yield start
    prev,next = links[start]
    while next != start:
    yield next
    prev,next = links[next]

    def __reversed__(self):
    links = self.links
    start = self.start
    if links:
    prev,next = links[start]
    while prev != start:
    yield prev
    prev,next = self.links[prev]
    yield start

    def pop(self, last=True):
    if not self:
    raise KeyError('set is empty')
    key = next(reversed(self)) if last else next(iter(self))
    self.discard(key)
    return key

    def __repr__(self):
    if not self:
    return '%s()' % (self.__class__.__name__,)
    return '%s(%r)' % (self.__class__.__name__, list(self))

    def __eq__(self, other):
    if isinstance(other, OrderedSet):
    return len(self) == len(other) and list(self) == list(other)
    return not self.isdisjoint(other)

    def test():
    D = OrderedSet('abcd')
    R = OrderedSet()
    for x in list(D):
    print(D,R)
    R.add(D.pop(last = False))
    print(D,R)
    print()
    R.start = 'c'
    print(list(reversed(R)))

    if __name__ == '__main__':
    test()
    pataphor, Mar 30, 2009
    #15
  16. Raymond Hettinger

    pataphor Guest

    On Mon, 30 Mar 2009 19:57:39 -0700 (PDT)
    Alex_Gaynor <> wrote:

    > I really like the Ordered Set class(I've been thinking about one ever
    > since ordered dict hit the std lib), is there any argument against
    > adding one to the collections module? I'd be willing to write a PEP
    > up for it.


    Suppose the implementation would use a circular linked list. Then the
    iteration could start from any point, the thing is symmetric after all.
    But this would also mean we could add items to the left of that new
    starting point, since that would now be the 'end' of the circle. This
    is something different from merely remembering insertion order. How do
    you feel about that?

    P.
    pataphor, Mar 31, 2009
    #16
  17. Raymond Hettinger

    pataphor Guest

    On Tue, 31 Mar 2009 10:33:26 +0200
    pataphor <> wrote:

    > On Mon, 30 Mar 2009 19:57:39 -0700 (PDT)
    > Alex_Gaynor <> wrote:
    >
    > > I really like the Ordered Set class(I've been thinking about one
    > > ever since ordered dict hit the std lib), is there any argument
    > > against adding one to the collections module? I'd be willing to
    > > write a PEP up for it.

    >
    > Suppose the implementation would use a circular linked list. Then the
    > iteration could start from any point, the thing is symmetric after
    > all. But this would also mean we could add items to the left of that
    > new starting point, since that would now be the 'end' of the circle.
    > This is something different from merely remembering insertion order.
    > How do you feel about that?


    And in case that didn't confuse you enough, how about this method?

    def move(self,key1,key2):
    #self ==> key1,(key2 ... end), (key1+1... key2-1)
    links = self.links
    if set([key1,key2]) and self :
    start = self.start
    a = links[key1][1]
    b = links[key2][0]
    c = links[start][0]
    links[key1][1] = key2
    links[key2][0] = key1
    links[a][0] = c
    links[c][1] = a
    links[1] = start
    links[start][0] = b

    This takes [key2:] (isn't pseudo slice notation wonderful?) and
    inserts it after key1.

    for example:

    R = OrderedSet(range(10))
    print(list(R))
    R.move(3,7)
    print(list(R))

    gives:

    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    [0, 1, 2, 3, 7, 8, 9, 4, 5, 6]

    All in O(1) of course.

    P.
    pataphor, Mar 31, 2009
    #17
  18. Raymond Hettinger

    pataphor Guest

    On Tue, 31 Mar 2009 06:03:13 -0700 (PDT)
    Alex_Gaynor <> wrote:

    > My inclination would be to more or less *just* have it implement the
    > set API, the way ordered dict does in 2.7/3.1.


    As far as I can tell all that would be needed is read/write access to
    two key variables: The iterator start position and the underlying map.
    There is no need for more than basic set API since people can use
    those two variables to subclass their own iterators.

    P.
    pataphor, Mar 31, 2009
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Newbie
    Replies:
    1
    Views:
    509
    Andrew Thompson
    Apr 7, 2004
  2. Arvind Ganesan

    ordered list (OL) tag with tables

    Arvind Ganesan, Sep 6, 2003, in forum: HTML
    Replies:
    10
    Views:
    10,457
    Nico Schuyt
    Sep 6, 2003
  3. Amit Khemka

    ordered sets operations on lists..

    Amit Khemka, Feb 10, 2006, in forum: Python
    Replies:
    13
    Views:
    586
    Kay Schluehr
    Feb 13, 2006
  4. kelvSYC

    Ordered Sets

    kelvSYC, Dec 19, 2007, in forum: Java
    Replies:
    14
    Views:
    912
    kelvSYC
    Dec 20, 2007
  5. DL

    Ordered list inside ordered list

    DL, Nov 9, 2009, in forum: Javascript
    Replies:
    6
    Views:
    307
    Dr J R Stockton
    Nov 21, 2009
Loading...

Share This Page