Ordering python sets

Discussion in 'Python' started by Mr.SpOOn, Oct 22, 2008.

  1. Mr.SpOOn

    Mr.SpOOn Guest

    Hi,
    I need a structure to represent a set of integers. I also need to
    perform on this set some basic set operations, such as adding or
    removing elements, joining with other sets and checking for the
    presence of specific elements.

    I think that using Python sets would be the best choice, but I also
    need integers to be ordered inside the set and I've just found out
    that, instead, Python sets are unordered collections.

    Is there another convenient structure or shall I use lists and define
    the operations I need?

    Thanks.
    Mr.SpOOn, Oct 22, 2008
    #1
    1. Advertising

  2. Mr.SpOOn

    Guest

    Mr.SpOOn:
    > Is there another convenient structure or shall I use lists and define
    > the operations I need?


    <musings>
    As Python becomes accepted for more and more "serious" projects some
    more data structures can eventually be added to the collections
    module:
    - SortedSet, SortedDict: can be based on red-black trees. Require
    items to be sortable (them being hashable isn't required, but it's
    probably more safe that they are immutable).
    - Odict: a dict ordered according to insertion order.
    - Bidict: a unordered dict that allows O(1) retrieval on both keys and
    values (that are both unique).
    - Graph: a directed unsorted data structure like mine may be
    acceptable too.
    - Bitset: dynamically re-sizable and efficient in space and time, easy
    to implement in C.
    - Chain: simulates a double linked list, but its implementation can be
    similar to the current Deque but allowing not completely filled blocks
    in the middle too. (I haven't named it just List because there's a
    name clash with the list()).
    - I use all those data structures in Python programs, plus some more,
    like interval map, trie (and a dawg), persistent dict and persistent
    list, kd-tree, BK-tree, Fibonacci Heap, a rank & select, a disjoint-
    set, and maybe more. But those are uncommon enough to be left out of a
    standard library.
    - A problem with the Chain data structure is how to represent
    iterators in Python. I think this is a big problem, that I don't know
    how to solve yet. A possible solution is to make them owned by the
    Chain itself, but this makes the code slow down linearly in accord to
    the number of the iterators. If someone has a solution I'm all ears.
    </musings>

    Bye,
    bearophile
    , Oct 22, 2008
    #2
    1. Advertising

  3. Mr.SpOOn

    Lie Ryan Guest

    On Wed, 22 Oct 2008 10:43:35 -0700, bearophileHUGS wrote:

    > Mr.SpOOn:
    >> Is there another convenient structure or shall I use lists and define
    >> the operations I need?

    >
    > <musings>
    > As Python becomes accepted for more and more "serious" projects some
    > more data structures can eventually be added to the collections module:
    > - SortedSet, SortedDict: can be based on red-black trees. Require items
    > to be sortable (them being hashable isn't required, but it's probably
    > more safe that they are immutable). - Odict: a dict ordered according to
    > insertion order. - Bidict: a unordered dict that allows O(1) retrieval
    > on both keys and values (that are both unique).
    > - Graph: a directed unsorted data structure like mine may be acceptable
    > too.
    > - Bitset: dynamically re-sizable and efficient in space and time, easy
    > to implement in C.
    > - Chain: simulates a double linked list, but its implementation can be
    > similar to the current Deque but allowing not completely filled blocks
    > in the middle too. (I haven't named it just List because there's a name
    > clash with the list()).
    > - I use all those data structures in Python programs, plus some more,
    > like interval map, trie (and a dawg), persistent dict and persistent
    > list, kd-tree, BK-tree, Fibonacci Heap, a rank & select, a disjoint-
    > set, and maybe more. But those are uncommon enough to be left out of a
    > standard library.
    > - A problem with the Chain data structure is how to represent iterators
    > in Python. I think this is a big problem, that I don't know how to solve
    > yet. A possible solution is to make them owned by the Chain itself, but
    > this makes the code slow down linearly in accord to the number of the
    > iterators. If someone has a solution I'm all ears. </musings>
    >
    > Bye,
    > bearophile


    <anotherrandommusing>
    Since python is dynamic language, I think it should be possible to do
    something like this:

    a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')
    b = dict({'a': 'A'}, implementation = 'binarytree')
    c = dict({'a': 'A'}, implementation = 'binarytree')

    i.e. basically since a data structure can have different implementations,
    and different implementations have different performance characteristics,
    it should be possible to dynamically change the implementation used.

    In the far future, the data structure and its implementation could be
    abstracted even further:

    a = list() # ordered list
    b = set() # unordered list
    c = dict() # unordered dictionary
    d = sorteddict() # ordered dictionary

    Each of the implementations would share a common subset of methods and
    possibly a few implementation dependent method that could only work on
    certain implementations (or is extremely slow except in the correct
    implementation).
    </anotherrandommusing>
    Lie Ryan, Oct 25, 2008
    #3
  4. On Sat, 25 Oct 2008 08:58:18 +0000, Lie Ryan wrote:

    > <anotherrandommusing>
    > Since python is dynamic language, I think it should be possible to do
    > something like this:
    >
    > a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')
    > b = dict({'a': 'A'}, implementation = 'binarytree')
    > c = dict({'a': 'A'}, implementation = 'binarytree')


    Oh I hope not. I think you have mistaken "dynamic" for "chaotic".

    When I see a dict, I want to know that any two dicts work the same way. I
    don't want to have to search the entire project's source code to find out
    if it is a dict implemented as a hash table with O(1) lookups, or a dict
    implemented as a binary tree with O(log N) lookups, or a dict implemented
    as a linear array with O(N) lookups.

    If I wanted that sort of nightmare, I can already do it by shadowing the
    builtin:

    dict = binarytree
    D = dict({'a': 'A'}) # make a binary tree

    There is no possible good that come from this suggestion. The beauty of
    Python is that the built-in data structures (list, dict, set) are
    powerful enough for 99% of uses[1], and for the other 1%, you can easily
    and explicitly use something else.

    But *explicitly* is the point. There's never any time where you can do
    this:

    type(mydict) is dict

    and not know exactly what performance characteristics mydict will have.
    (Unless you shadow dict or type, or otherwise do something that breaks
    the rules.) You never need to ask, "Okay, it's a dict. What sort of dict?"

    If you want a binary tree, ask for a binary tree.






    [1] Your mileage may vary.


    --
    Steven
    Steven D'Aprano, Oct 25, 2008
    #4
  5. On approximately 10/25/2008 2:21 AM, came the following characters from
    the keyboard of Steven D'Aprano:
    > If you want a binary tree, ask for a binary tree.


    OK, where can I find a binary tree? One that works quickly and
    efficiently on at least Linux, Mac, and Windows?

    I don't find it in the standard library. I'd actually probably rather
    have a red-black tree, can't find that either. I found a pure-python
    tree, but it's performance was abysmal. I found some others that only
    claimed to work on some platforms.

    While on the topic of ordered keys, it seems that sorted's cmp feature
    is omitted from 3.0. That might be fine, but how does one create a key
    that corresponds to ascending integer followed by descending character
    string?

    --
    Glenn -- http://nevcal.com/
    ===========================
    A protocol is complete when there is nothing left to remove.
    -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
    Glenn Linderman, Oct 25, 2008
    #5
  6. Mr.SpOOn

    Lie Ryan Guest

    On Sat, 25 Oct 2008 09:21:05 +0000, Steven D'Aprano wrote:

    > On Sat, 25 Oct 2008 08:58:18 +0000, Lie Ryan wrote:
    >
    >> <anotherrandommusing>
    >> Since python is dynamic language, I think it should be possible to do
    >> something like this:
    >>
    >> a = list([1, 2, 3, 4, 5], implementation = 'linkedlist') b = dict({'a':
    >> 'A'}, implementation = 'binarytree') c = dict({'a': 'A'},
    >> implementation = 'binarytree')

    >
    > Oh I hope not. I think you have mistaken "dynamic" for "chaotic".
    >
    > When I see a dict, I want to know that any two dicts work the same way.


    Oh no, the two dict implementation would work _exactly_ the same from the
    outside, they are transparently interchangeable. Only the performance
    characteristic differs because of the different implementation. Actually
    I got this idea from a book about algorithm and data structure, that book
    said that an abstract data type (e.g. dict, set, list) have several
    competing implementations or data structures (e.g. binary tree dict,
    hashed dict, array dict). A data's implementation and interface can be
    separated so that we can switch the data structure's implementation
    without changing the rest of the code. The book is Algorithm Design
    Manual by Skiena.

    hint: read the PS below

    > I don't want to have to search the entire project's source code to find
    > out if it is a dict implemented as a hash table with O(1) lookups, or a
    > dict implemented as a binary tree with O(log N) lookups, or a dict
    > implemented as a linear array with O(N) lookups.


    No, you'd only need to look at the dict's creation point (or actually
    much better at projects docs, but not everyone create good docs). The
    alternative you mentioned below, by shadowing built-in, is both a hack
    and is much more likely to be missed.

    > If I wanted that sort of nightmare, I can already do it by shadowing the
    > builtin:
    >
    > dict = binarytree
    > D = dict({'a': 'A'}) # make a binary tree


    I DON'T want THAT sort of nightmare you mentioned...
    And it'd be impossible to have two dictionary that have two different
    implementations.

    > There is no possible good that come from this suggestion. The beauty of
    > Python is that the built-in data structures (list, dict, set) are
    > powerful enough for 99% of uses[1], and for the other 1%, you can easily
    > and explicitly use something else.


    Oh really? As far as I know, python's list is extremely bad if you're
    inserting data at the beginning of the list (e.g. lst.insert(0) requires
    the whole array be "copied" one address away). This is because python's
    list uses array data structure, making addressing (e.g. a[2]) fast, but
    insertion slow. If, on the other hand, it is implemented using binary
    tree, it'd make insertion O(log n) but addressing is a bit tricky on it.

    The keyword is "tradeoffs".

    > But *explicitly* is the point. There's never any time where you can do
    > this:


    Yes, true, explicitly IS the point. How more explicit can you be than:
    dict({foo: bar}, implementation = 'binarytree')

    > type(mydict) is dict


    If my memory serves right, binary tree dict and hashed table dict is both
    a dict right? (Duck Typing)
    Only their implementation differs. Implementation is... well,
    "implementation detail".

    > and not know exactly what performance characteristics mydict will have.


    Oh... why do I need to know what the performance characteristic of mydict
    is? Unless I know what I'm doing.

    > (Unless you shadow dict or type, or otherwise do something that breaks
    > the rules.) You never need to ask, "Okay, it's a dict. What sort of
    > dict?"


    Okay, it's a dict. What sort of dict? Who the hell cares? I don't need to
    know, they all looks and behave the same (Duck Typing)... at least until
    I profile them (since profiling is a deep black magic by itself, it
    cannot be used to discredit switching implementations).

    Sometimes we need a data type to use a specific data structure that have
    some specific performance characteristic, because we know we'll be doing
    a specific operation a lot more than other operations.

    If you actually need to know what the implementation is currently being
    used, you could implement a dict.implementation property.

    > If you want a binary tree, ask for a binary tree.


    Yeah, you ask for binary tree EXPLICITLY:
    bintreedict = dict({a:b}, implementation = 'binarytree')

    this:
    regularhasheddict = dict({a:b})

    would have a reasonable default.


    PS: I do admit I have used the wrong terms in the last post. I used the
    term "data structure", instead it should be "abstract data type", "data
    structure" is a synonym for "implementation". In this post, I hope I've
    corrected all of the usage.
    Lie Ryan, Oct 25, 2008
    #6
  7. Mr.SpOOn

    Terry Reedy Guest

    Lie Ryan wrote:

    >
    > <anotherrandommusing>
    > Since python is dynamic language, I think it should be possible to do
    > something like this:
    >
    > a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')


    For this to work, the abstract list would have to know about all
    implementations of the abstraction.

    > b = dict({'a': 'A'}, implementation = 'binarytree')
    > c = dict({'a': 'A'}, implementation = 'binarytree')
    >
    > i.e. basically since a data structure can have different implementations,
    > and different implementations have different performance characteristics,
    > it should be possible to dynamically change the implementation used.
    >
    > In the far future, the data structure and its implementation could be
    > abstracted even further:
    >
    > a = list() # ordered list
    > b = set() # unordered list
    > c = dict() # unordered dictionary
    > d = sorteddict() # ordered dictionary
    >
    > Each of the implementations would share a common subset of methods and
    > possibly a few implementation dependent method that could only work on
    > certain implementations (or is extremely slow except in the correct
    > implementation).
    > </anotherrandommusing>


    The future is 3.0, at least in part, with Abstract Base Classes.
    There are 16 in the collectons module.
    "In addition to containers, the collections module provides some ABCs
    (abstract base classes) that can be used to test whether a class
    provides a particular interface, for example, is it hashable or a
    mapping, and some of them can also be used as mixin classes."

    The ABCs for numbers are in the numbers module.

    tjr
    Terry Reedy, Oct 25, 2008
    #7
  8. Mr.SpOOn

    Lie Ryan Guest

    On Sat, 25 Oct 2008 18:20:46 -0400, Terry Reedy wrote:

    > Lie Ryan wrote:
    >
    >
    >> <anotherrandommusing>
    >> Since python is dynamic language, I think it should be possible to do
    >> something like this:
    >>
    >> a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')

    >
    > For this to work, the abstract list would have to know about all
    > implementations of the abstraction.


    /the exact syntax isn't really important/
    /abstract type and implementation separation is the important point/

    Actually, if I want to force it, that syntax could work using the same
    magic used by event-based syst
    Lie Ryan, Oct 25, 2008
    #8
  9. Mr.SpOOn

    Lie Ryan Guest

    On Sat, 25 Oct 2008 18:20:46 -0400, Terry Reedy wrote:

    > Lie Ryan wrote:
    >
    >
    >> <anotherrandommusing>
    >> Since python is dynamic language, I think it should be possible to do
    >> something like this:
    >>
    >> a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')

    >
    > For this to work, the abstract list would have to know about all
    > implementations of the abstraction.


    # Sorry the last message is truncated because of an "accident"

    /the exact syntax isn't really important/
    /abstract type and implementation separation is the important point/

    Actually, if I want to force it, that syntax could work using the same
    magic used by event-based systems: registration. Although I agree it
    might be a bit cumbersome to do registration for something like this, but
    as I've said before, exact syntax is not really important.
    Lie Ryan, Oct 25, 2008
    #9
  10. On Sat, 25 Oct 2008 21:53:10 +0000, Lie Ryan wrote:

    > On Sat, 25 Oct 2008 09:21:05 +0000, Steven D'Aprano wrote:
    >
    >> On Sat, 25 Oct 2008 08:58:18 +0000, Lie Ryan wrote:
    >>
    >>> <anotherrandommusing>
    >>> Since python is dynamic language, I think it should be possible to do
    >>> something like this:
    >>>
    >>> a = list([1, 2, 3, 4, 5], implementation = 'linkedlist') b =
    >>> dict({'a': 'A'}, implementation = 'binarytree') c = dict({'a': 'A'},
    >>> implementation = 'binarytree')

    >>
    >> Oh I hope not. I think you have mistaken "dynamic" for "chaotic".
    >>
    >> When I see a dict, I want to know that any two dicts work the same way.

    >
    > Oh no, the two dict implementation would work _exactly_ the same from
    > the outside, they are transparently interchangeable. Only the
    > performance characteristic differs because of the different
    > implementation.


    Exactly. That was my point.



    [...]
    >> I don't want to have to search the entire project's source code to find
    >> out if it is a dict implemented as a hash table with O(1) lookups, or a
    >> dict implemented as a binary tree with O(log N) lookups, or a dict
    >> implemented as a linear array with O(N) lookups.

    >
    > No, you'd only need to look at the dict's creation point (or actually
    > much better at projects docs, but not everyone create good docs).


    And how do you find an arbitrary object's creation point without
    searching the project's source code?



    >> If I wanted that sort of nightmare, I can already do it by shadowing
    >> the builtin:
    >>
    >> dict = binarytree
    >> D = dict({'a': 'A'}) # make a binary tree

    >
    > I DON'T want THAT sort of nightmare you mentioned... And it'd be
    > impossible to have two dictionary that have two different
    > implementations.


    Nonsense.

    dict = binarytree
    D1 = dict({'a': 'A'}) # make a binary tree "dict"
    dict = __builtin__.dict
    D2 = dict({'a': 'A'}) # make a standard dict
    dict = someothertype
    D3 = dict({'a': 'A'})

    I'm not suggesting this is a good idea. This is a terrible idea. But it
    is not much worse than your idea:

    D1 = dict({'a': 'A'}, implementation='binarytree')
    D2 = dict({'a': 'A'}, implementation='dict')
    D3 = dict({'a': 'A'}, implementation='someothertype')


    >> There is no possible good that come from this suggestion. The beauty of
    >> Python is that the built-in data structures (list, dict, set) are
    >> powerful enough for 99% of uses[1], and for the other 1%, you can
    >> easily and explicitly use something else.

    >
    > Oh really? As far as I know, python's list is extremely bad if you're
    > inserting data at the beginning of the list


    And how often do you do that?

    And when you do, use a deque. Just call it a deque.


    [...]
    >> But *explicitly* is the point. There's never any time where you can do
    >> this:

    >
    > Yes, true, explicitly IS the point. How more explicit can you be than:
    > dict({foo: bar}, implementation = 'binarytree')
    >
    >> type(mydict) is dict



    You miss the point. With your plan, you can do this:

    D1 = dict({foo: bar}, implementation = 'binarytree')
    D2 = dict({foo: bar}, implementation = 'dict')
    type(D1) is type(D2)

    and yet D1 and D2 have UTTERLY different performance characteristics. So
    now you need to add ANOTHER test to distinguish dicts-which-are-dicts
    from dicts-which-are-binary-trees:

    D1.implementation != D2.implementation

    And why? So you can avoid calling a goose a goose, and call it a duck
    instead.


    > If my memory serves right, binary tree dict and hashed table dict is
    > both a dict right? (Duck Typing)
    > Only their implementation differs. Implementation is... well,
    > "implementation detail".


    Duck typing refers to *interface*, not implementation. I have no problem
    with you using a type with the same interface as a dict. That's what duck
    typing is all about. Just don't call it a dict!


    >> and not know exactly what performance characteristics mydict will have.

    >
    > Oh... why do I need to know what the performance characteristic of
    > mydict is? Unless I know what I'm doing.


    http://www.joelonsoftware.com/articles/fog0000000319.html


    Because when you do this:

    mydict[key] = 1

    it's important whether each dict lookup is O(1), O(log N) or O(N). For a
    dict with one million items, that means that an implementation based on a
    binary tree does O(20) times more processing than a dict, and an
    implementation based on linear searching does O(1000000) times more
    processing.

    If you think implementation details don't matter, try this:

    s1 = 'c'*(10**6)

    versus

    s2 = ''
    for i in xrange(10**6):
    s2 = 'c' + s2 # defeat optimizer


    >> (Unless you shadow dict or type, or otherwise do something that breaks
    >> the rules.) You never need to ask, "Okay, it's a dict. What sort of
    >> dict?"

    >
    > Okay, it's a dict. What sort of dict? Who the hell cares?


    If you don't care, then why are you specifying the implementation type?

    mydict = dict({'foo': 'bar'}, implementation="surprise me!")

    You can't have it both ways. If you care, then you know enough to want a
    hash table based dict (the standard) or a binary tree or something else.
    So go ahead and use whatever data structure you want. Just don't call it
    a dict.

    But if you don't care, then just use the Python standard data types.


    > I don't need
    > to know, they all looks and behave the same (Duck Typing)...


    Until you try a simple operation like len(mydict) and it takes three
    minutes because the implementation you choose is really inefficient.


    > Sometimes we need a data type to use a specific data structure that have
    > some specific performance characteristic, because we know we'll be doing
    > a specific operation a lot more than other operations.


    I'm not arguing that you should never use any other data structure. Go
    right ahead and use them all you like.


    >> If you want a binary tree, ask for a binary tree.

    >
    > Yeah, you ask for binary tree EXPLICITLY: bintreedict = dict({a:b},
    > implementation = 'binarytree')



    Or you could do it the right way:

    D1 = binarytree(data)
    D2 = dict(data)
    type(D1), type(D2)
    => returns binarytree, dict

    versus:

    D1 = dict(data, implementation='binarytree')
    D2 = dict(data)
    type(D1), type(D2)
    => returns dict, dict



    --
    Steven
    Steven D'Aprano, Oct 26, 2008
    #10
  11. Mr.SpOOn

    Terry Reedy Guest

    Lie Ryan wrote:
    > On Sat, 25 Oct 2008 18:20:46 -0400, Terry Reedy wrote:


    >>> a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')

    >> For this to work, the abstract list would have to know about all
    >> implementations of the abstraction.

    >
    > /the exact syntax isn't really important/
    > /abstract type and implementation separation is the important point/
    >
    > Actually, if I want to force it, that syntax could work using the same
    > magic used by event-based systems: registration.


    ABCs have registration method. The builtin ABCs have appropriate
    builtin classes preregistered.
    >>> import collections as co
    >>> mu = co.MutableSequence
    >>> issubclass(list, mu)

    True

    I believe user classes that inherit from an ABC are also registered, and
    other can be registered explicitly.

    >Although I agree it
    > might be a bit cumbersome to do registration for something like this, but
    > as I've said before, exact syntax is not really important.


    Then why do you object to current
    mylist = linkedlist(data)
    and request the harder to write and implement
    mylist = list(data, implementation = 'linkedlist')
    ?

    tjr
    Terry Reedy, Oct 26, 2008
    #11
  12. On Sat, 25 Oct 2008 21:53:10 +0000, Lie Ryan wrote:

    > Oh no, the two dict implementation would work _exactly_ the same from
    > the outside, they are transparently interchangeable. Only the
    > performance characteristic differs because of the different
    > implementation.


    They are not 100% interchangeably. Dictionaries as implemented need keys
    to be hashable objects and sorted dictionaries need the keys to have an
    order defined. So switching mapping types may fail unless the key
    objects already have the needed magic methods implemented to do the right
    thing.

    Ciao,
    Marc 'BlackJack' Rintsch
    Marc 'BlackJack' Rintsch, Oct 26, 2008
    #12
  13. Mr.SpOOn

    Lie Ryan Guest

    On Sun, 26 Oct 2008 00:53:18 +0000, Steven D'Aprano wrote:
    [...]
    > And how do you find an arbitrary object's creation point without
    > searching the project's source code?


    How is it better using the current way?
    Asking the .implementation field isn't much harder than asking the type
    (), and is much more obvious that we're asking the "implementation" being
    used.

    >>> If I wanted that sort of nightmare, I can already do it by shadowing
    >>> the builtin:
    >>>
    >>> dict = binarytree
    >>> D = dict({'a': 'A'}) # make a binary tree

    >>
    >> I DON'T want THAT sort of nightmare you mentioned... And it'd be
    >> impossible to have two dictionary that have two different
    >> implementations.

    >
    > Nonsense.
    >

    <snip>

    There is two good arguments for using a plain string to specify the
    implementation used: 1) Plain-text Configuration, 2) argument passing.

    1) The current way requires you have a map of the configuration text to a
    function (i.e. imps = {'linkedlist': linkedlist}), than assign the
    function to _a global name_ (i.e. lista = imps[configuration[confkey]] --
    namespace pollution), then instantiate an object using that (ls = lista
    ([...]))). The implementation way, only requires ls = list([...],
    implementation = configuration[confkey])

    2) Easily said, plain text passing is safer and easier than passing
    function object.

    >>> There is no possible good that come from this suggestion. The beauty
    >>> of Python is that the built-in data structures (list, dict, set) are
    >>> powerful enough for 99% of uses[1], and for the other 1%, you can
    >>> easily and explicitly use something else.

    >>
    >> Oh really? As far as I know, python's list is extremely bad if you're
    >> inserting data at the beginning of the list

    >
    > And how often do you do that?


    It's not even a deque yet, a regular queue is slow using list in python,
    and any insertion sort requires you to insert elements into arbitrary
    position. This makes using array for it a O(n**2). On the other hand,
    using binary tree is a O(log n).

    > [...]
    >>> But *explicitly* is the point. There's never any time where you can do
    >>> this:

    >>
    >> Yes, true, explicitly IS the point. How more explicit can you be than:
    >> dict({foo: bar}, implementation = 'binarytree')
    >>
    >>> type(mydict) is dict

    >
    >
    > You miss the point. With your plan, you can do this:
    >
    > D1 = dict({foo: bar}, implementation = 'binarytree') D2 = dict({foo:
    > bar}, implementation = 'dict') type(D1) is type(D2)
    >
    > and yet D1 and D2 have UTTERLY different performance characteristics.


    It does not matter that it have different performance characteristic.

    > So
    > now you need to add ANOTHER test to distinguish dicts-which-are-dicts
    > from dicts-which-are-binary-trees:


    How often would you need to care about the specific implementation used?
    Compare with how often you want to know how to work with the object. Most
    of the time, you only need to know how to work with it (i.e. knowing its
    interface), only _sometimes_ when it DOES matter speed-wise, you'd need
    to know and affect its implementation.

    > D1.implementation != D2.implementation
    >
    > And why? So you can avoid calling a goose a goose, and call it a duck
    > instead.


    I think we have an utterly different view here.
    I wished that list() becomes an abstract type, instead of a data
    structure. (binarytreelist and arraylist is some property of a list)

    While you insisted that list() is a data structure, and abstract type is
    some obscureness somewhere unknown.

    >> If my memory serves right, binary tree dict and hashed table dict is
    >> both a dict right? (Duck Typing)
    >> Only their implementation differs. Implementation is... well,
    >> "implementation detail".

    >
    > Duck typing refers to *interface*, not implementation.


    Well said, you've just supported my view.

    > I have no problem
    > with you using a type with the same interface as a dict. That's what
    > duck typing is all about. Just don't call it a dict!


    why do you think binarytreedict is "less dict" than hasheddict?

    >>> and not know exactly what performance characteristics mydict will
    >>> have.

    >>
    >> Oh... why do I need to know what the performance characteristic of
    >> mydict is? Unless I know what I'm doing.

    >
    > http://www.joelonsoftware.com/articles/fog0000000319.html


    Sometimes it is good to have control of everything. Sometimes (most of
    the time) I know I can dump some of the details to think on more
    important matters. To know which parts need more thinking by doing
    profiling.

    > Because when you do this:
    >
    > mydict[key] = 1
    >
    > it's important whether each dict lookup is O(1), O(log N) or O(N). For a
    > dict with one million items, that means that an implementation based on
    > a binary tree does O(20) times more processing than a dict, and an
    > implementation based on linear searching does O(1000000) times more
    > processing.


    Well, I don't need to know how the dict works if I'm only using it to
    store twenty items right? (I do know how python's dict/set/list/etc is
    implemented, but unless that particular dict is central to my program,
    I'm not going to create my programs around how dict is implemented, I'm
    going to create it on how my mind works).

    I don't care how print statement works, unless I'm trying to do something
    fancy with print. I don't care that print takes a series of electrical
    signal, then convert it into a series of bytes and pass it into the
    terminal stdout then the terminal will figure out how to draw the string
    using the appropriate font at the appropriate place and pass itself to a
    window manager which would compose the desktop into an image then pass
    itself to the graphic driver which would convert the image to electrical
    signal to be sent to the graphic card which would do some fancy tricks
    before sending it to the monitor, when what I need is only to show the
    user a simple welcome message.

    The reason we use high-level language is that we know we don't need to
    care about some details. Only when the detail matters then we take care
    of it. If I do care about all the details, I'd use C, no, no, lower, I'd
    use assembly, no, lower..., I'd design my own CPU, no even lower, I'd
    design my own non-von-neumann architecture. I don't think I can change
    the rules of physics though, but if I can... <some more pseudo random
    bytes>

    >> I don't need
    >> to know, they all looks and behave the same (Duck Typing)...

    >
    > Until you try a simple operation like len(mydict) and it takes three
    > minutes because the implementation you choose is really inefficient.


    .... then I know I have chosen the wrong data structure, and I know I need
    to change it.

    >>> If you want a binary tree, ask for a binary tree.

    >>
    >> Yeah, you ask for binary tree EXPLICITLY: bintreedict = dict({a:b},
    >> implementation = 'binarytree')

    >
    > Or you could do it the right way:

    <snip>

    For example, if I have a certain unknown object. What is easier?

    >>> type(URO) # Unidentified Round Object

    => returns (*&%$^%*

    or

    >>> type(URO)

    => returns diningplate

    only after three thousand years of continuous and laborious researching,
    at last I found out that (*&%$^%* is an alien dining plate, and it does
    have all the features and functionality as earth dining plate, although
    due to it being made of CClKO32C32CH42U it is much more resistant to
    acidity but would melt when you put carrot on it.

    Only when I want to eat carrot, would I care that it is an alien dining
    plate. And only when I want to eat something extremely acidic, I'd chose
    the alien dining plate over earth dining plate.
    Lie Ryan, Oct 26, 2008
    #13
  14. Mr.SpOOn

    Lie Ryan Guest

    On Sat, 25 Oct 2008 21:50:36 -0400, Terry Reedy wrote:

    > Lie Ryan wrote:
    >> On Sat, 25 Oct 2008 18:20:46 -0400, Terry Reedy wrote:

    > Then why do you object to current
    > mylist = linkedlist(data)
    > and request the harder to write and implement
    > mylist = list(data, implementation = 'linkedlist')


    I don't wholly object it. I think it's fine but can be improved.

    I tend to be more inclined on this side though:

    mylist = list.linkedlist(data) # or list<othersymbol>linkedlist(data)
    as syntax sugar for
    mylist = list(data, implementation = 'linkedlist')

    this kinds of syntax magnify the fact that linkedlist is a choice of
    implementation for list abstract type. The current syntax makes no
    indication whatsoever that linkedlist is anything related to list.
    Lie Ryan, Oct 26, 2008
    #14
  15. Mr.SpOOn

    Guest

    Glenn Linderman:

    > how does one create a key that corresponds to ascending integer followed by descending character string?


    (Others may have already answered you because Google groups is very
    slow.)

    >>> seq = [(10, "abb"), (5, "zul"), (5, "hal"), (2, "of")]
    >>> sorted(seq, key=lambda (n,s): (-n, s), reverse=True)

    [(2, 'of'), (5, 'zul'), (5, 'hal'), (10, 'abb')]

    A little harder question is how to create a key that corresponds to
    ascending string followed by descending string?

    To do that you can sort the data two times, relying on the stable
    nature of the Python sort.

    Another (silly?) solution may be to invent a "negate" function for
    strings too:

    >>> strneg = lambda txt: txt.translate(itable)
    >>> itable = "".join(chr(i) for i in xrange(255, -1, -1))
    >>> pairs = [('al', 'boo'), ('zul', 'ao'), ('al', 'den'), ('zul', 'o')]
    >>> sorted(pairs, key=lambda (s1,s2): (s1, strneg(s2)))

    [('al', 'den'), ('al', 'boo'), ('zul', 'o'), ('zul', 'ao')]

    Bye,
    bearophile
    , Oct 27, 2008
    #15
  16. Mr.SpOOn

    Peter Otten Guest

    wrote:

    > Glenn Linderman:
    >
    >> how does one create a key that corresponds to ascending integer followed
    >> by descending character string?

    >
    > (Others may have already answered you because Google groups is very
    > slow.)
    >
    >>>> seq = [(10, "abb"), (5, "zul"), (5, "hal"), (2, "of")]
    >>>> sorted(seq, key=lambda (n,s): (-n, s), reverse=True)

    > [(2, 'of'), (5, 'zul'), (5, 'hal'), (10, 'abb')]
    >
    > A little harder question is how to create a key that corresponds to
    > ascending string followed by descending string?
    >
    > To do that you can sort the data two times, relying on the stable
    > nature of the Python sort.
    >
    > Another (silly?) solution may be to invent a "negate" function for
    > strings too:
    >
    >>>> strneg = lambda txt: txt.translate(itable)
    >>>> itable = "".join(chr(i) for i in xrange(255, -1, -1))
    >>>> pairs = [('al', 'boo'), ('zul', 'ao'), ('al', 'den'), ('zul', 'o')]
    >>>> sorted(pairs, key=lambda (s1,s2): (s1, strneg(s2)))

    > [('al', 'den'), ('al', 'boo'), ('zul', 'o'), ('zul', 'ao')]


    Here's a class that can negate arbitrary values:

    >>> class Neg(object):

    .... def __init__(self, value):
    .... self.value = value
    .... def __cmp__(self, other):
    .... return -cmp(self.value, other.value)
    ....
    >>> pairs = [('al', 'boo'), ('zul', 'ao'), ('al', 'den'), ('zul', 'o')]
    >>> sorted(pairs, key=lambda (s, t): (s, Neg(t)))

    [('al', 'den'), ('al', 'boo'), ('zul', 'o'), ('zul', 'ao')]
    Peter Otten, Oct 27, 2008
    #16
  17. Mr.SpOOn

    Guest

    Lie Ryan:

    >Oh no, the two dict implementation would work _exactly_ the same from the outside, they are transparently interchangeable. Only the performance characteristic differs because of the different implementation.<


    I don't agree with the general idea. If the operations done by your
    data structure have different computational complexity, then they are
    fit for different usages. When you program you must know what
    computational complexity has each of the operations of your data
    structyre, otherwise there's no way to know the complexity of your
    whole program, so instead of programming you are just become a mage
    that tries magical spells and hopes for the better. So I don't accept
    so much different data structures to have the same name. That's why
    I'll never appreciate the Python list type to be named list instead of
    array, despite it supports more or less all the functions you expect
    from an abstract list type.

    Said that, for a high-level language like Python I can see another
    possible solution. To have a type that contains several kinds of data
    structures, for example a "dict" that contains a hash implementation,
    a red-black tree, etc. Such data structure can use a small part of the
    computational power given to it to collect statistics usages of each
    object (or each variable, that may contain in succession several
    ojects of the same type). Such data structure can then at run time
    adapt dynamically, chosing to use the implementation fitter for the
    specific usage of each object or variable (the programmer can give
    hints of course, or sometimes even coerce the usage a specific
    implementation). (such statistics can also be saved on disk to be used
    for the successive run of the program, to help them work well from the
    start too, and not just after a while). If this capability works well
    in practice, then it can solve the problem you were talking about, I
    think.

    I presume data structures in future high-level languages will be quite
    more able to adapt themselves to their usages. Today it's the second
    time I talk about future programming languages :)

    Bye,
    bearophile
    , Oct 27, 2008
    #17
  18. Mr.SpOOn

    Carl Banks Guest

    On Oct 25, 4:58 am, Lie Ryan <> wrote:
    > On Wed, 22 Oct 2008 10:43:35 -0700, bearophileHUGS wrote:
    > > Mr.SpOOn:
    > >> Is there another convenient structure or shall I use lists and define
    > >> the operations I need?

    >
    > > <musings>
    > > As Python becomes accepted for more and more "serious" projects some
    > > more data structures can eventually be added to the collections module:
    > > - SortedSet, SortedDict: can be based on red-black trees. Require items
    > > to be sortable (them being hashable isn't required, but it's probably
    > > more safe that they are immutable). - Odict: a dict ordered according to
    > > insertion order. - Bidict: a unordered dict that allows O(1) retrieval
    > > on both keys and values (that are both unique).
    > > - Graph: a directed unsorted data structure like mine may be acceptable
    > > too.
    > > - Bitset: dynamically re-sizable and efficient in space and time, easy
    > > to implement in C.
    > > - Chain: simulates a double linked list, but its implementation can be
    > > similar to the current Deque but allowing not completely filled blocks
    > > in the middle too. (I haven't named it just List because there's a name
    > > clash with the list()).
    > > - I use all those data structures in Python programs, plus some more,
    > > like interval map, trie (and a dawg), persistent dict and persistent
    > > list, kd-tree, BK-tree, Fibonacci Heap, a rank & select, a disjoint-
    > > set, and maybe more. But those are uncommon enough to be left out of a
    > > standard library.
    > > - A problem with the Chain data structure is how to represent iterators
    > > in Python. I think this is a big problem, that I don't know how to solve
    > > yet. A possible solution is to make them owned by the Chain itself, but
    > > this makes the code slow down linearly in accord to the number of the
    > > iterators. If someone has a solution I'm all ears. </musings>

    >
    > > Bye,
    > > bearophile

    >
    > <anotherrandommusing>
    > Since python is dynamic language, I think it should be possible to do
    > something like this:
    >
    > a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')
    > b = dict({'a': 'A'}, implementation = 'binarytree')
    > c = dict({'a': 'A'}, implementation = 'binarytree')


    -1

    This doesn't buy you anything over simply implementing different types
    altogether.

    a = collections.linkedlist([1,2,3,4,5])
    b = collections.btree({'a': 'A'})

    In fact, it adds a whole lot of complexity to the built-in types.


    Carl Banks
    Carl Banks, Oct 27, 2008
    #18
  19. On approximately 10/27/2008 10:27 AM, came the following characters from
    the keyboard of Peter Otten:
    > wrote
    >> Glenn Linderman
    >>> how does one create a key that corresponds to ascending integer followed
    >>> by descending character string?
    >>>

    >> (Others may have already answered you because Google groups is very
    >> slow.)
    >>
    >>>>> seq = [(10, "abb"), (5, "zul"), (5, "hal"), (2, "of")]
    >>>>> sorted(seq, key=lambda (n,s): (-n, s), reverse=True)
    >>>>>

    >> [(2, 'of'), (5, 'zul'), (5, 'hal'), (10, 'abb')]
    >>
    >> A little harder question is how to create a key that corresponds to
    >> ascending string followed by descending string?
    >>
    >> To do that you can sort the data two times, relying on the stable
    >> nature of the Python sort.
    >>


    Ick. Costs twice as much.

    >> Another (silly?) solution may be to invent a "negate" function for
    >> strings too:
    >>
    >>>>> strneg = lambda txt: txt.translate(itable)
    >>>>> itable = "".join(chr(i) for i in xrange(255, -1, -1))
    >>>>> pairs = [('al', 'boo'), ('zul', 'ao'), ('al', 'den'), ('zul', 'o')]
    >>>>> sorted(pairs, key=lambda (s1,s2): (s1, strneg(s2)))
    >>>>>

    >> [('al', 'den'), ('al', 'boo'), ('zul', 'o'), ('zul', 'ao')]
    >>



    This only works for byte strings, not Unicode strings. Remember, it is
    3.0 that eliminates the cmp option to sorted, and its strings are
    Unicode strings.

    > Here's a class that can negate arbitrary values
    >>>> class Neg(object):
    >>>>

    > ... def __init__(self, value):
    > ... self.value = value
    > ... def __cmp__(self, other):
    > ... return -cmp(self.value, other.value)
    > ...
    >
    >>>> pairs = [('al', 'boo'), ('zul', 'ao'), ('al', 'den'), ('zul', 'o')]
    >>>> sorted(pairs, key=lambda (s, t): (s, Neg(t)))
    >>>>

    > [('al', 'den'), ('al', 'boo'), ('zul', 'o'), ('zul', 'ao')]


    This solution seems to be the solution I was looking for. I think "Neg"
    is a poor name, but all the good names are significantly longer... Given
    its usage, though, in a sorted call, "Rev" would be a better poor name!
    ReversedCompare might be a better long name.

    How can I suggest adding it to the documentation for Python 3.0 ? It is
    much preferable to the "obvious" solution of sorting twice, and doing
    that is a trap which many people might fall into. I posted here
    instead, and thank you for the solution.

    --
    Glenn -- http://nevcal.com/
    ===========================
    A protocol is complete when there is nothing left to remove.
    -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
    Glenn Linderman, Oct 28, 2008
    #19
  20. Mr.SpOOn

    greg Guest

    On approximately 10/27/2008 10:27 AM, came the following characters from
    the keyboard of Peter Otten:

    > Here's a class that can negate arbitrary values
    >
    > ... def __init__(self, value):
    > ... self.value = value
    > ... def __cmp__(self, other):
    > ... return -cmp(self.value, other.value)


    Unfortunately, the __cmp__ method has been eliminated from
    3.0 as well, so this won't work as-is. You would need to
    override __lt__, __eq__, etc.

    --
    Greg
    greg, Oct 28, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Grzegorz Dostatni

    Python sets.

    Grzegorz Dostatni, May 4, 2004, in forum: Python
    Replies:
    4
    Views:
    1,204
    Dave Reed
    May 5, 2004
  2. sapsi

    Sets in Python

    sapsi, Sep 19, 2007, in forum: Python
    Replies:
    33
    Views:
    890
    sapsi
    Sep 22, 2007
  3. Tim Chase

    Re: Ordering python sets

    Tim Chase, Oct 22, 2008, in forum: Python
    Replies:
    8
    Views:
    495
    Peter Otten
    Oct 22, 2008
  4. nbigaouette

    Z-Ordering (Morton ordering) question

    nbigaouette, Nov 5, 2009, in forum: C Programming
    Replies:
    2
    Views:
    2,095
  5. Eduardo Schettino
    Replies:
    3
    Views:
    252
    Lie Ryan
    Apr 25, 2010
Loading...

Share This Page