Ordering python sets

Mr.SpOOn · Oct 22, 2008

Hi,
I need a structure to represent a set of integers. I also need to
perform on this set some basic set operations, such as adding or
removing elements, joining with other sets and checking for the
presence of specific elements.

I think that using Python sets would be the best choice, but I also
need integers to be ordered inside the set and I've just found out
that, instead, Python sets are unordered collections.

Is there another convenient structure or shall I use lists and define
the operations I need?

Thanks.

bearophileHUGS · Oct 22, 2008

Mr.SpOOn:

Is there another convenient structure or shall I use lists and define
the operations I need?

<musings>
As Python becomes accepted for more and more "serious" projects some
more data structures can eventually be added to the collections
module:
- SortedSet, SortedDict: can be based on red-black trees. Require
items to be sortable (them being hashable isn't required, but it's
probably more safe that they are immutable).
- Odict: a dict ordered according to insertion order.
- Bidict: a unordered dict that allows O(1) retrieval on both keys and
values (that are both unique).
- Graph: a directed unsorted data structure like mine may be
acceptable too.
- Bitset: dynamically re-sizable and efficient in space and time, easy
to implement in C.
- Chain: simulates a double linked list, but its implementation can be
similar to the current Deque but allowing not completely filled blocks
in the middle too. (I haven't named it just List because there's a
name clash with the list()).
- I use all those data structures in Python programs, plus some more,
like interval map, trie (and a dawg), persistent dict and persistent
list, kd-tree, BK-tree, Fibonacci Heap, a rank & select, a disjoint-
set, and maybe more. But those are uncommon enough to be left out of a
standard library.
- A problem with the Chain data structure is how to represent
iterators in Python. I think this is a big problem, that I don't know
how to solve yet. A possible solution is to make them owned by the
Chain itself, but this makes the code slow down linearly in accord to
the number of the iterators. If someone has a solution I'm all ears.
</musings>

Bye,
bearophile

Lie Ryan · Oct 25, 2008

Mr.SpOOn:

<musings>
As Python becomes accepted for more and more "serious" projects some
more data structures can eventually be added to the collections module:
- SortedSet, SortedDict: can be based on red-black trees. Require items
to be sortable (them being hashable isn't required, but it's probably
more safe that they are immutable). - Odict: a dict ordered according to
insertion order. - Bidict: a unordered dict that allows O(1) retrieval
on both keys and values (that are both unique).
- Graph: a directed unsorted data structure like mine may be acceptable
too.
- Bitset: dynamically re-sizable and efficient in space and time, easy
to implement in C.
- Chain: simulates a double linked list, but its implementation can be
similar to the current Deque but allowing not completely filled blocks
in the middle too. (I haven't named it just List because there's a name
clash with the list()).
- I use all those data structures in Python programs, plus some more,
like interval map, trie (and a dawg), persistent dict and persistent
list, kd-tree, BK-tree, Fibonacci Heap, a rank & select, a disjoint-
set, and maybe more. But those are uncommon enough to be left out of a
standard library.
- A problem with the Chain data structure is how to represent iterators
in Python. I think this is a big problem, that I don't know how to solve
yet. A possible solution is to make them owned by the Chain itself, but
this makes the code slow down linearly in accord to the number of the
iterators. If someone has a solution I'm all ears. </musings>

Bye,
bearophile

<anotherrandommusing>
Since python is dynamic language, I think it should be possible to do
something like this:

a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')
b = dict({'a': 'A'}, implementation = 'binarytree')
c = dict({'a': 'A'}, implementation = 'binarytree')

i.e. basically since a data structure can have different implementations,
and different implementations have different performance characteristics,
it should be possible to dynamically change the implementation used.

In the far future, the data structure and its implementation could be
abstracted even further:

a = list() # ordered list
b = set() # unordered list
c = dict() # unordered dictionary
d = sorteddict() # ordered dictionary

Each of the implementations would share a common subset of methods and
possibly a few implementation dependent method that could only work on
certain implementations (or is extremely slow except in the correct
implementation).
</anotherrandommusing>

Steven D'Aprano · Oct 25, 2008

<anotherrandommusing>
Since python is dynamic language, I think it should be possible to do
something like this:

a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')
b = dict({'a': 'A'}, implementation = 'binarytree')
c = dict({'a': 'A'}, implementation = 'binarytree')

Oh I hope not. I think you have mistaken "dynamic" for "chaotic".

When I see a dict, I want to know that any two dicts work the same way. I
don't want to have to search the entire project's source code to find out
if it is a dict implemented as a hash table with O(1) lookups, or a dict
implemented as a binary tree with O(log N) lookups, or a dict implemented
as a linear array with O(N) lookups.

If I wanted that sort of nightmare, I can already do it by shadowing the
builtin:

dict = binarytree
D = dict({'a': 'A'}) # make a binary tree

There is no possible good that come from this suggestion. The beauty of
Python is that the built-in data structures (list, dict, set) are
powerful enough for 99% of uses[1], and for the other 1%, you can easily
and explicitly use something else.

But *explicitly* is the point. There's never any time where you can do
this:

type(mydict) is dict

and not know exactly what performance characteristics mydict will have.
(Unless you shadow dict or type, or otherwise do something that breaks
the rules.) You never need to ask, "Okay, it's a dict. What sort of dict?"

If you want a binary tree, ask for a binary tree.

[1] Your mileage may vary.

Glenn Linderman · Oct 25, 2008

If you want a binary tree, ask for a binary tree.

OK, where can I find a binary tree? One that works quickly and
efficiently on at least Linux, Mac, and Windows?

I don't find it in the standard library. I'd actually probably rather
have a red-black tree, can't find that either. I found a pure-python
tree, but it's performance was abysmal. I found some others that only
claimed to work on some platforms.

While on the topic of ordered keys, it seems that sorted's cmp feature
is omitted from 3.0. That might be fine, but how does one create a key
that corresponds to ascending integer followed by descending character
string?

Lie Ryan · Oct 25, 2008

<anotherrandommusing>
Since python is dynamic language, I think it should be possible to do
something like this:

a = list([1, 2, 3, 4, 5], implementation = 'linkedlist') b = dict({'a':
'A'}, implementation = 'binarytree') c = dict({'a': 'A'},
implementation = 'binarytree')

Click to expand...

Oh I hope not. I think you have mistaken "dynamic" for "chaotic".

When I see a dict, I want to know that any two dicts work the same way.

Oh no, the two dict implementation would work _exactly_ the same from the
outside, they are transparently interchangeable. Only the performance
characteristic differs because of the different implementation. Actually
I got this idea from a book about algorithm and data structure, that book
said that an abstract data type (e.g. dict, set, list) have several
competing implementations or data structures (e.g. binary tree dict,
hashed dict, array dict). A data's implementation and interface can be
separated so that we can switch the data structure's implementation
without changing the rest of the code. The book is Algorithm Design
Manual by Skiena.

hint: read the PS below

I don't want to have to search the entire project's source code to find
out if it is a dict implemented as a hash table with O(1) lookups, or a
dict implemented as a binary tree with O(log N) lookups, or a dict
implemented as a linear array with O(N) lookups.

No, you'd only need to look at the dict's creation point (or actually
much better at projects docs, but not everyone create good docs). The
alternative you mentioned below, by shadowing built-in, is both a hack
and is much more likely to be missed.

If I wanted that sort of nightmare, I can already do it by shadowing the
builtin:

dict = binarytree
D = dict({'a': 'A'}) # make a binary tree

I DON'T want THAT sort of nightmare you mentioned...
And it'd be impossible to have two dictionary that have two different
implementations.

There is no possible good that come from this suggestion. The beauty of
Python is that the built-in data structures (list, dict, set) are
powerful enough for 99% of uses[1], and for the other 1%, you can easily
and explicitly use something else.

Oh really? As far as I know, python's list is extremely bad if you're
inserting data at the beginning of the list (e.g. lst.insert(0) requires
the whole array be "copied" one address away). This is because python's
list uses array data structure, making addressing (e.g. a[2]) fast, but
insertion slow. If, on the other hand, it is implemented using binary
tree, it'd make insertion O(log n) but addressing is a bit tricky on it.

The keyword is "tradeoffs".

But *explicitly* is the point. There's never any time where you can do
this:

Yes, true, explicitly IS the point. How more explicit can you be than:
dict({foo: bar}, implementation = 'binarytree')

type(mydict) is dict

If my memory serves right, binary tree dict and hashed table dict is both
a dict right? (Duck Typing)
Only their implementation differs. Implementation is... well,
"implementation detail".

and not know exactly what performance characteristics mydict will have.

Oh... why do I need to know what the performance characteristic of mydict
is? Unless I know what I'm doing.

(Unless you shadow dict or type, or otherwise do something that breaks
the rules.) You never need to ask, "Okay, it's a dict. What sort of
dict?"

Okay, it's a dict. What sort of dict? Who the hell cares? I don't need to
know, they all looks and behave the same (Duck Typing)... at least until
I profile them (since profiling is a deep black magic by itself, it
cannot be used to discredit switching implementations).

Sometimes we need a data type to use a specific data structure that have
some specific performance characteristic, because we know we'll be doing
a specific operation a lot more than other operations.

If you actually need to know what the implementation is currently being
used, you could implement a dict.implementation property.

If you want a binary tree, ask for a binary tree.

Yeah, you ask for binary tree EXPLICITLY:
bintreedict = dict({a:b}, implementation = 'binarytree')

this:
regularhasheddict = dict({a:b})

would have a reasonable default.

PS: I do admit I have used the wrong terms in the last post. I used the
term "data structure", instead it should be "abstract data type", "data
structure" is a synonym for "implementation". In this post, I hope I've
corrected all of the usage.

Terry Reedy · Oct 25, 2008

Lie said:
<anotherrandommusing>
Since python is dynamic language, I think it should be possible to do
something like this:

a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')

For this to work, the abstract list would have to know about all
implementations of the abstraction.

b = dict({'a': 'A'}, implementation = 'binarytree')
c = dict({'a': 'A'}, implementation = 'binarytree')

i.e. basically since a data structure can have different implementations,
and different implementations have different performance characteristics,
it should be possible to dynamically change the implementation used.

In the far future, the data structure and its implementation could be
abstracted even further:

a = list() # ordered list
b = set() # unordered list
c = dict() # unordered dictionary
d = sorteddict() # ordered dictionary

Each of the implementations would share a common subset of methods and
possibly a few implementation dependent method that could only work on
certain implementations (or is extremely slow except in the correct
implementation).
</anotherrandommusing>

The future is 3.0, at least in part, with Abstract Base Classes.
There are 16 in the collectons module.
"In addition to containers, the collections module provides some ABCs
(abstract base classes) that can be used to test whether a class
provides a particular interface, for example, is it hashable or a
mapping, and some of them can also be used as mixin classes."

The ABCs for numbers are in the numbers module.

tjr

Lie Ryan · Oct 25, 2008

Lie Ryan wrote:

<anotherrandommusing>
Since python is dynamic language, I think it should be possible to do
something like this:

a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')

Click to expand...

For this to work, the abstract list would have to know about all
implementations of the abstraction.

/the exact syntax isn't really important/
/abstract type and implementation separation is the important point/

Actually, if I want to force it, that syntax could work using the same
magic used by event-based syst

Lie Ryan · Oct 25, 2008

Lie Ryan wrote:

<anotherrandommusing>
Since python is dynamic language, I think it should be possible to do
something like this:

a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')

Click to expand...

For this to work, the abstract list would have to know about all
implementations of the abstraction.

# Sorry the last message is truncated because of an "accident"

/the exact syntax isn't really important/
/abstract type and implementation separation is the important point/

Actually, if I want to force it, that syntax could work using the same
magic used by event-based systems: registration. Although I agree it
might be a bit cumbersome to do registration for something like this, but
as I've said before, exact syntax is not really important.

Steven D'Aprano · Oct 26, 2008

<anotherrandommusing>
Since python is dynamic language, I think it should be possible to do
something like this:

a = list([1, 2, 3, 4, 5], implementation = 'linkedlist') b =
dict({'a': 'A'}, implementation = 'binarytree') c = dict({'a': 'A'},
implementation = 'binarytree')

Click to expand...

Oh I hope not. I think you have mistaken "dynamic" for "chaotic".

When I see a dict, I want to know that any two dicts work the same way.

Click to expand...

Oh no, the two dict implementation would work _exactly_ the same from
the outside, they are transparently interchangeable. Only the
performance characteristic differs because of the different
implementation.

Exactly. That was my point.

[...]

No, you'd only need to look at the dict's creation point (or actually
much better at projects docs, but not everyone create good docs).

And how do you find an arbitrary object's creation point without
searching the project's source code?

I DON'T want THAT sort of nightmare you mentioned... And it'd be
impossible to have two dictionary that have two different
implementations.

Nonsense.

dict = binarytree
D1 = dict({'a': 'A'}) # make a binary tree "dict"
dict = __builtin__.dict
D2 = dict({'a': 'A'}) # make a standard dict
dict = someothertype
D3 = dict({'a': 'A'})

I'm not suggesting this is a good idea. This is a terrible idea. But it
is not much worse than your idea:

D1 = dict({'a': 'A'}, implementation='binarytree')
D2 = dict({'a': 'A'}, implementation='dict')
D3 = dict({'a': 'A'}, implementation='someothertype')

There is no possible good that come from this suggestion. The beauty of
Python is that the built-in data structures (list, dict, set) are
powerful enough for 99% of uses[1], and for the other 1%, you can
easily and explicitly use something else.

Click to expand...

Oh really? As far as I know, python's list is extremely bad if you're
inserting data at the beginning of the list

And how often do you do that?

And when you do, use a deque. Just call it a deque.

[...]

Yes, true, explicitly IS the point. How more explicit can you be than:
dict({foo: bar}, implementation = 'binarytree')

You miss the point. With your plan, you can do this:

D1 = dict({foo: bar}, implementation = 'binarytree')
D2 = dict({foo: bar}, implementation = 'dict')
type(D1) is type(D2)

and yet D1 and D2 have UTTERLY different performance characteristics. So
now you need to add ANOTHER test to distinguish dicts-which-are-dicts
from dicts-which-are-binary-trees:

D1.implementation != D2.implementation

And why? So you can avoid calling a goose a goose, and call it a duck
instead.

If my memory serves right, binary tree dict and hashed table dict is
both a dict right? (Duck Typing)
Only their implementation differs. Implementation is... well,
"implementation detail".

Duck typing refers to *interface*, not implementation. I have no problem
with you using a type with the same interface as a dict. That's what duck
typing is all about. Just don't call it a dict!

Oh... why do I need to know what the performance characteristic of
mydict is? Unless I know what I'm doing.

http://www.joelonsoftware.com/articles/fog0000000319.html

Because when you do this:

mydict[key] = 1

it's important whether each dict lookup is O(1), O(log N) or O(N). For a
dict with one million items, that means that an implementation based on a
binary tree does O(20) times more processing than a dict, and an
implementation based on linear searching does O(1000000) times more
processing.

If you think implementation details don't matter, try this:

s1 = 'c'*(10**6)

versus

s2 = ''
for i in xrange(10**6):
s2 = 'c' + s2 # defeat optimizer

Okay, it's a dict. What sort of dict? Who the hell cares?

If you don't care, then why are you specifying the implementation type?

mydict = dict({'foo': 'bar'}, implementation="surprise me!")

You can't have it both ways. If you care, then you know enough to want a
hash table based dict (the standard) or a binary tree or something else.
So go ahead and use whatever data structure you want. Just don't call it
a dict.

But if you don't care, then just use the Python standard data types.

I don't need
to know, they all looks and behave the same (Duck Typing)...

Until you try a simple operation like len(mydict) and it takes three
minutes because the implementation you choose is really inefficient.

Sometimes we need a data type to use a specific data structure that have
some specific performance characteristic, because we know we'll be doing
a specific operation a lot more than other operations.

I'm not arguing that you should never use any other data structure. Go
right ahead and use them all you like.

Yeah, you ask for binary tree EXPLICITLY: bintreedict = dict({a:b},
implementation = 'binarytree')

Or you could do it the right way:

D1 = binarytree(data)
D2 = dict(data)
type(D1), type(D2)
=> returns binarytree, dict

versus:

D1 = dict(data, implementation='binarytree')
D2 = dict(data)
type(D1), type(D2)
=> returns dict, dict

Terry Reedy · Oct 26, 2008

Lie said:
a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')

Click to expand...

For this to work, the abstract list would have to know about all
implementations of the abstraction.

Click to expand...

/the exact syntax isn't really important/
/abstract type and implementation separation is the important point/

Actually, if I want to force it, that syntax could work using the same
magic used by event-based systems: registration.

ABCs have registration method. The builtin ABCs have appropriate
builtin classes preregistered.True

I believe user classes that inherit from an ABC are also registered, and
other can be registered explicitly.

Although I agree it
might be a bit cumbersome to do registration for something like this, but
as I've said before, exact syntax is not really important.

Then why do you object to current
mylist = linkedlist(data)
and request the harder to write and implement
mylist = list(data, implementation = 'linkedlist')
?

tjr

Marc 'BlackJack' Rintsch · Oct 26, 2008

Oh no, the two dict implementation would work _exactly_ the same from
the outside, they are transparently interchangeable. Only the
performance characteristic differs because of the different
implementation.

They are not 100% interchangeably. Dictionaries as implemented need keys
to be hashable objects and sorted dictionaries need the keys to have an
order defined. So switching mapping types may fail unless the key
objects already have the needed magic methods implemented to do the right
thing.

Ciao,
Marc 'BlackJack' Rintsch

Lie Ryan · Oct 26, 2008

]

And how do you find an arbitrary object's creation point without
searching the project's source code?

How is it better using the current way?
Asking the .implementation field isn't much harder than asking the type
(), and is much more obvious that we're asking the "implementation" being
used.

Nonsense.

<snip>

There is two good arguments for using a plain string to specify the
implementation used: 1) Plain-text Configuration, 2) argument passing.

1) The current way requires you have a map of the configuration text to a
function (i.e. imps = {'linkedlist': linkedlist}), than assign the
function to _a global name_ (i.e. lista = imps[configuration[confkey]] --
namespace pollution), then instantiate an object using that (ls = lista
([...]))). The implementation way, only requires ls = list([...],
implementation = configuration[confkey])

2) Easily said, plain text passing is safer and easier than passing
function object.

There is no possible good that come from this suggestion. The beauty
of Python is that the built-in data structures (list, dict, set) are
powerful enough for 99% of uses[1], and for the other 1%, you can
easily and explicitly use something else.

Click to expand...

Oh really? As far as I know, python's list is extremely bad if you're
inserting data at the beginning of the list

Click to expand...

And how often do you do that?

It's not even a deque yet, a regular queue is slow using list in python,
and any insertion sort requires you to insert elements into arbitrary
position. This makes using array for it a O(n**2). On the other hand,
using binary tree is a O(log n).

[...]

Yes, true, explicitly IS the point. How more explicit can you be than:
dict({foo: bar}, implementation = 'binarytree')

Click to expand...

You miss the point. With your plan, you can do this:

D1 = dict({foo: bar}, implementation = 'binarytree') D2 = dict({foo:
bar}, implementation = 'dict') type(D1) is type(D2)

and yet D1 and D2 have UTTERLY different performance characteristics.

It does not matter that it have different performance characteristic.

So
now you need to add ANOTHER test to distinguish dicts-which-are-dicts
from dicts-which-are-binary-trees:

How often would you need to care about the specific implementation used?
Compare with how often you want to know how to work with the object. Most
of the time, you only need to know how to work with it (i.e. knowing its
interface), only _sometimes_ when it DOES matter speed-wise, you'd need
to know and affect its implementation.

D1.implementation != D2.implementation

And why? So you can avoid calling a goose a goose, and call it a duck
instead.

I think we have an utterly different view here.
I wished that list() becomes an abstract type, instead of a data
structure. (binarytreelist and arraylist is some property of a list)

While you insisted that list() is a data structure, and abstract type is
some obscureness somewhere unknown.

Duck typing refers to *interface*, not implementation.

Well said, you've just supported my view.

I have no problem
with you using a type with the same interface as a dict. That's what
duck typing is all about. Just don't call it a dict!

why do you think binarytreedict is "less dict" than hasheddict?

http://www.joelonsoftware.com/articles/fog0000000319.html

Sometimes it is good to have control of everything. Sometimes (most of
the time) I know I can dump some of the details to think on more
important matters. To know which parts need more thinking by doing
profiling.

Because when you do this:

mydict[key] = 1

it's important whether each dict lookup is O(1), O(log N) or O(N). For a
dict with one million items, that means that an implementation based on
a binary tree does O(20) times more processing than a dict, and an
implementation based on linear searching does O(1000000) times more
processing.

Well, I don't need to know how the dict works if I'm only using it to
store twenty items right? (I do know how python's dict/set/list/etc is
implemented, but unless that particular dict is central to my program,
I'm not going to create my programs around how dict is implemented, I'm
going to create it on how my mind works).

I don't care how print statement works, unless I'm trying to do something
fancy with print. I don't care that print takes a series of electrical
signal, then convert it into a series of bytes and pass it into the
terminal stdout then the terminal will figure out how to draw the string
using the appropriate font at the appropriate place and pass itself to a
window manager which would compose the desktop into an image then pass
itself to the graphic driver which would convert the image to electrical
signal to be sent to the graphic card which would do some fancy tricks
before sending it to the monitor, when what I need is only to show the
user a simple welcome message.

The reason we use high-level language is that we know we don't need to
care about some details. Only when the detail matters then we take care
of it. If I do care about all the details, I'd use C, no, no, lower, I'd
use assembly, no, lower..., I'd design my own CPU, no even lower, I'd
design my own non-von-neumann architecture. I don't think I can change
the rules of physics though, but if I can... <some more pseudo random
bytes>

Until you try a simple operation like len(mydict) and it takes three
minutes because the implementation you choose is really inefficient.

.... then I know I have chosen the wrong data structure, and I know I need
to change it.

Or you could do it the right way:

<snip>

For example, if I have a certain unknown object. What is easier?
=> returns (*&%$^%*

or
=> returns diningplate

only after three thousand years of continuous and laborious researching,
at last I found out that (*&%$^%* is an alien dining plate, and it does
have all the features and functionality as earth dining plate, although
due to it being made of CClKO32C32CH42U it is much more resistant to
acidity but would melt when you put carrot on it.

Only when I want to eat carrot, would I care that it is an alien dining
plate. And only when I want to eat something extremely acidic, I'd chose
the alien dining plate over earth dining plate.

Lie Ryan · Oct 26, 2008

Then why do you object to current
mylist = linkedlist(data)
and request the harder to write and implement
mylist = list(data, implementation = 'linkedlist')

I don't wholly object it. I think it's fine but can be improved.

I tend to be more inclined on this side though:

mylist = list.linkedlist(data) # or list<othersymbol>linkedlist(data)
as syntax sugar for
mylist = list(data, implementation = 'linkedlist')

this kinds of syntax magnify the fact that linkedlist is a choice of
implementation for list abstract type. The current syntax makes no
indication whatsoever that linkedlist is anything related to list.

bearophileHUGS · Oct 27, 2008

Glenn Linderman:

how does one create a key that corresponds to ascending integer followed by descending character string?

(Others may have already answered you because Google groups is very
slow.)

seq = [(10, "abb"), (5, "zul"), (5, "hal"), (2, "of")]
sorted(seq, key=lambda (n,s): (-n, s), reverse=True)

Click to expand...

Click to expand...

[(2, 'of'), (5, 'zul'), (5, 'hal'), (10, 'abb')]

A little harder question is how to create a key that corresponds to
ascending string followed by descending string?

To do that you can sort the data two times, relying on the stable
nature of the Python sort.

Another (silly?) solution may be to invent a "negate" function for
strings too:

strneg = lambda txt: txt.translate(itable)
itable = "".join(chr(i) for i in xrange(255, -1, -1))
pairs = [('al', 'boo'), ('zul', 'ao'), ('al', 'den'), ('zul', 'o')]
sorted(pairs, key=lambda (s1,s2): (s1, strneg(s2)))

Click to expand...

Click to expand...

[('al', 'den'), ('al', 'boo'), ('zul', 'o'), ('zul', 'ao')]

Bye,
bearophile

Peter Otten · Oct 27, 2008

Glenn Linderman:

how does one create a key that corresponds to ascending integer followed
by descending character string?

Click to expand...

(Others may have already answered you because Google groups is very
slow.)

seq = [(10, "abb"), (5, "zul"), (5, "hal"), (2, "of")]
sorted(seq, key=lambda (n,s): (-n, s), reverse=True)

Click to expand...

Click to expand...

[(2, 'of'), (5, 'zul'), (5, 'hal'), (10, 'abb')]

A little harder question is how to create a key that corresponds to
ascending string followed by descending string?

To do that you can sort the data two times, relying on the stable
nature of the Python sort.

Another (silly?) solution may be to invent a "negate" function for
strings too:

strneg = lambda txt: txt.translate(itable)
itable = "".join(chr(i) for i in xrange(255, -1, -1))
pairs = [('al', 'boo'), ('zul', 'ao'), ('al', 'den'), ('zul', 'o')]
sorted(pairs, key=lambda (s1,s2): (s1, strneg(s2)))

Click to expand...

Click to expand...

[('al', 'den'), ('al', 'boo'), ('zul', 'o'), ('zul', 'ao')]

Here's a class that can negate arbitrary values:
.... def __init__(self, value):
.... self.value = value
.... def __cmp__(self, other):
.... return -cmp(self.value, other.value)
....

pairs = [('al', 'boo'), ('zul', 'ao'), ('al', 'den'), ('zul', 'o')]
sorted(pairs, key=lambda (s, t): (s, Neg(t)))

Click to expand...

Click to expand...

[('al', 'den'), ('al', 'boo'), ('zul', 'o'), ('zul', 'ao')]

bearophileHUGS · Oct 27, 2008

Lie Ryan:

Oh no, the two dict implementation would work _exactly_ the same from the outside, they are transparently interchangeable. Only the performance characteristic differs because of the different implementation.<

I don't agree with the general idea. If the operations done by your
data structure have different computational complexity, then they are
fit for different usages. When you program you must know what
computational complexity has each of the operations of your data
structyre, otherwise there's no way to know the complexity of your
whole program, so instead of programming you are just become a mage
that tries magical spells and hopes for the better. So I don't accept
so much different data structures to have the same name. That's why
I'll never appreciate the Python list type to be named list instead of
array, despite it supports more or less all the functions you expect
from an abstract list type.

Said that, for a high-level language like Python I can see another
possible solution. To have a type that contains several kinds of data
structures, for example a "dict" that contains a hash implementation,
a red-black tree, etc. Such data structure can use a small part of the
computational power given to it to collect statistics usages of each
object (or each variable, that may contain in succession several
ojects of the same type). Such data structure can then at run time
adapt dynamically, chosing to use the implementation fitter for the
specific usage of each object or variable (the programmer can give
hints of course, or sometimes even coerce the usage a specific
implementation). (such statistics can also be saved on disk to be used
for the successive run of the program, to help them work well from the
start too, and not just after a while). If this capability works well
in practice, then it can solve the problem you were talking about, I
think.

I presume data structures in future high-level languages will be quite
more able to adapt themselves to their usages. Today it's the second
time I talk about future programming languages

Bye,
bearophile

Carl Banks · Oct 27, 2008

Mr.SpOOn:

Click to expand...

<musings>
As Python becomes accepted for more and more "serious" projects some
more data structures can eventually be added to the collections module:
- SortedSet, SortedDict: can be based on red-black trees. Require items
to be sortable (them being hashable isn't required, but it's probably
more safe that they are immutable). - Odict: a dict ordered according to
insertion order. - Bidict: a unordered dict that allows O(1) retrieval
on both keys and values (that are both unique).
- Graph: a directed unsorted data structure like mine may be acceptable
too.
- Bitset: dynamically re-sizable and efficient in space and time, easy
to implement in C.
- Chain: simulates a double linked list, but its implementation can be
similar to the current Deque but allowing not completely filled blocks
in the middle too. (I haven't named it just List because there's a name
clash with the list()).
- I use all those data structures in Python programs, plus some more,
like interval map, trie (and a dawg), persistent dict and persistent
list, kd-tree, BK-tree, Fibonacci Heap, a rank & select, a disjoint-
set, and maybe more. But those are uncommon enough to be left out of a
standard library.
- A problem with the Chain data structure is how to represent iterators
in Python. I think this is a big problem, that I don't know how to solve
yet. A possible solution is to make them owned by the Chain itself, but
this makes the code slow down linearly in accord to the number of the
iterators. If someone has a solution I'm all ears. </musings>

Click to expand...

Bye,
bearophile

Click to expand...

<anotherrandommusing>
Since python is dynamic language, I think it should be possible to do
something like this:

a = list([1, 2, 3, 4, 5], implementation = 'linkedlist')
b = dict({'a': 'A'}, implementation = 'binarytree')
c = dict({'a': 'A'}, implementation = 'binarytree')

-1

This doesn't buy you anything over simply implementing different types
altogether.

a = collections.linkedlist([1,2,3,4,5])
b = collections.btree({'a': 'A'})

In fact, it adds a whole lot of complexity to the built-in types.

Carl Banks

Glenn Linderman · Oct 28, 2008

(e-mail address removed) wrote

Glenn Linderman

how does one create a key that corresponds to ascending integer followed
by descending character string?

Click to expand...

(Others may have already answered you because Google groups is very
slow.)

seq = [(10, "abb"), (5, "zul"), (5, "hal"), (2, "of")]
sorted(seq, key=lambda (n,s): (-n, s), reverse=True)

Click to expand...

[(2, 'of'), (5, 'zul'), (5, 'hal'), (10, 'abb')]

A little harder question is how to create a key that corresponds to
ascending string followed by descending string?

To do that you can sort the data two times, relying on the stable
nature of the Python sort.

Click to expand...

Ick. Costs twice as much.

Another (silly?) solution may be to invent a "negate" function for
strings too:

strneg = lambda txt: txt.translate(itable)
itable = "".join(chr(i) for i in xrange(255, -1, -1))
pairs = [('al', 'boo'), ('zul', 'ao'), ('al', 'den'), ('zul', 'o')]
sorted(pairs, key=lambda (s1,s2): (s1, strneg(s2)))

Click to expand...

[('al', 'den'), ('al', 'boo'), ('zul', 'o'), ('zul', 'ao')]

Click to expand...

This only works for byte strings, not Unicode strings. Remember, it is
3.0 that eliminates the cmp option to sorted, and its strings are
Unicode strings.

Here's a class that can negate arbitrary values... def __init__(self, value):
... self.value = value
... def __cmp__(self, other):
... return -cmp(self.value, other.value)
...

pairs = [('al', 'boo'), ('zul', 'ao'), ('al', 'den'), ('zul', 'o')]
sorted(pairs, key=lambda (s, t): (s, Neg(t)))

Click to expand...

Click to expand...

[('al', 'den'), ('al', 'boo'), ('zul', 'o'), ('zul', 'ao')]

This solution seems to be the solution I was looking for. I think "Neg"
is a poor name, but all the good names are significantly longer... Given
its usage, though, in a sorted call, "Rev" would be a better poor name!
ReversedCompare might be a better long name.

How can I suggest adding it to the documentation for Python 3.0 ? It is
much preferable to the "obvious" solution of sorting twice, and doing
that is a trap which many people might fall into. I posted here
instead, and thank you for the solution.

greg · Oct 28, 2008

Here's a class that can negate arbitrary values

... def __init__(self, value):
... self.value = value
... def __cmp__(self, other):
... return -cmp(self.value, other.value)

Unfortunately, the __cmp__ method has been eliminated from
3.0 as well, so this won't work as-is. You would need to
override __lt__, __eq__, etc.

Python sets which support multiple same elements	2	May 20, 2011
Algorithm for generating pitch-class sets in prime form	1	Jan 28, 2011
sparse sets of integers?	6	Apr 4, 2005
overlapping sets	6	Mar 24, 2006
Z-Ordering (Morton ordering) question	2	Nov 5, 2009
Feedback on Sets, and Partitions	7	Apr 30, 2004
Python sets.	4	May 4, 2004
Problem involving sets...	4	Apr 14, 2006

Ordering python sets

Mr.SpOOn

bearophileHUGS

Lie Ryan

Steven D'Aprano

Glenn Linderman

Lie Ryan

Terry Reedy

Lie Ryan

Lie Ryan

Steven D'Aprano

Terry Reedy

Marc 'BlackJack' Rintsch

Lie Ryan

Lie Ryan

bearophileHUGS

Peter Otten

bearophileHUGS

Carl Banks

Glenn Linderman

greg

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads