Q: sort's key and cmp parameters

R

Raymond Hettinger

[Paul Rubin]
The idea was that you have a list of trees that you want to sort, and
an ordering relation between trees:

   def gt(tree1, tree2): ...

Are the trees user defined classes? Can the gt() function be added
incorporated as __lt__ method so that you can just run a plain sort:

sort(list_of_trees)

Is the recursive search order something you can turn into a straight
sequence:

sort(list_of_trees, key=flattener)

IOW, if there is an ordering relation between the trees, why can't
it be either part of the tree API or collapsable into a list of
successive nodes to be compared.

From the sound of it, the trees are static during the sort and
would get a nice O(n log n) --> O(n) speed-up if a key function
were allowed to flatten them in a single pass.


Raymond
 
P

Paul Rubin

Raymond Hettinger said:
Can you give an example of a list of trees and a cmp function
that recursively compares them?

Example of list of trees (nested dicts). In practice you could get
such a list from the simplejson module:

list_of_trees = [{'value':1, 'left':{'value':3,'left':None,'right':None},
'right':{'value':7,'left':{'value':5, ...}}},
{'value':19, 'left':{'value':23', ...}},
...
]

Example comparison function:

def compare(tree1, tree2):
c = cmp(tree1['value'], tree2['value'])
if c != 0: return c
c = cmp(tree1['left'], tree2['left'])
if c != 0: return c
return cmp(tree1['right'], tree2['right])
Are the trees user defined classes?

Not in general. They might be nested tuples, lists, or dictionaries.
Or they could come from a non-extensible library-defined class, like
from cElementTree.
From the sound of it, the trees are static during the sort and
would get a nice O(n log n) --> O(n) speed-up if a key function
were allowed to flatten them in a single pass.

But the key function has to do all those comparisons on the internal
nodes.
 
P

Paul Rubin

Paul Rubin said:
Example comparison function:

def compare(tree1, tree2):
c = cmp(tree1['value'], tree2['value'])
if c != 0: return c
c = cmp(tree1['left'], tree2['left'])
if c != 0: return c
return cmp(tree1['right'], tree2['right])

Sorry, meant recursive comparison.

def compare(tree1, tree2):
c = cmp(tree1['value'], tree2['value'])
if c != 0: return c
c = compare(tree1['left'], tree2['left'])
if c != 0: return c
return compare(tree1['right'], tree2['right])
 
P

Paul Rubin

Paul Rubin said:
c = compare(tree1['left'], tree2['left'])

Of course this recursive call crashes if either branch is None.
Oh well, I'll stop trying to correct it since I'm sure you get the idea.
 
K

kj

In said:
Python 2.x provides two ways
and you can use whichever one fits the application better. I have
never understood why Python 3.x finds it necessary to break one of
them. Maybe I can migrate to Haskell by the time Python 2.x becomes
deprecated.

!!!

Maybe Haskell is much handier than I give it credit for, but it's
hard for me to imagine that it is as convenient as Python 3, even
without the cmp sort option... (And I agree with you that getting
rid of sort's cmp in Python 3 was a bad idea.)

What's going on here? Our lab recently hired a new postdoc who,
to our disbelief, works almost exclusively in OCaml. And I hear
all this talk about switching to Haskell or Scheme. I don't get
it. Despite the elegance of these languages, the libraries are
not there. It seems to me it would take forever to get the simplest
things done in these languages...

Confused.

kj
 
P

Paul Rubin

kj said:
!!!

Maybe Haskell is much handier than I give it credit for, but it's
hard for me to imagine that it is as convenient as Python 3, even
without the cmp sort option...

Heh, yeah, I was being a bit snarky/grouchy. Haskell has a very steep
learning curve and will never be as convenient as Python for banging
out some small script. It's worth considering for larger or more
serious programs.
What's going on here? Our lab recently hired a new postdoc who,
to our disbelief, works almost exclusively in OCaml. And I hear
all this talk about switching to Haskell or Scheme. I don't get
it. Despite the elegance of these languages, the libraries are
not there. It seems to me it would take forever to get the simplest
things done in these languages...

Haskell's library is growing very rapidly, more so than Python's I'd
say. Take a look at

http://donsbot.wordpress.com/2009/03/16/visualising-the-haskell-universe/

if you're willing to count Hackage (sort of the equivalent of the
Python cheese shop). The Haskell Platform (counterpart to Python
stdlib) is also very actively expanding.

Ocaml and Scheme both seem to me to be sort of stagnant. Scheme is an
elegant fossil. Some people find Ocaml to be at a sweet spot,
combining Python's convenience and the more important aspects of
Haskell's expressiveness. I haven't used it myself. It seems to me
that Haskell is attracting all the most advanced development
attention.
 
R

Raymond Hettinger

[Paul Rubin]
Example of list of trees (nested dicts). In practice you could get
such a list from the simplejson module:

list_of_trees = [{'value':1, 'left':{'value':3,'left':None,'right':None},
'right':{'value':7,'left':{'value':5, ...}}},
{'value':19, 'left':{'value':23', ...}},
...
]

So, it looks like the relevant comparison values can be stored in
nested lists:

list_of_lists = \
[[1, [3, [], []],
[7, [5, [], []], []]],
[19, [23, [], []],
[]],
]

Here I've used a transformation where a tree is stored as:

[tree['value'], tree['left'], tree['right']]

and the None value indicating an empty tree transformed to:

[]
Example comparison function:

   def compare(tree1, tree2):
      c = cmp(tree1['value'], tree2['value'])
      if c != 0: return c
      c = compare(tree1['left'], tree2['left'])
      if c != 0: return c
      return compare(tree1['right'], tree2['right])

Hmm, that recursive comparison order looks like how Python
already recursively compares lists. So, perhaps the
lists can be compared directly:

if list_of_lists[0] < list_of_lists[1]:
...

Not in general. They might be nested tuples, lists, or dictionaries.
Or they could come from a non-extensible library-defined class, like
from cElementTree

Are there any of these trees that cannot be transformed
(by a key function) into an equivalent nested list of lists
and values so that the trees can be directly compared to one another
using Python's normal built-in recursive comparison order for lists?


Raymond
 
P

Paul Rubin

Raymond Hettinger said:
So, it looks like the relevant comparison values can be stored in
nested lists:

list_of_lists = \
[[1, [3, [], []],
[7, [5, [], []], []]],
[19, [23, [], []],
[]],
]

Now you've copied all the data out of the original tree, which is even
worse than putting a class wrapper around it. The comparison remains
(asymptotically) as expensive as before, too.
Are there any of these trees that cannot be transformed
(by a key function) into an equivalent nested list of lists
and values so that the trees can be directly compared to one another
using Python's normal built-in recursive comparison order for lists?

I think so. Say you want to change the numeric comparisons so that
even numbers always sort before odd numbers, ie.
-4 < -2 < 0 < 2 < 4 < ... < -999 < ... -1 < 1 < 3 ...
I don't think there is a way to re-encode the numbers so they
can be compared with direct integer comparison. Even if there is,
I think there are reasonable orderings on the trees themselves that
can't possibly be embedded in the standard python comparison order.
I might be able to construct an example using ordinal diagrams from
proof theory.
 
R

Raymond Hettinger

So, it looks like the relevant comparison values can be stored in
nested lists:
list_of_lists = \
[[1, [3, [], []],
[7, [5, [], []], []]],
[19, [23, [], []],
[]],
]

Now you've copied all the data out of the original tree, which is even
worse than putting a class wrapper around it. The comparison remains
(asymptotically) as expensive as before, too.

FWIW, you could also just flatten it to: [(1,3,7,5), (19,23)].

The point is that the tree comparisons you presented have a
predetermined
order of values to compare, so they either be recast a flattened list
of comparison values or the tree itself can be cast in a list of lists
form.
Either way, the O(n log n) step of doing the actual comparisons runs
much
faster than if you coded a recursive cmp function written in pure
python.

Say you want to change the numeric comparisons so that
even numbers always sort before odd numbers, ie.
-4 < -2 < 0 < 2 < 4 < ... < -999 < ... -1 < 1 < 3 ...

This is too easy:
>>> s = [-2, 4, 2, -1, -3, 1, -4, 0, 3]
>>> s.sort()
>>> s.sort(key=lambda x: x%2)
>>> s
[-4, -2, 0, 2, 4, -3, -1, 1, 3]

The use cases you've presented so far aren't convincing,
but I'm sure that if you keep making-up random, weird use cases,
there will be one where a cmp function is a better approach.

I think there are reasonable orderings on the trees themselves that
can't possibly be embedded in the standard python comparison order.
I might be able to construct an example using ordinal diagrams from
proof theory.

As long as the values of a tree get compared in a predetermined order,
there will always be a flattened list equivalent that works faster
using a key function. If you're going to concoct something isn't
easily transformable to a key function, I think your best bet is to
create a comparison where the trees have an interaction other than
comparing values at identical node positions; otherwise, the tree
shape hasn't made the problem any more difficult than sorting a list
of successive values. The concocted comparison function needs to
exploit some aspect of the tree shape than can't be precomputed
in a key function (perhaps involving some complex interaction between
the two compared items like a tree merge or somesuch).

Instead of trying to create a hard comparison from first principles,
there may already be some research on the subject in the SQL world.
It seems to me that SQL's ORDER BY clauses are essentially just
key functions, so maybe there are known cases where a query cannot
be sorted with just the tools provided by SQL.


Raymond

P.S. I accept than you hate sorting in Py3.x. There's no need to
convince me of that. I was just curious about your one real-world
use case and whether I could find a straight-forward key function
that would get the job done.
 
P

Paul Rubin

Raymond Hettinger said:
FWIW, you could also just flatten it to: [(1,3,7,5), (19,23)].

The point is that the tree comparisons you presented have a
predetermined order of values to compare, so they either be recast a
flattened list of comparison values or the tree itself can be cast
in a list of lists form. Either way, the O(n log n) step of doing
the actual comparisons runs much faster than if you coded a
recursive cmp function written in pure python.

Possibly so, but asymptotically it's the same, and anyway
Say you want to change the numeric comparisons so that
even numbers always sort before odd numbers, ie.
-4 < -2 < 0 < 2 < 4 < ... < -999 < ... -1 < 1 < 3 ...

This is too easy:
s = [-2, 4, 2, -1, -3, 1, -4, 0, 3]
s.sort()
s.sort(key=lambda x: x%2)
s
[-4, -2, 0, 2, 4, -3, -1, 1, 3]

I don't think so:
>>> s = [0, 2, -2, 3, 1, -1, 4, -4, -3]
>>> s.sort(key=lambda x:x%2)
>>> s
[0, 2, -2, 4, -4, 3, 1, -1, -3]

s.sort(key=lambda x:(x%2,x)) might work but why are you so opposed
to being able to write one's natural thought pattern if that pattern
happens to be a comparison function? Who wants to concoct these
ad-hoc encodings for every ordering?
As long as the values of a tree get compared in a predetermined order,
there will always be a flattened list equivalent that works faster
using a key function. If you're going to concoct something isn't
easily transformable to a key function, I think your best bet is to
create a comparison where the trees have an interaction other than
comparing values at identical node positions;

How about this: gen_e, gen_pi, gen_sqrt2, etc. all generate
representations of various real numbers as possibly infinite sequences
of digits:

def gen_pi():
yield 3
yield 1
yield 4
# ... compute infinite stream one digit at a time

Comparison is obvious if (hmmm....) you can guarantee that you're
sorting unique elements and that the sort function won't compare an
element with itself (otherwise I guess you could cut off comparison at
a million digits or whatever):

def cmp(gen1, gen2):
for d,e in izip(gen1(), gen2()):
c = cmp(d,e)
if c: return c

I don't see any clean way to handle that with key=, though of course
maybe I'm missing something.
P.S. I accept than you hate sorting in Py3.x. There's no need to
convince me of that. I was just curious about your one real-world
use case and whether I could find a straight-forward key function
that would get the job done.

I don't hate sorting in py3.x. I understand the concept that py3.x
introduces incompatibilities to make things better. What I don't like
is the approach of breaking stuff gratutiously with no benefit. If
the positional arg for cmp is a kludge, why not switch it to a keyword
arg instead of eliminating it?

I don't subscribe to the cult of the real-world use case. I'm not
smart enough to anticipate every way anyone might ever want to use
code that I write. So I try to identify the obvious "basis vectors"
of sensible functionality, implement those, and make sure that the
entire space of combinations works the way it should, without worrying
about whether anyone will care about any specific combination. There
is no real-world use case for the constant assignment

a = 0x36b20b32c927eaf68bb40d87d251049db98da0c03a0e8c791f04ee486ec2f7ee

which I'm sure nobody has ever seen before (I just made it from
os.urandom). But if Python somehow failed to parse that number (and
only that number), that would be considered an unacceptable bug,
because a straightforward combination of primitives had failed to work
as it should.

Any book on sorting spends most of its time on comparison sorting,
which is the primitive operation. key= is very useful but it is
a derived operation, not the other way around.
 
R

Raymond Hettinger

 Say you want to change the numeric comparisons so that
This is too easy:
    >>> s = [-2, 4, 2, -1, -3, 1, -4, 0, 3]
    >>> s.sort()
    >>> s.sort(key=lambda x: x%2)
    >>> s
    [-4, -2, 0, 2, 4, -3, -1, 1, 3]

I don't think so:

    >>> s = [0, 2, -2, 3, 1, -1, 4, -4, -3]
    >>> s.sort(key=lambda x:x%2)
    >>> s
    [0, 2, -2, 4, -4, 3, 1, -1, -3]

There were two sorts in my post and only one in yours.
That's why you didn't get the same answer.

s.sort(key=lambda x:(x%2,x)) might work but why are you so opposed
to being able to write one's natural thought pattern if that pattern
happens to be a comparison function?  Who wants to concoct these
ad-hoc encodings for every ordering?

Your (x%2, x) idea seems like a good way to do it in a single pass.
Also, it seems to be a natural translation of the problem statement,
"even numbers always sort before odd numbers", into a primary key
and secondary sort key.

How about this: gen_e, gen_pi, gen_sqrt2, etc. all generate
representations of various real numbers as possibly infinite sequences
of digits:

   def gen_pi():
      yield 3
      yield 1
      yield 4
      # ... compute infinite stream one digit at a time

Comparison is obvious if (hmmm....) you can guarantee that you're
sorting unique elements and that the sort function won't compare an
element with itself (otherwise I guess you could cut off comparison at
a million digits or whatever):

   def cmp(gen1, gen2):
     for d,e in izip(gen1(), gen2()):
        c = cmp(d,e)
        if c: return c

I don't see any clean way to handle that with key=, though of course
maybe I'm missing something.  

Right. I would be forced to use a Cmp2Key wrapper for this one.

If sorting lazily evaluated infinite sequences is a common use
case for you, then you're going to love Haskell ;-)

I don't hate sorting in py3.x.

That's good to hear.

 I understand the concept that py3.x
introduces incompatibilities to make things better.  What I don't like
is the approach of breaking stuff gratutiously with no benefit.  If
the positional arg for cmp is a kludge, why not switch it to a keyword
arg instead of eliminating it?

When Guido made the final call, I'm sure he was balancing
a number of goals including language simplification and
one-way-to-do-it. I'm also sure that sorting lazily
evaluated infinite sequences was not on his radar screen :)

 key= is very useful but it is
a derived operation, not the other way around.

As you showed with infinite sequence example, it may well
be the case that cmp is more general and can more readily
handle an exotic case.

Psychologically, the thing that I find to be interesting is
that beginners and intermediate users seem to take to key
functions more readily than old timers. The key function seems
to be an easy thing to teach (perhaps because that's the
way Excel sorts and the way SQL sorts, or perhaps it is because
problem statements seem to arise in the form of "sort by this,
then by that" instead of "here's how two objects should be
compared").

In contrast, some people who have have had deep experience with
cmp functions may tend to first think of cmp solutions and then
have a harder time seeing solutions with key functions. If you
grew-up on C's qsort() like I did, then a key function may not
be the first thing that pops into your head.

One other interesting data point: Python uses key functions
in min(), max(), heapq.nsmallest(), heapq.nlargest, and
itertools.groupby(). Those functions never supported a
cmp function argument, nor has one ever been requested.


Raymond
 
P

Paul Rubin

Raymond Hettinger said:
There were two sorts in my post and only one in yours.
That's why you didn't get the same answer.

Whoops, missed that.
When Guido made the final call, I'm sure he was balancing
a number of goals including language simplification and
one-way-to-do-it. I'm also sure that sorting lazily
evaluated infinite sequences was not on his radar screen :)

I can't help wondering if there will be feature requests for a cmp
argument for the rest of eternity until it eventually makes its way
back in.
Psychologically, the thing that I find to be interesting is
that beginners and intermediate users seem to take to key
functions more readily than old timers.

I think key is preferable in at least 90% of the cases. I also think
a general purpose language should be able to handle 100% of the cases,
so if key is all you have, there can be a gap to fill.
In contrast, some people who have have had deep experience with
cmp functions may tend to first think of cmp solutions and then
have a harder time seeing solutions with key functions. If you
grew-up on C's qsort() like I did, then a key function may not
be the first thing that pops into your head.

That could be.
One other interesting data point: Python uses key functions
in min(), max(), heapq.nsmallest(), heapq.nlargest, and
itertools.groupby(). Those functions never supported a
cmp function argument, nor has one ever been requested.

Interesting. Maybe someday...
 
S

Steven D'Aprano

I can't help wondering if there will be feature requests for a cmp
argument for the rest of eternity until it eventually makes its way back
in.

When I learned that cmp was being dropped, I was horrified. That lasted
for about fifteen minutes of time actually using key :)

Occasionally I come across sorting problems where it isn't obvious how to
write the key function, but I've never come across one where it wasn't
possible at all. I no longer miss cmp.

I think key is preferable in at least 90% of the cases. I also think a
general purpose language should be able to handle 100% of the cases, so
if key is all you have, there can be a gap to fill.

I don't think it's possible to enumerate 100% of the cases, let alone
support them.

Here's one thing I've sometimes wanted to do: sort multiple lists at
once, according to the contents of one of them, with a single call. You
can get the same result with a key function and sorting each list
separately, but if you've got a lot of large lists, that becomes
expensive.

Python's sort is a memory-sort -- you can't sort a list too big to fit
into memory. Sorting 400GB of data using a single function call is not
something I see as fundamental to a general purpose language (although of
course you should be able to write a disk-sort utility using Python).

Nor does sort() support the case where the value of the comparisons
differs according to where the elements are in the list, e.g. given:

[a, b, c, d, a, b]

the result of comparing a and b at the start of the list is different
from comparing them at the end of the list. I don't think there's an
actual use-case for this feature, and cmp would have just as much trouble
solving this (pretend) problem as key does, but I don't think languages
need to supply this as a fundamental operation. If you need it, write
your own sort function :)

Which of course would be a solution to your problem. The source code for
timsort is available, and stable. You could fork it, re-enable the
support for comparison functions, and put it on the Cheeseshop. If there
is sufficient interest, maybe you'll get cmp support re-added to the
built-in in a version or two.
 
B

Bearophile

Raymond Hettinger:
Psychologically, the thing that I find to be interesting is
that beginners and intermediate users seem to take to key
functions more readily than old timers.  The key function seems
to be an easy thing to teach (perhaps because that's the
way Excel sorts and the way SQL sorts, or perhaps it is because
problem statements seem to arise in the form of "sort by this,
then by that" instead of "here's how two objects should be
compared").

In contrast, some people who have have had deep experience with
cmp functions may tend to first think of cmp solutions and then
have a harder time seeing solutions with key functions.  If you
grew-up on C's qsort() like I did, then a key function may not
be the first thing that pops into your head.

I love the 'key', it makes my code simpler and it's simpler to
understand. I am not opposed to the idea of keeping cmp, that in some
rare cases may be useful or more natural.

The problem is that if you allow to use the cmp, lot of programmers
will use it straight away, not even bothering to know what that
strange 'key' argument may be useful for. And they miss the how much
handy 'key' is.

I am having a very hard type (despite my implementation, explanations,
etc) explaining to D programmers why for a flexible general-purpose
function a key function argument is often better. They in the end have
added something like that in the std lib, but with a weird name like
SchwartzSort that's surely not the first standard sort they look
for... this is bad.

So a funny solution to this situation may be the following: to keep
only 'key' in the sort/sorted for few years, so the community of
Python programmers gets used to 'key' only, and then put 'cmp' back
in, for the special cases ;-)

Bye,
bearophile
 
P

Paul Rubin

Bearophile said:
The problem is that if you allow to use the cmp, lot of programmers
will use it straight away, not even bothering to know what that
strange 'key' argument may be useful for. And they miss the how much
handy 'key' is.

Given how often we hear "consenting adults" as justification for any
number of gaps in Python error checking, the argument above is
singularly unpersuasive.
I am having a very hard type (despite my implementation, explanations,
etc) explaining to D programmers why for a flexible general-purpose
function a key function argument is often better. They in the end have
added something like that in the std lib, but with a weird name like
SchwartzSort that's surely not the first standard sort they look
for... this is bad.

"key" and the DSU (Schwartz) sorting pattern makes lots of sense in
scripting languages like Python that are primarily used on relatively
small data sets. The best reason for writing something in a lower
level language like D is because you want to push the limits of your
availiable machine resources. Therefore, the DSU strategy's extra
memory consumption will be what you want less of the time than it is
in Python.
 
R

Raymond Hettinger

[bearophile]
I love the 'key', it makes my code simpler and it's simpler to
understand. I am not opposed to the idea of keeping cmp, that in some
rare cases may be useful or more natural.

The problem is that if you allow to use the cmp, lot of programmers
will use it straight away, not even bothering to know what that
strange 'key' argument may be useful for. And they miss the how much
handy 'key' is. ....
So a funny solution to this situation may be the following: to keep
only 'key' in the sort/sorted for few years, so the community of
Python programmers gets used to 'key' only, and then put 'cmp' back
in, for the special cases ;-)

FWIW, sorting is only a small part of the picture. The notion of
three-way cmp functions was removed throughout the language. The
built-in cmp() function and the __cmp__ magic method and the
corresponding C slot were all excised. None of them played nicely
with rich comparisons. It was a case where having two-ways-to-do-it
was complicating everyone's lives. AFAICT, very few people understood
exactly how the two interacted -- the under-the-hood C code for it
was complex, unattractive, and hard to understand.

I am having a very hard type (despite my implementation,
explanations, etc) explaining to D programmers why for a
flexible general-purpose function a key function argument
is often better.

Last week, I learned a new term, "Code Prion", that referred to
a code technique such as cmp-functions which cause the proteins
of the brain to become contagiously misfolded so that the victim
1) always prefers that technique, 2) easily infects others, and
3) is resistant to sterilization (using other techniques) ;-)

Once Kernighan and Ritchie put C's qsort() into the food supply,
we were doomed.

FWIW, I think the standard library's unittest module is also a
code prion -- there's no other explanation for its thriving
despite competition from the vastly superior syntax of py.test
or nose.


Raymond
 
P

Paul Rubin

Raymond Hettinger said:
Once Kernighan and Ritchie put C's qsort() into the food supply,
we were doomed.

It was already too late. Knuth vol 3 came out in 1973(?) and its
sorting half is mostly about comparison sorting.
FWIW, I think the standard library's unittest module is also a
code prion -- there's no other explanation for its thriving
despite competition from the vastly superior syntax of py.test
or nose.

unittest is sort of a clone of JUnit if I understand correctly.
py.test and nose are obscure because they're not in the stdlib. I
have used unittest a little bit but I mostly use doctest these days.
I have the impression that py.test and/or nose are better in some ways
but I haven't bothered checking further.
 
B

Bearophile

Paul Rubin:
bearophile:
"key" and the DSU (Schwartz) sorting pattern makes lots of sense in
scripting languages like Python that are primarily used on relatively
small data sets. The best reason for writing something in a lower
level language like D is because you want to push the limits of your
availiable machine resources. Therefore, the DSU strategy's extra
memory consumption will be what you want less of the time than it is
in Python.

I think you are quite wrong.

In D dynamic arrays are built-in, they are used almost as Python lists
(but they are single-typed, unless you create an array of variants),
among other things they have a built-in sort. Such sort is used for
small situations too. Even in a language like D *very* often what you
need is to sort a small amount of data in a very flexible way. In such
situations what you need isn't to squeeze the last byte or CPU cycle
out of the PC, but to have a handy and short syntax, a very flexible
sorting, and something that's surely not bug-prone. In such situation
having a 'key' argument is *better*. Such sort can be stable. That's
why the main sort/sorted routines in my dlibs accept an optional key
callable, and they are very handy. I can show you examples.

On the other hand once in a while in a D program you may need to sort
a very large amount of data, or you may need something faster (and
unstable, like a refined introsort based on a 2-pivot QS), or even
more special than the things allowed by the built in sort. In such
situation you can import a special sorting function from the std lib
and use it, such sort can have something equivalent to a cmp (but
lower level, it can accept a function template alias, to inline the
comparison operation).

So the idea is: in a multi-level language you very often have to sort
a limited amount of data in complex ways. Because in most programs a
quite large percentage of the code isn't performance-critical. In such
large parts of the code what you want is something very handy, safe
and flexible. So having a key function is positive. In the few spots
where you need high performance, you can import (or even implement
with your hands) and use a specialized sorting routine. D is a general
purpose language, it's not used like C in Python projects to write
performance-critical spots only.

This is also visible in other parts of the language, for example there
are built-in associative arrays. Their syntax is handy, and even if
they aren't high performance (they are sometimes slower than Python
dicts despite being O(1), but they never have a O(n^2) performance
profile, as may happen to Python dicts, so they are safer) you can use
them in a very quick way, even if you have to create a 10-items
associative array. The end result is that in D programs you can find
more hashes than in C++ code, where I think programmers use them only
where simpler solutions (like iterating on an array) aren't good
enough. (So D AAs have to be fast even in situations where you put 8
items inside them, like in Python dicts). This free usage of AAs makes
the code stronger too (because once in a while you may put 1000 items
in such AA, and it will keeps working efficiently, while a linear scan
in a list of 1000 items starts to become not efficient).

Bye,
bearophile
 
S

Steven D'Aprano

[bearophile]
I love the 'key', it makes my code simpler and it's simpler to
understand. I am not opposed to the idea of keeping cmp, that in some
rare cases may be useful or more natural.

The problem is that if you allow to use the cmp, lot of programmers
will use it straight away, not even bothering to know what that strange
'key' argument may be useful for. And they miss the how much handy
'key' is. ...
So a funny solution to this situation may be the following: to keep
only 'key' in the sort/sorted for few years, so the community of Python
programmers gets used to 'key' only, and then put 'cmp' back in, for
the special cases ;-)

FWIW, sorting is only a small part of the picture. The notion of
three-way cmp functions was removed throughout the language. The
built-in cmp() function and the __cmp__ magic method and the
corresponding C slot were all excised. None of them played nicely with
rich comparisons. It was a case where having two-ways-to-do-it was
complicating everyone's lives. AFAICT, very few people understood
exactly how the two interacted -- the under-the-hood C code for it was
complex, unattractive, and hard to understand.

There's no reason why a hypothetical comparison function for sort() needs
to be a three-way comparison function. All you need to implement sorting
is a function to implement less-than -- I believe that sort() and min()
are already implemented in such a way as to only require the objects
implements __lt__.

Hypothetically, sort(), min() and heapq.nsmallest() could take an
optional less-than comparison function, and max() and heapq.nlargest()
could use a greater-than comparison function.

Of course you can do this now with a cost of O(N) and a helper class:

class SortHelper:
def __init__(self, x):
self.x = x
def __lt__(self, other):
return self.x < other.x + 42 # or whatever...

# Decorate, Sort, Undecorate.
mylist = map(SortHelper, mylist)
mylist.sort()
mylist = [obj.x for obj in mylist]


If your list is huge, you can do the decoration in place:

for i, x in enumerate(mylist):
mylist = SortHelper(x)
mylist.sort()
for i, obj in enumerate(mylist):
mylist = obj.x



but since this is going to especially useful for really big lists, paying
that extra O(N) cost twice will hurt.
 
P

Paul Rubin

Bearophile said:
sorting, and something that's surely not bug-prone. In such situation
having a 'key' argument is *better*. Such sort can be stable.

Nothing stops comparison sorting from being stable. Since the rest of
your post seems premised on the opposite, I hope that clears things up
for you.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,066
Latest member
VytoKetoReviews

Latest Threads

Top