Q: sort's key and cmp parameters

S

Steven D'Aprano

Nothing stops comparison sorting from being stable. Since the rest of
your post seems premised on the opposite, I hope that clears things up
for you.

I'm sure Paul already knows this, but key-based sorts are comparison
sorts.

There are two basic types of sorts: comparison based, where the routine
has to compare items (usually with the < operator), and non-comparison
sorts, like bucket sort, pigeon-hole sort and radix sort. These sorts
require special knowledge of the items being sorted, and don't need to
compare two items. General purpose sorts like Python's sort() do,
regardless of whether you pass a key function, a three-way comparison
function, or something else.
 
B

Bearophile

Paul Rubin:
Bearophile:

Nothing stops comparison sorting from being stable.  Since the rest of
your post seems premised on the opposite, I hope that clears things up
for you.

When I have written that post I was partially unfocused, I am sorry.

What I meant is that a general sorting routine, even in D, is better
to be first of all flexible. So I think it's better for the D built-in
sort to be stable, because such extra invariant allows you to use the
sort in more situations.

Bye,
bearophile
 
H

Hrvoje Niksic

Bearophile said:
What I meant is that a general sorting routine, even in D, is better
to be first of all flexible. So I think it's better for the D built-in
sort to be stable, because such extra invariant allows you to use the
sort in more situations.

Note that stable sort has additional memory requirements. In situations
where you don't need stability, but do need memory-efficient in-place
sorting, an unstable sort might well be preferred. This is why
libraries such as C++'s STL offer both.
 
B

Bearophile

Hrvoje Niksic:
Note that stable sort has additional memory requirements.  In situations
where you don't need stability, but do need memory-efficient in-place
sorting, an unstable sort might well be preferred.  This is why
libraries such as C++'s STL offer both.

There are stable sorts that need no additional memory.

In a largish program written in a general-purpose multi-level
language, like D (this is probably true for C++ too), often 80% of the
code takes a very little time to run. Such code may need to process
few small arrays, or small data sets, or to manage the GUI, etc.

So optimizing those parts of the code for performance (both in memory
used and CPU cycles used) is stupid. What you want in such large parts
of the code is:
- to write code quickly
- to have code that's short, readable, easy to fix, and most important
of all that is the less bug-prone as possible.

So what you need in such large parts of the code is something that's
very flexible and safe, even if it's not top performance (both in
memory and CPU).

This is why for example in D there are built-in associative arrays,
that have a handy syntax, built-in methods, and they are never O(n^2),
even if they are not the most efficient ones where you need max
performance, or where you need minimal memory used, or where you need
to perform unusual operations. Such built-ins are useful to reduce
both code length and bug count, because they are quite safe and easy
to use.

Stable sorts are a little safer, because they don't scramble your data
in certain ways, so they can be used in more situations. This is why
having the Timsort in Python is good.

When you run profile the D code and you see your program uses too much
RAM or wastes too much time in the built-in sort, then you can switch
to using special sorts from the std lib. You can even write your own
hash or sort for special situations (and you don't have to drop down
to use another language for that, you keep using the same, even if you
may want to change your programming stile, and use a more C-like
style, with no dynamic allocations, some bit twidding, pointers, more
structs, 16 bit integers, unsigned integers, unions, compiler
intrinsics, even inlined assembly that uses SSE3, etc). In normal code
you may want to adopt a more Java-like programming style, or
templates, or even using a large library I have written, you may
program it almost as a curious Python-like, with lazyness too.

This is why I think the built-in sort has to be flexible, because you
are supposed to use it most of the times, and most of the times you
don't need top performance. Different parts of a program have to be
optimized for different things, computer performance, or programmer
performance.

Bye,
bearophile
 
L

Luis Zarrabeitia

Given how often we hear "consenting adults" as justification for any
number of gaps in Python error checking, the argument above is
singularly unpersuasive.

Well, as long as you consider them "gaps" in need of a "justification", of
course that argument (and the one about the "gaps" themselves) will of course
seem singularly unpersuasive.

But if you see them as a "feature" (that may sometimes, albeit rarely,
missfire), then you would have no problem with /either/ argument.
 
R

Raymond Hettinger

[Hrvoje Niksic]
Note that stable sort has additional memory requirements.  In situations
where you don't need stability, but do need memory-efficient in-place
sorting, an unstable sort might well be preferred.  This is why
libraries such as C++'s STL offer both.

FWIW, the "additional memory requirements" are typically a set of
pointers to the objects being sorted, so the memory overhead is
typically very small relative to the size of the objects being sorted.
IOW, this isn't much of a consideration in most Python apps.


Raymond
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top