Guido rethinking removal of cmp from sort method

Terry Reedy · Apr 1, 2011

Removing cmp certainly isn't the most disruptive change of Python 3,

That was almost certainly the ascii to unicode switch for strings. It is
still not quite complete in 3.2 but should be pretty well ironed out in 3.3.

but it seems like the one with the least benefit.

Since the set of changes was finite, there *must* be (at least) one with
the lowest benefit/cost ratio. The removal of list.sort(cmp) may well be
that one. Certainly, reasonable people can disagree as to whether the
ratio is above or below 1.0.

If cmp had not been removed, some other change would have been the worst
in this respect, and possibly the subject of a thread like this.

Terry Reedy · Apr 1, 2011

When I speak of implementation vs interface I am speaking from a
strictly object oriented philosophy, as pedigree, from Grady Booch, whom
I consider to be the father of Object Oriented Analysis and Design
(Booch, OO A&D with apps, 1994).

Python is object based but not object oriented in the Booch sense.

The Class interface holds a "special" firmness which fosters the client
relationship of trust and assumption, without which OOA&D is pointless.
The interface of a Class must not change once advertised, and once in
production. This is specific to OOA&D.

Right, and Python is not OOA&D based.

Never change an advertised Class interface.

In Python, class interfaces are no more sacred than module or function
interfaces. If one takes 'no interface change' literally, then Python
would have to be frozen. Even bug fixes change a defacto interface. If
you want frozon, stick with one particular release. There are lots
available.

Terry Reedy · Apr 1, 2011

What happens then is you define a new interface.

Like key= versus cmp=

In Microsoft-speak if
the IWhatever interface needs an incompatible extension like new
parameters, they introduce IWhatever2 which supports the new parameters.
They change the implementation of IWhatever so it becomes a wrapper for
IWhatever2 setting the new parameters to default values, to keep
implementing the old behavior.

If cmp had been left (or were reinstated) but its implementation changed
internally to use cmp_to_key to make it a wrapper for key, you would be
happy? 2to3 could probably gain a fixer to change

..sort(cmp=f) # to

import functools import cmp_to_key
..sort(key=functools.cmp_to_key(f))

I know some would not like this because interface change is not their
real concern.

Terry Reedy · Apr 1, 2011

I appreciate the spirit of your arguments overall, and I do not
necessarily disagree with much of what you are saying. I would like to
challenge you to see this from a little different perspective, if I may.

I have, but I consider your perspective, as I understand it, unrealistic

There are two distinct ways for looking at this "mild code breakage,"
and it might be good to think about how we approach changes in the
future based on an understanding of both perspectives, and consideration
for the clients.

Guido especially and the other developers, generally, try to balance
benefit and cost. What some may not know is that we consider benefits
over several years, and benefits to future users as well as current
users. We hope and expect the former to outnumber the latter in not too
many years.

The decision, about the time of 2.2, to put off most non-bugfix
code-breaking changes until 3.0, rather than spread them over the
remainder of 2.x, was in large part based on the expressed wishes of at
least some users. (I myself would have preferred sooner and more spread
out.)

In the possible perspective of the Python language developers 3x changes
are mild

Compared to the changes to both the language and implementation Guido
once considered, they are! But the Perl 6 fiasco and an article by Joel
Spolsky advocating evolutionary, not revolutionary, software change
prompted him toward mininal change that would meet objectives. Many
proposed changes were rejected.

The perspective of the Class client is something quite different.

There is no 'Class' in Python. Module and function clients have the same
perspective. Python classes are just instances of class 'type'. Modules,
instances of class 'module', are sometimes regarded as singleton
classes. Functions are instances of various classes. Methods are
functions until bound as instances of bound-method classes. Functions
are sometimes an alternative to classes, one favored by those of a more
functional rather than strictly OOP bent.

In any case, you seem to include the interface of class attributes (such
as the list.sort function) as part of the class interface. Since
anything can be a class attribute, freezing 'class interfaces' freezes
everything.

When you get ready to change an advertised Class interface in the
future, please consider my interface rules

Taken strictly, they pretty much prohibit anything but implementation
changes. If we allow the addition of numbered versions of modules,
classes, and functions, with nothing ever deleted, then we would have
massive bloat, with attendant maintenance and documentation problems.
Some modules might be up to v20 by now. Certainly, the switch from ascii
to unicode as the default text encoding (and I am not sure your rules
would allow that) changed the interface of nearly every module and a
large fraction of classes.

Let me reiterate that PSF and Python are not Microsoft and Office.
Resources are primarily voluntary and limited, but code and even
binaries are available indefinitely, so no one is forced by *us* to
switch any particular Python program to any other version of the language.

If you think your rules are workable, try developing a fork that obeys
them.... but wait, maybe there already is one: 2.7, which will only get
bugfixes and invisible implementation changes for several years. Of
course, implementation changes must be screened carefully because in the
real world, as opposed to the imagined Booch world, it is all to0 easy
to accidentally introduce an interface change for some corner case.

Also, an implementation change that exchanges space for time will be
seen an an interface change by somebody. Do you include space-time
behavior in *your* definition of interface (that should not be changed)?

Indeed, some object to the removal of cmp precisely on this basis, not
on the fairly trivial code rewrite it entails. This was the case with
the Google example that prompted this thread. The Googler may well have
written fresh code, as code breakage was *not* the issue. If the
list.sort change had been purely an implementation change, if cmp
remained but its had been changed to use cmp_to_key internally, there
would have been many of the same objections expressed on this thread anyway.

So why is there a problem with cmp? Because there are people who want
most of the changes that break your rules, but not this particular one.

Paul Rubin · Apr 1, 2011

Terry Reedy said:
Like key= versus cmp=

Well, in an untyped language like Python, adding a feature to an
interface doesn't require defining a new interface unless you change
something incompatibly. key= is very useful but it can be added without
breaking cmp= .

2to3 could probably gain a fixer to change
.sort(cmp=f) # to

import functools import cmp_to_key
.sort(key=functools.cmp_to_key(f))

I know some would not like this because interface change is not their
real concern.

Looks like a good idea. There is an efficiency hit from the above in
some situations, but at least it prevents code from breaking, so unless
there's some drawback I'm not currently spotting, it's better than
nothing and I'd endorse adding such a wrapper. 2to3 should show some
kind of diagnostic and maybe put a comment into the output code,
when it does that particular transformation, since most of the
time there's probably a better way to write the key function.

Paul Rubin · Apr 1, 2011

Terry Reedy said:
In Python, class interfaces are no more sacred than module or function
interfaces. If one takes 'no interface change' literally, then Python
would have to be frozen. Even bug fixes change a defacto interface.

Oh come on, a interface is advertised if it's documented in the manual,
which cmp is. There are some undocumented (what you call defacto)
interfaces that you sometimes have to be pragmatically a bit careful
about messing with because people rely on them, but it's almost always
more legitimate to break an undocumented interface than a documented
one.

Terry Reedy · Apr 1, 2011

What happens then is you define a new interface. In Microsoft-speak if
the IWhatever interface needs an incompatible extension like new
parameters, they introduce IWhatever2 which supports the new parameters.
They change the implementation of IWhatever so it becomes a wrapper for
IWhatever2 setting the new parameters to default values, to keep
implementing the old behavior.

Now you have two versions, and eventually many more, to maintain and
document. That takes resources we currently do not have.

Some problems in addition to the benefits of this approach:
1. The users of IWhatever will not gain the benefits of IWhatever2.
2. Upgrading to IWhatever2 requires a change of name as well as off
parameters.
3. If only some users are upgraded, the IWhatever and IWhatever2 users
may become incompatible even though they were before, thus breaking code
without changing the old interface.

Example: Python2 added str2 (= unicode) on top of str. This had all the
problems listed above. Since CPython uses str internally, in particular
for identifiers, CPython users were stuck with the limitations of str.
Rebinding str to mean str2 has many benefits, especially in the future,
in addition to current costs.

John Bokma · Apr 1, 2011

Terry Reedy said:
But the Perl 6 fiasco

Perl 6 a complete failure? Wow, must be coming from a clueless Python
zealot. If Perl 6 is a fiasco, so is Python 3. Both are not considered
production ready, and both can be downloaded and used today:

http://rakudo.org/

Next release is planned for later this month.

Did Perl 6 take a long time? Sure. But confusing it with Python 2 ->
Python 3 is just plainly stupid. It's a complete rewrite of Perl, and
it's much easier to think of it as a complete new language instead of
similar to Perl 4 -> 5 and comparable to Python 2 -> 3.

But if you had any idea what you were talking about, you already knew
that.

geremy condra · Apr 1, 2011

On Wed, Mar 30, 2011 at 7:13 PM, Steven D'Aprano

Or, an alternative approach would be for one of the cmp-supporters to
take the code for Python's sort routine, and implement your own sort-with-
cmp (in C, of course, a pure Python solution will likely be unusable) and
offer it as a download. For anyone who knows how to do C extensions, this
shouldn't be hard: just grab the code in Python 2.7 and make it a stand-
alone function that can be imported.

If you get lots of community interest in this, that is a good sign that
the solution is useful and practical, and then you can push to have it
included in the standard library or even as a built-in.

And if not, well, at least you will be able to continue using cmp in your
own code.

I don't have a horse in this race, but I do wonder how much of Python
could actually survive this test. My first (uneducated) guess is "not
very much"- we would almost certainly lose large pieces of the string
API and other builtins, and I have no doubt at all that a really
significant chunk of the standard library would vanish as well. In
fact, looking at the data I took from PyPI a while back, it's pretty
clear that Python's feature set would look very different overall if
we applied this test to everything.

Geremy Condra

Paul Rubin · Apr 1, 2011

Terry Reedy said:
Now you have two versions, and eventually many more, to maintain and
document.

In the case of cmp= there's not two interfaces needed. Python2 does a
perfectly good job supporting cmp and key with one interface. We do
have urllib and urllib2 as separate interfaces (with separate
implementations), and we keep some other legacy interfaces around too,
like the sha and md5 modules (both of which are now subsumed by
hashlib).

That takes resources we currently do not have.

Well ok, but now you're making excuses for instability, which seems like
a problem given Python's self-marketing as a stable, production-class
system that competes with better-funded efforts like Java.

Example: Python2 added str2 (= unicode) on top of str. This had all
the problems listed above. Since CPython uses str internally, in
particular for identifiers, CPython users were stuck with the
limitations of str. Rebinding str to mean str2 has many benefits,
especially in the future, in addition to current costs.

That really is a massive change, but a welcome one, because Python2
programs broke all the time due to missed conversions between str and
unicode. It is the type of thing that a language transition (Python2 to
Python3) is supposed to bring, that goes beyond Interface2 to
Interface3. There's been a lot of experience gained in the decades
since Python's creation and it's fine to do an overhaul after all these
years. The issues are 1) don't break stuff unless there's a substantial
benefit; and 2) don't do these major incompatible releases too often.
There should not be any incompatible Python4 before 2020 or so.

I actually think Python3 actually didn't go far enough in fixing
Python2. I'd have frankly preferred delaying it by a few years, to
allow PyPy to come to maturity and serve as the new main Python
implementation, and have that drive the language change decisions.
Instead we're going to have to give up a lot of possible improvements we
could have gotten from the new implementation.

Steven D'Aprano · Apr 1, 2011

On Wed, Mar 30, 2011 at 7:13 PM, Steven D'Aprano

I don't have a horse in this race, but I do wonder how much of Python
could actually survive this test. My first (uneducated) guess is "not
very much"- we would almost certainly lose large pieces of the string
API and other builtins, and I have no doubt at all that a really
significant chunk of the standard library would vanish as well. In fact,
looking at the data I took from PyPI a while back, it's pretty clear
that Python's feature set would look very different overall if we
applied this test to everything.

I don't understand what you mean by "this test".

I'm certainly not suggesting that we strip every built-in of all methods
and make everything a third-party C extension. That would be insane.

Nor do I mean that every feature in the standard library should be forced
to prove itself or be removed. The features removed from Python 3 were
deliberately few and conservative, and it was a one-off change (at least
until Python 4000 in the indefinite future). If something is in Python 3
*now*, you can assume that it won't be removed any time soon.

What I'm saying is this: cmp is already removed from sorting, and we
can't change the past. Regardless of whether this was a mistake or not,
the fact is that it is gone, and therefore re-adding it is a new feature
request. Those who want cmp functionality in Python 3 have three broad
choices:

(1) suck it up and give up the fight; the battle is lost, move on;

(2) keep arguing until they either wear down the Python developers or get
kill-filed; never give up, never surrender;

(3) port the feature that they want into a third-party module, so that
they can actually use it in code, and then when they have evidence that
the community needs and/or wants this feature, then try to have it re-
added to the language.

I'm suggesting that #3 is a more practical, useful approach than writing
another hundred thousand words complaining about what a terrible mistake
it was. Having to do:

from sorting import csort

as a prerequisite for using a comparison function is not an onerous
requirement for developers. If fans of functional programming can live
with "from functools import reduce", fans of cmp can live with that.

Steven D'Aprano · Apr 1, 2011

I actually think Python3 actually didn't go far enough in fixing
Python2. I'd have frankly preferred delaying it by a few years, to
allow PyPy to come to maturity and serve as the new main Python
implementation, and have that drive the language change decisions.
Instead we're going to have to give up a lot of possible improvements we
could have gotten from the new implementation.

There's always Python 4000

Steven D'Aprano · Apr 1, 2011

Perl 6 a complete failure?

"Fiasco" does not mean "complete failure". It is a debacle, an
embarrassing, serious failure, (and also an Italian wine bottle with a
rounded bottom), but not necessarily complete. It does not imply that a
fiasco cannot, with great effort and/or time, be eventually recovered
from. Netscape Navigator 6 was a fiasco, which directly lead to the
dominance of Internet Explorer in the browser market, but today the heir
of Netscape, Mozilla's Firefox browser, has regained a majority of the
browser market in Europe and is catching up on IE world-wide. Those who
are old enough will remember that Microsoft Word 3.0 was a fiasco, but
there's no doubt that Word has well recovered to virtually own the entire
word processing market.

Wow, must be coming from a clueless Python zealot.

Thanks for sharing.

If Perl 6 is a fiasco, so is Python 3. Both are not considered
production ready, and both can be downloaded and used today:

This is FUD about Python 3. Python 3 is absolutely production ready. The
only reason to avoid Python 3 is if your software relies on a specific
third-party library that does not yet support Python 3.

On the other hand, the PerlFAQ still describes Perl 6 as not ready:

http://faq.perl.org/perlfaq1.html#What_are_Perl_4_Perl
http://faq.perl.org/perlfaq1.html#What_is_Perl_6_

"Perl 6 is the next major version of Perl, although it's not intended to
replace Perl 5. It's still in development in both its syntax and design.
The work started in 2002 and is still ongoing. ..."

"Perl 6 is not scheduled for release yet ..."

Nine years after Perl 6 was started, neither the syntax nor design are
settled.

The initial PEP for Python 3000 development was in 2006; the first final
release of Python 3 was in 2008, but I don't count that because Python
3.0 was fatally flawed and is no longer supported. The first production
ready release of Python 3.1 was 2009: three years from talking to a
production-ready version.

Did Perl 6 take a long time? Sure. But confusing it with Python 2 ->
Python 3 is just plainly stupid. It's a complete rewrite of Perl, and
it's much easier to think of it as a complete new language instead of
similar to Perl 4 -> 5 and comparable to Python 2 -> 3.

What you have described is not a reason for rejecting the claim that Perl
6 was a fiasco, but the reason for *why* it was a fiasco.

geremy condra · Apr 1, 2011

I don't understand what you mean by "this test".

I mean testing whether a feature should be in Python based on whether
it can meet some undefined standard of popularity if implemented as a
third-party module or extension.

I'm certainly not suggesting that we strip every built-in of all methods
and make everything a third-party C extension. That would be insane.

Granted, but I think the implication is clear: that only those
features which could be successful if implemented and distributed by a
third party should be in Python. My argument is that there are many
features currently in Python that I doubt would pass that test, but
which should probably be in anyway. The conclusion I draw from that is
that this isn't a particularly good way to determine whether something
should be in standard Python.

Nor do I mean that every feature in the standard library should be forced
to prove itself or be removed. The features removed from Python 3 were
deliberately few and conservative, and it was a one-off change (at least
until Python 4000 in the indefinite future). If something is in Python 3
*now*, you can assume that it won't be removed any time soon.

I may have been unclear, so let me reiterate: I'm not under the
impression that you're advocating this as a course of action. I'm just
pointing out that the standard for inclusion you're advocating is
probably not a particularly good one, especially in this case, and
engaging in a bit of a thought experiment about what would happen if
other parts of Python were similarly scrutinized.

What I'm saying is this: cmp is already removed from sorting, and we
can't change the past. Regardless of whether this was a mistake or not,
the fact is that it is gone, and therefore re-adding it is a new feature
request. Those who want cmp functionality in Python 3 have three broad
choices:

I might quibble over whether re-adding is the same as a new feature
request, but as I said- I don't care about cmp.

(1) suck it up and give up the fight; the battle is lost, move on;

(2) keep arguing until they either wear down the Python developers or get
kill-filed; never give up, never surrender;

(3) port the feature that they want into a third-party module, so that
they can actually use it in code, and then when they have evidence that
the community needs and/or wants this feature, then try to have it re-
added to the language.

I'm suggesting that #3 is a more practical, useful approach than writing
another hundred thousand words complaining about what a terrible mistake
it was. Having to do:

from sorting import csort

as a prerequisite for using a comparison function is not an onerous
requirement for developers. If fans of functional programming can live
with "from functools import reduce", fans of cmp can live with that.

And that's fine, as I said I don't have a horse in this race. My point
is just that I don't think the standard you're using is a good one-
ISTM that if it *had* been applied evenly we would have wound up with
a much less complete (and much less awesome) Python than we have
today. That indicates that there are a reasonable number of real-world
cases where it hasn't and shouldn't apply.

Geremy Condra

Terry Reedy · Apr 1, 2011

Looks like a good idea. There is an efficiency hit from the above in
some situations, but at least it prevents code from breaking, so unless
there's some drawback I'm not currently spotting, it's better than
nothing and I'd endorse adding such a wrapper. 2to3 should show some
kind of diagnostic and maybe put a comment into the output code,
when it does that particular transformation, since most of the
time there's probably a better way to write the key function.

rewriting cmp_to_key in C is underway

http://bugs.python.org/issue11707

Paul Rubin · Apr 1, 2011

Steven D'Aprano said:
What I'm saying is this: cmp is already removed from sorting, and we
can't change the past. Regardless of whether this was a mistake or
not,

No it's not already removed, I just tried it (in Python 2.6, which is
called "Python" for short) and it still works. It's not "removed" from
Python until basically all Python users have migrated and "Python"
essentially always means "Python 3". Until that happens, for Python 2
users, Python 3 is just a fork of Python with some stuff added and some
stuff broken, that might get its act together someday. I see in the
subject of this thread, "Guido rethinking removal of cmp from sort
method" which gives hope that one particular bit of breakage might get
fixed.

the fact is that it is gone, and therefore re-adding it is a new feature
request. Those who want cmp functionality in Python 3 have three broad
choices: ...
(3) port the feature that they want into a third-party module, ...
I'm suggesting that #3 is a more practical, useful approach ...
Having to do:
from sorting import csort ...
If fans of functional programming can live
with "from functools import reduce", fans of cmp can live with that.

If "sorting" is in the stdlib like functools is, then the similarity
makes sense and the suggestion isn't so bad. But you're proposing a 3rd
party module, which is not the same thing at all. "Batteries included"
actually means something, namely that you don't have to write your
critical applications using a library base written with a Wikipedia-like
development model where anybody can ship anything, where you're expected
to examine every module yourself before you can trust it. Stuff in the
stdlib occasionally has bugs or gaps, but it has a generally consistent
quality level, is documented, and has been reviewed and reasonably
sanity checked by a central development group that knows what it's
doing. Stuff in 3rd party libraries has none of the above. There are
too many places for it to go wrong and I've generally found it best to
stick with stdlib modules instead of occasionally superior modules that
have the disadvantage of coming from a third party.

Paul Rubin · Apr 1, 2011

Steven D'Aprano said:
There's always Python 4000

Is that on the boards yet?

Chris Angelico · Apr 1, 2011

If "sorting" is in the stdlib like functools is, then the similarity
makes sense and the suggestion isn't so bad. But you're proposing a 3rd
party module, which is not the same thing at all. "Batteries included"
actually means something...

To me, "batteries included" means that I can:
1) Write a Python script on any Ubuntu laptop that I put my hands on,
and expect it to work.
2) Put a shebang on it, chmod it plus exx, and give it to someone, and
expect it to work on his system.
3) Post it to my web site along with the comment "You will need a
Python interpreter to run this", and expect it to work.

Every third-party library I need weakens that. Sure, situation 1 isn't
too hard; but the other two end up becoming a bit awkward. The
Yosemite Project requires a support module on Windows, making it that
bit harder to share with people; but I accept that, because it's doing
some rather unusual things (simulating keypresses on another window).
Sorting a list is not unusual enough to justify a third-party module.

ChrisA

Benjamin Peterson · Apr 1, 2011

Paul Rubin said:
I actually think Python3 actually didn't go far enough in fixing
Python2. I'd have frankly preferred delaying it by a few years, to
allow PyPy to come to maturity and serve as the new main Python
implementation, and have that drive the language change decisions.
Instead we're going to have to give up a lot of possible improvements we
could have gotten from the new implementation.

Why would having PyPy as the reference implementation have made this design
decisions turn out better?

Paul Rubin · Apr 1, 2011

Benjamin Peterson said:
Why would having PyPy as the reference implementation have made this design
decisions turn out better?

A fair amount of Python 2's design was influenced by what was convenient
or efficient to implement in CPython. There's nothing wrong with that
and it's a perfectly normal and sensible strategy. Anyone writing
Python code in a serious way has to maintain some awareness of how
CPython works, so CPython's influence finds its way into Python user
programs too. With PyPy as the reference implementation, the designers
may find they can take the language in cool new directions that were
impossible with CPython, or alternatively, they might find that adding
minor retrictions (that would count as "breaking more stuff") would give
big advantages under PyPy that weren't significant in CPython. What
kinds of stuff and is any of it a sure thing? Unknown. That's why the
idea was: first get more experience with PyPy, then figure out how it
should affect the language.

Chanelling Guido - dict subclasses	11	Jan 14, 2014
Sort by number of characters	1	Nov 2, 2023
Q: sort's key and cmp parameters	45	Oct 1, 2009
Python's doc problems: sort	11	Apr 29, 2008
basic questions on cmp, < and sort	4	Oct 25, 2006
Q: sort's key and cmp parameters	1	Oct 1, 2009
Using s.sort([cmp[, key[, reverse]]]) to sort a list of objects based on a attribute	7	Sep 7, 2007
Some sort questions - especially hashes	4	Oct 10, 2013

Guido rethinking removal of cmp from sort method

Terry Reedy

Terry Reedy

Terry Reedy

Terry Reedy

Paul Rubin

Paul Rubin

Terry Reedy

John Bokma

geremy condra

Paul Rubin

Steven D'Aprano

Steven D'Aprano

Steven D'Aprano

geremy condra

Terry Reedy

Paul Rubin

Paul Rubin

Chris Angelico

Benjamin Peterson

Paul Rubin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads