no more comparisons

Alan Isaac · Mar 12, 2008

I was surprised to see that
comparison is slated for death
in Python 3000.

For example:
http://www.python.org/dev/peps/pep-3100/
list.sort() and builtin.sorted() methods: eliminate cmp parameter [27] [done]

But there is a rumor of a PEP to restore comparisons.
http://mail.python.org/pipermail/python-3000/2008-January/011764.html

Is that going anywhere?

Also, what is the core motivation for removing this functionality?

Alan Isaac

Carl Banks · Mar 12, 2008

I was surprised to see that
comparison is slated for death
in Python 3000.

For example:http://www.python.org/dev/peps/pep-3100/
list.sort() and builtin.sorted() methods: eliminate cmp parameter [27] [done]

Hmm, wasn't aware they were taking it that far. You should almost
always avoid using the cmp parameter because it's very inefficient;
instead, look at the key parameter which is a function that maps
objects in a your sequence to a sort key and sorts on that.

So instead of (for a simple example):

s.sort(cmp=lambda a,b: cmp(a.get_id(),b.get_id()))

You would use:

s.sort(key=lambda a:a.get_id())

(However, there are rare cases where you can't easily map your items
to a sortable builtin. I suppose in those cases you'll have to use a
custom comparison proxy. But I digress.)

But there is a rumor of a PEP to restore comparisons.http://mail.python.org/pipermail/python-3000/2008-January/011764.html

Is that going anywhere?
No.

Also, what is the core motivation for removing this functionality?

The basically replaced it with a better one. Instead of the cmp
methods above, use the key method. Instead of __cmp__ method for
overriding object operators, use the rich comparison methods: __lt__,
__gt__, and so on.

Python 2.x currently implements both cmp and rich comparisons at the
same time, but that creates a lot of underlying complexity, so they
got rid of it.

Carl Banks

Paul Rubin · Mar 12, 2008

Carl Banks said:
For example:http://www.python.org/dev/peps/pep-3100/
list.sort() and builtin.sorted() methods: eliminate cmp parameter [27] [done]

Click to expand...

Hmm, wasn't aware they were taking it that far. You should almost
always avoid using the cmp parameter because it's very inefficient;

I don't see what's so inefficient about it necessarily. The key
argument invokes a DSU scheme that allocates and releases potentially
a lot of memory, which both uses time and increases the program's
memory region. The cmp option should not be removed. However,
requiring it to be specified as a keyword parameter instead of
just passed as an unlabelled arg is fine.

Alan Isaac · Mar 12, 2008

Paul said:
The cmp option should not be removed. However, requiring

it to be specified as a keyword parameter instead of just

passed as an unlabelled arg is fine.

Sure; I would have no problem with that.

But that is not what is happening.

As for Carl's suggestion to use ``key``:
this is already possible when it is convenient,
but it is not always convenient. (Even aside
from memory considerations.)

By the way, I even saw mention of even removing the

``cmp`` built-in.

Cheers,

Alan Isaac

Terry Reedy · Mar 13, 2008

| > Hmm, wasn't aware they were taking it that far. You should almost
| > always avoid using the cmp parameter because it's very inefficient;
|
| I don't see what's so inefficient about it necessarily.

The key function is called once per list item, for n calls total. The
comparision function is called once per comparision. There are at least
n-1 such calls and typically something on the order of n * lg2(n) calls.

Dan Bishop · Mar 13, 2008

Sure; I would have no problem with that.

But that is not what is happening.

As for Carl's suggestion to use ``key``:
this is already possible when it is convenient,
but it is not always convenient. (Even aside
from memory considerations.)

def cmp_key(cmp_fn):
class CmpWrapper(object):
def __init__(self, obj):
self.obj = obj
def __cmp__(self, other):
return cmp_fn(self.obj, other.obj)
return CmpWrapper

Paul Rubin · Mar 13, 2008

Terry Reedy said:
| I don't see what's so inefficient about it necessarily.

The key function is called once per list item, for n calls total. The
comparision function is called once per comparision. There are at least
n-1 such calls and typically something on the order of n * lg2(n) calls.

Right. Depending on the application, calling the comparison function
lg(n) times may be faster than calling the key function once.

Example: you want to sort a list of strings case-independently, each
string containing a few pages worth of text. There are 10000 strings,
each maybe 10000 chars long. Then (untested):

x.sort(xs, key=lower)

makes a copy of each of the 10000 strings, makes 10000 tuples, sorts,
then releases the temporary strings and tuples. At least 100
megabytes of memory allocated and 100 million chars copied, even
though any give pair of strings will usually differ somewhere in the
first few characters and there's no need to look past those.

from string import lower
from operator import eq
from itertools import *

def xcmp(a,b):
for x,y in izip(a,b)):
c = cmp(x.lower() y.lower())
if c: return c
return 0

x.sort(xcmp)

runs the comparison function about 14000 times, looking at only a few
bytes on most of the calls, and using only a small constant amount of
auxiliary storage, and is likely to be much faster than all that
copying.

Hmm, there is a slight problem with xcmp-- it will fail if a is a
prefix of b. That's fixable but anyway it still makes its point as
written.

Terry Reedy · Mar 13, 2008

| > | I don't see what's so inefficient about it necessarily.
| >
| > The key function is called once per list item, for n calls total. The
| > comparision function is called once per comparision. There are at
least
| > n-1 such calls and typically something on the order of n * lg2(n)
calls.
|
| Right. Depending on the application, calling the comparison function
| lg(n) times may be faster than calling the key function once.
|
| Example: you want to sort a list of strings case-independently, each
| string containing a few pages worth of text. There are 10000 strings,
| each maybe 10000 chars long.

If the strings are pulled in from a file, or stored off to a file, as I
would expect in and real app of this sort, the in-memory data to be sorted
could/should be tuples of the lowercased strings and files positions. Then
no key param is needed.

| Then (untested):
|
| x.sort(xs, key=lower)
|
| makes a copy of each of the 10000 strings, makes 10000 tuples, sorts,
| then releases the temporary strings and tuples. At least 100
| megabytes of memory allocated and 100 million chars copied, even
| though any give pair of strings will usually differ somewhere in the
| first few characters and there's no need to look past those.
|
| from string import lower
| from operator import eq
| from itertools import *
|
| def xcmp(a,b):
| for x,y in izip(a,b)):
| c = cmp(x.lower() y.lower())
| if c: return c
| return 0
|
| x.sort(xcmp)
|
| runs the comparison function about 14000 times,

14 * 10000 = 140000, not 14000. For each of these, you have a call to izip
and multiple calls to .next, cmp, and .lower.

That said, this use case of where the potentially needed key is long
whereas the actually needed key is much shorter is a better use case than
anything I have seen so far.

Still, as I figure it, the initial part of each text is lowercased around
28 times, on average. So there can only be a speed saving if the text is
more than 28 times the effective actual key.

To actually do such a thing, at least more than once, I would be tempted to
have the key function take just the first k chars, for k perhaps 50. If
the texts average, say, 1500 chars, then the key array would be 1/30 *
perhaps 2 (for object overhead) = 1/15 of the original array space. Then
run over the 'sorted' array to see if there are any cases where the first
50 chars match and if so, check more to see if a swap needs to be made.
Easiest would be to do a simple bubble or insert sort with the full
compare.

tjr

Duncan Booth · Mar 13, 2008

Paul Rubin said:
Right. Depending on the application, calling the comparison function
lg(n) times may be faster than calling the key function once.

Example: you want to sort a list of strings case-independently, each
string containing a few pages worth of text. There are 10000 strings,
each maybe 10000 chars long. Then (untested):

x.sort(xs, key=lower)

makes a copy of each of the 10000 strings, makes 10000 tuples, sorts,
then releases the temporary strings and tuples. At least 100
megabytes of memory allocated and 100 million chars copied, even
though any give pair of strings will usually differ somewhere in the
first few characters and there's no need to look past those.

Not tuples actually, there's a special sortwrapper type which is used.
This is slightly better as it stops the comparison leaking through to
the original values without having to store an index value.

As Dan has pointed out, for those rare cases where it actually is better
to use cmp than key you can easily create a wrapper. The only change is
that you now have to work at using cmp instead of it being the first
thing you think of. So here you could use his wrapper and your
comparison function together:

x.sort(key=cmp_key(xcmp))

I'd agree though that the comparison wrapper should be reasonably well
publicised: I'd remembered that there was an easy way to implement cmp
in terms of key but not what it was.

Alan Isaac · Mar 13, 2008

Dan said:
def cmp_key(cmp_fn):

class CmpWrapper(object):

def __init__(self, obj):

self.obj = obj

def __cmp__(self, other):

return cmp_fn(self.obj, other.obj)

return CmpWrapper

Apparently I'm overlooking something obvious ...

how is this supposed to work if __cmp__ is no longer

being called? (Which was my understanding.)

Thank you,

Alan Isaac

Carl Banks · Mar 13, 2008

Apparently I'm overlooking something obvious ...

how is this supposed to work if __cmp__ is no longer

being called? (Which was my understanding.)

It won't. In Python 3.0 you'd have to write this class in terms of
rich comparisons (__lt__, __gt__, etc.).

Carl Banks

Alan Isaac · Mar 13, 2008

Dan said:
On Mar 13, 12:38 pm, Alan Isaac wrote:

Carl said:
It won't. In Python 3.0 you'd have to write this class in terms of

rich comparisons (__lt__, __gt__, etc.).

Exactly. So something simple (define an anonymous function)

has become a bit of a pain.

On the other hand, I've looked through my extant code and

have not found a use of ``cmp`` that I cannot work around.

So maybe this is not as bad as I feared. What are some use

cases that will clearly be harder (i.e., at least require

a slightly elaborate wrapper) after this change?

Cheers,

Alan Isaac

Mark Dickinson · Mar 13, 2008

So maybe this is not as bad as I feared. What are some use

cases that will clearly be harder (i.e., at least require

a slightly elaborate wrapper) after this change?

Sorting tuples, where the second item in the tuple should
have the opposite ordering to the first is going to be a bit
of a pain. Or even worse, where the ordering of the second
item depends on the value of the first item in the tuple.

For example, suppose that (for whatever contrived reason)
you're representing integers in (sign, magnitude) format
by tuples (s, i), where s = 0 or 1 (0 for positive, 1 for
negative) and i is a string representing the absolute
value of the integer. So

(0, '3') represents 3
(1, '14') represents 14

and you want to sort according to numeric value. One
can certainly come up with a key that works, but it's
not quite as straightforward as writing a custom __cmp__.

This isn't a totally contrived example: the internal
Decimal format isn't so different from this.

Mark

Mark Dickinson · Mar 13, 2008

(1, '14') represents 14

should be -14, of course.

Alan Isaac · Mar 14, 2008

Mark said:
Sorting tuples, where the second item in the tuple should

have the opposite ordering to the first is going to be

a bit of a pain. Or even worse, where the ordering of the

second item depends on the value of the first item in the

tuple.

This is like some examples where I had used cmp,

but if I understand correctly I think it is not a problem.

For example, suppose that (for whatever contrived reason)

you're representing integers in (sign, magnitude) format

by tuples (s, i), where s = 0 or 1 (0 for positive, 1 for

negative) and i is a string representing the absolute

value of the integer. So

Does this do it? ::

key= lambda x: (-x[1],int(x2))

Here I am depending on the lexicographic sorting of tuples.
Without that there would be real trouble.

Cheers,

Alan Isaac

Dan Bishop · Mar 14, 2008

Mark said:
Mark said:

Sorting tuples, where the second item in the tuple should
have the opposite ordering to the first is going to be
a bit of a pain. Or even worse, where the ordering of the
second item depends on the value of the first item in the
tuple.

Click to expand...

This is like some examples where I had used cmp,

but if I understand correctly I think it is not a problem.

For example, suppose that (for whatever contrived reason)
you're representing integers in (sign, magnitude) format
by tuples (s, i), where s = 0 or 1 (0 for positive, 1 for
negative) and i is a string representing the absolute
value of the integer. So

Click to expand...

Does this do it? ::

key= lambda x: (-x[1],int(x2))

Here I am depending on the lexicographic sorting of tuples.
Without that there would be real trouble.

Even assuming you meant (-x[0], int(x[1])), that sorts negative
numbers in the wrong order. You want

key = lambda x: (-1 if x[0] else 1) * int(x[1])

Mark Dickinson · Mar 14, 2008

Does this do it? ::

key= lambda x: (-x[1],int(x2))

Here I am depending on the lexicographic sorting of tuples.
Without that there would be real trouble.

Close.

I think it needs to be something like:

key = lambda x: (-x[0], int(x[1]) if x[0] == 0 else -int(x[1]))

Relying on lexicographic sorting of tuples seems okay to me.

What bugs me more about this example is that one has to rely
on the existence of a way to negate the usual ordering. This
is fine for numbers (where you can use the unary - operator)
or for numeric strings (where you can convert to int and then
use -), but it's not too much of a stretch to imagine cases
where neither of those options is available (e.g. non-numeric
strings), and where one ends up moving further and further
from readable code and into the realm of arcane trickery...

Mark

Alan Isaac · Mar 14, 2008

Dan said:
Even assuming you meant (-x[0], int(x[1])), that sorts negative

numbers in the wrong order. You want

key = lambda x: (-1 if x[0] else 1) * int(x[1])

Sorry, was speed typing (very badly) and the question

actually had two different problem descriptions.

As you suggest, what I meant was::

key= lambda x: (-x[0], int(x[1]))

which was meant to address:

"Sorting tuples, where the second item in the tuple

should have the opposite ordering to the first is going

to be a bit of a pain."

But you are quite right, the problem was then specified to

numerical order, for which I like your solution. Or even::

key= lambda x: -int(x[1]) if x[0] else int(x[1])

The key point being, if you will, that this use case does

not support the idea that relying on ``key`` will be so bad.

So, what is a case that is really uncomfortable?

Thanks,

Alan Isaac

Chanelling Guido - dict subclasses	11	Jan 15, 2014
cmp and sorting non-symmetric types	5	Nov 13, 2007
python-dev Summary for 2006-05-01 through 2006-05-15	3	Jun 14, 2006
python-dev Summary for 2006-08-01 through 2006-08-15	0	Oct 9, 2006
python-dev Summary for 2005-06-16 through 2005-06-30	2	Jul 9, 2005
Removal of tkinter from python 3.0? [was: Fate of the repr module inPy3.0]	13	Mar 20, 2008
PEP 3107 Function Annotations for review and comment	4	Dec 29, 2006
relative imports improve program organization... suggestions?	4	Aug 8, 2008

no more comparisons

Alan Isaac

Carl Banks

Paul Rubin

Alan Isaac

Terry Reedy

Dan Bishop

Paul Rubin

Terry Reedy

Duncan Booth

Alan Isaac

Carl Banks

Alan Isaac

Mark Dickinson

Mark Dickinson

Alan Isaac

Dan Bishop

Mark Dickinson

Alan Isaac

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads