python 3's adoption

M

mdj

What would annoy me if I used Python 3.x would be the apparent lack of
the __cmp__ method for conveniently defining comparisons between
instances of my own classes. Having to define all the rich comparison
methods frequently isn't even as much fun as it sounds.

OT, but you can always define the other operators in terms of a cmp
and mix it in, restoring the original behaviour. Unfortunately it
won't restore the original performance until someone comes to their
senses and restores __cmp__

Matt
 
S

Steven D'Aprano

OT, but you can always define the other operators in terms of a cmp and
mix it in, restoring the original behaviour. Unfortunately it won't
restore the original performance until someone comes to their senses and
restores __cmp__

"Comes to their senses"?

There's nothing you can do with __cmp__ that you can't do better with
rich comparisons, and plenty that rich comparisons can do that __cmp__ is
utterly incapable of dealing with. __cmp__ is crippled since it can only
be used for defining classes where the operators < etc return flags. It
can't be used if you want to implement some other behaviour for the
operators. E.g. here's a silly example:
.... def __init__(self):
.... self.link = None
.... def __gt__(self, other):
.... self.link = other
....<__main__.X object at 0xb7cda74c>


More importantly, __cmp__ is only suitable for classes that implement
total ordering. If you have a data type that does not have total
ordering, for example sets, you can't implement it using __cmp__.

E.g.:
s = set([1, 2, 3, 4])
t = set([3, 4, 5, 6])
s < t False
s > t False
s == t False
cmp(s, t)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot compare sets using cmp()


Sets have partial ordering, and __cmp__ is simply not up to the job of
dealing with it.

Having two mechanisms for implementing comparisons is unnecessary. It
adds complications to the language that we are better off without. The
only advantage of the obsolete __cmp__ is that lazy programmers only need
to write one method instead of six. This is an advantage, I accept that
(hey, I'm a lazy programmer too, that's why I use Python!) but it's not a
big advantage. If you really care about it you can create a mixin class,
a decorator, or a metaclass to simplify creation of the methods. For
example, a quick and dirty decorator:

.... cls.__gt__ = lambda self, other: self.__cmp__(other) == 1
.... cls.__ge__ = lambda self, other: self.__cmp__(other) >= 0
.... cls.__eq__ = lambda self, other: self.__cmp__(other) == 0
.... cls.__ne__ = lambda self, other: self.__cmp__(other) != 0
.... cls.__le__ = lambda self, other: self.__cmp__(other) <= 0
.... cls.__lt__ = lambda self, other: self.__cmp__(other) == -1
.... return cls
........ class BiggerThanEverything(object):
.... def __cmp__(self, other):
.... return 1
....False
 
R

Roy Smith

Steven D'Aprano said:
In any case, while such a idiom works in code, it plays havoc with the
interactive interpreter:

... "pass"
...
'pass'
'pass'
'pass'
'pass'
'pass'

It's not particularly useful without the quotes either :)
 
S

Steve Holden

Steven said:
OT, but you can always define the other operators in terms of a cmp and
mix it in, restoring the original behaviour. Unfortunately it won't
restore the original performance until someone comes to their senses and
restores __cmp__

"Comes to their senses"?

There's nothing you can do with __cmp__ that you can't do better with
rich comparisons, and plenty that rich comparisons can do that __cmp__ is
utterly incapable of dealing with. __cmp__ is crippled since it can only
be used for defining classes where the operators < etc return flags. It
can't be used if you want to implement some other behaviour for the
operators. E.g. here's a silly example:
... def __init__(self):
... self.link = None
... def __gt__(self, other):
... self.link = other
...<__main__.X object at 0xb7cda74c>


More importantly, __cmp__ is only suitable for classes that implement
total ordering. If you have a data type that does not have total
ordering, for example sets, you can't implement it using __cmp__.

E.g.:
s = set([1, 2, 3, 4])
t = set([3, 4, 5, 6])
s < t False
s > t False
s == t False
cmp(s, t)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot compare sets using cmp()


Sets have partial ordering, and __cmp__ is simply not up to the job of
dealing with it.

Having two mechanisms for implementing comparisons is unnecessary. It
adds complications to the language that we are better off without. The
only advantage of the obsolete __cmp__ is that lazy programmers only need
to write one method instead of six. This is an advantage, I accept that
(hey, I'm a lazy programmer too, that's why I use Python!) but it's not a
big advantage. If you really care about it you can create a mixin class,
a decorator, or a metaclass to simplify creation of the methods. For
example, a quick and dirty decorator:

... cls.__gt__ = lambda self, other: self.__cmp__(other) == 1
... cls.__ge__ = lambda self, other: self.__cmp__(other) >= 0
... cls.__eq__ = lambda self, other: self.__cmp__(other) == 0
... cls.__ne__ = lambda self, other: self.__cmp__(other) != 0
... cls.__le__ = lambda self, other: self.__cmp__(other) <= 0
... cls.__lt__ = lambda self, other: self.__cmp__(other) == -1
... return cls
...... class BiggerThanEverything(object):
... def __cmp__(self, other):
... return 1
...False
While I am fully aware that premature optimization, etc., but I cannot
resist an appeal to efficiency if it finally kills off this idea that
"they took 'cmp()' away" is a bad thing.

Passing a cmp= argument to sort provides the interpreter with a function
that will be called each time any pair of items have to be compared. The
key= argument, however, specifies a transformation from [x0, x1, ...,
xN] to [(key(x0), x0), (key(x1), x1), ..., (key(xN), xN)] (which calls
the key function precisely once per sortable item).
From a C routine like sort() [in CPython, anyway] calling out from C to
a Python function to make a low-level decision like "is A less than B?"
turns out to be disastrous for execution efficiency (unlike the built-in
default comparison, which can be called directly from C in CPython).

If your data structures have a few hundred items in them it isn't going
to make a huge difference. If they have a few million thenit is already
starting to affect performance ;-)

regards
Steve
 
M

mdj

"Comes to their senses"?

There's nothing you can do with __cmp__ that you can't do better with
rich comparisons, and plenty that rich comparisons can do that __cmp__ is
utterly incapable of dealing with. __cmp__ is crippled since it can only
be used for defining classes where the operators < etc return flags. It
can't be used if you want to implement some other behaviour for the
operators. E.g. here's a silly example:

[example snipped]

Okay, but I think that overloading to change the semantic meaning of
operators is another issue, and one that there's a great deal of
difference of opinion on. Personally, I find arbitrarily redefining
operators with well understood semantics to mean something totally
different to be poor style, and those who disagree would find my style
poor :)
Having two mechanisms for implementing comparisons is unnecessary. It
adds complications to the language that we are better off without. The
only advantage of the obsolete __cmp__ is that lazy programmers only need
to write one method instead of six. This is an advantage, I accept that
(hey, I'm a lazy programmer too, that's why I use Python!) but it's not a
big advantage. If you really care about it you can create a mixin class,
a decorator, or a metaclass to simplify creation of the methods. For
example, a quick and dirty decorator:

I agree it's unnecessary, but deleting a time tested idiom in the name
of consistency seems excessive to me. Is there not a compromise? For
those of us who prefer limiting operator overloads to semantically
compatible cases the extra flexibility is seldom necessary, and the
price extracted in terms of inconvenience is therefore quite high. You
can (as I mentioned and you demonstrated) work around the problem, but
I'd hate to see 50 different implementations of a common idiom wasting
brain space in future projects.

Matt

(Note I remove the crosspost to comp.lang.lisp where I saw this - one
of the reasons I love lisp is that discussions like this become
totally unnecessary)
 
S

Steven D'Aprano

It's not particularly useful without the quotes either :)



Of course "while 1: pass" is useless, but it doesn't fill the terminal
with thousands of lines of "pass". The point I was making is that the
statement pass does nothing, while in the interactive interpreter a
string or other object prints to the screen.
 
T

Terry Reedy

What would annoy me if I used Python 3.x would be the apparent lack of
the __cmp__ method for conveniently defining comparisons between
instances of my own classes. Having to define all the rich comparison
methods frequently isn't even as much fun as it sounds.

You define __eq__, which automatically generates __ne__.
You define __lt__, which is all sort and heap need.
This should be about as easier than __eq__, which is really needed, and
__cmp__. If you need the other 3, either copy the recipe in the
Cookbook, or, even faster

def __ge__(s,o): return o.__lt__(s)
def __le__(s,o): return s < o or s == o
def __ge__(s,o): return s > o or s == o

All done.

Terry Jan Reedy
 
A

Alf P. Steinbach

* Steve Holden:
While I am fully aware that premature optimization, etc., but I cannot
resist an appeal to efficiency if it finally kills off this idea that
"they took 'cmp()' away" is a bad thing.

Passing a cmp= argument to sort provides the interpreter with a function
that will be called each time any pair of items have to be compared. The
key= argument, however, specifies a transformation from [x0, x1, ...,
xN] to [(key(x0), x0), (key(x1), x1), ..., (key(xN), xN)] (which calls
the key function precisely once per sortable item).
From a C routine like sort() [in CPython, anyway] calling out from C to
a Python function to make a low-level decision like "is A less than B?"
turns out to be disastrous for execution efficiency (unlike the built-in
default comparison, which can be called directly from C in CPython).

If your data structures have a few hundred items in them it isn't going
to make a huge difference. If they have a few million thenit is already
starting to affect performance ;-)

It's not either/or, it's do programmers still need the cmp functionality?

Consider that *correctness* is a bit more important than efficiency, and that
sorting strings is quite common...

Possibly you can show me the way forward towards sorting these strings (shown
below) correctly for a Norwegian locale. Possibly you can't. But one thing is
for sure, if there was a cmp functionality it would not be a problem.


<example>
Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> L = ["æ", "ø", "å"] # This is in SORTED ORDER in Norwegian
>>> L ['æ', 'ø', 'å']
>>> L.sort()
>>> L ['å', 'æ', 'ø']
>>>
>>> import locale
>>> locale.getdefaultlocale() ('nb_NO', 'cp1252')
>>> locale.setlocale( locale.LC_ALL ) # Just checking... 'C'
>>> locale.setlocale( locale.LC_ALL, "" ) # Setting default locale, Norwgian. 'Norwegian (Bokmål)_Norway.1252'
>>> locale.strxfrm( "æøå" ) 'æøå'
>>> L.sort( key = locale.strxfrm )
>>> L ['å', 'æ', 'ø']
>>> locale.strcoll( "å", "æ" ) 1
>>> locale.strcoll( "æ", "ø" ) -1
>>>
</example>


Note that strcoll correctly orders the strings as ["æ", "ø", "å"], that is, it
would have if it could have been used as cmp function to sort (or better, to a
separate routine named e.g. custom_sort).

And strcoll can be so used in 2.x:


<example>
C:\Documents and Settings\Alf\test> py2
Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information..... print "[" + ", ".join( list ) + "]"
....
>>> L = [u"æ", u"ø", u"å"]
>>> show( L ) [æ, ø, å]
>>> L.sort()
>>> show( L ) [å, æ, ø]
>>> import locale
>>> locale.setlocale( locale.LC_ALL, "" ) 'Norwegian (Bokm\xe5l)_Norway.1252'
>>> L.sort( cmp = locale.strcoll )
>>> show( L ) [æ, ø, å]
>>> L [u'\xe6', u'\xf8', u'\xe5']
>>> _
</example>


The above may just be a bug in the 3.x stxfrm. But it illustrates that sometimes
you have your sort order defined by a comparision function. Transforming that
into a key can be practically impossible (it can also be quite inefficient).


Cheers & hth.,

- Alf
 
S

Steven D'Aprano

L = ["æ", "ø", "å"] # This is in SORTED ORDER in Norwegian L
[...]
L.sort( key = locale.strxfrm )
L ['å', 'æ', 'ø']
locale.strcoll( "å", "æ" ) 1
locale.strcoll( "æ", "ø" )
-1

Note that strcoll correctly orders the strings as ["æ", "ø", "å"], that
is, it would have if it could have been used as cmp function to sort (or
better, to a separate routine named e.g. custom_sort).

This is in Python2.5, so I'm explicitly specifying unicode strings:
L = [u"æ", u"ø", u"å"]
assert sorted(L) == [u'å', u'æ', u'ø']

The default C-locale sorting of L does not equal to L. Now let's change
to Norwegian locale:
æøå

So far so good, we've confirmed that in Python 2.5 we can sort in a
locale-aware form. Now, can we do this with a key function? Thanks to
Raymond Hettinger's recipe here:

http://code.activestate.com/recipes/576653/

æøå


Success!

Let's try it in Python 3.1:
L = ["æ", "ø", "å"]
assert sorted(L) == ['å', 'æ', 'ø']

import locale
locale.setlocale(locale.LC_ALL, 'nb_NO') 'nb_NO'
''.join(sorted(L, key=CmpToKey(locale.strcoll)))
'æøå'


The definition of CmpToKey can be found at the URL above. It's not very
exciting, but here it is:


def CmpToKey(mycmp):
'Convert a cmp= function into a key= function'
class K(object):
def __init__(self, obj, *args):
self.obj = obj
def __lt__(self, other):
return mycmp(self.obj, other.obj) == -1
def __gt__(self, other):
return mycmp(self.obj, other.obj) == 1
def __eq__(self, other):
return mycmp(self.obj, other.obj) == 0
def __le__(self, other):
return mycmp(self.obj, other.obj) != 1
def __ge__(self, other):
return mycmp(self.obj, other.obj) != -1
def __ne__(self, other):
return mycmp(self.obj, other.obj) != 0
return K


If that's too verbose for you, stick this as a helper function in your
application:


def CmpToKey(mycmp):
'Convert a cmp= function into a key= function'
class K(object):
def __init__(self, obj, *args):
self.obj = obj
__lt__ = lambda s, o: mycmp(s.obj, o.obj) == -1
__gt__ = lambda s, o: mycmp(s.obj, o.obj) == 1
__eq__ = lambda s, o: mycmp(s.obj, o.obj) == 0
__le__ = lambda s, o: mycmp(s.obj, o.obj) != 1
__ge__ = lambda s, o: mycmp(s.obj, o.obj) != -1
__ne__ = lambda s, o: mycmp(s.obj, o.obj) != 0
return K


[...]
The above may just be a bug in the 3.x stxfrm. But it illustrates that
sometimes you have your sort order defined by a comparision function.
Transforming that into a key can be practically impossible (it can also
be quite inefficient).

This might be true, but you haven't demonstrated it. With one little
helper function, you should be able to convert any comparison function
into a key function, with relatively little overhead.
 
P

Paul Boddie

You define __eq__, which automatically generates __ne__.

From the Python 3.1 language reference:

"There are no implied relationships among the comparison operators.
The truth of x==y does not imply that x!=y is false. Accordingly, when
defining __eq__(), one should also define __ne__() so that the
operators will behave as expected."

Maybe Python 3.1 plugs a default method in, anyway.
You define __lt__, which is all sort and heap need.
This should be about as easier than __eq__, which is really needed, and
__cmp__. If you need the other 3, either copy the recipe in the
Cookbook, or, even faster

def __ge__(s,o): return o.__lt__(s)
def __le__(s,o): return s < o or s == o
def __ge__(s,o): return s > o or s == o

Spot the error in the above. ;-) Of course, this could be defined in a
base class and be inherited everywhere, but the case that made me
raise this issue actually involved instances of a class delegating
comparison/sorting to another object. With __cmp__ this can be done
concisely and only involve the delegation of one method - not so with
the rich comparison "protocol".

Paul
 
A

Alf P. Steinbach

* Steven D'Aprano:
L = ["æ", "ø", "å"] # This is in SORTED ORDER in Norwegian L
[...]
L.sort( key = locale.strxfrm )
L ['å', 'æ', 'ø']
locale.strcoll( "å", "æ" ) 1
locale.strcoll( "æ", "ø" )
-1

Note that strcoll correctly orders the strings as ["æ", "ø", "å"], that
is, it would have if it could have been used as cmp function to sort (or
better, to a separate routine named e.g. custom_sort).

This is in Python2.5, so I'm explicitly specifying unicode strings:
L = [u"æ", u"ø", u"å"]
assert sorted(L) == [u'å', u'æ', u'ø']

The default C-locale sorting of L does not equal to L. Now let's change
to Norwegian locale:
æøå

So far so good, we've confirmed that in Python 2.5 we can sort in a
locale-aware form. Now, can we do this with a key function? Thanks to
Raymond Hettinger's recipe here:

http://code.activestate.com/recipes/576653/

æøå


Success!

Let's try it in Python 3.1:
L = ["æ", "ø", "å"]
assert sorted(L) == ['å', 'æ', 'ø']

import locale
locale.setlocale(locale.LC_ALL, 'nb_NO')
'nb_NO'


Hm. A bit off-topic, but...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\cpython\python31\lib\locale.py", line 527, in
return _setlocale(category, locale)
locale.Error: unsupported locale setting
This on a machine where the Python default locale is Norwegian.

'æøå'


The definition of CmpToKey can be found at the URL above. It's not very
exciting, but here it is:


def CmpToKey(mycmp):
'Convert a cmp= function into a key= function'
class K(object):
def __init__(self, obj, *args):
self.obj = obj
def __lt__(self, other):
return mycmp(self.obj, other.obj) == -1
def __gt__(self, other):
return mycmp(self.obj, other.obj) == 1
def __eq__(self, other):
return mycmp(self.obj, other.obj) == 0
def __le__(self, other):
return mycmp(self.obj, other.obj) != 1
def __ge__(self, other):
return mycmp(self.obj, other.obj) != -1
def __ne__(self, other):
return mycmp(self.obj, other.obj) != 0
return K

This is pretty smart as a generic solution.

Thanks! :)

I was thinking more of sticking those comparisions in some custom string class
or such, which would be rather ugly...

The above does have quite high overhead though!

That is, it's /inefficient/ compared to using a 'cmp' function directly.

If that's too verbose for you, stick this as a helper function in your
application:


def CmpToKey(mycmp):
'Convert a cmp= function into a key= function'
class K(object):
def __init__(self, obj, *args):
self.obj = obj
__lt__ = lambda s, o: mycmp(s.obj, o.obj) == -1
__gt__ = lambda s, o: mycmp(s.obj, o.obj) == 1
__eq__ = lambda s, o: mycmp(s.obj, o.obj) == 0
__le__ = lambda s, o: mycmp(s.obj, o.obj) != 1
__ge__ = lambda s, o: mycmp(s.obj, o.obj) != -1
__ne__ = lambda s, o: mycmp(s.obj, o.obj) != 0
return K


[...]
The above may just be a bug in the 3.x stxfrm. But it illustrates that
sometimes you have your sort order defined by a comparision function.
Transforming that into a key can be practically impossible (it can also
be quite inefficient).

This might be true, but you haven't demonstrated it.

The "can be ... practically impossible" was hogwash. I posted late, sorry. The
"quite inefficient" holds.

With one little
helper function, you should be able to convert any comparison function
into a key function, with relatively little overhead.

I wouldn't call an extra Python method call per comparision "relatively little
overhead".

Note that the same number of comparisions as with a 'cmp' based sort is performed.

But with the wrapper every comparision is indirected through a Python method
call (I think, but not sure, that the creation of those comparision objects does
not introduce significant overhead, but the calls must).


<code>
# -*- coding: utf-8 -*-
from __future__ import print_function
from __future__ import unicode_literals
try:
range = xrange
except:
pass


import locale
import sys
import random
import timeit

def CmpToKey( mycmp ):
"Convert a cmp= function into a key= function"
class K( object ):
def __init__( self, obj, *args ):
self.obj = obj
def __lt__( self, other ):
return mycmp( self.obj, other.obj ) == -1
def __gt__( self, other ):
return mycmp( self.obj, other.obj ) == 1
def __eq__( self, other ):
return mycmp( self.obj, other.obj ) == 0
def __le__( self, other ):
return mycmp( self.obj, other.obj ) != 1
def __ge__( self, other ):
return mycmp( self.obj, other.obj ) != -1
def __ne__( self, other ):
return mycmp( self.obj, other.obj ) != 0
return K

def elapsed_time_for( f ):
n_calls = 1
return timeit.timeit( f, number = n_calls )

def major_pyversion():
return sys.version_info[0]

def sorting_of( string_list ):
def sort2x():
string_list.sort( cmp = locale.strcoll )
def sort3x():
string_list.sort( key = CmpToKey( locale.strcoll ) )
return sort2x if major_pyversion() <= 2 else sort3x


alphabet = "abcdefghijklmnopqrstuvwxyzæøå"
n_strings = int( 1e5 )
nbno_locale = "" # "nb_NO" raises exception on my machine.

def rnd_string():
len = random.randint( 1, 80 )
s = ""
for i in range( len ):
s = s + random.choice( alphabet )
return s

locale.setlocale( locale.LC_ALL, nbno_locale )
random.seed( 666 )
assert( rnd_string() == "æmoxqudxlszåaåcteærmuocøjrdæpbvyabwhn" )

print( "Creating array of pseudo-random strings..." )
L = [rnd_string() for i in range( n_strings )]
print( "Sorting..." )
time = elapsed_time_for( sorting_of( L ) )
print( "Last string is '{0}'".format( L[-1] ) )
print( "{0}".format( time ) )
assert( L[-1][0] == "Ã¥" )
</code>


<example>
C:\Documents and Settings\Alf\test> py3 sort.py
Creating array of pseudo-random strings...
Sorting...
Last string is
'åååywxuybrxgvnstoyvcmdåjdlnxwwfsbnzsinxncmtrxuxgtqtduzvåixqovtjxiflfesuvfa'
8.45987772189

C:\Documents and Settings\Alf\test> py2 sort.py
Creating array of pseudo-random strings...
Sorting...
Last string is
'åååywxuybrxgvnstoyvcmdåjdlnxwwfsbnzsinxncmtrxuxgtqtduzvåixqovtjxiflfesuvfa'
3.41336790011

C:\Documents and Settings\Alf\test> _
</example>


In the run above Python 3.x was only roughly 2.5 times slower. On my machine it
ranges from 2.5 times slower to 3 times slower. Most often around 2.5 x slower.

Of course this does not measure how much faster a Python 3.x cmp/"<" based sort
/could be/, only how slow, sluggish and snail-like it is compared to 2.x. :)

In other words, that's the speed price, 2.5 to 3 times slower sorting, for
ditching cmp instead of e.g. replacing it with something better.


Cheers & hth.,

- Alf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top