no more comparisons

Discussion in 'Python' started by Alan Isaac, Mar 12, 2008.

  1. Alan Isaac

    Alan Isaac Guest

    Alan Isaac, Mar 12, 2008
    #1
    1. Advertising

  2. Alan Isaac

    Carl Banks Guest

    On Mar 12, 4:51 pm, Alan Isaac <> wrote:
    > I was surprised to see that
    > comparison is slated for death
    > in Python 3000.
    >
    > For example:http://www.python.org/dev/peps/pep-3100/
    > list.sort() and builtin.sorted() methods: eliminate cmp parameter [27] [done]



    Hmm, wasn't aware they were taking it that far. You should almost
    always avoid using the cmp parameter because it's very inefficient;
    instead, look at the key parameter which is a function that maps
    objects in a your sequence to a sort key and sorts on that.

    So instead of (for a simple example):

    s.sort(cmp=lambda a,b: cmp(a.get_id(),b.get_id()))

    You would use:

    s.sort(key=lambda a:a.get_id())

    (However, there are rare cases where you can't easily map your items
    to a sortable builtin. I suppose in those cases you'll have to use a
    custom comparison proxy. But I digress.)


    > But there is a rumor of a PEP to restore comparisons.http://mail.python.org/pipermail/python-3000/2008-January/011764.html
    >
    > Is that going anywhere?


    No.


    > Also, what is the core motivation for removing this functionality?


    The basically replaced it with a better one. Instead of the cmp
    methods above, use the key method. Instead of __cmp__ method for
    overriding object operators, use the rich comparison methods: __lt__,
    __gt__, and so on.

    Python 2.x currently implements both cmp and rich comparisons at the
    same time, but that creates a lot of underlying complexity, so they
    got rid of it.


    Carl Banks
     
    Carl Banks, Mar 12, 2008
    #2
    1. Advertising

  3. Alan Isaac

    Paul Rubin Guest

    Carl Banks <> writes:
    > > For example:http://www.python.org/dev/peps/pep-3100/
    > > list.sort() and builtin.sorted() methods: eliminate cmp parameter [27] [done]

    >
    > Hmm, wasn't aware they were taking it that far. You should almost
    > always avoid using the cmp parameter because it's very inefficient;


    I don't see what's so inefficient about it necessarily. The key
    argument invokes a DSU scheme that allocates and releases potentially
    a lot of memory, which both uses time and increases the program's
    memory region. The cmp option should not be removed. However,
    requiring it to be specified as a keyword parameter instead of
    just passed as an unlabelled arg is fine.
     
    Paul Rubin, Mar 12, 2008
    #3
  4. Alan Isaac

    Alan Isaac Guest

    Paul Rubin wrote:

    > The cmp option should not be removed. However, requiring


    > it to be specified as a keyword parameter instead of just


    > passed as an unlabelled arg is fine.




    Sure; I would have no problem with that.

    But that is not what is happening.

    As for Carl's suggestion to use ``key``:
    this is already possible when it is convenient,
    but it is not always convenient. (Even aside
    from memory considerations.)



    By the way, I even saw mention of even removing the

    ``cmp`` built-in.



    Cheers,

    Alan Isaac
     
    Alan Isaac, Mar 12, 2008
    #4
  5. Alan Isaac

    Terry Reedy Guest

    | > Hmm, wasn't aware they were taking it that far. You should almost
    | > always avoid using the cmp parameter because it's very inefficient;
    |
    | I don't see what's so inefficient about it necessarily.

    The key function is called once per list item, for n calls total. The
    comparision function is called once per comparision. There are at least
    n-1 such calls and typically something on the order of n * lg2(n) calls.
     
    Terry Reedy, Mar 13, 2008
    #5
  6. Alan Isaac

    Dan Bishop Guest

    On Mar 12, 6:52 pm, Alan Isaac <> wrote:
    > Paul Rubin wrote:
    > > The cmp option should not be removed. However, requiring
    > > it to be specified as a keyword parameter instead of just
    > > passed as an unlabelled arg is fine.

    >
    > Sure; I would have no problem with that.
    >
    > But that is not what is happening.
    >
    > As for Carl's suggestion to use ``key``:
    > this is already possible when it is convenient,
    > but it is not always convenient. (Even aside
    > from memory considerations.)


    def cmp_key(cmp_fn):
    class CmpWrapper(object):
    def __init__(self, obj):
    self.obj = obj
    def __cmp__(self, other):
    return cmp_fn(self.obj, other.obj)
    return CmpWrapper
     
    Dan Bishop, Mar 13, 2008
    #6
  7. Alan Isaac

    Paul Rubin Guest

    "Terry Reedy" <> writes:
    > | I don't see what's so inefficient about it necessarily.
    >
    > The key function is called once per list item, for n calls total. The
    > comparision function is called once per comparision. There are at least
    > n-1 such calls and typically something on the order of n * lg2(n) calls.


    Right. Depending on the application, calling the comparison function
    lg(n) times may be faster than calling the key function once.

    Example: you want to sort a list of strings case-independently, each
    string containing a few pages worth of text. There are 10000 strings,
    each maybe 10000 chars long. Then (untested):

    x.sort(xs, key=lower)

    makes a copy of each of the 10000 strings, makes 10000 tuples, sorts,
    then releases the temporary strings and tuples. At least 100
    megabytes of memory allocated and 100 million chars copied, even
    though any give pair of strings will usually differ somewhere in the
    first few characters and there's no need to look past those.

    from string import lower
    from operator import eq
    from itertools import *

    def xcmp(a,b):
    for x,y in izip(a,b)):
    c = cmp(x.lower() y.lower())
    if c: return c
    return 0

    x.sort(xcmp)

    runs the comparison function about 14000 times, looking at only a few
    bytes on most of the calls, and using only a small constant amount of
    auxiliary storage, and is likely to be much faster than all that
    copying.

    Hmm, there is a slight problem with xcmp-- it will fail if a is a
    prefix of b. That's fixable but anyway it still makes its point as
    written.
     
    Paul Rubin, Mar 13, 2008
    #7
  8. Alan Isaac

    Terry Reedy Guest

    "Paul Rubin" <"http://phr.cx"@NOSPAM.invalid> wrote in message
    news:...
    | "Terry Reedy" <> writes:
    | > | I don't see what's so inefficient about it necessarily.
    | >
    | > The key function is called once per list item, for n calls total. The
    | > comparision function is called once per comparision. There are at
    least
    | > n-1 such calls and typically something on the order of n * lg2(n)
    calls.
    |
    | Right. Depending on the application, calling the comparison function
    | lg(n) times may be faster than calling the key function once.
    |
    | Example: you want to sort a list of strings case-independently, each
    | string containing a few pages worth of text. There are 10000 strings,
    | each maybe 10000 chars long.

    If the strings are pulled in from a file, or stored off to a file, as I
    would expect in and real app of this sort, the in-memory data to be sorted
    could/should be tuples of the lowercased strings and files positions. Then
    no key param is needed.

    | Then (untested):
    |
    | x.sort(xs, key=lower)
    |
    | makes a copy of each of the 10000 strings, makes 10000 tuples, sorts,
    | then releases the temporary strings and tuples. At least 100
    | megabytes of memory allocated and 100 million chars copied, even
    | though any give pair of strings will usually differ somewhere in the
    | first few characters and there's no need to look past those.
    |
    | from string import lower
    | from operator import eq
    | from itertools import *
    |
    | def xcmp(a,b):
    | for x,y in izip(a,b)):
    | c = cmp(x.lower() y.lower())
    | if c: return c
    | return 0
    |
    | x.sort(xcmp)
    |
    | runs the comparison function about 14000 times,

    14 * 10000 = 140000, not 14000. For each of these, you have a call to izip
    and multiple calls to .next, cmp, and .lower.

    That said, this use case of where the potentially needed key is long
    whereas the actually needed key is much shorter is a better use case than
    anything I have seen so far.

    Still, as I figure it, the initial part of each text is lowercased around
    28 times, on average. So there can only be a speed saving if the text is
    more than 28 times the effective actual key.

    To actually do such a thing, at least more than once, I would be tempted to
    have the key function take just the first k chars, for k perhaps 50. If
    the texts average, say, 1500 chars, then the key array would be 1/30 *
    perhaps 2 (for object overhead) = 1/15 of the original array space. Then
    run over the 'sorted' array to see if there are any cases where the first
    50 chars match and if so, check more to see if a swap needs to be made.
    Easiest would be to do a simple bubble or insert sort with the full
    compare.

    tjr
     
    Terry Reedy, Mar 13, 2008
    #8
  9. Alan Isaac

    Duncan Booth Guest

    Paul Rubin <http://> wrote:

    > "Terry Reedy" <> writes:
    >> | I don't see what's so inefficient about it necessarily.
    >>
    >> The key function is called once per list item, for n calls total.
    >> The comparision function is called once per comparision. There are
    >> at least n-1 such calls and typically something on the order of n *
    >> lg2(n) calls.

    >
    > Right. Depending on the application, calling the comparison function
    > lg(n) times may be faster than calling the key function once.
    >
    > Example: you want to sort a list of strings case-independently, each
    > string containing a few pages worth of text. There are 10000 strings,
    > each maybe 10000 chars long. Then (untested):
    >
    > x.sort(xs, key=lower)
    >
    > makes a copy of each of the 10000 strings, makes 10000 tuples, sorts,
    > then releases the temporary strings and tuples. At least 100
    > megabytes of memory allocated and 100 million chars copied, even
    > though any give pair of strings will usually differ somewhere in the
    > first few characters and there's no need to look past those.


    Not tuples actually, there's a special sortwrapper type which is used.
    This is slightly better as it stops the comparison leaking through to
    the original values without having to store an index value.

    As Dan has pointed out, for those rare cases where it actually is better
    to use cmp than key you can easily create a wrapper. The only change is
    that you now have to work at using cmp instead of it being the first
    thing you think of. So here you could use his wrapper and your
    comparison function together:

    x.sort(key=cmp_key(xcmp))

    I'd agree though that the comparison wrapper should be reasonably well
    publicised: I'd remembered that there was an easy way to implement cmp
    in terms of key but not what it was.
     
    Duncan Booth, Mar 13, 2008
    #9
  10. Alan Isaac

    Alan Isaac Guest

    Dan Bishop wrote:

    > def cmp_key(cmp_fn):


    > class CmpWrapper(object):


    > def __init__(self, obj):


    > self.obj = obj


    > def __cmp__(self, other):


    > return cmp_fn(self.obj, other.obj)


    > return CmpWrapper




    Apparently I'm overlooking something obvious ...

    how is this supposed to work if __cmp__ is no longer

    being called? (Which was my understanding.)



    Thank you,

    Alan Isaac
     
    Alan Isaac, Mar 13, 2008
    #10
  11. Alan Isaac

    Carl Banks Guest

    On Mar 13, 12:38 pm, Alan Isaac <> wrote:
    > Dan Bishop wrote:
    > > def cmp_key(cmp_fn):
    > > class CmpWrapper(object):
    > > def __init__(self, obj):
    > > self.obj = obj
    > > def __cmp__(self, other):
    > > return cmp_fn(self.obj, other.obj)
    > > return CmpWrapper

    >
    > Apparently I'm overlooking something obvious ...
    >
    > how is this supposed to work if __cmp__ is no longer
    >
    > being called? (Which was my understanding.)



    It won't. In Python 3.0 you'd have to write this class in terms of
    rich comparisons (__lt__, __gt__, etc.).


    Carl Banks
     
    Carl Banks, Mar 13, 2008
    #11
  12. Alan Isaac

    Alan Isaac Guest

    >> Dan Bishop wrote:

    >>> def cmp_key(cmp_fn):


    >>> class CmpWrapper(object):


    >>> def __init__(self, obj):


    >>> self.obj = obj


    >>> def __cmp__(self, other):


    >>> return cmp_fn(self.obj, other.obj)


    >>> return CmpWrapper








    > On Mar 13, 12:38 pm, Alan Isaac wrote:


    >> how is this supposed to work if __cmp__ is no longer


    >> being called? (Which was my understanding.)






    Carl Banks wrote:

    > It won't. In Python 3.0 you'd have to write this class in terms of


    > rich comparisons (__lt__, __gt__, etc.).






    Exactly. So something simple (define an anonymous function)

    has become a bit of a pain.



    On the other hand, I've looked through my extant code and

    have not found a use of ``cmp`` that I cannot work around.

    So maybe this is not as bad as I feared. What are some use

    cases that will clearly be harder (i.e., at least require

    a slightly elaborate wrapper) after this change?



    Cheers,

    Alan Isaac
     
    Alan Isaac, Mar 13, 2008
    #12
  13. On Mar 13, 3:48 pm, Alan Isaac <> wrote:
    > So maybe this is not as bad as I feared.  What are some use
    >
    > cases that will clearly be harder (i.e., at least require
    >
    > a slightly elaborate wrapper) after this change?


    Sorting tuples, where the second item in the tuple should
    have the opposite ordering to the first is going to be a bit
    of a pain. Or even worse, where the ordering of the second
    item depends on the value of the first item in the tuple.

    For example, suppose that (for whatever contrived reason)
    you're representing integers in (sign, magnitude) format
    by tuples (s, i), where s = 0 or 1 (0 for positive, 1 for
    negative) and i is a string representing the absolute
    value of the integer. So

    (0, '3') represents 3
    (1, '14') represents 14

    and you want to sort according to numeric value. One
    can certainly come up with a key that works, but it's
    not quite as straightforward as writing a custom __cmp__.

    This isn't a totally contrived example: the internal
    Decimal format isn't so different from this.

    Mark
     
    Mark Dickinson, Mar 13, 2008
    #13
  14. On Mar 13, 5:10 pm, Mark Dickinson <> wrote:

    > (1, '14') represents 14


    should be -14, of course.
     
    Mark Dickinson, Mar 13, 2008
    #14
  15. Alan Isaac

    Alan Isaac Guest

    Mark Dickinson wrote:

    > Sorting tuples, where the second item in the tuple should


    > have the opposite ordering to the first is going to be


    > a bit of a pain. Or even worse, where the ordering of the


    > second item depends on the value of the first item in the


    > tuple.




    This is like some examples where I had used cmp,

    but if I understand correctly I think it is not a problem.



    > For example, suppose that (for whatever contrived reason)


    > you're representing integers in (sign, magnitude) format


    > by tuples (s, i), where s = 0 or 1 (0 for positive, 1 for


    > negative) and i is a string representing the absolute


    > value of the integer. So




    Does this do it? ::



    key= lambda x: (-x[1],int(x2))



    Here I am depending on the lexicographic sorting of tuples.
    Without that there would be real trouble.



    Cheers,

    Alan Isaac
     
    Alan Isaac, Mar 14, 2008
    #15
  16. Alan Isaac

    Dan Bishop Guest

    On Mar 13, 7:38 pm, Alan Isaac <> wrote:
    > Mark Dickinson wrote:
    > > Sorting tuples, where the second item in the tuple should
    > > have the opposite ordering to the first is going to be
    > > a bit of a pain. Or even worse, where the ordering of the
    > > second item depends on the value of the first item in the
    > > tuple.

    >
    > This is like some examples where I had used cmp,
    >
    > but if I understand correctly I think it is not a problem.
    >
    > > For example, suppose that (for whatever contrived reason)
    > > you're representing integers in (sign, magnitude) format
    > > by tuples (s, i), where s = 0 or 1 (0 for positive, 1 for
    > > negative) and i is a string representing the absolute
    > > value of the integer. So

    >
    > Does this do it? ::
    >
    > key= lambda x: (-x[1],int(x2))
    >
    > Here I am depending on the lexicographic sorting of tuples.
    > Without that there would be real trouble.


    Even assuming you meant (-x[0], int(x[1])), that sorts negative
    numbers in the wrong order. You want

    key = lambda x: (-1 if x[0] else 1) * int(x[1])
     
    Dan Bishop, Mar 14, 2008
    #16
  17. On Mar 13, 8:38 pm, Alan Isaac <> wrote:
    > Does this do it? ::
    >
    >     key= lambda x: (-x[1],int(x2))
    >
    > Here I am depending on the lexicographic sorting of tuples.
    > Without that there would be real trouble.


    Close. :)

    I think it needs to be something like:

    key = lambda x: (-x[0], int(x[1]) if x[0] == 0 else -int(x[1]))

    Relying on lexicographic sorting of tuples seems okay to me.

    What bugs me more about this example is that one has to rely
    on the existence of a way to negate the usual ordering. This
    is fine for numbers (where you can use the unary - operator)
    or for numeric strings (where you can convert to int and then
    use -), but it's not too much of a stretch to imagine cases
    where neither of those options is available (e.g. non-numeric
    strings), and where one ends up moving further and further
    from readable code and into the realm of arcane trickery...

    Mark
     
    Mark Dickinson, Mar 14, 2008
    #17
  18. Alan Isaac

    Alan Isaac Guest

    Dan Bishop wrote:

    > Even assuming you meant (-x[0], int(x[1])), that sorts negative


    > numbers in the wrong order. You want


    > key = lambda x: (-1 if x[0] else 1) * int(x[1])






    Sorry, was speed typing (very badly) and the question

    actually had two different problem descriptions.

    As you suggest, what I meant was::



    key= lambda x: (-x[0], int(x[1]))



    which was meant to address:



    "Sorting tuples, where the second item in the tuple

    should have the opposite ordering to the first is going

    to be a bit of a pain."



    But you are quite right, the problem was then specified to

    numerical order, for which I like your solution. Or even::



    key= lambda x: -int(x[1]) if x[0] else int(x[1])





    The key point being, if you will, that this use case does

    not support the idea that relying on ``key`` will be so bad.

    So, what is a case that is really uncomfortable?



    Thanks,

    Alan Isaac
     
    Alan Isaac, Mar 14, 2008
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chad Z. Hower aka Kudzu

    ASP.net comparisons (ANN?)

    Chad Z. Hower aka Kudzu, Feb 16, 2004, in forum: ASP .Net
    Replies:
    6
    Views:
    368
    Chad Z. Hower aka Kudzu
    Feb 17, 2004
  2. VB Programmer

    Question: Time comparisons

    VB Programmer, Jul 14, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    315
    Lucas Tam
    Jul 14, 2004
  3. Michael
    Replies:
    4
    Views:
    418
    Matt Hammond
    Jun 26, 2006
  4. Ioannis Vranos

    Proposal: More flexible comparisons

    Ioannis Vranos, Jun 25, 2009, in forum: C++
    Replies:
    22
    Views:
    701
    Juha Nieminen
    Jun 29, 2009
  5. Robert Klemme

    With a Ruby Yell: more, more more!

    Robert Klemme, Sep 28, 2005, in forum: Ruby
    Replies:
    5
    Views:
    218
    Jeff Wood
    Sep 29, 2005
Loading...

Share This Page