Sorting: too different times. Why?

Discussion in 'Python' started by n00m, Nov 22, 2009.

  1. n00m

    n00m Guest

    Any comment:

    class Vector:
    def __init__(self, x, y):
    self.x = x
    self.y = y
    def __cmp__(self, v):
    if self.x < v.x and self.y > v.y:
    return -1
    return 0

    def v_cmp(v1, v2):
    if v1.x < v2.x and v1.y > v2.y:
    return -1
    return 0

    from random import randint
    from time import time

    a = []
    for i in range(200000):
    a += [Vector(randint(0, 500000), randint(0, 500000))]
    b = a[:]
    c = a[:]

    print 'Sorting...'

    t = time()
    b.sort(cmp=v_cmp)
    print time() - t

    t = time()
    c.sort()
    print time() - t

    print b == c



    >>> ===================================== RESTART ======
    >>>

    Sorting...
    0.906000137329
    6.57799983025
    True
    n00m, Nov 22, 2009
    #1
    1. Advertising

  2. n00m

    n00m Guest

    I was expecting the 1st method would be *slower* than the 2nd one :)
    Or at least equal... Just random ("intuitive") expectations
    n00m, Nov 22, 2009
    #2
    1. Advertising

  3. In the subject line, you write "too different times". You actually want
    "two", the number, not "too" as in "too many", "too much". Lots of native
    English speakers get this wrong too :)

    On Sun, 22 Nov 2009 01:21:42 -0800, n00m wrote:

    > Any comment:
    >
    > class Vector:
    > def __init__(self, x, y):
    > self.x = x
    > self.y = y
    > def __cmp__(self, v):
    > if self.x < v.x and self.y > v.y:
    > return -1
    > return 0



    Modern versions of Python (since 2.2 I think?) use __lt__ or __gt__ for
    sorting. If the class doesn't have a __lt__ method, Python falls back on
    __cmp__.

    > b.sort(cmp=v_cmp)


    This is relatively fast, because you pass a comparison function directly,
    so Python doesn't need to look for a __lt__ method then fall back to
    __cmp__. It just uses v_cmp, every time.


    > c.sort()


    This is slower, because every comparison looks up the __lt__ and fails,
    then tries the __cmp__.

    If you change the definition of Vector to include rich comparison methods
    as detailed here:

    http://docs.python.org/reference/datamodel.html#object.__lt__

    sorting will probably be significantly faster still.



    --
    Steven
    Steven D'Aprano, Nov 22, 2009
    #3
  4. n00m schrieb:
    > Any comment:
    >
    > class Vector:
    > def __init__(self, x, y):
    > self.x = x
    > self.y = y
    > def __cmp__(self, v):
    > if self.x < v.x and self.y > v.y:
    > return -1
    > return 0
    >
    > def v_cmp(v1, v2):
    > if v1.x < v2.x and v1.y > v2.y:
    > return -1
    > return 0
    >
    > from random import randint
    > from time import time
    >
    > a = []
    > for i in range(200000):


    Use xrange instead (unless you are under python3), because for loops you
    don't need the full list range creates - xrange is just a generator.

    > a += [Vector(randint(0, 500000), randint(0, 500000))]


    Better use .append here, looks nicer and should also be a bit faster.

    > b = a[:]
    > c = a[:]
    >
    > print 'Sorting...'
    >
    > t = time()
    > b.sort(cmp=v_cmp)
    > print time() - t
    >
    > t = time()
    > c.sort()
    > print time() - t
    >
    > print b == c
    >
    >
    >
    >>>> ===================================== RESTART ======
    >>>>

    > Sorting...
    > 0.906000137329
    > 6.57799983025


    I think the main reason is that method-dispatch is more expensive than
    function-dispatch. The former must create a bound method before calling,
    the latter just works out of the box.

    Things get better if you do this:

    t = time()
    c.sort(cmp=Vector.__cmp__)
    print time() - t


    Although not the exact same performance - I get

    Sorting...
    0.677843093872
    1.4283311367
    True

    Diez
    Diez B. Roggisch, Nov 22, 2009
    #4
  5. n00m

    Chris Rebert Guest

    On Sun, Nov 22, 2009 at 2:56 AM, n00m <> wrote:
    > I was expecting the 1st method would be *slower* than the 2nd one :)
    > Or at least equal... Just random ("intuitive") expectations


    The second method repeatedly looks up left_item.__class__.__cmp__
    (i.e. Vector.__cmp__) when doing the necessary comparisons between the
    list items; while these lookups are optimized and are fast, they are
    not free and cannot be skipped because Python doesn't know the list
    contains only Vectors.
    The first method uses the single provided comparison function and thus
    does no such lookups; hence, it's faster.

    That's my guess anyway.

    Cheers,
    Chris
    --
    http://blog.rebertia.com
    Chris Rebert, Nov 22, 2009
    #5
  6. On Nov 22, 9:21 am, n00m <> wrote:
    > Any comment:
    >
    > class Vector:
    >     def __init__(self, x, y):
    >         self.x = x
    >         self.y = y
    >     def __cmp__(self, v):
    >         if self.x < v.x and self.y > v.y:
    >             return -1
    >         return 0
    >
    > def v_cmp(v1, v2):
    >     if v1.x < v2.x and v1.y > v2.y:
    >         return -1
    >     return 0
    >
    > from random import randint
    > from time import time
    >
    > a = []
    > for i in range(200000):
    >     a += [Vector(randint(0, 500000), randint(0, 500000))]
    > b = a[:]
    > c = a[:]
    >
    > print 'Sorting...'
    >
    > t = time()
    > b.sort(cmp=v_cmp)
    > print time() - t
    >
    > t = time()
    > c.sort()
    > print time() - t
    >
    > print b == c
    >
    > >>> ===================================== RESTART ======

    >
    > Sorting...
    > 0.906000137329
    > 6.57799983025
    > True


    Do you get the same magnitude difference if you make Vector a new-
    style
    class? (I.e., use "class Vector(object)" instead of "class Vector
    ()".)

    Mark
    Mark Dickinson, Nov 22, 2009
    #6
  7. n00m

    Dave Angel Guest

    n00m wrote:
    > Any comment:
    >
    > <snip>
    > def v_cmp(v1, v2):
    > if v1.x < v2.x and v1.y > v2.y:
    > return -1
    > return 0
    >
    >
    >

    The second part of the compound if is backwards. So if this is headed
    for production code, it better get fixed.

    DaveA
    Dave Angel, Nov 22, 2009
    #7
  8. n00m

    n00m Guest

    > Do you get the same magnitude difference
    > if you make Vector a new-style class?


    Yes (I mean "No"): new-style's much faster

    And now it's elephants instead of vectors.
    Def: an elephant is smarter than another one IIF
    its size is strictly less but its IQ is strictly
    greater

    I.e. you can't compare (2, 8) to (20, 50)
    or let count them as equally smart elephants.
    ================================================

    class Elephant(object):
    def __init__(self, size, iq):
    self.size = size
    self.iq = iq
    def __cmp__(self, e):
    if self.size < e.size and self.iq > e.iq:
    return -1
    if self.size > e.size and self.iq < e.iq:
    return 1
    return 0

    def e_cmp(e1, e2):
    if e1.size < e2.size and e1.iq > e2.iq:
    return -1
    if e1.size > e2.size and e1.iq < e2.iq:
    return 1
    return 0

    from random import randint
    from time import time

    a = []
    for i in xrange(200000):
    a.append(Elephant(randint(1, 50000), randint(1, 50000)))
    b = a[:]
    c = a[:]

    print 'Sorting...'

    t = time()
    b.sort(cmp=e_cmp)
    print time() - t

    t = time()
    c.sort()
    print time() - t

    print b == c



    >>> ===================================== RESTART =====
    >>>

    Sorting...
    1.56299996376
    1.95300006866
    True
    n00m, Nov 22, 2009
    #8
  9. n00m

    n00m Guest


    > The second part of the compound if is backwards.  So if this is headed
    > for production code, it better get fixed.
    >
    > DaveA


    Not sure I'm understanding your remark.
    n00m, Nov 22, 2009
    #9
  10. n00m

    MRAB Guest

    Steven D'Aprano wrote:
    > In the subject line, you write "too different times". You actually want
    > "two", the number, not "too" as in "too many", "too much". Lots of native
    > English speakers get this wrong too :)
    >

    [snip]
    It could mean that the times are not just different, they're _too_
    different, ie a lot more than they are expected to be.
    MRAB, Nov 22, 2009
    #10
  11. n00m

    n00m Guest

    Here "meaningful order" is:
    if
    elephant "a" is smarter than elephant "a[j]"
    then "i" must be strictly less than "j"

    Of course, to the same effect we could sort them simply
    by sizes, but then time of sorting would increase by ~
    2 times -- due to decreasing of number of equally smart
    things.

    But here it does not matter -- for my initial question.
    I like all above explanations. Especially that by Chris Rebert.
    n00m, Nov 22, 2009
    #11
  12. n00m

    n00m Guest

    :) Of course, by "too" I meant "too", as in "tooooo much"
    n00m, Nov 22, 2009
    #12
  13. n00m

    MRAB Guest

    n00m wrote:
    > :) Of course, by "too" I meant "too", as in "tooooo much"


    Although it's OK in English to say "too much x" or "too many x", it's
    somewhat unnatural to say "too different xs"; it would have to be "the
    xs are too different". Nobody said English was logical! :)
    MRAB, Nov 22, 2009
    #13
  14. n00m

    n00m Guest

    > it's somewhat unnatural to say "too different xs"

    Aha. Thanks.
    PS
    For years I thought that song's title "No Woman No Cry" by Bob Marley
    means "No Woman -- No Cry". As if a man got rid of his woman and
    stopped
    crying, out of her bad behaviour etc.
    It turned out to mean "No, woman,.. no cry..."

    Or take "Drips" by Eminem. What on earth do the drips mean?

    Album: The Eminem Show
    Song: Drips

    [Eminem] Obie.. yo
    [Trice] {*coughing*} I'm sick
    [Eminem] Damn, you straight dog?

    [Chorus]
    That's why I ain't got no time
    for these games and stupid tricks
    or these bitches on my dick
    That's how dudes be gettin sick
    That's how dicks be gettin drips
    Fallin victims to this shit...
    ....
    ....
    n00m, Nov 22, 2009
    #14
  15. n00m

    Lie Ryan Guest

    n00m wrote:
    >> The second part of the compound if is backwards. So if this is headed
    >> for production code, it better get fixed.
    >>
    >> DaveA

    >
    > Not sure I'm understanding your remark.


    Maybe he meant, that this:
    if v1.x < v2.x and v1.y > v2.y
    should be:
    if v1.x < v2.x and v1.y < v2.y
    ?
    Lie Ryan, Nov 22, 2009
    #15
  16. n00m

    Dave Angel Guest

    n00m wrote:
    >> The second part of the compound if is backwards. So if this is headed
    >> for production code, it better get fixed.
    >>
    >> DaveA
    >>

    >
    > Not sure I'm understanding your remark.
    >
    >

    Well, others in the thread have observed the same thing, so maybe it
    doesn't matter. But the quoted code had only one if statement:

    >>def v_cmp(v1, v2):
    >> if v1.x < v2.x and v1.y > v2.y:
    >> return -1
    >> return 0



    And the first part of the compound if is a "<" comparison, while the
    second part is a ">" comparison

    This produces a different sort order than the default tuple comparison.
    So it needs fixing.

    DaveA
    Dave Angel, Nov 22, 2009
    #16
  17. n00m

    Mel Guest

    MRAB wrote:

    > n00m wrote:
    >> :) Of course, by "too" I meant "too", as in "tooooo much"

    >
    > Although it's OK in English to say "too much x" or "too many x", it's
    > somewhat unnatural to say "too different xs"; it would have to be "the
    > xs are too different". Nobody said English was logical! :)


    Now that James Joyce has written _Finnegans Wake_ there are no grammar
    errors and there are no spelling errors. There are only unexpected
    meanings.

    Mel.
    Mel, Nov 22, 2009
    #17
  18. On Sun, 22 Nov 2009 15:08:28 +0000, Duncan Booth wrote:

    > n00m <> wrote:
    >
    >> And now it's elephants instead of vectors. Def: an elephant is smarter
    >> than another one IIF its size is strictly less but its IQ is strictly
    >> greater
    >>
    >> I.e. you can't compare (2, 8) to (20, 50) or let count them as equally
    >> smart elephants.

    >
    > and that still isn't a relationship where you can get any meaningful
    > order out of sorting them:


    Not everything has, or need have, a total order. There are relationships
    which are only partial (i.e. not all the items are comparable), or
    missing transitivity.

    A real-world example is pecking order in chickens, or social hierarchies
    in general. Using the > operator to mean "higher ranking", you often get
    non-transitive hierarchies like the following:

    A > B, C, D, E
    B > C, E
    C > D, E
    D > B, E

    That is, A > B > C > D > E except that D > B also.

    Economic preference is also non-transitive: people may prefer X to Y, and
    prefer Y to Z, but prefer Z to X.

    It is perfectly legitimate to sort a non-total ordered list, provided you
    understand the limitations, including that the order you get will depend
    on the order you started with.


    --
    Steven
    Steven D'Aprano, Nov 22, 2009
    #18
  19. On Sun, 22 Nov 2009 09:15:57 -0800, n00m wrote:

    > Or take "Drips" by Eminem. What on earth do the drips mean?


    When you have a cold or flu, your nose drips.

    Some sexually transmitted diseases make your genitals drip.


    --
    Steven
    Steven D'Aprano, Nov 22, 2009
    #19
  20. n00m

    n00m Guest

    > Some sexually transmitted diseases make your genitals drip.

    I suspected this :) Eminem is a famous misogynist
    n00m, Nov 23, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?bWF2cmlja18xMDE=?=

    SetAuthCookie works some times and fails some times?

    =?Utf-8?B?bWF2cmlja18xMDE=?=, Mar 23, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    491
    =?Utf-8?B?bWF2cmlja18xMDE=?=
    Mar 23, 2006
  2. =?Utf-8?B?bWF2cmlja18xMDE=?=

    Forms Authentication Fails some times and not some times???

    =?Utf-8?B?bWF2cmlja18xMDE=?=, Mar 28, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    476
    =?Utf-8?B?bWF2cmlja18xMDE=?=
    Mar 28, 2006
  3. djskrill
    Replies:
    9
    Views:
    683
    djskrill
    Oct 1, 2003
  4. Mr. SweatyFinger

    why why why why why

    Mr. SweatyFinger, Nov 28, 2006, in forum: ASP .Net
    Replies:
    4
    Views:
    838
    Mark Rae
    Dec 21, 2006
  5. Mr. SweatyFinger
    Replies:
    2
    Views:
    1,668
    Smokey Grindel
    Dec 2, 2006
Loading...

Share This Page