Multiple "cmp"s chained one after another

Volker Grabsch · May 14, 2005

Hello!

Ich just found a very nice 'pythonic' solution for an often appearing
problem. I didn't find it documented anywhere, so I'm posting it here.
If this isn't new in any way, I'd really like to get to know it.

Example problem:
I have some "datetime" objects and want to sort them, as here:

birthdays = [d1,d2,d3,d4]
birthdays.sort()

However, I don't want to sort them the default way. These are birthdays,
so only the month and day do matter, not the year. E.g.:

2003-01-01 should be smaller than 1984-05-01

So I have to write the comparison on my own, e.g.

def cmp_birthdays(d1,d2):
if d1.month > d2.month: return 1
if d1.month < d2.month: return -1
if d1.day > d2.day: return 1
if d1.day < d2.day: return -1
return 0

...
birthdays.sort(cmp_birthdays)

This implementation of cmp_birthdays is very ugly. Image you want to
chain more than 2 values in that "cmp_birthdays". I also want to use the
builtin "cmp" function, not ">" and "<".

After thinking some minutes about it, I found a very nice solution:
I have some "cmp"s one aftter another. If one if them return 1 oder -1,
it sould be returned. If it returns 0, the next "cmp" is used. In other
words: I have a sequence of numbers, and want to get the first one that
is not 0. (or return 0, if all numbers were 0)

But this is exactly what the "or" operator does, due to short-circuit
evaluation. In this example, that means:

def cmp_bithdays(d1,d2):
return cmp(d1.month,d2.month) or cmp(d1.day,d2.day)

The generic pattern is:

return cmp(...) or cmp (...) or cmp(...) or ...

I'm not sure whether this pattern is already a "common recipe", but
I found it to be a very nice idea.

Any opinions?

Greets,

Volker

vincent wehren · May 14, 2005

Newsbeitrag | Hello!
|
| Ich just found a very nice 'pythonic' solution for an often appearing
| problem. I didn't find it documented anywhere, so I'm posting it here.
| If this isn't new in any way, I'd really like to get to know it.
|
| Example problem:
| I have some "datetime" objects and want to sort them, as here:
|
| birthdays = [d1,d2,d3,d4]
| birthdays.sort()
|
| However, I don't want to sort them the default way. These are birthdays,
| so only the month and day do matter, not the year. E.g.:
|
| 2003-01-01 should be smaller than 1984-05-01
|
| So I have to write the comparison on my own, e.g.
|
| def cmp_birthdays(d1,d2):
| if d1.month > d2.month: return 1
| if d1.month < d2.month: return -1
| if d1.day > d2.day: return 1
| if d1.day < d2.day: return -1
| return 0
|
| ...
| birthdays.sort(cmp_birthdays)

If you don't care about the year, why not just "normalize" the year
to all be the same using the replace method of the date instance?
Something like:

d1 = datetime.date(2004, 12, 2)
d2 = datetime.date(2001, 12, 3)
d3 = datetime.date(2002, 12, 6)
d4 = datetime.date(1977, 12, 7)
dates =[d1,d2,d3,d4]
datesNorm = [obj.replace(year=1900) for obj in (dates)]
datesNorm.sort()
print datesNorm # etcetera

HTH,

---
Vincent

|
| This implementation of cmp_birthdays is very ugly. Image you want to
| chain more than 2 values in that "cmp_birthdays". I also want to use the
| builtin "cmp" function, not ">" and "<".
|
| After thinking some minutes about it, I found a very nice solution:
| I have some "cmp"s one aftter another. If one if them return 1 oder -1,
| it sould be returned. If it returns 0, the next "cmp" is used. In other
| words: I have a sequence of numbers, and want to get the first one that
| is not 0. (or return 0, if all numbers were 0)
|
| But this is exactly what the "or" operator does, due to short-circuit
| evaluation. In this example, that means:
|
| def cmp_bithdays(d1,d2):
| return cmp(d1.month,d2.month) or cmp(d1.day,d2.day)
|
| The generic pattern is:
|
| return cmp(...) or cmp (...) or cmp(...) or ...
|
| I'm not sure whether this pattern is already a "common recipe", but
| I found it to be a very nice idea.

|
| Any opinions?
|
|
| Greets,
|
| Volker
|
| --
| Volker Grabsch
| ---<<(())>>---
| \frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\}\right|}{\sqrt
| [G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}

Robert Kern · May 14, 2005

Volker said:
Hello!

Ich just found a very nice 'pythonic' solution for an often appearing
problem. I didn't find it documented anywhere, so I'm posting it here.
If this isn't new in any way, I'd really like to get to know it.

Example problem:
I have some "datetime" objects and want to sort them, as here:

birthdays = [d1,d2,d3,d4]
birthdays.sort()

However, I don't want to sort them the default way. These are birthdays,
so only the month and day do matter, not the year. E.g.:

2003-01-01 should be smaller than 1984-05-01 [snip]
Any opinions?

I find that using the "key" argument to sort is much nicer than "cmp"
for these tasks.

In [5]:L = [datetime.date(2005,5,2), datetime.date(1984,12,15),
datetime.date(1954,1,1)]

In [7]:L.sort(key=lambda x: (x.month, x.day))

In [8]:L
Out[8]:
[datetime.date(1954, 1, 1),
datetime.date(2005, 5, 2),
datetime.date(1984, 12, 15)]

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Peter Hansen · May 14, 2005

vincent said:
Newsbeitrag | However, I don't want to sort them the default way. These are birthdays,
| so only the month and day do matter, not the year. E.g.:
| ....
If you don't care about the year, why not just "normalize" the year
to all be the same using the replace method of the date instance?
Something like:

d1 = datetime.date(2004, 12, 2)
d2 = datetime.date(2001, 12, 3)
d3 = datetime.date(2002, 12, 6)
d4 = datetime.date(1977, 12, 7)
dates =[d1,d2,d3,d4]
datesNorm = [obj.replace(year=1900) for obj in (dates)]
datesNorm.sort()
print datesNorm # etcetera

Or just use the .timetuple() method on datetime objects and sort on the
8th element of the 9-element tuple, which is the day-of-the-year.

-Peter

Steven Bethard · May 14, 2005

Robert said:
I find that using the "key" argument to sort is much nicer than "cmp"
for these tasks.

In [5]:L = [datetime.date(2005,5,2), datetime.date(1984,12,15),
datetime.date(1954,1,1)]

In [7]:L.sort(key=lambda x: (x.month, x.day))

In [8]:L
Out[8]:
[datetime.date(1954, 1, 1),
datetime.date(2005, 5, 2),
datetime.date(1984, 12, 15)]

Yes, definitely. Also worth noting in Robert Kern's solution is that
instead of writing:

def mycmp(d1, d2):
return cmp(d1.month,d2.month) or cmp(d1.day,d2.day)

you can write:

def mycmp(d1, d2):
return cmp((d1.month, d1.day), (d2.month, d2.day))

or if you're using the key= argument (like you probably should):

def mykey(d):
return (d.month, d.day)

The point here is that rather than chaining cmp() calls with ors, you
should just use a tuple -- the standard comparison order in tuples is
exactly what you're looking for.

STeVe

Volker Grabsch · May 14, 2005

Steven said:
Robert Kern wrote:
def mykey(d):
return (d.month, d.day)

The point here is that rather than chaining cmp() calls with ors, you
should just use a tuple -- the standard comparison order in tuples is
exactly what you're looking for.

That's an excellent idea! Thanks a lot. I really didn't think of the "key="
argument.

Greets,

Volker Grabsch · May 14, 2005

vincent said:
If you don't care about the year, why not just "normalize" the year
to all be the same using the replace method of the date instance?

That's a very bad idea. In my example, this would work, but in "reality"
I don't sort datetime objects, of course! (Is there any real application
where you want to do that?)

Instead, I'm sorting "Person" objects using a "birthday" attribute.
Since I use these Person objects also in other places, they should never
be modified without just to be sorted. In general, the number side effects
should always be minimized.

datesNorm = [obj.replace(year=1900) for obj in (dates)]
datesNorm.sort()

This code would go bad especially in my situation, where my "Person"
objects are SQLObjects, thus the "normalisation" would be written into
the database. Okay, one could use transactions and rollback", but I
think, my point is clear now.

Nevertheless, I think you idea is very interesting. Is there any "real"
application where normalizing just for sorting would be reasonable?

Greets,

Volker Grabsch · May 14, 2005

Peter said:
Or just use the .timetuple() method on datetime objects and sort on the
8th element of the 9-element tuple, which is the day-of-the-year.

An interesting idea. But wouldn't sorting by (dd.month,dd.day) be more
effective?

In other words: Does .timetuple() create a tuple, or does it just return
a tuple which is present anyway?

Greets,

vincent wehren · May 14, 2005

Newsbeitrag | vincent wehren wrote:
| >
| > If you don't care about the year, why not just "normalize" the year
| > to all be the same using the replace method of the date instance?
|
| That's a very bad idea. In my example, this would work, but in "reality"
| I don't sort datetime objects, of course! (Is there any real application
| where you want to do that?)
|
| Instead, I'm sorting "Person" objects using a "birthday" attribute.
| Since I use these Person objects also in other places, they should never
| be modified without just to be sorted. In general, the number side effects
| should always be minimized.

Can you explain where you see a modification to the orginal object
happening?
(or in any of the other solutions proposed for that matter...)

Not here I hope:

|
| > datesNorm = [obj.replace(year=1900) for obj in (dates)]
| > datesNorm.sort()

If you print datesNorm, you'll see:

[datetime.date(1900, 12, 2), datetime.date(1900, 12, 3), datetime.date(1900,
12, 6), datetime.date(1900, 12, 7)]

However, dates is still the same:

[datetime.date(2004, 12, 2), datetime.date(2001, 12, 3), datetime.date(2002,
12, 6), datetime.date(1977, 12, 7)]

|
| This code would go bad especially in my situation, where my "Person"
| objects are SQLObjects, thus the "normalisation" would be written into
| the database. Okay, one could use transactions and rollback", but I
| think, my point is clear now.

Since you wouldn't need to change the attribute object to
perform a sort (on the instances) using it (or portions of it) as key, it's
not.
Or is there something fundamental I am missing about your particular use
case?

| Nevertheless, I think you idea is very interesting. Is there any "real"
| application where normalizing just for sorting would be reasonable?

How about a case-insensitive sort of strings? (uppering being the
normalization step)
Or getting rid of accented / special characters before sorting.
These sound like fairly straight-forward use cases to me

Anyway, I was doing some SQL this morning where getting rid of portions of
'datetime' fields in
a WHERE clause is sometimes just the ticket - hence the connection.

--

Vincent

|
| --
| Volker Grabsch
| ---<<(())>>---
| \frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\}\right|}{\sqrt
| [G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}

Volker Grabsch · May 14, 2005

vincent said:
| > If you don't care about the year, why not just "normalize" the year
| > to all be the same using the replace method of the date instance?
|
| That's a very bad idea. In my example, this would work, but in "reality"
| I don't sort datetime objects, of course! (Is there any real application
| where you want to do that?)
|
| Instead, I'm sorting "Person" objects using a "birthday" attribute.
| Since I use these Person objects also in other places, they should never
| be modified without just to be sorted. In general, the number side effects
| should always be minimized.

Can you explain where you see a modification to the orginal object
happening?
(or in any of the other solutions proposed for that matter...)

Sorry, my fault. I didn't read carefully enough. X-)

Not here I hope:

| > datesNorm = [obj.replace(year=1900) for obj in (dates)]
| > datesNorm.sort()

While you don't change the original objects, there's still a problem since
you're sorting the normalized values. However, I want to sort the original
list (i.e. the list of "Person" objects).

But that's not a real problem if one normalizes in a key function:

def key_birthday(d):
return d.replace(year=1900)
...
dates.sort(key=key_birthday)

...as suggested in other followups of my posting.

| Nevertheless, I think you idea is very interesting. Is there any "real"
| application where normalizing just for sorting would be reasonable?

How about a case-insensitive sort of strings? (uppering being the
normalization step)
Or getting rid of accented / special characters before sorting.
These sound like fairly straight-forward use cases to me

For your solution these are good examples. But my question was, whether
normalizing first, and just sorting the normalized values (not the original
values) is reasonable.

I.e., when I sort some strings case-insensitive, I don't want my resulting
(sorted) list to contain only lowercase string. But that's what I would
get if I used the algorithm you described above.

Greets,

Peter Hansen · May 15, 2005

Volker said:
An interesting idea. But wouldn't sorting by (dd.month,dd.day) be more
effective?

Depending on the meaning of "effective", I suppose so.

In other words: Does .timetuple() create a tuple, or does it just return
a tuple which is present anyway?

I don't know what it does. Normally one shouldn't care about that kind
of thing. In this case, you seem to be assuming that .month and .day
are values that are "present anyway", but we don't really know that
either, since they could just as well be properties. If someone who
knows or someone who checks the source (for a given implementation of
course... it could change) cares to share, we'd know more. What good
that would do us is still a mystery to me. ;-)

-Peter

Glauco Silva · May 16, 2005

You can try this :

l = [(2001, 5, 2),(2111,3,3),(1984, 3, 1), (2001, 1, 1)]
l.sort(lambda x = l[0],y = l[1] : cmp((x[1],x[2]),(y[1],y[2])))

Click to expand...

Click to expand...

----- Original Message -----
From: "Volker Grabsch" <[email protected]>
To: <[email protected]>
Sent: Saturday, May 14, 2005 7:09 AM
Subject: Multiple "cmp"s chained one after another

Hello!

Ich just found a very nice 'pythonic' solution for an often appearing
problem. I didn't find it documented anywhere, so I'm posting it here.
If this isn't new in any way, I'd really like to get to know it.

Example problem:
I have some "datetime" objects and want to sort them, as here:

birthdays = [d1,d2,d3,d4]
birthdays.sort()

However, I don't want to sort them the default way. These are birthdays,
so only the month and day do matter, not the year. E.g.:

2003-01-01 should be smaller than 1984-05-01

So I have to write the comparison on my own, e.g.

def cmp_birthdays(d1,d2):
if d1.month > d2.month: return 1
if d1.month < d2.month: return -1
if d1.day > d2.day: return 1
if d1.day < d2.day: return -1
return 0

....
birthdays.sort(cmp_birthdays)

This implementation of cmp_birthdays is very ugly. Image you want to
chain more than 2 values in that "cmp_birthdays". I also want to use the
builtin "cmp" function, not ">" and "<".

After thinking some minutes about it, I found a very nice solution:
I have some "cmp"s one aftter another. If one if them return 1 oder -1,
it sould be returned. If it returns 0, the next "cmp" is used. In other
words: I have a sequence of numbers, and want to get the first one that
is not 0. (or return 0, if all numbers were 0)

But this is exactly what the "or" operator does, due to short-circuit
evaluation. In this example, that means:

def cmp_bithdays(d1,d2):
return cmp(d1.month,d2.month) or cmp(d1.day,d2.day)

The generic pattern is:

return cmp(...) or cmp (...) or cmp(...) or ...

I'm not sure whether this pattern is already a "common recipe", but
I found it to be a very nice idea.

Any opinions?

Greets,

Volker

Tic Tac Toe Game	2	Mar 10, 2024
C# How to convert date into en-US when thread culture is ar-SA	1	Feb 18, 2021
Awsome Python - chained exceptions	41	Feb 12, 2013
file data => array(s)	6	Dec 14, 2011
Taskcproblem calendar	4	Aug 31, 2023
Surprising timeit result	2	Oct 27, 2009
Collect Excel Data from Website	5	Apr 30, 2022
Help with code plsss	0	Aug 30, 2023

Multiple "cmp"s chained one after another

Volker Grabsch

vincent wehren

Robert Kern

Peter Hansen

Steven Bethard

Volker Grabsch

Volker Grabsch

Volker Grabsch

vincent wehren

Volker Grabsch

Peter Hansen

Glauco Silva

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads