# Multiple "cmp"s chained one after another

Discussion in 'Python' started by Volker Grabsch, May 14, 2005.

1. ### Volker GrabschGuest

Hello!

Ich just found a very nice 'pythonic' solution for an often appearing
problem. I didn't find it documented anywhere, so I'm posting it here.
If this isn't new in any way, I'd really like to get to know it.

Example problem:
I have some "datetime" objects and want to sort them, as here:

birthdays = [d1,d2,d3,d4]
birthdays.sort()

However, I don't want to sort them the default way. These are birthdays,
so only the month and day do matter, not the year. E.g.:

2003-01-01 should be smaller than 1984-05-01

So I have to write the comparison on my own, e.g.

def cmp_birthdays(d1,d2):
if d1.month > d2.month: return 1
if d1.month < d2.month: return -1
if d1.day > d2.day: return 1
if d1.day < d2.day: return -1
return 0

...
birthdays.sort(cmp_birthdays)

This implementation of cmp_birthdays is very ugly. Image you want to
chain more than 2 values in that "cmp_birthdays". I also want to use the
builtin "cmp" function, not ">" and "<".

After thinking some minutes about it, I found a very nice solution:
I have some "cmp"s one aftter another. If one if them return 1 oder -1,
it sould be returned. If it returns 0, the next "cmp" is used. In other
words: I have a sequence of numbers, and want to get the first one that
is not 0. (or return 0, if all numbers were 0)

But this is exactly what the "or" operator does, due to short-circuit
evaluation. In this example, that means:

def cmp_bithdays(d1,d2):
return cmp(d1.month,d2.month) or cmp(d1.day,d2.day)

The generic pattern is:

return cmp(...) or cmp (...) or cmp(...) or ...

I'm not sure whether this pattern is already a "common recipe", but
I found it to be a very nice idea.

Any opinions?

Greets,

Volker

--
Volker Grabsch
---<<(())>>---
\frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\}\right|}{\sqrt
[G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}

Volker Grabsch, May 14, 2005

2. ### vincent wehrenGuest

"Volker Grabsch" <> schrieb im
Newsbeitrag news:d64io9$s6o$05$-online.com... | Hello! | | Ich just found a very nice 'pythonic' solution for an often appearing | problem. I didn't find it documented anywhere, so I'm posting it here. | If this isn't new in any way, I'd really like to get to know it. | | Example problem: | I have some "datetime" objects and want to sort them, as here: | | birthdays = [d1,d2,d3,d4] | birthdays.sort() | | However, I don't want to sort them the default way. These are birthdays, | so only the month and day do matter, not the year. E.g.: | | 2003-01-01 should be smaller than 1984-05-01 | | So I have to write the comparison on my own, e.g. | | def cmp_birthdays(d1,d2): | if d1.month > d2.month: return 1 | if d1.month < d2.month: return -1 | if d1.day > d2.day: return 1 | if d1.day < d2.day: return -1 | return 0 | | ... | birthdays.sort(cmp_birthdays) If you don't care about the year, why not just "normalize" the year to all be the same using the replace method of the date instance? Something like: d1 = datetime.date(2004, 12, 2) d2 = datetime.date(2001, 12, 3) d3 = datetime.date(2002, 12, 6) d4 = datetime.date(1977, 12, 7) dates =[d1,d2,d3,d4] datesNorm = [obj.replace(year=1900) for obj in (dates)] datesNorm.sort() print datesNorm # etcetera HTH, --- Vincent | | This implementation of cmp_birthdays is very ugly. Image you want to | chain more than 2 values in that "cmp_birthdays". I also want to use the | builtin "cmp" function, not ">" and "<". | | After thinking some minutes about it, I found a very nice solution: | I have some "cmp"s one aftter another. If one if them return 1 oder -1, | it sould be returned. If it returns 0, the next "cmp" is used. In other | words: I have a sequence of numbers, and want to get the first one that | is not 0. (or return 0, if all numbers were 0) | | But this is exactly what the "or" operator does, due to short-circuit | evaluation. In this example, that means: | | def cmp_bithdays(d1,d2): | return cmp(d1.month,d2.month) or cmp(d1.day,d2.day) | | The generic pattern is: | | return cmp(...) or cmp (...) or cmp(...) or ... | | I'm not sure whether this pattern is already a "common recipe", but | I found it to be a very nice idea. | | Any opinions? | | | Greets, | | Volker | | -- | Volker Grabsch | ---<<(())>>--- | \frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\}\right|}{\sqrt | [G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}} vincent wehren, May 14, 2005 1. ### Advertisements 3. ### Robert KernGuest Volker Grabsch wrote: > Hello! > > Ich just found a very nice 'pythonic' solution for an often appearing > problem. I didn't find it documented anywhere, so I'm posting it here. > If this isn't new in any way, I'd really like to get to know it. > > Example problem: > I have some "datetime" objects and want to sort them, as here: > > birthdays = [d1,d2,d3,d4] > birthdays.sort() > > However, I don't want to sort them the default way. These are birthdays, > so only the month and day do matter, not the year. E.g.: > > 2003-01-01 should be smaller than 1984-05-01 [snip] > Any opinions? I find that using the "key" argument to sort is much nicer than "cmp" for these tasks. In [5]:L = [datetime.date(2005,5,2), datetime.date(1984,12,15), datetime.date(1954,1,1)] In [7]:L.sort(key=lambda x: (x.month, x.day)) In [8]:L Out[8]: [datetime.date(1954, 1, 1), datetime.date(2005, 5, 2), datetime.date(1984, 12, 15)] -- Robert Kern "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter Robert Kern, May 14, 2005 4. ### Peter HansenGuest vincent wehren wrote: > "Volker Grabsch" <> schrieb im > Newsbeitrag news:d64io9$s6o$05$-online.com...
> | However, I don't want to sort them the default way. These are birthdays,
> | so only the month and day do matter, not the year. E.g.:
> |

....
> If you don't care about the year, why not just "normalize" the year
> to all be the same using the replace method of the date instance?
> Something like:
>
> d1 = datetime.date(2004, 12, 2)
> d2 = datetime.date(2001, 12, 3)
> d3 = datetime.date(2002, 12, 6)
> d4 = datetime.date(1977, 12, 7)
> dates =[d1,d2,d3,d4]
> datesNorm = [obj.replace(year=1900) for obj in (dates)]
> datesNorm.sort()
> print datesNorm # etcetera

Or just use the .timetuple() method on datetime objects and sort on the
8th element of the 9-element tuple, which is the day-of-the-year.

-Peter

Peter Hansen, May 14, 2005
5. ### Steven BethardGuest

Robert Kern wrote:
> I find that using the "key" argument to sort is much nicer than "cmp"
>
> In [5]:L = [datetime.date(2005,5,2), datetime.date(1984,12,15),
> datetime.date(1954,1,1)]
>
> In [7]:L.sort(key=lambda x: (x.month, x.day))
>
> In [8]:L
> Out[8]:
> [datetime.date(1954, 1, 1),
> datetime.date(2005, 5, 2),
> datetime.date(1984, 12, 15)]

Yes, definitely. Also worth noting in Robert Kern's solution is that

def mycmp(d1, d2):
return cmp(d1.month,d2.month) or cmp(d1.day,d2.day)

you can write:

def mycmp(d1, d2):
return cmp((d1.month, d1.day), (d2.month, d2.day))

or if you're using the key= argument (like you probably should):

def mykey(d):
return (d.month, d.day)

The point here is that rather than chaining cmp() calls with ors, you
should just use a tuple -- the standard comparison order in tuples is
exactly what you're looking for.

STeVe

Steven Bethard, May 14, 2005
6. ### Volker GrabschGuest

Steven Bethard wrote:
> Robert Kern wrote:
> def mykey(d):
> return (d.month, d.day)
>
> The point here is that rather than chaining cmp() calls with ors, you
> should just use a tuple -- the standard comparison order in tuples is
> exactly what you're looking for.

That's an excellent idea! Thanks a lot. I really didn't think of the "key="
argument.

Greets,

--
Volker Grabsch
---<<(())>>---
\frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\}\right|}{\sqrt
[G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}

Volker Grabsch, May 14, 2005
7. ### Volker GrabschGuest

vincent wehren wrote:
>
> If you don't care about the year, why not just "normalize" the year
> to all be the same using the replace method of the date instance?

That's a very bad idea. In my example, this would work, but in "reality"
I don't sort datetime objects, of course! (Is there any real application
where you want to do that?)

Instead, I'm sorting "Person" objects using a "birthday" attribute.
Since I use these Person objects also in other places, they should never
be modified without just to be sorted. In general, the number side effects
should always be minimized.

> datesNorm = [obj.replace(year=1900) for obj in (dates)]
> datesNorm.sort()

This code would go bad especially in my situation, where my "Person"
objects are SQLObjects, thus the "normalisation" would be written into
the database. Okay, one could use transactions and rollback", but I
think, my point is clear now.

Nevertheless, I think you idea is very interesting. Is there any "real"
application where normalizing just for sorting would be reasonable?

Greets,

--
Volker Grabsch
---<<(())>>---
\frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\}\right|}{\sqrt
[G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}

Volker Grabsch, May 14, 2005
8. ### Volker GrabschGuest

Peter Hansen wrote:
>
> Or just use the .timetuple() method on datetime objects and sort on the
> 8th element of the 9-element tuple, which is the day-of-the-year.

An interesting idea. But wouldn't sorting by (dd.month,dd.day) be more
effective?

In other words: Does .timetuple() create a tuple, or does it just return
a tuple which is present anyway?

Greets,

--
Volker Grabsch
---<<(())>>---
\frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\}\right|}{\sqrt
[G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}

Volker Grabsch, May 14, 2005
9. ### vincent wehrenGuest

"Volker Grabsch" <> schrieb im
Newsbeitrag news:d65h73$fgr$02\$-online.com...
| vincent wehren wrote:
| >
| > If you don't care about the year, why not just "normalize" the year
| > to all be the same using the replace method of the date instance?
|
| That's a very bad idea. In my example, this would work, but in "reality"
| I don't sort datetime objects, of course! (Is there any real application
| where you want to do that?)
|
| Instead, I'm sorting "Person" objects using a "birthday" attribute.
| Since I use these Person objects also in other places, they should never
| be modified without just to be sorted. In general, the number side effects
| should always be minimized.

Can you explain where you see a modification to the orginal object
happening?
(or in any of the other solutions proposed for that matter...)

Not here I hope:

|
| > datesNorm = [obj.replace(year=1900) for obj in (dates)]
| > datesNorm.sort()

If you print datesNorm, you'll see:

[datetime.date(1900, 12, 2), datetime.date(1900, 12, 3), datetime.date(1900,
12, 6), datetime.date(1900, 12, 7)]

However, dates is still the same:

[datetime.date(2004, 12, 2), datetime.date(2001, 12, 3), datetime.date(2002,
12, 6), datetime.date(1977, 12, 7)]

|
| This code would go bad especially in my situation, where my "Person"
| objects are SQLObjects, thus the "normalisation" would be written into
| the database. Okay, one could use transactions and rollback", but I
| think, my point is clear now.

Since you wouldn't need to change the attribute object to
perform a sort (on the instances) using it (or portions of it) as key, it's
not.
Or is there something fundamental I am missing about your particular use
case?

| Nevertheless, I think you idea is very interesting. Is there any "real"
| application where normalizing just for sorting would be reasonable?

How about a case-insensitive sort of strings? (uppering being the
normalization step)
Or getting rid of accented / special characters before sorting.
These sound like fairly straight-forward use cases to me

Anyway, I was doing some SQL this morning where getting rid of portions of
'datetime' fields in
a WHERE clause is sometimes just the ticket - hence the connection.

--

Vincent

|
| --
| Volker Grabsch
| ---<<(())>>---
| \frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\}\right|}{\sqrt
| [G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}

vincent wehren, May 14, 2005
10. ### Volker GrabschGuest

vincent wehren wrote:
>
>| > If you don't care about the year, why not just "normalize" the year
>| > to all be the same using the replace method of the date instance?
>|
>| That's a very bad idea. In my example, this would work, but in "reality"
>| I don't sort datetime objects, of course! (Is there any real application
>| where you want to do that?)
>|
>| Instead, I'm sorting "Person" objects using a "birthday" attribute.
>| Since I use these Person objects also in other places, they should never
>| be modified without just to be sorted. In general, the number side effects
>| should always be minimized.
>
> Can you explain where you see a modification to the orginal object
> happening?
> (or in any of the other solutions proposed for that matter...)

Sorry, my fault. I didn't read carefully enough. X-)

> Not here I hope:
>
>| > datesNorm = [obj.replace(year=1900) for obj in (dates)]
>| > datesNorm.sort()

While you don't change the original objects, there's still a problem since
you're sorting the normalized values. However, I want to sort the original
list (i.e. the list of "Person" objects).

But that's not a real problem if one normalizes in a key function:

def key_birthday(d):
return d.replace(year=1900)
...
dates.sort(key=key_birthday)

...as suggested in other followups of my posting.

>| Nevertheless, I think you idea is very interesting. Is there any "real"
>| application where normalizing just for sorting would be reasonable?
>
> How about a case-insensitive sort of strings? (uppering being the
> normalization step)
> Or getting rid of accented / special characters before sorting.
> These sound like fairly straight-forward use cases to me

For your solution these are good examples. But my question was, whether
normalizing first, and just sorting the normalized values (not the original
values) is reasonable.

I.e., when I sort some strings case-insensitive, I don't want my resulting
(sorted) list to contain only lowercase string. But that's what I would
get if I used the algorithm you described above.

Greets,

--
Volker Grabsch
---<<(())>>---
\frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\}\right|}{\sqrt
[G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}

Volker Grabsch, May 14, 2005
11. ### Peter HansenGuest

Volker Grabsch wrote:
> Peter Hansen wrote:
>
>>Or just use the .timetuple() method on datetime objects and sort on the
>>8th element of the 9-element tuple, which is the day-of-the-year.

>
> An interesting idea. But wouldn't sorting by (dd.month,dd.day) be more
> effective?

Depending on the meaning of "effective", I suppose so.

> In other words: Does .timetuple() create a tuple, or does it just return
> a tuple which is present anyway?

I don't know what it does. Normally one shouldn't care about that kind
of thing. In this case, you seem to be assuming that .month and .day
are values that are "present anyway", but we don't really know that
either, since they could just as well be properties. If someone who
knows or someone who checks the source (for a given implementation of
course... it could change) cares to share, we'd know more. What good
that would do us is still a mystery to me. ;-)

-Peter

Peter Hansen, May 15, 2005
12. ### Glauco SilvaGuest

You can try this :
>>> l = [(2001, 5, 2),(2111,3,3),(1984, 3, 1), (2001, 1, 1)]
>>> l.sort(lambda x = l[0],y = l[1] : cmp((x[1],x[2]),(y[1],y[2])))

----- Original Message -----
From: "Volker Grabsch" <>
To: <>
Sent: Saturday, May 14, 2005 7:09 AM
Subject: Multiple "cmp"s chained one after another

Hello!

Ich just found a very nice 'pythonic' solution for an often appearing
problem. I didn't find it documented anywhere, so I'm posting it here.
If this isn't new in any way, I'd really like to get to know it.

Example problem:
I have some "datetime" objects and want to sort them, as here:

birthdays = [d1,d2,d3,d4]
birthdays.sort()

However, I don't want to sort them the default way. These are birthdays,
so only the month and day do matter, not the year. E.g.:

2003-01-01 should be smaller than 1984-05-01

So I have to write the comparison on my own, e.g.

def cmp_birthdays(d1,d2):
if d1.month > d2.month: return 1
if d1.month < d2.month: return -1
if d1.day > d2.day: return 1
if d1.day < d2.day: return -1
return 0

....
birthdays.sort(cmp_birthdays)

This implementation of cmp_birthdays is very ugly. Image you want to
chain more than 2 values in that "cmp_birthdays". I also want to use the
builtin "cmp" function, not ">" and "<".

After thinking some minutes about it, I found a very nice solution:
I have some "cmp"s one aftter another. If one if them return 1 oder -1,
it sould be returned. If it returns 0, the next "cmp" is used. In other
words: I have a sequence of numbers, and want to get the first one that
is not 0. (or return 0, if all numbers were 0)

But this is exactly what the "or" operator does, due to short-circuit
evaluation. In this example, that means:

def cmp_bithdays(d1,d2):
return cmp(d1.month,d2.month) or cmp(d1.day,d2.day)

The generic pattern is:

return cmp(...) or cmp (...) or cmp(...) or ...

I'm not sure whether this pattern is already a "common recipe", but
I found it to be a very nice idea.

Any opinions?

Greets,

Volker

--
Volker Grabsch
---<<(())>>---
\frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\}\right|}{\sqrt
[G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.11.10 - Release Date: 13/5/2005

Glauco Silva, May 16, 2005