Multiple "cmp"s chained one after another

V

Volker Grabsch

Hello!

Ich just found a very nice 'pythonic' solution for an often appearing
problem. I didn't find it documented anywhere, so I'm posting it here.
If this isn't new in any way, I'd really like to get to know it.

Example problem:
I have some "datetime" objects and want to sort them, as here:

birthdays = [d1,d2,d3,d4]
birthdays.sort()

However, I don't want to sort them the default way. These are birthdays,
so only the month and day do matter, not the year. E.g.:

2003-01-01 should be smaller than 1984-05-01

So I have to write the comparison on my own, e.g.

def cmp_birthdays(d1,d2):
if d1.month > d2.month: return 1
if d1.month < d2.month: return -1
if d1.day > d2.day: return 1
if d1.day < d2.day: return -1
return 0

...
birthdays.sort(cmp_birthdays)

This implementation of cmp_birthdays is very ugly. Image you want to
chain more than 2 values in that "cmp_birthdays". I also want to use the
builtin "cmp" function, not ">" and "<".

After thinking some minutes about it, I found a very nice solution:
I have some "cmp"s one aftter another. If one if them return 1 oder -1,
it sould be returned. If it returns 0, the next "cmp" is used. In other
words: I have a sequence of numbers, and want to get the first one that
is not 0. (or return 0, if all numbers were 0)

But this is exactly what the "or" operator does, due to short-circuit
evaluation. In this example, that means:

def cmp_bithdays(d1,d2):
return cmp(d1.month,d2.month) or cmp(d1.day,d2.day)

The generic pattern is:

return cmp(...) or cmp (...) or cmp(...) or ...

I'm not sure whether this pattern is already a "common recipe", but
I found it to be a very nice idea. :)

Any opinions?


Greets,

Volker
 
V

vincent wehren

Newsbeitrag | Hello!
|
| Ich just found a very nice 'pythonic' solution for an often appearing
| problem. I didn't find it documented anywhere, so I'm posting it here.
| If this isn't new in any way, I'd really like to get to know it.
|
| Example problem:
| I have some "datetime" objects and want to sort them, as here:
|
| birthdays = [d1,d2,d3,d4]
| birthdays.sort()
|
| However, I don't want to sort them the default way. These are birthdays,
| so only the month and day do matter, not the year. E.g.:
|
| 2003-01-01 should be smaller than 1984-05-01
|
| So I have to write the comparison on my own, e.g.
|
| def cmp_birthdays(d1,d2):
| if d1.month > d2.month: return 1
| if d1.month < d2.month: return -1
| if d1.day > d2.day: return 1
| if d1.day < d2.day: return -1
| return 0
|
| ...
| birthdays.sort(cmp_birthdays)


If you don't care about the year, why not just "normalize" the year
to all be the same using the replace method of the date instance?
Something like:

d1 = datetime.date(2004, 12, 2)
d2 = datetime.date(2001, 12, 3)
d3 = datetime.date(2002, 12, 6)
d4 = datetime.date(1977, 12, 7)
dates =[d1,d2,d3,d4]
datesNorm = [obj.replace(year=1900) for obj in (dates)]
datesNorm.sort()
print datesNorm # etcetera


HTH,

---
Vincent




|
| This implementation of cmp_birthdays is very ugly. Image you want to
| chain more than 2 values in that "cmp_birthdays". I also want to use the
| builtin "cmp" function, not ">" and "<".
|
| After thinking some minutes about it, I found a very nice solution:
| I have some "cmp"s one aftter another. If one if them return 1 oder -1,
| it sould be returned. If it returns 0, the next "cmp" is used. In other
| words: I have a sequence of numbers, and want to get the first one that
| is not 0. (or return 0, if all numbers were 0)
|
| But this is exactly what the "or" operator does, due to short-circuit
| evaluation. In this example, that means:
|
| def cmp_bithdays(d1,d2):
| return cmp(d1.month,d2.month) or cmp(d1.day,d2.day)
|
| The generic pattern is:
|
| return cmp(...) or cmp (...) or cmp(...) or ...
|
| I'm not sure whether this pattern is already a "common recipe", but
| I found it to be a very nice idea. :)
|
| Any opinions?
|
|
| Greets,
|
| Volker
|
| --
| Volker Grabsch
| ---<<(())>>---
| \frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\}\right|}{\sqrt
| [G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}
 
R

Robert Kern

Volker said:
Hello!

Ich just found a very nice 'pythonic' solution for an often appearing
problem. I didn't find it documented anywhere, so I'm posting it here.
If this isn't new in any way, I'd really like to get to know it.

Example problem:
I have some "datetime" objects and want to sort them, as here:

birthdays = [d1,d2,d3,d4]
birthdays.sort()

However, I don't want to sort them the default way. These are birthdays,
so only the month and day do matter, not the year. E.g.:

2003-01-01 should be smaller than 1984-05-01 [snip]
Any opinions?

I find that using the "key" argument to sort is much nicer than "cmp"
for these tasks.

In [5]:L = [datetime.date(2005,5,2), datetime.date(1984,12,15),
datetime.date(1954,1,1)]

In [7]:L.sort(key=lambda x: (x.month, x.day))

In [8]:L
Out[8]:
[datetime.date(1954, 1, 1),
datetime.date(2005, 5, 2),
datetime.date(1984, 12, 15)]

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
P

Peter Hansen

vincent said:
Newsbeitrag | However, I don't want to sort them the default way. These are birthdays,
| so only the month and day do matter, not the year. E.g.:
| ....
If you don't care about the year, why not just "normalize" the year
to all be the same using the replace method of the date instance?
Something like:

d1 = datetime.date(2004, 12, 2)
d2 = datetime.date(2001, 12, 3)
d3 = datetime.date(2002, 12, 6)
d4 = datetime.date(1977, 12, 7)
dates =[d1,d2,d3,d4]
datesNorm = [obj.replace(year=1900) for obj in (dates)]
datesNorm.sort()
print datesNorm # etcetera

Or just use the .timetuple() method on datetime objects and sort on the
8th element of the 9-element tuple, which is the day-of-the-year.

-Peter
 
S

Steven Bethard

Robert said:
I find that using the "key" argument to sort is much nicer than "cmp"
for these tasks.

In [5]:L = [datetime.date(2005,5,2), datetime.date(1984,12,15),
datetime.date(1954,1,1)]

In [7]:L.sort(key=lambda x: (x.month, x.day))

In [8]:L
Out[8]:
[datetime.date(1954, 1, 1),
datetime.date(2005, 5, 2),
datetime.date(1984, 12, 15)]

Yes, definitely. Also worth noting in Robert Kern's solution is that
instead of writing:

def mycmp(d1, d2):
return cmp(d1.month,d2.month) or cmp(d1.day,d2.day)

you can write:

def mycmp(d1, d2):
return cmp((d1.month, d1.day), (d2.month, d2.day))

or if you're using the key= argument (like you probably should):

def mykey(d):
return (d.month, d.day)

The point here is that rather than chaining cmp() calls with ors, you
should just use a tuple -- the standard comparison order in tuples is
exactly what you're looking for.

STeVe
 
V

Volker Grabsch

Steven said:
Robert Kern wrote:
def mykey(d):
return (d.month, d.day)

The point here is that rather than chaining cmp() calls with ors, you
should just use a tuple -- the standard comparison order in tuples is
exactly what you're looking for.

That's an excellent idea! Thanks a lot. I really didn't think of the "key="
argument.


Greets,
 
V

Volker Grabsch

vincent said:
If you don't care about the year, why not just "normalize" the year
to all be the same using the replace method of the date instance?

That's a very bad idea. In my example, this would work, but in "reality"
I don't sort datetime objects, of course! (Is there any real application
where you want to do that?)

Instead, I'm sorting "Person" objects using a "birthday" attribute.
Since I use these Person objects also in other places, they should never
be modified without just to be sorted. In general, the number side effects
should always be minimized.
datesNorm = [obj.replace(year=1900) for obj in (dates)]
datesNorm.sort()

This code would go bad especially in my situation, where my "Person"
objects are SQLObjects, thus the "normalisation" would be written into
the database. Okay, one could use transactions and rollback", but I
think, my point is clear now.

Nevertheless, I think you idea is very interesting. Is there any "real"
application where normalizing just for sorting would be reasonable?


Greets,
 
V

Volker Grabsch

Peter said:
Or just use the .timetuple() method on datetime objects and sort on the
8th element of the 9-element tuple, which is the day-of-the-year.

An interesting idea. But wouldn't sorting by (dd.month,dd.day) be more
effective?

In other words: Does .timetuple() create a tuple, or does it just return
a tuple which is present anyway?


Greets,
 
V

vincent wehren

Newsbeitrag | vincent wehren wrote:
| >
| > If you don't care about the year, why not just "normalize" the year
| > to all be the same using the replace method of the date instance?
|
| That's a very bad idea. In my example, this would work, but in "reality"
| I don't sort datetime objects, of course! (Is there any real application
| where you want to do that?)
|
| Instead, I'm sorting "Person" objects using a "birthday" attribute.
| Since I use these Person objects also in other places, they should never
| be modified without just to be sorted. In general, the number side effects
| should always be minimized.

Can you explain where you see a modification to the orginal object
happening?
(or in any of the other solutions proposed for that matter...)

Not here I hope:

|
| > datesNorm = [obj.replace(year=1900) for obj in (dates)]
| > datesNorm.sort()

If you print datesNorm, you'll see:

[datetime.date(1900, 12, 2), datetime.date(1900, 12, 3), datetime.date(1900,
12, 6), datetime.date(1900, 12, 7)]

However, dates is still the same:

[datetime.date(2004, 12, 2), datetime.date(2001, 12, 3), datetime.date(2002,
12, 6), datetime.date(1977, 12, 7)]

|
| This code would go bad especially in my situation, where my "Person"
| objects are SQLObjects, thus the "normalisation" would be written into
| the database. Okay, one could use transactions and rollback", but I
| think, my point is clear now.

Since you wouldn't need to change the attribute object to
perform a sort (on the instances) using it (or portions of it) as key, it's
not.
Or is there something fundamental I am missing about your particular use
case?

| Nevertheless, I think you idea is very interesting. Is there any "real"
| application where normalizing just for sorting would be reasonable?

How about a case-insensitive sort of strings? (uppering being the
normalization step)
Or getting rid of accented / special characters before sorting.
These sound like fairly straight-forward use cases to me ;)

Anyway, I was doing some SQL this morning where getting rid of portions of
'datetime' fields in
a WHERE clause is sometimes just the ticket - hence the connection.


--

Vincent


|
| --
| Volker Grabsch
| ---<<(())>>---
| \frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\}\right|}{\sqrt
| [G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}
 
V

Volker Grabsch

vincent said:
| > If you don't care about the year, why not just "normalize" the year
| > to all be the same using the replace method of the date instance?
|
| That's a very bad idea. In my example, this would work, but in "reality"
| I don't sort datetime objects, of course! (Is there any real application
| where you want to do that?)
|
| Instead, I'm sorting "Person" objects using a "birthday" attribute.
| Since I use these Person objects also in other places, they should never
| be modified without just to be sorted. In general, the number side effects
| should always be minimized.

Can you explain where you see a modification to the orginal object
happening?
(or in any of the other solutions proposed for that matter...)

Sorry, my fault. I didn't read carefully enough. X-)
Not here I hope:

| > datesNorm = [obj.replace(year=1900) for obj in (dates)]
| > datesNorm.sort()

While you don't change the original objects, there's still a problem since
you're sorting the normalized values. However, I want to sort the original
list (i.e. the list of "Person" objects).

But that's not a real problem if one normalizes in a key function:

def key_birthday(d):
return d.replace(year=1900)
...
dates.sort(key=key_birthday)

...as suggested in other followups of my posting.
| Nevertheless, I think you idea is very interesting. Is there any "real"
| application where normalizing just for sorting would be reasonable?

How about a case-insensitive sort of strings? (uppering being the
normalization step)
Or getting rid of accented / special characters before sorting.
These sound like fairly straight-forward use cases to me ;)

For your solution these are good examples. But my question was, whether
normalizing first, and just sorting the normalized values (not the original
values) is reasonable.

I.e., when I sort some strings case-insensitive, I don't want my resulting
(sorted) list to contain only lowercase string. But that's what I would
get if I used the algorithm you described above.


Greets,
 
P

Peter Hansen

Volker said:
An interesting idea. But wouldn't sorting by (dd.month,dd.day) be more
effective?

Depending on the meaning of "effective", I suppose so.
In other words: Does .timetuple() create a tuple, or does it just return
a tuple which is present anyway?

I don't know what it does. Normally one shouldn't care about that kind
of thing. In this case, you seem to be assuming that .month and .day
are values that are "present anyway", but we don't really know that
either, since they could just as well be properties. If someone who
knows or someone who checks the source (for a given implementation of
course... it could change) cares to share, we'd know more. What good
that would do us is still a mystery to me. ;-)

-Peter
 
G

Glauco Silva

You can try this :
l = [(2001, 5, 2),(2111,3,3),(1984, 3, 1), (2001, 1, 1)]
l.sort(lambda x = l[0],y = l[1] : cmp((x[1],x[2]),(y[1],y[2])))

----- Original Message -----
From: "Volker Grabsch" <[email protected]>
To: <[email protected]>
Sent: Saturday, May 14, 2005 7:09 AM
Subject: Multiple "cmp"s chained one after another


Hello!

Ich just found a very nice 'pythonic' solution for an often appearing
problem. I didn't find it documented anywhere, so I'm posting it here.
If this isn't new in any way, I'd really like to get to know it.

Example problem:
I have some "datetime" objects and want to sort them, as here:

birthdays = [d1,d2,d3,d4]
birthdays.sort()

However, I don't want to sort them the default way. These are birthdays,
so only the month and day do matter, not the year. E.g.:

2003-01-01 should be smaller than 1984-05-01

So I have to write the comparison on my own, e.g.

def cmp_birthdays(d1,d2):
if d1.month > d2.month: return 1
if d1.month < d2.month: return -1
if d1.day > d2.day: return 1
if d1.day < d2.day: return -1
return 0

....
birthdays.sort(cmp_birthdays)

This implementation of cmp_birthdays is very ugly. Image you want to
chain more than 2 values in that "cmp_birthdays". I also want to use the
builtin "cmp" function, not ">" and "<".

After thinking some minutes about it, I found a very nice solution:
I have some "cmp"s one aftter another. If one if them return 1 oder -1,
it sould be returned. If it returns 0, the next "cmp" is used. In other
words: I have a sequence of numbers, and want to get the first one that
is not 0. (or return 0, if all numbers were 0)

But this is exactly what the "or" operator does, due to short-circuit
evaluation. In this example, that means:

def cmp_bithdays(d1,d2):
return cmp(d1.month,d2.month) or cmp(d1.day,d2.day)

The generic pattern is:

return cmp(...) or cmp (...) or cmp(...) or ...

I'm not sure whether this pattern is already a "common recipe", but
I found it to be a very nice idea. :)

Any opinions?


Greets,

Volker
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top