Replacing cmp with key for sorting

G

George Sakkis

I want to sort sequences of strings lexicographically but those with
longer prefix should come earlier, e.g. for s = ['a', 'bc', 'bd',
'bcb', 'ba', 'ab'], the sorted sequence is ['ab', 'a', 'ba', 'bcb',
'bc', 'bd']. Currently I do it with:

s.sort(cmp=lambda x,y: 0 if x==y else
-1 if x.startswith(y) else
+1 if y.startswith(x) else
cmp(x,y))

Can this be done with an equivalent key function instead of cmp ?

George
 
B

bearophileHUGS

I want to sort sequences of strings lexicographically but those with
longer prefix should come earlier, e.g. for s = ['a', 'bc', 'bd',
'bcb', 'ba', 'ab'], the sorted sequence is ['ab', 'a', 'ba', 'bcb',
'bc', 'bd']. Currently I do it with:

s.sort(cmp=lambda x,y: 0 if x==y else
                                    -1 if x.startswith(y) else
                                    +1 if y.startswith(x) else
                                    cmp(x,y))

Can this be done with an equivalent key function instead of cmp ?

George

Your input and output:

s = ['a', 'bc', 'bd', 'bcb', 'ba', 'ab']
r = ['ab', 'a', 'ba', 'bcb', 'bc', 'bd']

To me your lambda looks like an abuse of the inline if expression. So
I suggest to replace it with a true function, that is more readable:

def mycmp(x, y):
if x == y:
return 0
elif x.startswith(y):
return -1
elif y.startswith(x):
return +1
else:
return cmp(x, y)

print sorted(s, cmp=mycmp)

It's a peculiar cmp function, I'm thinking still in what situations it
can be useful.

To use the key argument given a cmp function I use the simple code
written by Hettinger:

def cmp2key(mycmp):
"Converts a cmp= function into a key= function"
class K:
def __init__(self, obj, *args):
self.obj = obj
def __cmp__(self, other):
return mycmp(self.obj, other.obj)
return K
print sorted(s, key=cmp2key(mycmp))

Now I'll look for simpler solutions...

Bye,
bearophile
 
A

Arnaud Delobelle

George Sakkis said:
I want to sort sequences of strings lexicographically but those with
longer prefix should come earlier, e.g. for s = ['a', 'bc', 'bd',
'bcb', 'ba', 'ab'], the sorted sequence is ['ab', 'a', 'ba', 'bcb',
'bc', 'bd']. Currently I do it with:

s.sort(cmp=lambda x,y: 0 if x==y else
-1 if x.startswith(y) else
+1 if y.startswith(x) else
cmp(x,y))

Can this be done with an equivalent key function instead of cmp ?

Here's an idea:
['ab', 'a', 'ba', 'bcb', 'bc', 'bd']

The 3 above is the length of the longest string in the list

Here's another idea, probably more practical:
['ab', 'a', 'ba', 'bcb', 'bc', 'bd']

HTH
 
B

bearophileHUGS

Alan G Isaac:
Probably not what you had in mind ...
...
     >>> def k(si): return si+'z'*(maxlen-len(si))

This looks a little better:

assert isinstance(s, str)
sorted(s, key=lambda p: p.ljust(maxlen, "\255"))

If the string is an unicode that may not work anymore.
I don't know if there are better solutions.

Bye,
bearophile
 
B

bearophileHUGS

Arnaud Delobelle:
Here's another idea, probably more practical:

Nice.
A variant that probably works with unicode strings too:

print sorted(s, key=lambda x: [-ord(l) for l in x], reverse=True)

Bye,
bearophile
 
A

Arnaud Delobelle

Arnaud Delobelle:
Here's another idea, probably more practical:

Nice.
A variant that probably works with unicode strings too:

print sorted(s, key=lambda x: [-ord(l) for l in x], reverse=True)

Of course that's better! (although mine will work with unicode if yours
does). It's funny how the obvious escapes me so often. Still I think
the idea of the 'double reverse' (one letterwise, the other listwise)
was quite good.
 
B

bearophileHUGS

Arnaud Delobelle:
It's funny how the obvious escapes me so often.

In this case it's a well known cognitive effect: the mind of humans
clings to first good/working solution, not allowing its final tuning.
For that you may need to think about something else for a short time,
and then look at your solution with a little "fresher" mind.

This (ugly) translation into D + my functional-style libs shows why
Python syntax is a good idea:

import d.all;
void main() {
auto txt = "a bc bd bcb ba ab".split();
putr( sorted(txt, (string s){ return map((char c){return -
cast(int)c;}, s);} ).reverse );
}

Long Live To Python! :)

Bye,
bearophile
 
G

George Sakkis

Arnaud Delobelle:
Here's another idea, probably more practical:

Nice.
A variant that probably works with unicode strings too:

print sorted(s, key=lambda x: [-ord(l) for l in x], reverse=True)

Bye,
bearophile

Awesome! I tested it on a sample list of ~61K words [1] and it's
almost 40% faster, from ~1.05s dropped to ~0.62s. That's still >15
times slower than the default sorting (0.04s) but I guess there's not
much more room for improvement.

George

[1] http://www.cs.pitt.edu/~kirk/cs1501/Pruhs/Spring2006/assignments/boggle/5desk.txt
 
B

bearophileHUGS

George Sakkis:
but I guess there's not much more room for improvement.

That's nonsense, Python is a high level language, so there's nearly
always room for improvement (even in programs written in assembly you
can generally find faster solutions).
If speed is what you look for, and your strings are ASCII then this is
much faster:

tab = "".join(map(chr, xrange(256)))[::-1]
s.sort(key=lambda x: x.translate(tab), reverse=True)

Bye,
bearophile
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top