Code snippet: Natural string sorting

C

C. Barnes

Summary:

Sorts strings in a way that seems natural to humans.
If the
strings contain integers, then the integers are
ordered
numerically. For example, sorts ['Team 11', 'Team 3',
'Team 1']
into the order ['Team 1', 'Team 3', 'Team 11'].

Code:

#
---------------------------------------------------------
# natsort.py: Natural string sorting.
#
---------------------------------------------------------

# By Seo Sanghyeon. Some changes by Connelly Barnes.

def try_int(s):
"Convert to integer if possible."
try: return int(s)
except: return s

def natsort_key(s):
"Used internally to get a tuple by which s is
sorted."
import re
return map(try_int, re.findall(r'(\d+|\D+)', s))

def natcmp(a, b):
"Natural string comparison, case sensitive."
return cmp(natsort_key(a), natsort_key(b))

def natcasecmp(a, b):
"Natural string comparison, ignores case."
return natcmp(a.lower(), b.lower())

def natsort(seq, cmp=natcmp):
"In-place natural string sort."
seq.sort(cmp)

def natsorted(seq, cmp=natcmp):
"Returns a copy of seq, sorted by natural string
sort."
import copy
temp = copy.copy(seq)
natsort(temp, cmp)
return temp

Examples:

You can use this code to sort tarball filenames:
natsorted(['ver-1.3.12', 'ver-1.3.3', 'ver-1.2.5',
'ver-1.2.15', 'ver-1.2.3', 'ver-1.2.1'])
['ver-1.2.1', 'ver-1.2.3', 'ver-1.2.5', 'ver-1.2.15',
'ver-1.3.3', 'ver-1.3.12']

Chemical elements:
natsorted(['C1H2', 'C1H4', 'C2H2', 'C2H6', 'C2N',
'C3H6'])
['C1H2', 'C1H4', 'C2H2', 'C2H6', 'C2N', 'C3H6']

Teams:
natsorted(['Team 101', 'Team 58', 'Team 30', 'Team
1'])
['Team 1', 'Team 30', 'Team 58', 'Team 101']

Pass natcasecmp as a second argument for
case-insensitive sorting:
natsorted(['a5', 'A7', 'a15', 'a9', 'A8'],
natcasecmp)
['a5', 'A7', 'A8', 'a9', 'a15']

Enjoy!


Message and source code are in the public domain.




__________________________________
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/
 
M

Matteo Dell'Amico

C. Barnes said:
Summary:

Sorts strings in a way that seems natural to humans.
If the
strings contain integers, then the integers are
ordered
numerically. For example, sorts ['Team 11', 'Team 3',
'Team 1']
into the order ['Team 1', 'Team 3', 'Team 11'].
[snip]

Nice. Can be written (and, hopefully, made a bit more efficient) using
the Decorate-Sort-Undecorate pattern too:

import re

_expr = re.compile(r'(\d+|\D+)')

def try_int(s):
if s.isdigit():
return int(s)
else:
return s

def nat_key(s):
return [try_int(e) for e in _expr.findall(s)]

def nat_nocase_key(s):
return nat_key(s.lower())

def decsorted(seq, key):
decorated = [(key(item), item) for item in seq]
decorated.sort()
return [item for k, item in decorated]

Then you could use decsorted(seq, nat_key) or decsorted(seq,
nat_nocase_key).

If someone is found of one-liners, he can just write nat_key as

def nat_key(s):
return map(lambda s: s.isdigit() and int(s) or s, _expr.findall(s))

Note that 2.4's sorted() and list.sort() will have built-in support for
decorate-sort-undecorate, so you'll just have to throw nat_key or
nat_nocase_key to them, with no need for decsorted.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,756
Messages
2,569,533
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top