Custom alphabetical sort

Pander Musubi · Dec 24, 2012

Hi all,

I would like to sort according to this order:

(' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

How can I do this? The default sorted() does not give the desired result.

Thanks,

Pander

Thomas Bach · Dec 24, 2012

I would like to sort according to this order:

(' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

One option is to use sorted's key parameter with an appropriate
mapping in a dictionary:
'5aAàÀåBCçËÉíÎLÖøquùx'

Regards,
Thomas.

Pander Musubi · Dec 24, 2012

One option is to use sorted's key parameter with an appropriate

mapping in a dictionary:

'5aAàÀåBCçËÉíÎLÖøquùx'

This doesn't work for words with more than one character:
['\xc3\xb8asdf', '\xc3\xa1\xc3\xa1', 'aa', 'a123', '\xc3\xa11234', 'Aaa']

Pander Musubi · Dec 24, 2012

One option is to use sorted's key parameter with an appropriate

mapping in a dictionary:

'5aAàÀåBCçËÉíÎLÖøquùx'

This doesn't work for words with more than one character:
['\xc3\xb8asdf', '\xc3\xa1\xc3\xa1', 'aa', 'a123', '\xc3\xa11234', 'Aaa']

Ian Kelly · Dec 24, 2012

This doesn't work for words with more than one character:

Try this instead:

def collate(x):
return list(map(d.get, x))

sorted(data, key=collate)

I would also probably change "d.get" to "d.__getitem__" for a clearer error
message in the case the string contains characters that it doesn't know how
to sort.

wxjmfauth · Dec 27, 2012

Le lundi 24 décembre 2012 16:32:56 UTC+1, Pander Musubi a écrit :

Hi all,

I would like to sort according to this order:

(' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i','I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q','r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û','Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

How can I do this? The default sorted() does not give the desired result.

-----

One way is to create a list of 2-lists / 2-tuples, like

[(modified_word_1, word_1), (modified_word_2, word_2), ...]

and to use the native sorting wich will use the first element
modified_word_2 as primary key.

The task lies in the creation of the primary keys.

I did it once for French (seriously) and for German (less
seriously) scripts. (Only as an exercise for fun).

Eg.

rob = ['noduleux', 'noël', 'noèse', 'noétique', .... 'nœud', 'noir', 'noirâtre']
z = list(rob)
random.shuffle(z)
z

Click to expand...

Click to expand...

['noirâtre', 'noèse', 'noir', 'noël', 'nœud', 'noétique',
'noduleux']['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir',
'noirâtre']True

PS Py 3.3 warranty: ~30% slower than Py 3.2

jmf

Terry Reedy · Dec 27, 2012

Le lundi 24 dÃ©cembre 2012 16:32:56 UTC+1, Pander Musubi a Ã©crit :

One way is to create a list of 2-lists / 2-tuples, like

[(modified_word_1, word_1), (modified_word_2, word_2), ...]

and to use the native sorting wich will use the first element
modified_word_2 as primary key.

The task lies in the creation of the primary keys.

I did it once for French (seriously) and for German (less
seriously) scripts. (Only as an exercise for fun).

rob = ['noduleux', 'noÃ«l', 'noÃ¨se', 'noÃ©tique', ... 'nÅ“ud', 'noir', 'noirÃ¢tre']
z = list(rob)
random.shuffle(z)
z

Click to expand...

Click to expand...

['noirÃ¢tre', 'noÃ¨se', 'noir', 'noÃ«l', 'nÅ“ud', 'noÃ©tique',
'noduleux']['noduleux', 'noÃ«l', 'noÃ¨se', 'noÃ©tique', 'nÅ“ud', 'noir',
'noirÃ¢tre']

True

Click to expand...

PS Py 3.3 warranty: ~30% slower than Py 3.2

Do you have any actual timing data to back up that claim?
If so, please give specifics, including build, os, system, timing code,
and result.

Ian Kelly · Dec 27, 2012

Do you have any actual timing data to back up that claim?
If so, please give specifics, including build, os, system, timing code, and
result.

There was another thread about this one a while back. Using IDLE on Windows XP:

import timeit, locale
li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']
locale.setlocale(locale.LC_ALL, 'French_France')

Click to expand...

Click to expand...

'French_France.1252'
1.1581226105552531

1.4595282361305697

1.460 / 1.158 = 1.261

1.233450899485831

1.5793845307155152

1.579 / 1.233 = 1.281

So about 26% slower for sorting a short list of French words and about
28% slower for a longer list. Replacing the strings with ASCII and
removing the 'key' argument gives a comparable result for the long
list but more like a 40% slowdown for the short list.

wxjmfauth · Dec 28, 2012

Le vendredi 28 décembre 2012 00:17:53 UTC+1, Ian a écrit :

Do you have any actual timing data to back up that claim?

Click to expand...

If so, please give specifics, including build, os, system, timing code,and

Click to expand...

result.

Click to expand...

There was another thread about this one a while back. Using IDLE on Windows XP:

import timeit, locale
li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']
locale.setlocale(locale.LC_ALL, 'French_France')

Click to expand...

'French_France.1252'

# Python 3.2
min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale;from __main__ import li", number=100000))

Click to expand...

1.1581226105552531

# Python 3.3.0
min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale;from __main__ import li", number=100000))

Click to expand...

Click to expand...

1.4595282361305697

1.460 / 1.158 = 1.261

1.233450899485831

Click to expand...

1.5793845307155152

1.579 / 1.233 = 1.281

So about 26% slower for sorting a short list of French words and about

28% slower for a longer list. Replacing the strings with ASCII and

removing the 'key' argument gives a comparable result for the long

list but more like a 40% slowdown for the short list.

----

Not related to this thread, for information.

My sorting algorithm is doing a little bit more than a
"locale.strxfrm". locale.strxfrm works precisely fine with
the list I gave as an exemple, it fails in many cases. One
of the bottlenecks is the "œ", which must be seen as "oe".
It is not the place to discuss this kind of linguistic aspects
here.

My algorithm does not use unicodedata or unicode normalization.
Mainly a lot of chars / substrings substitution for the
creation of the primary keys.

jmf

Is top define.h	0	Jul 11, 2012
DeprecationWarning: Non-ASCII character '\xc0'	2	Feb 6, 2004
which is better for you ?kakg	0	Apr 28, 2005
M0K3 everybody welcome to tjxzs.com K9RX	0	Apr 6, 2010
I develop a Java program to format Java codes	14	Mar 2, 2012
93 rewrweeds 13	0	May 17, 2009
81 rewrweeds 18	0	May 17, 2009
why does my python's program die after change computer system time?	1	Jun 3, 2009

Custom alphabetical sort

Pander Musubi

Thomas Bach

Pander Musubi

Pander Musubi

Ian Kelly

wxjmfauth

Terry Reedy

Ian Kelly

wxjmfauth

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads