Custom alphabetical sort

P

Pander Musubi

Hi all,

I would like to sort according to this order:

(' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

How can I do this? The default sorted() does not give the desired result.

Thanks,

Pander
 
T

Thomas Bach

I would like to sort according to this order:

(' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

One option is to use sorted's key parameter with an appropriate
mapping in a dictionary:
'5aAàÀåBCçËÉíÎLÖøquùx'

Regards,
Thomas.
 
P

Pander Musubi

One option is to use sorted's key parameter with an appropriate

mapping in a dictionary:




'5aAàÀåBCçËÉíÎLÖøquùx'

This doesn't work for words with more than one character:
['\xc3\xb8asdf', '\xc3\xa1\xc3\xa1', 'aa', 'a123', '\xc3\xa11234', 'Aaa']
 
P

Pander Musubi

One option is to use sorted's key parameter with an appropriate

mapping in a dictionary:




'5aAàÀåBCçËÉíÎLÖøquùx'

This doesn't work for words with more than one character:
['\xc3\xb8asdf', '\xc3\xa1\xc3\xa1', 'aa', 'a123', '\xc3\xa11234', 'Aaa']
 
I

Ian Kelly

This doesn't work for words with more than one character:

Try this instead:

def collate(x):
return list(map(d.get, x))

sorted(data, key=collate)

I would also probably change "d.get" to "d.__getitem__" for a clearer error
message in the case the string contains characters that it doesn't know how
to sort.
 
W

wxjmfauth

Le lundi 24 décembre 2012 16:32:56 UTC+1, Pander Musubi a écrit :
Hi all,



I would like to sort according to this order:



(' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i','I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q','r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û','Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')



How can I do this? The default sorted() does not give the desired result.

-----

One way is to create a list of 2-lists / 2-tuples, like

[(modified_word_1, word_1), (modified_word_2, word_2), ...]

and to use the native sorting wich will use the first element
modified_word_2 as primary key.

The task lies in the creation of the primary keys.

I did it once for French (seriously) and for German (less
seriously) scripts. (Only as an exercise for fun).

Eg.
rob = ['noduleux', 'noël', 'noèse', 'noétique', .... 'nœud', 'noir', 'noirâtre']
z = list(rob)
random.shuffle(z)
z
['noirâtre', 'noèse', 'noir', 'noël', 'nœud', 'noétique',
'noduleux']['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir',
'noirâtre']True

PS Py 3.3 warranty: ~30% slower than Py 3.2

jmf
 
T

Terry Reedy

Le lundi 24 décembre 2012 16:32:56 UTC+1, Pander Musubi a écrit :
One way is to create a list of 2-lists / 2-tuples, like

[(modified_word_1, word_1), (modified_word_2, word_2), ...]

and to use the native sorting wich will use the first element
modified_word_2 as primary key.

The task lies in the creation of the primary keys.

I did it once for French (seriously) and for German (less
seriously) scripts. (Only as an exercise for fun).
rob = ['noduleux', 'noël', 'noèse', 'noétique', ... 'nœud', 'noir', 'noirâtre']
z = list(rob)
random.shuffle(z)
z
['noirâtre', 'noèse', 'noir', 'noël', 'nœud', 'noétique',
'noduleux']['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir',
'noirâtre']

PS Py 3.3 warranty: ~30% slower than Py 3.2

Do you have any actual timing data to back up that claim?
If so, please give specifics, including build, os, system, timing code,
and result.
 
I

Ian Kelly

Do you have any actual timing data to back up that claim?
If so, please give specifics, including build, os, system, timing code, and
result.

There was another thread about this one a while back. Using IDLE on Windows XP:
import timeit, locale
li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']
locale.setlocale(locale.LC_ALL, 'French_France')
'French_France.1252'
1.1581226105552531
1.4595282361305697

1.460 / 1.158 = 1.261
1.233450899485831
1.5793845307155152

1.579 / 1.233 = 1.281

So about 26% slower for sorting a short list of French words and about
28% slower for a longer list. Replacing the strings with ASCII and
removing the 'key' argument gives a comparable result for the long
list but more like a 40% slowdown for the short list.
 
W

wxjmfauth

Le vendredi 28 décembre 2012 00:17:53 UTC+1, Ian a écrit :
Do you have any actual timing data to back up that claim?
If so, please give specifics, including build, os, system, timing code,and



There was another thread about this one a while back. Using IDLE on Windows XP:


import timeit, locale
li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']
locale.setlocale(locale.LC_ALL, 'French_France')
'French_France.1252'


# Python 3.2
min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale;from __main__ import li", number=100000))
1.1581226105552531


# Python 3.3.0
min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale;from __main__ import li", number=100000))

1.4595282361305697



1.460 / 1.158 = 1.261


1.233450899485831

1.5793845307155152



1.579 / 1.233 = 1.281



So about 26% slower for sorting a short list of French words and about

28% slower for a longer list. Replacing the strings with ASCII and

removing the 'key' argument gives a comparable result for the long

list but more like a 40% slowdown for the short list.

----

Not related to this thread, for information.

My sorting algorithm is doing a little bit more than a
"locale.strxfrm". locale.strxfrm works precisely fine with
the list I gave as an exemple, it fails in many cases. One
of the bottlenecks is the "œ", which must be seen as "oe".
It is not the place to discuss this kind of linguistic aspects
here.

My algorithm does not use unicodedata or unicode normalization.
Mainly a lot of chars / substrings substitution for the
creation of the primary keys.

jmf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top