Custom alphabetical sort

Discussion in 'Python' started by Pander Musubi, Dec 24, 2012.

  1. Hi all,

    I would like to sort according to this order:

    (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

    How can I do this? The default sorted() does not give the desired result.

    Thanks,

    Pander
     
    Pander Musubi, Dec 24, 2012
    #1
    1. Advertising

  2. Pander Musubi

    Thomas Bach Guest

    On Mon, Dec 24, 2012 at 07:32:56AM -0800, Pander Musubi wrote:
    > I would like to sort according to this order:
    >
    > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
    >


    One option is to use sorted's key parameter with an appropriate
    mapping in a dictionary:

    >>> cs = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')


    >>> d = { k: v for v, k in enumerate(cs) }


    >>> import random


    >>> ''.join(sorted(random.sample(cs, 20), key=d.get))

    '5aAàÀåBCçËÉíÎLÖøquùx'

    Regards,
    Thomas.
     
    Thomas Bach, Dec 24, 2012
    #2
    1. Advertising

  3. On Monday, December 24, 2012 5:11:03 PM UTC+1, Thomas Bach wrote:
    > On Mon, Dec 24, 2012 at 07:32:56AM -0800, Pander Musubi wrote:
    >
    > > I would like to sort according to this order:

    >
    > >

    >
    > > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9','a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i','I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q','r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û','Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

    >
    > >

    >
    >
    >
    > One option is to use sorted's key parameter with an appropriate
    >
    > mapping in a dictionary:
    >
    >
    >
    > >>> cs = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À','å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë','Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j','J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z','Z')

    >
    >
    >
    > >>> d = { k: v for v, k in enumerate(cs) }

    >
    >
    >
    > >>> import random

    >
    >
    >
    > >>> ''.join(sorted(random.sample(cs, 20), key=d.get))

    >
    > '5aAàÀåBCçËÉíÎLÖøquùx'


    This doesn't work for words with more than one character:

    >>> test=('øasdf', 'áá', 'aa', 'a123','á1234', 'Aaa', )
    >>> sorted(test, key=d.get)

    ['\xc3\xb8asdf', '\xc3\xa1\xc3\xa1', 'aa', 'a123', '\xc3\xa11234', 'Aaa']


    >
    >
    >
    > Regards,
    >
    > Thomas.
     
    Pander Musubi, Dec 24, 2012
    #3
  4. On Monday, December 24, 2012 5:11:03 PM UTC+1, Thomas Bach wrote:
    > On Mon, Dec 24, 2012 at 07:32:56AM -0800, Pander Musubi wrote:
    >
    > > I would like to sort according to this order:

    >
    > >

    >
    > > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9','a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i','I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q','r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û','Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

    >
    > >

    >
    >
    >
    > One option is to use sorted's key parameter with an appropriate
    >
    > mapping in a dictionary:
    >
    >
    >
    > >>> cs = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À','å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë','Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j','J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z','Z')

    >
    >
    >
    > >>> d = { k: v for v, k in enumerate(cs) }

    >
    >
    >
    > >>> import random

    >
    >
    >
    > >>> ''.join(sorted(random.sample(cs, 20), key=d.get))

    >
    > '5aAàÀåBCçËÉíÎLÖøquùx'


    This doesn't work for words with more than one character:

    >>> test=('øasdf', 'áá', 'aa', 'a123','á1234', 'Aaa', )
    >>> sorted(test, key=d.get)

    ['\xc3\xb8asdf', '\xc3\xa1\xc3\xa1', 'aa', 'a123', '\xc3\xa11234', 'Aaa']


    >
    >
    >
    > Regards,
    >
    > Thomas.
     
    Pander Musubi, Dec 24, 2012
    #4
  5. Pander Musubi

    Ian Kelly Guest

    On Dec 24, 2012 9:37 AM, "Pander Musubi" <> wrote:

    > > >>> ''.join(sorted(random.sample(cs, 20), key=d.get))

    > >
    > > '5aAàÀåBCçËÉíÎLÖøquùx'

    >
    > This doesn't work for words with more than one character:


    Try this instead:

    def collate(x):
    return list(map(d.get, x))

    sorted(data, key=collate)

    I would also probably change "d.get" to "d.__getitem__" for a clearer error
    message in the case the string contains characters that it doesn't know how
    to sort.
     
    Ian Kelly, Dec 24, 2012
    #5
  6. Pander Musubi

    Guest

    Le lundi 24 décembre 2012 16:32:56 UTC+1, Pander Musubi a écrit :
    > Hi all,
    >
    >
    >
    > I would like to sort according to this order:
    >
    >
    >
    > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i','I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q','r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û','Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
    >
    >
    >
    > How can I do this? The default sorted() does not give the desired result.
    >


    -----

    One way is to create a list of 2-lists / 2-tuples, like

    [(modified_word_1, word_1), (modified_word_2, word_2), ...]

    and to use the native sorting wich will use the first element
    modified_word_2 as primary key.

    The task lies in the creation of the primary keys.

    I did it once for French (seriously) and for German (less
    seriously) scripts. (Only as an exercise for fun).

    Eg.

    >>> rob = ['noduleux', 'noël', 'noèse', 'noétique',

    .... 'nœud', 'noir', 'noirâtre']
    >>> z = list(rob)
    >>> random.shuffle(z)
    >>> z

    ['noirâtre', 'noèse', 'noir', 'noël', 'nœud', 'noétique',
    'noduleux']
    >>> zo = libfrancais.sortfr(z)
    >>> zo

    ['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir',
    'noirâtre']
    >>> zo == rob

    True

    PS Py 3.3 warranty: ~30% slower than Py 3.2

    jmf
     
    , Dec 27, 2012
    #6
  7. Pander Musubi

    Terry Reedy Guest

    On 12/27/2012 1:17 PM, wrote:
    > Le lundi 24 décembre 2012 16:32:56 UTC+1, Pander Musubi a écrit :
    >> I would like to sort according to this order:
    >> (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', 'ä', 'Ä', 'á', 'Ã', 'â', 'Â', 'à', 'À', 'Ã¥', 'Ã…', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ã', 'í', 'Ã', 'î', 'ÃŽ', 'ì', 'ÃŒ', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ã’', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ãœ', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')


    > One way is to create a list of 2-lists / 2-tuples, like
    >
    > [(modified_word_1, word_1), (modified_word_2, word_2), ...]
    >
    > and to use the native sorting wich will use the first element
    > modified_word_2 as primary key.
    >
    > The task lies in the creation of the primary keys.
    >
    > I did it once for French (seriously) and for German (less
    > seriously) scripts. (Only as an exercise for fun).


    > >>> rob = ['noduleux', 'noël', 'noèse', 'noétique',

    > ... 'nœud', 'noir', 'noirâtre']
    > >>> z = list(rob)
    > >>> random.shuffle(z)
    > >>> z

    > ['noirâtre', 'noèse', 'noir', 'noël', 'nœud', 'noétique',
    > 'noduleux']
    > >>> zo = libfrancais.sortfr(z)
    > >>> zo

    > ['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir',
    > 'noirâtre']
    > >>> zo == rob

    > True


    > PS Py 3.3 warranty: ~30% slower than Py 3.2


    Do you have any actual timing data to back up that claim?
    If so, please give specifics, including build, os, system, timing code,
    and result.

    --
    Terry Jan Reedy
     
    Terry Reedy, Dec 27, 2012
    #7
  8. Pander Musubi

    Ian Kelly Guest

    On Thu, Dec 27, 2012 at 3:17 PM, Terry Reedy <> wrote:
    >> PS Py 3.3 warranty: ~30% slower than Py 3.2

    >
    >
    > Do you have any actual timing data to back up that claim?
    > If so, please give specifics, including build, os, system, timing code, and
    > result.


    There was another thread about this one a while back. Using IDLE on Windows XP:

    >>> import timeit, locale
    >>> li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']
    >>> locale.setlocale(locale.LC_ALL, 'French_France')

    'French_France.1252'

    >>> # Python 3.2
    >>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale; from __main__ import li", number=100000))

    1.1581226105552531

    >>> # Python 3.3.0
    >>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale; from __main__ import li", number=100000))

    1.4595282361305697

    1.460 / 1.158 = 1.261

    >>> li = li * 100
    >>> import random
    >>> random.shuffle(li)


    >>> # Python 3.2
    >>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale; from __main__ import li", number=1000))

    1.233450899485831

    >>> # Python 3.3.0
    >>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale; from __main__ import li", number=1000))

    1.5793845307155152

    1.579 / 1.233 = 1.281

    So about 26% slower for sorting a short list of French words and about
    28% slower for a longer list. Replacing the strings with ASCII and
    removing the 'key' argument gives a comparable result for the long
    list but more like a 40% slowdown for the short list.
     
    Ian Kelly, Dec 27, 2012
    #8
  9. Pander Musubi

    Guest

    Le vendredi 28 décembre 2012 00:17:53 UTC+1, Ian a écrit :
    > On Thu, Dec 27, 2012 at 3:17 PM, Terry Reedy <> wrote:
    >
    > >> PS Py 3.3 warranty: ~30% slower than Py 3.2

    >
    > >

    >
    > >

    >
    > > Do you have any actual timing data to back up that claim?

    >
    > > If so, please give specifics, including build, os, system, timing code,and

    >
    > > result.

    >
    >
    >
    > There was another thread about this one a while back. Using IDLE on Windows XP:
    >
    >
    >
    > >>> import timeit, locale

    >
    > >>> li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']

    >
    > >>> locale.setlocale(locale.LC_ALL, 'French_France')

    >
    > 'French_France.1252'
    >
    >
    >
    > >>> # Python 3.2

    >
    > >>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale;from __main__ import li", number=100000))

    >
    > 1.1581226105552531
    >
    >
    >
    > >>> # Python 3.3.0

    >
    > >>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale;from __main__ import li", number=100000))

    >
    > 1.4595282361305697
    >
    >
    >
    > 1.460 / 1.158 = 1.261
    >
    >
    >
    > >>> li = li * 100

    >
    > >>> import random

    >
    > >>> random.shuffle(li)

    >
    >
    >
    > >>> # Python 3.2

    >
    > >>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale;from __main__ import li", number=1000))

    >
    > 1.233450899485831
    >
    >
    >
    > >>> # Python 3.3.0

    >
    > >>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale;from __main__ import li", number=1000))

    >
    > 1.5793845307155152
    >
    >
    >
    > 1.579 / 1.233 = 1.281
    >
    >
    >
    > So about 26% slower for sorting a short list of French words and about
    >
    > 28% slower for a longer list. Replacing the strings with ASCII and
    >
    > removing the 'key' argument gives a comparable result for the long
    >
    > list but more like a 40% slowdown for the short list.


    ----

    Not related to this thread, for information.

    My sorting algorithm is doing a little bit more than a
    "locale.strxfrm". locale.strxfrm works precisely fine with
    the list I gave as an exemple, it fails in many cases. One
    of the bottlenecks is the "œ", which must be seen as "oe".
    It is not the place to discuss this kind of linguistic aspects
    here.

    My algorithm does not use unicodedata or unicode normalization.
    Mainly a lot of chars / substrings substitution for the
    creation of the primary keys.

    jmf
     
    , Dec 28, 2012
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?YmVub2l0?=

    ListItemCollection Sort Alphabetical

    =?Utf-8?B?YmVub2l0?=, Nov 3, 2005, in forum: ASP .Net
    Replies:
    4
    Views:
    11,336
    =?Utf-8?B?U3JlZWppdGggUmFt?=
    Nov 3, 2005
  2. David

    the Alphabetical Disorder

    David, Feb 27, 2004, in forum: Java
    Replies:
    4
    Views:
    490
    Collin VanDyck
    Feb 27, 2004
  3. Replies:
    4
    Views:
    450
    Peter Flynn
    Oct 23, 2005
  4. Replies:
    7
    Views:
    408
  5. Roy Smith

    Re: Custom alphabetical sort

    Roy Smith, Dec 24, 2012, in forum: Python
    Replies:
    10
    Views:
    291
    Joshua Landau
    Dec 27, 2012
Loading...

Share This Page