true alphabetic sort...

Discussion in 'Javascript' started by Ian Richardson, Apr 24, 2004.

  1. At the moment I'm using a quicksort algorithm to sort a list of
    countries in alphabetic order. This worked wonderfully until someone
    came up with the Åland Islands... and this is at the end of the list.

    I'm not sure it's supposed to be.

    Now I could just alter my comparison so it ignores the top bit, but this
    would then put it at the top of the list, even before Albania...
    Alternatively, should I put Å after A?

    In short, is there a preferred way of ordering these?

    Thanks,

    Ian
     
    Ian Richardson, Apr 24, 2004
    #1
    1. Advertising

  2. Ian Richardson <> skrev :

    >At the moment I'm using a quicksort algorithm to sort a list of
    >countries in alphabetic order. This worked wonderfully until someone
    >came up with the Åland Islands... and this is at the end of the list.


    Yes, and it's correct.

    In swedish, danish and norwegian is "Å" the last letter in the
    alphabet.
    --
    Knud
     
    Knud Gert Ellentoft, Apr 24, 2004
    #2
    1. Advertising

  3. Ian Richardson

    Evertjan. Guest

    Knud Gert Ellentoft wrote on 24 apr 2004 in comp.lang.javascript:
    > In swedish, danish and norwegian is "Å" the last letter in the
    > alphabet.


    Just curious:

    This will write "å" overhere:

    document.write('Å'.toLowercase)

    Does this work for all European alphabets?

    =============================

    When should I use:

    document.write('Å'.toLocaleLowerCase())

    ?

    --
    Evertjan.
    The Netherlands.
    (Please change the x'es to dots in my emailaddress)
     
    Evertjan., Apr 24, 2004
    #3
  4. "Evertjan." <> writes:

    > Just curious:
    >
    > This will write "å" overhere:
    >
    > document.write('Å'.toLowercase)
    >
    > Does this work for all European alphabets?


    It works for any Unicode letter, using the Unicode character database
    for the translation.

    > =============================
    >
    > When should I use:
    >
    > document.write('Å'.toLocaleLowerCase())


    Never, for the letter "Å".
    In ECMA 262, secion 15.5.4.17, the reason given for using
    toLocaleLowerCase, is for languages where the language rules conflict
    with the regular Unicode mapping. Tukish is given as an example.

    /L
    --
    Lasse Reichstein Nielsen -
    DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
    'Faith without judgement merely degrades the spirit divine.'
     
    Lasse Reichstein Nielsen, Apr 25, 2004
    #4
  5. Ian Richardson

    Ivo Guest

    "Knud Gert Ellentoft" wrote
    > Ian Richardson skrev :
    >
    > >At the moment I'm using a quicksort algorithm to sort a list of
    > >countries in alphabetic order. This worked wonderfully until someone
    > >came up with the Åland Islands... and this is at the end of the list.

    >
    > Yes, and it's correct.
    > In swedish, danish and norwegian is "Å" the last letter in the
    > alphabet.


    This is interesting. It may be that the Å follows Z in those languages, but
    this is new for me and probably the rest of the world. In a long
    alphabetical list, I and the OP would look for Å after A, and so I think in
    a web-environment it probably should be put there. Where do the French put
    the character ç in the French alphabet? Where do the Germans put the ß? I
    would look for it after the B.

    As for a javascript solution, the easiest would probably be replacing all
    occurances of ÀÁÂÃÄÅ and perhaps Æ with an A prior to sorting the list. This
    would result in a mix of accented and normal A's which is not perfect. Åland
    must come after Aruba but before Bermuda. We must write our own comparison.
    It involves

    var abc = 'AÀÁÂÃÄÅBßCÇDÐEÈÉÊËFGHIÌÍÎÏJ' +
    'KLMNÑOÒÓÔÕÖØPQRSSTÙÚÛÜVWXYÝYZ';

    and abc.toLowerCase() and testing for indexOf but I 'm quite not sure how.
    The following covers first letters only:

    function compare(a, b) {
    if (abc.indexOf(a.charAt(0)) < abc.indexOf(b.charAt(0)))
    {
    return -1;
    }
    if (abc.indexOf(a.charAt(0)) > abc.indexOf(b.charAt(0)))
    {
    return 1;
    }
    return 0;
    }
    var islands=['Curaçao','Bonaire','Åland','Aruba'];
    alert(islands.sort(compare));

    HTH
    Ìvð
     
    Ivo, Apr 25, 2004
    #5
  6. "Ivo" <> skrev :

    >This is interesting. It may be that the Å follows Z in those languages, but
    >this is new for me and probably the rest of the world. In a long
    >alphabetical list, I and the OP would look for Å after A, and so I think in
    >a web-environment it probably should be put there. Where do the French put
    >the character ç in the French alphabet? Where do the Germans put the ß? I
    >would look for it after the B.


    I know only the scandinavian languages and a scandinavian would
    look for "Å" (and æ.ø.ä and ö) at the the end of the alfabet, so
    therefor I would let it be as the last letter.
    --
    Knud
     
    Knud Gert Ellentoft, Apr 25, 2004
    #6
  7. "Ivo" <> writes:

    > This is interesting. It may be that the Å follows Z in those languages,


    That would be all languages that actually have "Å" as a letter.

    > but this is new for me and probably the rest of the world.


    Hard to say. Microsoft seems to know it. When they alphabetize Danish
    words, the double-A, the original form which was turned into the new
    letter "Å", comes last (with predictable incorrect results for the
    foreign word Aardwark).

    > In a long alphabetical list, I and the OP would look for Å after A,
    > and so I think in a web-environment it probably should be put
    > there.


    That entirely depends on the language. If you are sorting words from
    different languages, I can see the problem, but would probably prefer
    to have it last anyway. It is a letter in its own, not just a letter
    with a accent.

    > Where do the French put the character ç in the French alphabet?


    It's a c-cedilla, that is, a "c" with an accent. It is not a separate
    letter.

    > Where do the Germans put the ß? I would look for it after the B.


    That would be a weird place to look for a sharp S. It is *not* a beta
    (it is an s-z-ligature).

    > As for a javascript solution, the easiest would probably be replacing all
    > occurances of ÀÁÂÃÄÅ and perhaps Æ with an A prior to sorting the list.


    That's one choice. Since you cannot fix one language to work with, I
    don't think there is an official way to alphabetize.
    I would probably expand Æ (the a-e-ligature) to AE.

    > This would result in a mix of accented and normal A's which is not
    > perfect.


    Alas, perfect does not exist.
    The closest to perfect for my tastes is to alphabetize letters according
    to the language they come from, so Aalborg (Danish city using old spelling)
    would be after Zaire, but Aardwark would be under "A".

    /L
    --
    Lasse Reichstein Nielsen -
    DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
    'Faith without judgement merely degrades the spirit divine.'
     
    Lasse Reichstein Nielsen, Apr 25, 2004
    #7
  8. Ian Richardson

    Evertjan. Guest

    Lasse Reichstein Nielsen wrote on 25 apr 2004 in comp.lang.javascript:

    > In ECMA 262, secion 15.5.4.17, the reason given for using
    > toLocaleLowerCase, is for languages where the language rules conflict
    > with the regular Unicode mapping. Tukish is given as an example.


    Not in
    <http://developer.netscape.com/docs/javascript/e262-pdf.pdf>
    from 1997, which stops at 15.5.4.12

    There should be a 3rd edition, but I cannot find it on the web.

    Do you have an URL?


    --
    Evertjan.
    The Netherlands.
    (Please change the x'es to dots in my emailaddress)
     
    Evertjan., Apr 25, 2004
    #8
  9. "Evertjan." <> writes:

    > Lasse Reichstein Nielsen wrote on 25 apr 2004 in comp.lang.javascript:
    >
    >> In ECMA 262, secion 15.5.4.17, the reason given for using
    >> toLocaleLowerCase, is for languages where the language rules conflict
    >> with the regular Unicode mapping. Tukish is given as an example.

    >
    > Not in
    > <http://developer.netscape.com/docs/javascript/e262-pdf.pdf>
    > from 1997, which stops at 15.5.4.12
    >
    > There should be a 3rd edition, but I cannot find it on the web.


    > Do you have an URL?


    I use this one:
    <URL:http://www.mozilla.org/js/language/E262-3.pdf>
    It seems to be more recent, and better formatted, than the official
    version from ECMA itself. I fail to imaginie an explanation for that :)
    <URL:http://www.ecma-international.org/publications/files/ecma-st/Ecma-262.pdf>

    /L
    --
    Lasse Reichstein Nielsen -
    DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
    'Faith without judgement merely degrades the spirit divine.'
     
    Lasse Reichstein Nielsen, Apr 25, 2004
    #9
  10. JRS: In article <c6eega$ap7bp$-berlin.de>, seen in
    news:comp.lang.javascript, Ian Richardson <> posted
    at Sat, 24 Apr 2004 20:17:30 :
    >At the moment I'm using a quicksort algorithm to sort a list of
    >countries in alphabetic order. This worked wonderfully until someone
    >came up with the Åland Islands... and this is at the end of the list.
    >
    >I'm not sure it's supposed to be.
    >
    >Now I could just alter my comparison so it ignores the top bit, but this
    >would then put it at the top of the list, even before Albania...
    >Alternatively, should I put Å after A?
    >
    >In short, is there a preferred way of ordering these?



    I don't think those Islands *are* a country, but ICBW; are they not
    loose bits of Finland - or are they a country in the same sense as Wales
    & Scotland are? I have enough difficulty in determining which parts of
    the globe are in the EU, or associated, or whatever, for
    <URL:http://www.merlyn.demon.co.uk/european.htm>.


    However, while &Aring; may well sort to the end of the alphabet in all
    languages that use it, that does not necessarily mean that all letters
    of the extended Roman Alphabet sort to identical positions in all
    countries that use them. It is possible that Potaniland sorts &AElig;
    between A & B, while Erewhon puts it at the end.

    I think all likely extended-roman letters can be mapped in an obvious
    manner to one or two English letters; it is probably best to use that,
    then sort. After all, even foreigners will probably not know the proper
    sort order for languages other than their own; but they will be used to
    what the Anglos do with their names. My fair-sized atlas indexes those
    Islands as "Aland", in the middle of the "A" section.

    Remember that the proper names of Asian and North African countries need
    transliteration to be readable by the average Anglo - and may be quite
    different too : one does not necessarily seek Bharat or Nippon among the
    B or N sections.

    <URL:http://www.merlyn.demon.co.uk/quotes.htm#FredHoyle> :)

    --
    © John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4 ©
    <URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang.javascript
    <URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
    <URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
     
    Dr John Stockton, Apr 25, 2004
    #10
  11. Ian Richardson

    Evertjan. Guest

    Lasse Reichstein Nielsen wrote on 25 apr 2004 in comp.lang.javascript:

    > I use this one:
    > <URL:http://www.mozilla.org/js/language/E262-3.pdf>


    tnx,

    Interesting reading.


    --
    Evertjan.
    The Netherlands.
    (Please change the x'es to dots in my emailaddress)
     
    Evertjan., Apr 25, 2004
    #11
  12. Dr John Stockton wrote:

    > JRS: In article <c6eega$ap7bp$-berlin.de>, seen in
    > news:comp.lang.javascript, Ian Richardson <> posted
    > at Sat, 24 Apr 2004 20:17:30 :
    >
    >>At the moment I'm using a quicksort algorithm to sort a list of
    >>countries in alphabetic order. This worked wonderfully until someone
    >>came up with the Åland Islands... and this is at the end of the list.
    >>
    >>I'm not sure it's supposed to be.


    <snip>

    > I don't think those Islands *are* a country, but ICBW


    <snip>

    According to ftp://ftp.ripe.net/iso3166-countrycodes.txt, it's a country.

    <snip>

    I guess what I'm looking for is a language-specific dictionary sort, if
    such a thing exists, defaulting to a Unicode or some other default order
    if not.

    Ian
     
    Ian Richardson, Apr 25, 2004
    #12
  13. Ian Richardson

    optimistx Guest

    Ian Richardson wrote:

    >
    > According to ftp://ftp.ripe.net/iso3166-countrycodes.txt, it's a country.
    >
    > <snip>
    >
    > I guess what I'm looking for is a language-specific dictionary sort, if
    > such a thing exists, defaulting to a Unicode or some other default order
    > if not.
    >
    > Ian


    Åland is part of Finland, and Finland is an independent country. Member
    of UN.
     
    optimistx, Apr 25, 2004
    #13
  14. Lasse Reichstein Nielsen wrote:

    > "Evertjan." <> writes:
    >> There should be a 3rd edition, but I cannot find it on the web.
    >>
    >> Do you have an URL?

    >
    > I use this one:
    > <URL:http://www.mozilla.org/js/language/E262-3.pdf>
    > It seems to be more recent, and better formatted, than the official
    > version from ECMA itself. I fail to imaginie an explanation for that :)
    > <URL:http://www.ecma-international.org/publications/files/ecma-st/Ecma-262.pdf>


    Well, Netscape is (was?) developing the next version of JavaScript (v2.0)
    which should (have?) become the next edition of ECMAScript (ed. 4). Since
    AOLTW (apparently only temporarily) closed the Netscape browser division[1]
    and consequently Netscape is (currently) no longer a member of ECMA and
    AOLTW is neither, that might be a reason.


    PointedEars
    ___________
    [1] <http://www.holgermetzger.de/Netscape_History.html>
     
    Thomas 'PointedEars' Lahn, May 5, 2004
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?VGltOjouLg==?=

    MAJOR problem with alphabetic paging

    =?Utf-8?B?VGltOjouLg==?=, Jul 26, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    491
    =?Utf-8?B?VGltOjouLg==?=
    Jul 26, 2005
  2. steve

    list in alphabetic order

    steve, Sep 26, 2004, in forum: HTML
    Replies:
    7
    Views:
    601
    steve
    Sep 27, 2004
  3. py_genetic
    Replies:
    6
    Views:
    321
    py_genetic
    Jun 19, 2007
  4. emre esirik(hacettepe computer science and enginee

    I want to list data by alphabetic

    emre esirik(hacettepe computer science and enginee, Dec 2, 2007, in forum: C Programming
    Replies:
    7
    Views:
    317
    Bill Reid
    Apr 9, 2008
  5. bdb112
    Replies:
    45
    Views:
    1,348
    jazbees
    Apr 29, 2009
Loading...

Share This Page