true alphabetic sort...

I

Ian Richardson

At the moment I'm using a quicksort algorithm to sort a list of
countries in alphabetic order. This worked wonderfully until someone
came up with the Åland Islands... and this is at the end of the list.

I'm not sure it's supposed to be.

Now I could just alter my comparison so it ignores the top bit, but this
would then put it at the top of the list, even before Albania...
Alternatively, should I put Å after A?

In short, is there a preferred way of ordering these?

Thanks,

Ian
 
K

Knud Gert Ellentoft

Ian Richardson said:
At the moment I'm using a quicksort algorithm to sort a list of
countries in alphabetic order. This worked wonderfully until someone
came up with the Åland Islands... and this is at the end of the list.

Yes, and it's correct.

In swedish, danish and norwegian is "Å" the last letter in the
alphabet.
 
E

Evertjan.

Knud Gert Ellentoft wrote on 24 apr 2004 in comp.lang.javascript:
In swedish, danish and norwegian is "Å" the last letter in the
alphabet.

Just curious:

This will write "å" overhere:

document.write('Å'.toLowercase)

Does this work for all European alphabets?

=============================

When should I use:

document.write('Å'.toLocaleLowerCase())

?
 
L

Lasse Reichstein Nielsen

Evertjan. said:
Just curious:

This will write "å" overhere:

document.write('Å'.toLowercase)

Does this work for all European alphabets?

It works for any Unicode letter, using the Unicode character database
for the translation.
=============================

When should I use:

document.write('Å'.toLocaleLowerCase())

Never, for the letter "Å".
In ECMA 262, secion 15.5.4.17, the reason given for using
toLocaleLowerCase, is for languages where the language rules conflict
with the regular Unicode mapping. Tukish is given as an example.

/L
 
I

Ivo

Ian Richardson skrev :


Yes, and it's correct.
In swedish, danish and norwegian is "Å" the last letter in the
alphabet.

This is interesting. It may be that the Å follows Z in those languages, but
this is new for me and probably the rest of the world. In a long
alphabetical list, I and the OP would look for Å after A, and so I think in
a web-environment it probably should be put there. Where do the French put
the character ç in the French alphabet? Where do the Germans put the ß? I
would look for it after the B.

As for a javascript solution, the easiest would probably be replacing all
occurances of ÀÁÂÃÄÅ and perhaps Æ with an A prior to sorting the list. This
would result in a mix of accented and normal A's which is not perfect. Åland
must come after Aruba but before Bermuda. We must write our own comparison.
It involves

var abc = 'AÀÁÂÃÄÅBßCÇDÐEÈÉÊËFGHIÌÍÎÏJ' +
'KLMNÑOÒÓÔÕÖØPQRSSTÙÚÛÜVWXYÝYZ';

and abc.toLowerCase() and testing for indexOf but I 'm quite not sure how.
The following covers first letters only:

function compare(a, b) {
if (abc.indexOf(a.charAt(0)) < abc.indexOf(b.charAt(0)))
{
return -1;
}
if (abc.indexOf(a.charAt(0)) > abc.indexOf(b.charAt(0)))
{
return 1;
}
return 0;
}
var islands=['Curaçao','Bonaire','Åland','Aruba'];
alert(islands.sort(compare));

HTH
Ìvð
 
K

Knud Gert Ellentoft

Ivo said:
This is interesting. It may be that the Å follows Z in those languages, but
this is new for me and probably the rest of the world. In a long
alphabetical list, I and the OP would look for Å after A, and so I think in
a web-environment it probably should be put there. Where do the French put
the character ç in the French alphabet? Where do the Germans put the ß? I
would look for it after the B.

I know only the scandinavian languages and a scandinavian would
look for "Å" (and æ.ø.ä and ö) at the the end of the alfabet, so
therefor I would let it be as the last letter.
 
L

Lasse Reichstein Nielsen

Ivo said:
This is interesting. It may be that the Å follows Z in those languages,

That would be all languages that actually have "Å" as a letter.
but this is new for me and probably the rest of the world.

Hard to say. Microsoft seems to know it. When they alphabetize Danish
words, the double-A, the original form which was turned into the new
letter "Å", comes last (with predictable incorrect results for the
foreign word Aardwark).
In a long alphabetical list, I and the OP would look for Å after A,
and so I think in a web-environment it probably should be put
there.

That entirely depends on the language. If you are sorting words from
different languages, I can see the problem, but would probably prefer
to have it last anyway. It is a letter in its own, not just a letter
with a accent.
Where do the French put the character ç in the French alphabet?

It's a c-cedilla, that is, a "c" with an accent. It is not a separate
letter.
Where do the Germans put the ß? I would look for it after the B.

That would be a weird place to look for a sharp S. It is *not* a beta
(it is an s-z-ligature).
As for a javascript solution, the easiest would probably be replacing all
occurances of ÀÁÂÃÄÅ and perhaps Æ with an A prior to sorting the list.

That's one choice. Since you cannot fix one language to work with, I
don't think there is an official way to alphabetize.
I would probably expand Æ (the a-e-ligature) to AE.
This would result in a mix of accented and normal A's which is not
perfect.

Alas, perfect does not exist.
The closest to perfect for my tastes is to alphabetize letters according
to the language they come from, so Aalborg (Danish city using old spelling)
would be after Zaire, but Aardwark would be under "A".

/L
 
E

Evertjan.

Lasse Reichstein Nielsen wrote on 25 apr 2004 in comp.lang.javascript:
In ECMA 262, secion 15.5.4.17, the reason given for using
toLocaleLowerCase, is for languages where the language rules conflict
with the regular Unicode mapping. Tukish is given as an example.

Not in
<http://developer.netscape.com/docs/javascript/e262-pdf.pdf>
from 1997, which stops at 15.5.4.12

There should be a 3rd edition, but I cannot find it on the web.

Do you have an URL?
 
L

Lasse Reichstein Nielsen

Evertjan. said:
Lasse Reichstein Nielsen wrote on 25 apr 2004 in comp.lang.javascript:


Not in
<http://developer.netscape.com/docs/javascript/e262-pdf.pdf>
from 1997, which stops at 15.5.4.12

There should be a 3rd edition, but I cannot find it on the web.
Do you have an URL?

I use this one:
<URL:http://www.mozilla.org/js/language/E262-3.pdf>
It seems to be more recent, and better formatted, than the official
version from ECMA itself. I fail to imaginie an explanation for that :)
<URL:http://www.ecma-international.org/publications/files/ecma-st/Ecma-262.pdf>

/L
 
D

Dr John Stockton

JRS: In article <[email protected]>, seen in
news:comp.lang.javascript said:
At the moment I'm using a quicksort algorithm to sort a list of
countries in alphabetic order. This worked wonderfully until someone
came up with the Åland Islands... and this is at the end of the list.

I'm not sure it's supposed to be.

Now I could just alter my comparison so it ignores the top bit, but this
would then put it at the top of the list, even before Albania...
Alternatively, should I put Å after A?

In short, is there a preferred way of ordering these?


I don't think those Islands *are* a country, but ICBW; are they not
loose bits of Finland - or are they a country in the same sense as Wales
& Scotland are? I have enough difficulty in determining which parts of
the globe are in the EU, or associated, or whatever, for
<URL:http://www.merlyn.demon.co.uk/european.htm>.


However, while &Aring; may well sort to the end of the alphabet in all
languages that use it, that does not necessarily mean that all letters
of the extended Roman Alphabet sort to identical positions in all
countries that use them. It is possible that Potaniland sorts &AElig;
between A & B, while Erewhon puts it at the end.

I think all likely extended-roman letters can be mapped in an obvious
manner to one or two English letters; it is probably best to use that,
then sort. After all, even foreigners will probably not know the proper
sort order for languages other than their own; but they will be used to
what the Anglos do with their names. My fair-sized atlas indexes those
Islands as "Aland", in the middle of the "A" section.

Remember that the proper names of Asian and North African countries need
transliteration to be readable by the average Anglo - and may be quite
different too : one does not necessarily seek Bharat or Nippon among the
B or N sections.

<URL:http://www.merlyn.demon.co.uk/quotes.htm#FredHoyle> :)
 
I

Ian Richardson

Dr said:
JRS: In article <[email protected]>, seen in

I don't think those Islands *are* a country, but ICBW

<snip>

According to ftp://ftp.ripe.net/iso3166-countrycodes.txt, it's a country.

<snip>

I guess what I'm looking for is a language-specific dictionary sort, if
such a thing exists, defaulting to a Unicode or some other default order
if not.

Ian
 
O

optimistx

Ian said:
According to ftp://ftp.ripe.net/iso3166-countrycodes.txt, it's a country.

<snip>

I guess what I'm looking for is a language-specific dictionary sort, if
such a thing exists, defaulting to a Unicode or some other default order
if not.

Ian

Åland is part of Finland, and Finland is an independent country. Member
of UN.
 
T

Thomas 'PointedEars' Lahn

Lasse said:
I use this one:
<URL:http://www.mozilla.org/js/language/E262-3.pdf>
It seems to be more recent, and better formatted, than the official
version from ECMA itself. I fail to imaginie an explanation for that :)
<URL:http://www.ecma-international.org/publications/files/ecma-st/Ecma-262.pdf>

Well, Netscape is (was?) developing the next version of JavaScript (v2.0)
which should (have?) become the next edition of ECMAScript (ed. 4). Since
AOLTW (apparently only temporarily) closed the Netscape browser division[1]
and consequently Netscape is (currently) no longer a member of ECMA and
AOLTW is neither, that might be a reason.


PointedEars
___________
[1] <http://www.holgermetzger.de/Netscape_History.html>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top