Locale not working with Unicode strings in Perl 5.8?

J

John

After an upgrade from Perl 5.6 to perl 5.8, strings with character semantics
stopped sorting according to the locale settings. In the following script,
I sort a latin1-encoded string and a utf8-encoded string. The first sorts
correctly, the latter doesn't. In Perl 5.6, strings that had character
semantics sorted just fine as long as I used 'locale'.

I found a bug notice in perlunicode.html, under "Interaction with Locales",
dissuading the use of locales with Unicode in perl 5.8. So, if I want to
correctly sort a list of Spanish words encoded in UTF8, what do I do? Do I
have to convert back to latin1 every time I want to do a collating
operation?

Or does someone out there know of a better solution?


use locale;
use charnames ':full';
use Encode qw (from_to);


###Latin1-encoded literals.
my @data1 = split //, "eáú";
my @data2 = split //, "e\N{LATIN SMALL LETTER A WITH ACUTE}\N{LATIN SMALL
LETTER U WITH ACUTE}";

print "Data 1: ".join(', ', sort {$a cmp $b} @data1)."\n";
print "Data 2: ".join(', ', sort {$a cmp $b} @data2)."\n";



OUTPUT:

Data 1: á, e, ú
Data 2: á, ú, e
 
R

Rich

John said:
After an upgrade from Perl 5.6 to perl 5.8, strings with character
semantics
stopped sorting according to the locale settings. In the following
script,
I sort a latin1-encoded string and a utf8-encoded string. The first sorts
correctly, the latter doesn't. In Perl 5.6, strings that had character
semantics sorted just fine as long as I used 'locale'.

I found a bug notice in perlunicode.html, under "Interaction with
Locales",
dissuading the use of locales with Unicode in perl 5.8. So, if I want to
correctly sort a list of Spanish words encoded in UTF8, what do I do? Do
I have to convert back to latin1 every time I want to do a collating
operation?

Or does someone out there know of a better solution?

Unicode::Collate will do the job, but it's heavyweight and some tailoring
will probably be required.

Sort::ArbBiLex is lighter and might be a better solution - again you'll
probably need to do a little spade work to get the correct sorting order.

I'm sure there are other solutions as well - if you don't get any joy
posting here, try perl.unicode.

Cheers
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top