Problems with utf8, locale and regex

  • Thread starter Thore Harald Høye
  • Start date
T

Thore Harald Høye

I have made this testcase:
-----------------------
#!/usr/bin/perl
#use locale;
#use encoding 'iso-8859-1';
use utf8;
binmode(STDOUT, ":utf8");

print "\\x{00D8}:\n";
test("\x{00D8}");

print "\nØ:\n";
test("Ø");

sub test {
my $chr = shift;
print "ord: " . ord($chr) . ", '$chr', lc: " . lc($chr) . "\n";
print "isutf8: " . utf8::is_utf8($chr) . "\n";
$chr =~ /$chr/i && print "Caseinsensitive matches\n";
$chr =~ /$chr/ && print "Casesensitive matches\n";
}

-----------------------

The weirdest thing here is that if "use locale" is enabled, the case
insensitive test in the last test() will fail. Without use encoding it
will work in the first version (which does not get the utf8-flag), but not
in the last. Without use locale both works.

If I run the program with "use encoding.." enabled, both versions will
have the utf8-flag, and both fails. It will also print the result in
ISO-8859-1, even though I have the binmode() later.

It doesn't seem to matter what the locale is. I have tried no_NO.UTF-8 and
en_US.UTF-8. lc($chr) works in both cases, but it only sorts arrays
correctly with no_NO.

I save the file in UTF-8 mode.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top