Problems with utf8, locale and regex

Thore Harald Høye · Dec 5, 2007

I have made this testcase:
-----------------------
#!/usr/bin/perl
#use locale;
#use encoding 'iso-8859-1';
use utf8;
binmode(STDOUT, ":utf8");

print "\\x{00D8}:\n";
test("\x{00D8}");

print "\nØ:\n";
test("Ø");

sub test {
my $chr = shift;
print "ord: " . ord($chr) . ", '$chr', lc: " . lc($chr) . "\n";
print "isutf8: " . utf8::is_utf8($chr) . "\n";
$chr =~ /$chr/i && print "Caseinsensitive matches\n";
$chr =~ /$chr/ && print "Casesensitive matches\n";
}

-----------------------

The weirdest thing here is that if "use locale" is enabled, the case
insensitive test in the last test() will fail. Without use encoding it
will work in the first version (which does not get the utf8-flag), but not
in the last. Without use locale both works.

If I run the program with "use encoding.." enabled, both versions will
have the utf8-flag, and both fails. It will also print the result in
ISO-8859-1, even though I have the binmode() later.

It doesn't seem to matter what the locale is. I have tried no_NO.UTF-8 and
en_US.UTF-8. lc($chr) works in both cases, but it only sorts arrays
correctly with no_NO.

I save the file in UTF-8 mode.

character classes, locale and utf8 - strange behaviour	0	Apr 29, 2011
Cannot have locale word characters in a variable	9	Sep 2, 2013
Regex testing and UTF8 awarenes or Regex and numeric pattern matching	2	Mar 10, 2009
utf8 pragma - strange behavior	1	Mar 17, 2005
regular expressions and the LOCALE flag	0	Aug 3, 2010
Problems with use locale and regexp	5	Dec 29, 2006
utf8 and chomp	13	Feb 22, 2009
Help with utf8	4	Apr 7, 2009

Problems with utf8, locale and regex

Thore Harald Høye

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads