Problems with utf8, locale and regex

Discussion in 'Perl' started by Thore Harald Høye, Dec 5, 2007.

  1. I have made this testcase:
    -----------------------
    #!/usr/bin/perl
    #use locale;
    #use encoding 'iso-8859-1';
    use utf8;
    binmode(STDOUT, ":utf8");

    print "\\x{00D8}:\n";
    test("\x{00D8}");

    print "\nØ:\n";
    test("Ø");

    sub test {
    my $chr = shift;
    print "ord: " . ord($chr) . ", '$chr', lc: " . lc($chr) . "\n";
    print "isutf8: " . utf8::is_utf8($chr) . "\n";
    $chr =~ /$chr/i && print "Caseinsensitive matches\n";
    $chr =~ /$chr/ && print "Casesensitive matches\n";
    }

    -----------------------

    The weirdest thing here is that if "use locale" is enabled, the case
    insensitive test in the last test() will fail. Without use encoding it
    will work in the first version (which does not get the utf8-flag), but not
    in the last. Without use locale both works.

    If I run the program with "use encoding.." enabled, both versions will
    have the utf8-flag, and both fails. It will also print the result in
    ISO-8859-1, even though I have the binmode() later.

    It doesn't seem to matter what the locale is. I have tried no_NO.UTF-8 and
    en_US.UTF-8. lc($chr) works in both cases, but it only sorts arrays
    correctly with no_NO.

    I save the file in UTF-8 mode.
     
    Thore Harald Høye, Dec 5, 2007
    #1
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.