utf8 pragma - strange behavior

Discussion in 'Perl' started by ryang, Mar 17, 2005.

  1. ryang

    ryang Guest

    I am trying to understand how to work with Unicode in Perl. I have
    read the relevant man pages (perluniintro, perlunicode, etc.) and have
    written severl scripts to test/verifiy my understanding. However, I
    created a script that has unexpected output. The script is below and
    it contains some UTF-8 encoded characters which represent all five
    Spanish accented vowels plus the enye (n with a tilde over it) in upper
    and lower case. I hope that this post comes through as UTF-8 encoded
    as the source code is. I am posting from Google groups which does use
    UTF-8 encoding.

    BEGIN CODE >>
    #!/usr/bin/perl

    use warnings;
    use strict;
    #use utf8;
    use Encode;

    # using utf8 causes the characters to be printed in latin-1 encoding

    my %table = (
    # spanish
    # hexidecimal UTF-8 => actual UTF-8
    '0xc381' => chr(hex('c3')) . chr(hex('81')), # 'Á',
    '0xc389' => encode("utf8", "\x{00c9}"), # 'É',
    '0xc38d' => 'Í',
    '0xc393' => 'Ó',
    '0xc391' => 'Ñ',
    '0xc39a' => 'Ú',
    '0xc3a1' => 'á',
    '0xc3a9' => 'é',
    '0xc3ad' => 'í',
    '0xc3b3' => 'ó',
    '0xc3b1' => 'ñ',
    '0xc3ba' => 'ú',
    );

    foreach (sort keys %table) {
    print "$_ = $table{$_}\n";
    }
    << END CODE

    When the 'use utf8' line is commented out, the script outputs the UTF-8
    characters correctly. However, when the utf8 pragma is used, the
    characters that are actually hard coded into the hash as UTF-8 (not the
    Á or É) are printed in Latin-1. To my understanding, in Perl 5.8.x,
    the only effect of the utf8 pragma is to tell the parser that literals
    and variables may contain UTF-8 encoded characters. However in
    practice, the utf8 pragma is effecting the script's output.

    I have tested the script on Mac OSX 10.3.8 with Perl 5.8.1 and on
    Fedora Core (not sure which version) running perl 5.8.3.

    Can anyone explain why the utf8 pragma effects the output of the script?
     
    ryang, Mar 17, 2005
    #1
    1. Advertising

  2. ryang

    Wes Groleau Guest

    ryang wrote:
    > I am trying to understand how to work with Unicode in Perl. I have
    > read the relevant man pages (perluniintro, perlunicode, etc.) and have
    > written severl scripts to test/verifiy my understanding. However, I
    > created a script that has unexpected output. The script is below and


    Welcome to the club. :)

    > Can anyone explain why the utf8 pragma effects the output of the script?


    My problem (different post) is slightly different, but
    I'm going to try commenting out the pragma to see what happens.
     
    Wes Groleau, Apr 11, 2005
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    9
    Views:
    14,237
    Rob Dekker
    Jul 21, 2005
  2. Titus A Ducksass
    Replies:
    1
    Views:
    6,050
    Sid Ismail
    Aug 1, 2003
  3. Paul J. Le Genial

    [Q] pragma no-cache : what about the images ?

    Paul J. Le Genial, Mar 14, 2005, in forum: HTML
    Replies:
    5
    Views:
    7,518
  4. gry
    Replies:
    2
    Views:
    807
    Alf P. Steinbach
    Mar 13, 2012
  5. Michal Jankowski
    Replies:
    0
    Views:
    158
    Michal Jankowski
    Apr 29, 2011
Loading...

Share This Page