BUG in encoding package requires spaces around « and »

Discussion in 'Perl Misc' started by Mumia W., Jun 3, 2006.

  1. Mumia W.

    Mumia W. Guest

    A bug in the 'encoding' module seems to require spaces around the right
    and left double-angle-brackets. This only is needed when a variable is
    being interpolated within a double-quoted string. Here's a demonstration
    program:

    #!/usr/bin/perl
    use strict;
    use warnings;
    use encoding 'iso-8859-1';

    print "«Hi there»\n";
    print "«Hello there again»\n";

    our $string = 'Something fun';

    print "«$string»"; # BUG prevents compilation
    print "« $string »"; # spaces are needed around string to compile

    __END__

    This prints (i18n-file.pl is the name of my script):

    Global symbol "%_END__" requires explicit package name at ./i18n-file.pl
    line 11.
    Execution of ./i18n-file.pl aborted due to compilation errors.

    shell returned 255

    ------------------------------
    Comment out the first print «$string», and everything works.
     
    Mumia W., Jun 3, 2006
    #1
    1. Advertising

  2. Mumia W.

    Mumia W. Guest

    Mumia W. wrote:
    > A bug in the 'encoding' module seems to require spaces around the right
    > and left double-angle-brackets. This only is needed when a variable is
    > being interpolated within a double-quoted string. [...]


    A workaround is to use curly braces around the variable name:

    #!/usr/bin/perl
    use strict;
    use warnings;
    use encoding 'iso-8859-1';

    print "«Hi there»\n";
    print "«Hello there again»\n";

    our $string = 'Something fun';

    # print "«$string»\n"; # BUG prevents compilation
    print "«${string}»\n"; # Put 'string' in braces to avoid bug.
     
    Mumia W., Jun 3, 2006
    #2
    1. Advertising

  3. Re: BUG in encoding package requires spaces around « and »

    Mumia W. wrote:

    > Mumia W. wrote:
    > > A bug in the 'encoding' module seems to require spaces around the right
    > > and left double-angle-brackets. This only is needed when a variable is
    > > being interpolated within a double-quoted string. [...]

    >
    > A workaround is to use curly braces around the variable name:
    > [...]
    > print "«${string}»\n"; # Put 'string' in braces to avoid bug.


    Another workaround: print "«$string\»";

    My guess it that Perl considers » to be part of the scalar's name
    somehow (though « and » are part of ISO-8859-1). I think you're right
    that this is a bug in the encoding module.

    But the problem seems to occur only in the character at the right side
    of $string (»), not in the one at the left side («). (though print
    "$string«"; doesn't work either)

    --
    Bart
     
    Bart Van der Donck, Jun 3, 2006
    #3
  4. Mumia W.

    Guest

    Re: BUG in encoding package requires spaces around « and »

    Mumia W. wrote:
    > needed when a variable is being interpolated within a double-quoted string.


    And, FWIW, this bug also affects strings quoted in qq{} style (which is
    what I would expect, of course, but I did test it).

    --
    David Filmer (http://DavidFilmer.com)
     
    , Jun 3, 2006
    #4
  5. Mumia W.

    Mumia W. Guest

    Re: BUG in encoding package requires spaces around « and »

    Bart Van der Donck wrote:
    > Mumia W. wrote:
    >
    >> Mumia W. wrote:
    >>> A bug in the 'encoding' module seems to require spaces around the right
    >>> and left double-angle-brackets. This only is needed when a variable is
    >>> being interpolated within a double-quoted string. [...]

    >> A workaround is to use curly braces around the variable name:
    >> [...]
    >> print "«${string}»\n"; # Put 'string' in braces to avoid bug.

    >
    > Another workaround: print "«$string\»";
    >
    > My guess it that Perl considers » to be part of the scalar's name
    > somehow (though « and » are part of ISO-8859-1). I think you're right
    > that this is a bug in the encoding module.
    >
    > But the problem seems to occur only in the character at the right side
    > of $string (»), not in the one at the left side («). (though print
    > "$string«"; doesn't work either)
    >


    Thanks for the backslash idea. The 'encoding' parser seems to be partial
    towards us-ascii. I don't know what the semantic difference is supposed
    to be between the vertical bar (|) and the broken bar (¦), but the
    encoding module treats them very differently:

    1 #!/usr/bin/perl
    2 use strict;
    3 use warnings;
    4 use encoding 'iso-8859-1';
    5
    6 local $\ = "\n";
    7 our $string = 'Something fun';
    8 print "My string is $string|"; # | == \x{7C} (us-ascii, vert. bar)
    9 print "Broken: $string¦"; # ¦ == \x{A6} (8859-1, broken bar)
    10
    11 __END__
    12
    13 The encoding module doesn't seem to like characters
    14 above 127. Either put a backslash before the ¦ on line
    15 nine, or comment out line 4, and the program runs.
     
    Mumia W., Jun 3, 2006
    #5
  6. Re: BUG in encoding package requires spaces around « and »

    Mumia W. wrote:

    > Thanks for the backslash idea. The 'encoding' parser seems to be partial
    > towards us-ascii. I don't know what the semantic difference is supposed
    > to be between the vertical bar (|) and the broken bar (¦), but the
    > encoding module treats them very differently:
    >
    > 1 #!/usr/bin/perl
    > 2 use strict;
    > 3 use warnings;
    > 4 use encoding 'iso-8859-1';
    > 5
    > 6 local $\ = "\n";
    > 7 our $string = 'Something fun';
    > 8 print "My string is $string|"; # | == \x{7C} (us-ascii, vert.. bar)
    > 9 print "Broken: $string¦"; # ¦ == \x{A6} (8859-1, broken bar)
    > 10
    > 11 __END__
    > 12
    > 13 The encoding module doesn't seem to like characters
    > 14 above 127. Either put a backslash before the ¦ on line
    > 15 nine, or comment out line 4, and the program runs.


    You're right, it appears that anything above 127 triggers the error
    message.

    print "Broken: $stringµ";
    print "Broken: $stringô";
    print "Broken: $string´";
    print "Broken: $string£";
    print "Broken: $string§";

    Or, as in your example:

    | (124) is okay (below 127)
    ¦ (166) is not okay (above 127)

    128 is just half of 256 (=the available characters in ISO-8859-1). The
    range 0-127 can be covered by setting the bits in a 7-bit binary digit,
    hence that set is sometimes referred to as 7-bit ASCII. ISO-8859-1 is a
    8-bit character set though, so I'ld say this shouldn't normally happen.
    I tested on different OS's and as CGI because I was not sure it could
    maybe be a shell issue. But that should not be the case here.

    Note that the following encoding gives exactly the same results:

    use encoding 'ascii';

    (Which would be explainable, because ASCII covers 0 to 127 only)

    But. Other tests turned out that the following charsets seem to have
    the same issue:

    use encoding 'iso-8859-16';
    use encoding 'utf-8';
    use encoding 'utf8';
    use encoding 'windows-1251';

    So the problem is not only at ISO-8859-1.

    I'm not sure where to go from here. I would conclude at this point that
    the 'encoding'-module only works for characters up to 127 that are put
    next to a variable's name.

    I hope this can be of some help.

    --
    Bart
     
    Bart Van der Donck, Jun 3, 2006
    #6
  7. Mumia W.

    Dr.Ruud Guest

    Mumia W. schreef:

    > A bug in the 'encoding' module seems to require spaces around the
    > right and left double-angle-brackets. This only is needed when a
    > variable is being interpolated within a double-quoted string. Here's
    > a demonstration program:
    >
    > #!/usr/bin/perl
    > use strict;
    > use warnings;

    no utf8 ;
    > use encoding 'iso-8859-1';
    >
    > print "«Hi there»\n";
    > print "«Hello there again»\n";
    >
    > our $string = 'Something fun';
    >
    > print "«$string»"; # BUG prevents compilation
    > print "« $string »"; # spaces are needed around string to compile
    >
    > __END__
    >
    > This prints (i18n-file.pl is the name of my script):
    >
    > Global symbol "%_END__" requires explicit package name at
    > ./i18n-file.pl line 11.
    > Execution of ./i18n-file.pl aborted due to compilation errors.
    >
    > shell returned 255
    >
    > ------------------------------
    > Comment out the first print «$string», and everything works.


    Insert "no utf8;" before the "use encoding ..." line.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Jun 4, 2006
    #7
  8. Re: BUG in encoding package requires spaces around « and »

    Mumia W. wrote:
    > A bug in the 'encoding' module seems to require spaces around the right
    > and left double-angle-brackets. This only is needed when a variable is
    > being interpolated within a double-quoted string. Here's a demonstration
    > program:
    >
    > #!/usr/bin/perl
    > use strict;
    > use warnings;
    > use encoding 'iso-8859-1';
    >
    > print "«Hi there»\n";
    > print "«Hello there again»\n";
    >
    > our $string = 'Something fun';
    >
    > print "«$string»"; # BUG prevents compilation
    > print "« $string »"; # spaces are needed around string to compile
    >
    > __END__
    >
    > This prints (i18n-file.pl is the name of my script):
    >
    > Global symbol "%_END__" requires explicit package name at ./i18n-file.pl
    > line 11.
    > Execution of ./i18n-file.pl aborted due to compilation errors.
    >
    > shell returned 255
    >
    > ------------------------------
    > Comment out the first print «$string», and everything works.
    >
    >


    If you look in encoding.pm, it appears to me that unless you're using
    the filter option, all it does is do some sanity checks on the encoding
    name and then set ${^ENCODING} to the given encoding name. This, and the
    findings in the adjacent threads (that "«${string}»" works) make it
    sound like the Perl parser is mis-handling the end of the interpolated
    variable name.

    So:

    Are you using the latest Perl? I believe this is 5.8.8.

    Are you using the latest Encode? I believe this is 2.17, or at least
    that is the latest on the CPAN mirror I use, as of the time I write this.

    If the answer to both is true, you might want to consider reporting
    this. I'm not sure how I would go about this, but the Encode
    documentation suggests maybe joining and posting to the Perl Unicode
    Mailing List.

    Tom Wyant
     
    harryfmudd [AT] comcast [DOT] net, Jun 4, 2006
    #8
  9. Mumia W.

    Mumia W. Guest

    Re: BUG in encoding package requires spaces around « and »

    Dr.Ruud wrote:
    > Mumia W. schreef:
    >
    >> A bug in the 'encoding' module seems to require spaces around the
    >> right and left double-angle-brackets. This only is needed when a
    >> variable is being interpolated within a double-quoted string. Here's
    >> a demonstration program:
    >>
    >> #!/usr/bin/perl
    >> use strict;
    >> use warnings;

    > no utf8 ;
    >> use encoding 'iso-8859-1';
    >>
    >> print "«Hi there»\n";
    >> print "«Hello there again»\n";
    >>
    >> our $string = 'Something fun';
    >>
    >> print "«$string»"; # BUG prevents compilation
    >> print "« $string »"; # spaces are needed around string to compile
    >>
    >> __END__
    >>
    >> This prints (i18n-file.pl is the name of my script):
    >>
    >> Global symbol "%_END__" requires explicit package name at
    >> ./i18n-file.pl line 11.
    >> Execution of ./i18n-file.pl aborted due to compilation errors.
    >>
    >> shell returned 255
    >>
    >> ------------------------------
    >> Comment out the first print «$string», and everything works.

    >
    > Insert "no utf8;" before the "use encoding ..." line.
    >


    It works!

    And I think I see why (from man utf8):

    > Note that if you have bytes with the eighth bit on in your script (for
    > example embedded Latin-1 in your string literals), "use utf8" will be
    > unhappy since the bytes are most probably not well-formed UTF-8. If
    > you want to have such bytes and use utf8, you can disable utf8 until
    > the end the block (or file, if at top level) by "no utf8;".


    Thanks for the utf8 idea. So it seems that we have a lot of ways to
    solve this problem: (1) put a space between the variable and the special
    character, (2) put the variable name in curly braces, (3) put a
    backslash before the special character, (4) specify 'no utf8', and (5)
    go ahead and convert the file to utf8 and 'use utf8':

    #!/usr/bin/perl
    use strict;
    use warnings;
    use utf8;
    use encoding 'utf-8';

    local $\ = "\n";
    our $string = 'Something fun';
    print "Reg: $string®";
    print "B-Bar: $string¦";
    print "Quoted: «$string»";
    print "Yen: $string¥";
    print "Euro: $string€";

    our $exoãƒtic = 'ãƒãƒ‹ Ç­ Ñš シß㬠ヌ ã« ã­';
    print "exoãƒtic = $exoãƒtic";

    __END__


    It seems that utf8 extends the core perl parser in some interesting ways.
     
    Mumia W., Jun 5, 2006
    #9
  10. Re: BUG in encoding package requires spaces around « and »

    Mumia W. wrote:
    >
    >
    > Thanks for the utf8 idea. So it seems that we have a lot of ways to
    > solve this problem: (1) put a space between the variable and the special
    > character, (2) put the variable name in curly braces, (3) put a
    > backslash before the special character, (4) specify 'no utf8', and (5)
    > go ahead and convert the file to utf8 and 'use utf8':
    >
    > #!/usr/bin/perl
    > use strict;
    > use warnings;
    > use utf8;
    > use encoding 'utf-8';
    >
    > local $\ = "\n";
    > our $string = 'Something fun';
    > print "Reg: $string®";
    > print "B-Bar: $string¦";
    > print "Quoted: «$string»";
    > print "Yen: $string¥";
    > print "Euro: $string€";
    >
    > our $exoãƒtic = 'ãƒãƒ‹ Ç­ Ñš シß㬠ヌ ã« ã­';
    > print "exoãƒtic = $exoãƒtic";
    >
    > __END__
    >
    >
    > It seems that utf8 extends the core perl parser in some interesting ways.
    >
    >


    And non-obvious. It looks now like the behaviour is a feature (i.e. is
    documented). But it sure didn't pop out on my first pass through the
    documentation. Thanks.

    Tom Wyant
     
    harryfmudd [AT] comcast [DOT] net, Jun 5, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Elmar Krieger
    Replies:
    0
    Views:
    319
    Elmar Krieger
    Oct 26, 2003
  2. Piet
    Replies:
    0
    Views:
    550
  3. iwasjoeking
    Replies:
    9
    Views:
    626
    dorayme
    Jun 10, 2008
  4. John B. Matthews
    Replies:
    4
    Views:
    671
    John B. Matthews
    Sep 12, 2008
  5. johkar
    Replies:
    2
    Views:
    2,904
    Mayeul
    Dec 10, 2009
Loading...

Share This Page