BUG in encoding package requires spaces around « and »

M

Mumia W.

A bug in the 'encoding' module seems to require spaces around the right
and left double-angle-brackets. This only is needed when a variable is
being interpolated within a double-quoted string. Here's a demonstration
program:

#!/usr/bin/perl
use strict;
use warnings;
use encoding 'iso-8859-1';

print "«Hi there»\n";
print "«Hello there again»\n";

our $string = 'Something fun';

print "«$string»"; # BUG prevents compilation
print "« $string »"; # spaces are needed around string to compile

__END__

This prints (i18n-file.pl is the name of my script):

Global symbol "%_END__" requires explicit package name at ./i18n-file.pl
line 11.
Execution of ./i18n-file.pl aborted due to compilation errors.

shell returned 255
 
M

Mumia W.

Mumia said:
A bug in the 'encoding' module seems to require spaces around the right
and left double-angle-brackets. This only is needed when a variable is
being interpolated within a double-quoted string. [...]

A workaround is to use curly braces around the variable name:

#!/usr/bin/perl
use strict;
use warnings;
use encoding 'iso-8859-1';

print "«Hi there»\n";
print "«Hello there again»\n";

our $string = 'Something fun';

# print "«$string»\n"; # BUG prevents compilation
print "«${string}»\n"; # Put 'string' in braces to avoid bug.
 
B

Bart Van der Donck

Mumia said:
Mumia said:
A bug in the 'encoding' module seems to require spaces around the right
and left double-angle-brackets. This only is needed when a variable is
being interpolated within a double-quoted string. [...]

A workaround is to use curly braces around the variable name:
[...]
print "«${string}»\n"; # Put 'string' in braces to avoid bug.

Another workaround: print "«$string\»";

My guess it that Perl considers » to be part of the scalar's name
somehow (though « and » are part of ISO-8859-1). I think you're right
that this is a bug in the encoding module.

But the problem seems to occur only in the character at the right side
of $string (»), not in the one at the left side («). (though print
"$string«"; doesn't work either)
 
U

usenet

Mumia said:
needed when a variable is being interpolated within a double-quoted string.

And, FWIW, this bug also affects strings quoted in qq{} style (which is
what I would expect, of course, but I did test it).
 
M

Mumia W.

Bart said:
Mumia said:
Mumia said:
A bug in the 'encoding' module seems to require spaces around the right
and left double-angle-brackets. This only is needed when a variable is
being interpolated within a double-quoted string. [...]
A workaround is to use curly braces around the variable name:
[...]
print "«${string}»\n"; # Put 'string' in braces to avoid bug.

Another workaround: print "«$string\»";

My guess it that Perl considers » to be part of the scalar's name
somehow (though « and » are part of ISO-8859-1). I think you're right
that this is a bug in the encoding module.

But the problem seems to occur only in the character at the right side
of $string (»), not in the one at the left side («). (though print
"$string«"; doesn't work either)

Thanks for the backslash idea. The 'encoding' parser seems to be partial
towards us-ascii. I don't know what the semantic difference is supposed
to be between the vertical bar (|) and the broken bar (¦), but the
encoding module treats them very differently:

1 #!/usr/bin/perl
2 use strict;
3 use warnings;
4 use encoding 'iso-8859-1';
5
6 local $\ = "\n";
7 our $string = 'Something fun';
8 print "My string is $string|"; # | == \x{7C} (us-ascii, vert. bar)
9 print "Broken: $string¦"; # ¦ == \x{A6} (8859-1, broken bar)
10
11 __END__
12
13 The encoding module doesn't seem to like characters
14 above 127. Either put a backslash before the ¦ on line
15 nine, or comment out line 4, and the program runs.
 
B

Bart Van der Donck

Mumia said:
Thanks for the backslash idea. The 'encoding' parser seems to be partial
towards us-ascii. I don't know what the semantic difference is supposed
to be between the vertical bar (|) and the broken bar (¦), but the
encoding module treats them very differently:

1 #!/usr/bin/perl
2 use strict;
3 use warnings;
4 use encoding 'iso-8859-1';
5
6 local $\ = "\n";
7 our $string = 'Something fun';
8 print "My string is $string|"; # | == \x{7C} (us-ascii, vert.. bar)
9 print "Broken: $string¦"; # ¦ == \x{A6} (8859-1, broken bar)
10
11 __END__
12
13 The encoding module doesn't seem to like characters
14 above 127. Either put a backslash before the ¦ on line
15 nine, or comment out line 4, and the program runs.

You're right, it appears that anything above 127 triggers the error
message.

print "Broken: $stringµ";
print "Broken: $stringô";
print "Broken: $string´";
print "Broken: $string£";
print "Broken: $string§";

Or, as in your example:

| (124) is okay (below 127)
¦ (166) is not okay (above 127)

128 is just half of 256 (=the available characters in ISO-8859-1). The
range 0-127 can be covered by setting the bits in a 7-bit binary digit,
hence that set is sometimes referred to as 7-bit ASCII. ISO-8859-1 is a
8-bit character set though, so I'ld say this shouldn't normally happen.
I tested on different OS's and as CGI because I was not sure it could
maybe be a shell issue. But that should not be the case here.

Note that the following encoding gives exactly the same results:

use encoding 'ascii';

(Which would be explainable, because ASCII covers 0 to 127 only)

But. Other tests turned out that the following charsets seem to have
the same issue:

use encoding 'iso-8859-16';
use encoding 'utf-8';
use encoding 'utf8';
use encoding 'windows-1251';

So the problem is not only at ISO-8859-1.

I'm not sure where to go from here. I would conclude at this point that
the 'encoding'-module only works for characters up to 127 that are put
next to a variable's name.

I hope this can be of some help.
 
D

Dr.Ruud

Mumia W. schreef:
A bug in the 'encoding' module seems to require spaces around the
right and left double-angle-brackets. This only is needed when a
variable is being interpolated within a double-quoted string. Here's
a demonstration program:

#!/usr/bin/perl
use strict;
use warnings; no utf8 ;
use encoding 'iso-8859-1';

print "«Hi there»\n";
print "«Hello there again»\n";

our $string = 'Something fun';

print "«$string»"; # BUG prevents compilation
print "« $string »"; # spaces are needed around string to compile

__END__

This prints (i18n-file.pl is the name of my script):

Global symbol "%_END__" requires explicit package name at
./i18n-file.pl line 11.
Execution of ./i18n-file.pl aborted due to compilation errors.

shell returned 255

Insert "no utf8;" before the "use encoding ..." line.
 
H

harryfmudd [AT] comcast [DOT] net

Mumia said:
A bug in the 'encoding' module seems to require spaces around the right
and left double-angle-brackets. This only is needed when a variable is
being interpolated within a double-quoted string. Here's a demonstration
program:

#!/usr/bin/perl
use strict;
use warnings;
use encoding 'iso-8859-1';

print "«Hi there»\n";
print "«Hello there again»\n";

our $string = 'Something fun';

print "«$string»"; # BUG prevents compilation
print "« $string »"; # spaces are needed around string to compile

__END__

This prints (i18n-file.pl is the name of my script):

Global symbol "%_END__" requires explicit package name at ./i18n-file.pl
line 11.
Execution of ./i18n-file.pl aborted due to compilation errors.

shell returned 255

If you look in encoding.pm, it appears to me that unless you're using
the filter option, all it does is do some sanity checks on the encoding
name and then set ${^ENCODING} to the given encoding name. This, and the
findings in the adjacent threads (that "«${string}»" works) make it
sound like the Perl parser is mis-handling the end of the interpolated
variable name.

So:

Are you using the latest Perl? I believe this is 5.8.8.

Are you using the latest Encode? I believe this is 2.17, or at least
that is the latest on the CPAN mirror I use, as of the time I write this.

If the answer to both is true, you might want to consider reporting
this. I'm not sure how I would go about this, but the Encode
documentation suggests maybe joining and posting to the Perl Unicode
Mailing List.

Tom Wyant
 
M

Mumia W.

Dr.Ruud said:
Mumia W. schreef:


Insert "no utf8;" before the "use encoding ..." line.

It works!

And I think I see why (from man utf8):
Note that if you have bytes with the eighth bit on in your script (for
example embedded Latin-1 in your string literals), "use utf8" will be
unhappy since the bytes are most probably not well-formed UTF-8. If
you want to have such bytes and use utf8, you can disable utf8 until
the end the block (or file, if at top level) by "no utf8;".

Thanks for the utf8 idea. So it seems that we have a lot of ways to
solve this problem: (1) put a space between the variable and the special
character, (2) put the variable name in curly braces, (3) put a
backslash before the special character, (4) specify 'no utf8', and (5)
go ahead and convert the file to utf8 and 'use utf8':

#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use encoding 'utf-8';

local $\ = "\n";
our $string = 'Something fun';
print "Reg: $string®";
print "B-Bar: $string¦";
print "Quoted: «$string»";
print "Yen: $string¥";
print "Euro: $string€";

our $exoãƒtic = 'ãƒãƒ‹ Ç­ Ñš シß㬠ヌ ã« ã­';
print "exoãƒtic = $exoãƒtic";

__END__


It seems that utf8 extends the core perl parser in some interesting ways.
 
H

harryfmudd [AT] comcast [DOT] net

Mumia said:
Thanks for the utf8 idea. So it seems that we have a lot of ways to
solve this problem: (1) put a space between the variable and the special
character, (2) put the variable name in curly braces, (3) put a
backslash before the special character, (4) specify 'no utf8', and (5)
go ahead and convert the file to utf8 and 'use utf8':

#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use encoding 'utf-8';

local $\ = "\n";
our $string = 'Something fun';
print "Reg: $string®";
print "B-Bar: $string¦";
print "Quoted: «$string»";
print "Yen: $string¥";
print "Euro: $string€";

our $exoãƒtic = 'ãƒãƒ‹ Ç­ Ñš シß㬠ヌ ã« ã­';
print "exoãƒtic = $exoãƒtic";

__END__


It seems that utf8 extends the core perl parser in some interesting ways.

And non-obvious. It looks now like the behaviour is a feature (i.e. is
documented). But it sure didn't pop out on my first pass through the
documentation. Thanks.

Tom Wyant
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top