J
Josef Feit
Hi,
I have run accross a Perl behaviour, which I do not
understand:
I am trying to analyze some text with utf8 characters,
eg a file with "nXlXx", where the 'X' stands for
some utf8 encoded character. eg. "náláx"
(not sure whether it gets through).
Please change the 'X' in the %ascii for some
utf8 character (should be 'á').
#!/usr/bin/perl
# -----------------------------------------------------------
use warnings;
use strict;
use encoding 'utf-8';
use 5.010;
my %ascii = (
'X' => 'a',
);
my $line = <>;
chomp $line; # to chomp or not to chomp
print length($line), ": ";;
for( my $i = 0; $i < length($line); $i++ ){
my $znak = substr($line, $i, 1);
if( exists( $ascii{$znak} ) ){
print "+";
}else{
print "-";
}
}
print "\n";
---
The problem is with the chomp:
In case I chomp the $line, the output is as
expected: 5: -+-+-
If I comment out the chomp, the result is
8: --------
so the Perl does not consider the $line to be
utf8 encoded.
Is this a side effect of chomp or do I have it
wrong? I need not to chomp and get the utf8.
perl -v
This is perl, v5.10.0 built for x86_64-linux-thread-multi
Thanks
Josef
I have run accross a Perl behaviour, which I do not
understand:
I am trying to analyze some text with utf8 characters,
eg a file with "nXlXx", where the 'X' stands for
some utf8 encoded character. eg. "náláx"
(not sure whether it gets through).
Please change the 'X' in the %ascii for some
utf8 character (should be 'á').
#!/usr/bin/perl
# -----------------------------------------------------------
use warnings;
use strict;
use encoding 'utf-8';
use 5.010;
my %ascii = (
'X' => 'a',
);
my $line = <>;
chomp $line; # to chomp or not to chomp
print length($line), ": ";;
for( my $i = 0; $i < length($line); $i++ ){
my $znak = substr($line, $i, 1);
if( exists( $ascii{$znak} ) ){
print "+";
}else{
print "-";
}
}
print "\n";
---
The problem is with the chomp:
In case I chomp the $line, the output is as
expected: 5: -+-+-
If I comment out the chomp, the result is
8: --------
so the Perl does not consider the $line to be
utf8 encoded.
Is this a side effect of chomp or do I have it
wrong? I need not to chomp and get the utf8.
perl -v
This is perl, v5.10.0 built for x86_64-linux-thread-multi
Thanks
Josef