P
Petr Pajas
Hi,
I'm using Perl 5.8.3 and want it to be 100% UTF-8. I'm however having
troubles with latin-1 characters in strings, since they seem to remain
byte encoded, unless I explicitly call utf8::upgrade, which is very
annoying.
In the example below, \x{e1} is latin1 small aacute,
\x{168} is non-latin1 Scaron. The code shows, that \x{e1}
remains non-UTF8 as long as it meets a non-latin1 character, or
utf8::upgrade is called. Can anyone explain why (and possibly
how to avoid that)?
$ perl -e '
use utf8;
use Devel:eek;
$a="\x{e1}";
$b="\x{e1}\x{168}";
Dump($a);
Dump($b);
utf8::upgrade($a);
Dump($a)'
SV = PV(0x8150000) at 0x816a488
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x8163af8 "\341"\0
CUR = 1
LEN = 2
SV = PV(0x8150090) at 0x816a4c4
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x8162530 "\303\241\305\250"\0 [UTF8 "\x{e1}\x{168}"]
CUR = 4
LEN = 5
SV = PV(0x8150000) at 0x816a488
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x81701a8 "\303\241"\0 [UTF8 "\x{e1}"]
CUR = 2
LEN = 3
Thanks,
-- Petr
I'm using Perl 5.8.3 and want it to be 100% UTF-8. I'm however having
troubles with latin-1 characters in strings, since they seem to remain
byte encoded, unless I explicitly call utf8::upgrade, which is very
annoying.
In the example below, \x{e1} is latin1 small aacute,
\x{168} is non-latin1 Scaron. The code shows, that \x{e1}
remains non-UTF8 as long as it meets a non-latin1 character, or
utf8::upgrade is called. Can anyone explain why (and possibly
how to avoid that)?
$ perl -e '
use utf8;
use Devel:eek;
$a="\x{e1}";
$b="\x{e1}\x{168}";
Dump($a);
Dump($b);
utf8::upgrade($a);
Dump($a)'
SV = PV(0x8150000) at 0x816a488
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x8163af8 "\341"\0
CUR = 1
LEN = 2
SV = PV(0x8150090) at 0x816a4c4
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x8162530 "\303\241\305\250"\0 [UTF8 "\x{e1}\x{168}"]
CUR = 4
LEN = 5
SV = PV(0x8150000) at 0x816a488
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x81701a8 "\303\241"\0 [UTF8 "\x{e1}"]
CUR = 2
LEN = 3
Thanks,
-- Petr