HTML::Entities::encode() returning wrong(?) entities

J

Jim Higson

I'm calling encode_entities on some text I have read from a file, to turn it
into a webpage. According to file:

$ file text/text.en
$ text/text.en: UTF-8 Unicode English text, with very long lines

(although this might not matter)
Anyway, the letter ä appears in the text, and should be changed to ä

However, instead it is changed to:
ä

I can't see anything unusual about my code. Any ideas why I'm having this
problem?
 
J

Jim Higson

Jim said:
I'm calling encode_entities on some text I have read from a file, to turn
it into a webpage. According to file:

$ file text/text.en
$ text/text.en: UTF-8 Unicode English text, with very long lines

(although this might not matter)
Anyway, the letter ä appears in the text, and should be changed to ä

However, instead it is changed to:
ä

I can't see anything unusual about my code. Any ideas why I'm having this
problem?


I just found the answer myself - as I suspected it was to do with reading
the unicode in perl. Adding use open ':utf8'; to the top of the source
fixed this (although I'm not quite certain exactly what this means)
 
J

Joe Smith

Jim said:
$ text/text.en: UTF-8 Unicode English text, with very long lines
Anyway, the letter ä appears in the text, and should be changed to ä

In UTF-8 encoding, the single character "ä" is stored as two bytes:
"\xC3" and "\xA9". If you allow perl to think that the file is ISO-8859-1,
it will interpret those two bytes as "Ã" and "©". You need to tell perl
that the file is :utf8 in order for it to recognize those two bytes as
being a single Unicode character.

-Joe
 
E

Eric Amick

I just found the answer myself - as I suspected it was to do with reading
the unicode in perl. Adding use open ':utf8'; to the top of the source
fixed this (although I'm not quite certain exactly what this means)

It tells Perl to open all files with UTF-8 encoding set by default. Only
you can say whether that is the right thing. If it isn't, you can
specify it for specific files by using ':utf8' as the second argument of
a three-argument open or with a binmode call on the appropriate
filehandle.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,835
Latest member
KetoRushACVBuy

Latest Threads

Top