Writing a UTF-8 file

P

pearly

Hello everybody,

Does anyone know how I can write UTF-8 files without
a BOM in Perl?

Whether I open files in utf8 mode (2nd parameter of open or
via binmode) I always end up with
- A BOM "FF FE" (UTF-16LE afaik) at the start of the output file;
- Encoding with minimum 2 bytes per character.

I am reading strings from an external resource, so the following
is not 100% representative but has the same effect:

my $string_with_special_chars = "Château Müller\nGarçon";
# String contains entities acirc, uuml and ccedil.
open F, ">:utf8", "test.txt";
print F $string_with_special_chars;

Tried it both on Linux (Perl 5.8.6) and Windows (Perl 5.8.7).

Difference between utf8 and default mode:
The file created without explicit utf8 mode is readable in
Firefox (UTF-8 encoding). My hex editor shows that for all
characters the 2nd byte is 0x00.
The file opened with ">:utf8" shows hex C3 00 A2 00 for the
u umlaut resp. in total 6 bytes more due to the 3 special chars.

Where does the BOM 0xFF 0xFE come from?
Why does Perl add it?
Doesn't Perl write UTF-8 by default?
Why adding the BOM and why 2 or more bytes per character?

Puzzeling since ages (ok, days) on this.

Thank you for any hints.
MP
 
I

Ian Wilson

pearly said:
Hello everybody,

Does anyone know how I can write UTF-8 files without
a BOM in Perl?

You already posted this question under another name. Why?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top