how do you identify if a file is utf8 in perl

D

Dr.Ruud

(e-mail address removed) schreef:
how do you identify if a file is utf8 in perl

There is no perfect way to do that, whether you use perl or any other
executable.

An ASCII file (all bytes 0-127) is also a UTF-8 file (and a utf8 file).

The Windows Notepad text editor prefixes a UTF-8 file with a
BOM-special U+FEFF, which leads to the initial bytes EF BB BF. See
http://www.unicode.org/book/ch13.pdf (13.6 Specials). So you could
check for that "signature".

You could also read the file, in an eval-block, with the utf8-layer
active, once from start to end (or some other limit), to check its
utf8-ness.

See also perluniintro and perlunicode.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top