How do I create a new text file with utf-8 encoding

Discussion in 'Perl Misc' started by bk@docstream.no, May 10, 2007.

  1. Guest

    I use Activeperl version 5.8.8.817 on windows xp.

    I try create a new text file and add some content but when I open it
    in notepad, it says its a ansi encoded file. Why?

    Here is my code snippit:

    open my $fh, '>:encoding(UTF-8)', "testfile.txt";
    print $fh "Welcome to Muppet Show\n";
    close $fh;

    What do I do wrong?
     
    , May 10, 2007
    #1
    1. Advertising

  2. wrote:
    > I use Activeperl version 5.8.8.817 on windows xp.
    >
    > I try create a new text file and add some content but when I open it
    > in notepad, it says its a ansi encoded file. Why?
    >
    > open my $fh, '>:encoding(UTF-8)', "testfile.txt";
    > print $fh "Welcome to Muppet Show\n";
    > close $fh;
    >
    > What do I do wrong?


    Your sample text has the identical byte sequence in ASCII, Windows-1252 (aka
    ANSI), UTF-8, ISO-Latin1, ISO-Latin15, and probably a dozen other encodings.
    Therefore your sample is useless for testing for the correct encoding.

    Notepad relies on the byte order mark (BOM) do identify Unicode files,
    including UTF-8 where the BOM of course is meaningless and not used except
    by Notepad itself. In not so many words: Notepad has no clue what it is
    talking about. But for your sample text nor would any other tool.

    Step 1: use some sample text that contains characters, that have different
    code points in each encoding.
    Step 2: don't use Notepad. Write to a (trivial) HTML file and then use a web
    browser to view that file. There you can change the encoding and determine,
    if those characters are displayed correctly for the desired encoding.

    In over 8 years as software localization engineer and international program
    manager this has proven to be the only practical and reliable way to
    identify the actual encoding of a file.

    jue
     
    Jürgen Exner, May 10, 2007
    #2
    1. Advertising

  3. On May 10, 3:05 pm, "Jürgen Exner" <> wrote:
    > wrote:
    > > I use Activeperl version 5.8.8.817 on windows xp.

    >
    > > I try create a new text file and add some content but when I open it
    > > in notepad, it says its a ansi encoded file. Why?

    >
    > > open my $fh, '>:encoding(UTF-8)', "testfile.txt";
    > > print $fh "Welcome to Muppet Show\n";
    > > close $fh;

    >
    > > What do I do wrong?

    >
    > Your sample text has the identical byte sequence in ASCII, Windows-1252 (aka
    > ANSI), UTF-8, ISO-Latin1, ISO-Latin15, and probably a dozen other encodings.
    > Therefore your sample is useless for testing for the correct encoding.
    >
    > Notepad relies on the byte order mark (BOM) do identify Unicode files,
    > including UTF-8 where the BOM of course is meaningless and not used except
    > by Notepad itself.


    You mean Windows not Notepad. Most Windows programs will recognise a
    file with a utf8 BOM at the start as utf8.

    In a situation where you've got a mixture of Windows-1252 and utf8
    files knocking about then it's not a bad way to distinguish them. I'm
    not saying I particularly liked Microsoft's unilateral adoption of BOM
    in utf8 but I have to admit it makes the best of a bad job.

    In Perl I'd like to be able to say something like

    open my $fh, '>:encoding(UTF-8 BOM)', "testfile.txt";

    But AFIAK I can't and I just have to

    print $fh "\x{FEFF}"; # BOM
     
    Brian McCauley, May 10, 2007
    #3
  4. Brian McCauley wrote:
    > In a situation where you've got a mixture of Windows-1252 and utf8
    > files knocking about then it's not a bad way to distinguish them. I'm
    > not saying I particularly liked Microsoft's unilateral adoption of BOM
    > in utf8 but I have to admit it makes the best of a bad job.


    Fair enough, you got a point.
    However calling it a _Byte_Order_ Mark in context of UTF-8 is a misnomer if
    there ever has been one ;-)

    jue
     
    Jürgen Exner, May 10, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. mel-tey chamon!
    Replies:
    2
    Views:
    1,390
    Patrick TJ McPhee
    May 16, 2004
  2. Replies:
    1
    Views:
    575
    gene tani
    Dec 20, 2005
  3. Jin Lee
    Replies:
    4
    Views:
    143
    Jin Lee
    Sep 21, 2009
  4. Replies:
    2
    Views:
    419
  5. Replies:
    2
    Views:
    410
    Nathan Keel
    Aug 14, 2009
Loading...

Share This Page