how do you identify if a file is utf8 in perl

Discussion in 'Perl Misc' started by julian@ukonline.co.uk, Nov 8, 2006.

  1. Guest

    how do you identify if a file is utf8 in perl
    , Nov 8, 2006
    #1
    1. Advertising

  2. Dr.Ruud Guest

    schreef:

    > how do you identify if a file is utf8 in perl


    There is no perfect way to do that, whether you use perl or any other
    executable.

    An ASCII file (all bytes 0-127) is also a UTF-8 file (and a utf8 file).

    The Windows Notepad text editor prefixes a UTF-8 file with a
    BOM-special U+FEFF, which leads to the initial bytes EF BB BF. See
    http://www.unicode.org/book/ch13.pdf (13.6 Specials). So you could
    check for that "signature".

    You could also read the file, in an eval-block, with the utf8-layer
    active, once from start to end (or some other limit), to check its
    utf8-ness.

    See also perluniintro and perlunicode.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Nov 8, 2006
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shilpa
    Replies:
    1
    Views:
    733
    Brendan Green
    Mar 22, 2006
  2. =?Utf-8?B?am9obm55Rw==?=

    Can you identify the cell clicked in an ASP.NET datagrid?

    =?Utf-8?B?am9obm55Rw==?=, Mar 17, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    2,162
    =?Utf-8?B?am9obm55Rw==?=
    Mar 17, 2006
  3. Benjamin G. Jones
    Replies:
    0
    Views:
    400
    Benjamin G. Jones
    Jan 8, 2004
  4. gry
    Replies:
    2
    Views:
    718
    Alf P. Steinbach
    Mar 13, 2012
  5. ~greg
    Replies:
    1
    Views:
    108
Loading...

Share This Page