Autodetect UTF-8 vs ISO-8859-1

Discussion in 'XML' started by Lars, Dec 8, 2003.

  1. Lars

    Lars Guest

    As part of an error correction mechanism, I'd like to
    autodetect ISO-8859-1 vs UTF-8 usage. Where is this
    described concisely?

    -Lars
     
    Lars, Dec 8, 2003
    #1
    1. Advertising

  2. Lars

    Dean Tiegs Guest

    Lars <> writes:

    > As part of an error correction mechanism, I'd like to autodetect
    > ISO-8859-1 vs UTF-8 usage. Where is this described concisely?


    For an arbitrary text file, it is impossible to distinguish
    automatically between the two and be 100 percent sure of choosing
    correctly. However, if the file contains no invalid UTF-8 sequences,
    it is almost certainly UTF-8. It would be a very unusual ISO-8859-1
    file that did not have invalid UTF-8 sequences.

    For XML files, it's much simpler: if it is ISO-8859-1, it has to be
    declared in the XML declaration.

    --
    Dean Tiegs, NE¼-20-52-25-W4
    “Confortare et esto robustusâ€
    http://telusplanet.net/public/dctiegs/
     
    Dean Tiegs, Dec 8, 2003
    #2
    1. Advertising

  3. Dean Tiegs wrote:
    > For XML files, it's much simpler: if it is ISO-8859-1, it has to be
    > declared in the XML declaration.


    Or some lower-level protocol, like HTTP Content-Type header.
    --
    Johannes Koch
    In te domine speravi; non confundar in aeternum.
    (Te Deum, 4th cent.)
     
    Johannes Koch, Dec 9, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Navanith

    UTF-8 & ISO-8859-1

    Navanith, Jan 5, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    390
    Fred Chateau
    Jan 5, 2004
  2. Peter  Laan
    Replies:
    6
    Views:
    4,176
    Peter Laan
    Mar 7, 2005
  3. gerlar2000
    Replies:
    0
    Views:
    644
    gerlar2000
    Feb 21, 2005
  4. Franck DARRAS
    Replies:
    12
    Views:
    663
    Jim Higson
    Aug 23, 2004
  5. Peter Jacobi
    Replies:
    13
    Views:
    873
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Aug 3, 2004
Loading...

Share This Page