How test the encoding of a file ?

Discussion in 'XML' started by YGUEL, Apr 1, 2004.

  1. YGUEL

    YGUEL Guest

    Hello,
    do you know a good program to test what sort of charachters encoding
    is used in a file.
    I use iconv but it only can translate from a charachter encoding to an
    other. The problem is that I have some files and the way I get them
    doesn't assure me that what encoding they pretend to be is the one
    they use.

    Thanks for threading on this subject with me.

    P.S. I doesn't think that test all the encoding possibilities with
    iconv is a good solution.
    YGUEL, Apr 1, 2004
    #1
    1. Advertising

  2. YGUEL

    Manuel Yguel Guest

    YGUEL wrote:
    > Hello,
    > do you know a good program to test what sort of charachters encoding
    > is used in a file.
    > I use iconv but it only can translate from a charachter encoding to an
    > other. The problem is that I have some files and the way I get them
    > doesn't assure me that what encoding they pretend to be is the one
    > they use.
    >
    > Thanks for threading on this subject with me.
    >
    > P.S. I doesn't think that test all the encoding possibilities with
    > iconv is a good solution.

    I have see the Appendix F of XML 1.0 but does-it exists a code which
    does that ?
    Manuel Yguel, Apr 1, 2004
    #2
    1. Advertising

  3. "YGUEL" <> wrote in message
    news:...
    > Hello,
    > do you know a good program to test what sort of charachters encoding
    > is used in a file.


    Conformant xml parsers do this up to certain point (the ones that implements
    xml spec 1.0 appendix F as you mentioned).

    > I use iconv but it only can translate from a charachter encoding to an
    > other. The problem is that I have some files and the way I get them
    > doesn't assure me that what encoding they pretend to be is the one
    > they use.
    >


    The problem here is there is no idiot proof way to do this -
    if we have this kind of document for example:
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <doc>*</doc>

    where * would be copyright sign for example (ASCII value xA9)
    BUT despite of ISO-8859-1 being specified document would have
    been saved in UTF-8 and thus * would be saved as ASCII
    values xC2xA9. Now if you load that file with xml parser
    you get xC3x82xC2xA9 (first 2 bytes is xC2 converted to ÚTF-8
    and last to bytes is A9 converted to UTF-8)
    bytes xC2 and xA9 being perfectly legal latin1 characters, how
    would you detect that the file was saved in wrong encoding?

    > Thanks for threading on this subject with me.
    >
    > P.S. I doesn't think that test all the encoding possibilities with
    > iconv is a good solution.


    If you're dealing with xml, xml declaration with encoding="whatever"
    specified would be only recognized by an xml parser, not iconv,
    there might be some solutions available I'm not aware though, try google.

    with respect,
    Toni Uusitalo
    Toni Uusitalo, Apr 1, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hardy Wang

    Encoding.Default and Encoding.UTF8

    Hardy Wang, Jun 8, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    18,838
    Jon Skeet [C# MVP]
    Jun 9, 2004
  2. Replies:
    1
    Views:
    23,348
    Real Gagnon
    Oct 8, 2004
  3. Skybuck Flying

    Call oddities: &Test() vs &Test vs Test

    Skybuck Flying, Oct 4, 2009, in forum: C Programming
    Replies:
    1
    Views:
    694
    Skybuck Flying
    Oct 4, 2009
  4. Hunt Jon
    Replies:
    1
    Views:
    101
    Patrick Doyle
    Dec 15, 2008
  5. Replies:
    2
    Views:
    366
Loading...

Share This Page