Wrong default endianess in utf-16 and utf-32 !?

Discussion in 'Python' started by jmfauth, Oct 12, 2010.

  1. jmfauth

    jmfauth Guest

    I hope my understanding is correct and I'm not dreaming.

    When an endianess is not specified, (BE, LE, unmarked forms),
    the Unicode Consortium specifies, the default byte serialization
    should be big-endian.

    See http://www.unicode.org/faq//utf_bom.html
    Q: Which of the UTFs do I need to support?
    Q: Why do some of the UTFs have a BE or LE in their label,
    such as UTF-16LE?

    (+ technical papers)

    It appears Python is just working in the opposite way.

    Ditto with utf-32 and with utf-16/utf-32 in Python 3.1.2

    I attempted to find some precise discussions on that subject
    and I failed.

    Any thougths?
    jmfauth, Oct 12, 2010
    1. Advertisements

  2. Python uses the host's endianness by default. So, on a little-endian
    machine, utf-16 and utf-32 will use little-endian encoding.
    While decoding, though, the BOM is read by both of these codecs, so
    there should be no interoperability problems:

    (do note, though, that the explicit utf*-be and utf*-le variants do not
    add a BOM)


    Antoine Pitrou, Oct 12, 2010
    1. Advertisements

  3. jmfauth

    jmfauth Guest

    Thanks. I never have been aware of this.
    jmfauth, Oct 12, 2010
  4. jmfauth

    John Machin Guest

    Sometimes it is necessary to read right to the end of an answer:

    Q: Why do some of the UTFs have a BE or LE in their label, such as UTF-16LE?

    A: [snip] the unmarked form uses big-endian byte serialization by default, but
    may include a byte order mark at the beginning to indicate the actual byte
    serialization used.
    John Machin, Oct 12, 2010
  5. jmfauth

    jmfauth Guest

    Well, English is not my native language, however I think I read it

    My question had nothing to do with the BOM, the encoding/decoding
    or the BOM inclusion. My question was:

    "What should I understand by "utf-16"? "utf-16-le" or "utf-16-be"?

    And Antoine gave an answer.
    jmfauth, Oct 13, 2010
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.