I hope my understanding is correct and I'm not dreaming. When an endianess is not specified, (BE, LE, unmarked forms), the Unicode Consortium specifies, the default byte serialization should be big-endian. See http://www.unicode.org/faq//utf_bom.html Q: Which of the UTFs do I need to support? and Q: Why do some of the UTFs have a BE or LE in their label, such as UTF-16LE? (+ technical papers) It appears Python is just working in the opposite way. True Ditto with utf-32 and with utf-16/utf-32 in Python 3.1.2 I attempted to find some precise discussions on that subject and I failed. Any thougths?