PEP 8: Byte Order Mark (BOM) vs coding cookie

twyk · Aug 24, 2008

PEP 8 says ...

Files using ASCII (or UTF-8, for Python 3.0) should not have a coding
cookie.

What about a BOM (Byte Order Mark)? Per Wikipedia ...

http://en.wikipedia.org/wiki/Byte-order_mark#endnote_UTF-8)

'In UTF-8, this is not really a "byte order" mark. It identifies the
text as UTF-8 but doesn't say anything about the byte order, because
UTF-8 does not have byte order issues.'

So is it good style to omit the BOM in UTF-8 for Python 3.0?

Marc 'BlackJack' Rintsch · Aug 24, 2008

So is it good style to omit the BOM in UTF-8 for Python 3.0?

I'd say yes because it is unnecessary with UTF-8 and it messes up the she-
bang line of scripts.

Ciao,
Marc 'BlackJack' Rintsch

Terry Reedy · Aug 25, 2008

twyk said:
PEP 8 says ...

Files using ASCII (or UTF-8, for Python 3.0) should not have a coding
cookie.

What about a BOM (Byte Order Mark)? Per Wikipedia ...

http://en.wikipedia.org/wiki/Byte-order_mark#endnote_UTF-8)

'In UTF-8, this is not really a "byte order" mark. It identifies the
text as UTF-8 but doesn't say anything about the byte order, because
UTF-8 does not have byte order issues.'

So is it good style to omit the BOM in UTF-8 for Python 3.0?

According to Unicode manual, yes.

http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf

The endian order entry for UTF-8 in Table 2-4 is marked N/A because
UTF-8 code units are 8 bits in size, and the usual machine issues of
endian order for larger code units do not apply. The serialized order of
the bytes must not depart from the order defined by the UTF-
8 encoding form. Use of a BOM is neither required nor recommended for
UTF-8, but may be encountered in contexts where UTF-8 data is converted
from other encoding forms that use a BOM or where the BOM is used as a
UTF-8 signature. See the “Byte Order Mark” subsection in Section 16.8,
Specials, for more information.

Since Ascii files *are*, by intentional design, UTF-8 files, and since
Python assumes Ascii/UTF-8 as the default, in the absence of a coding
cookie, it does not need the signature.

UTF-8 question from Dive into Python 3	19	Jan 17, 2011
XML-Parsing with UTF-8 Byte-Order-Mark (BOM)	0	Jun 25, 2007
XML-Parsing with UTF-8 Byte-Order-Mark (BOM)	3	Jun 25, 2007
Unicode BOM marks	9	Mar 7, 2005
Q: Cteni unicode retezcu ze souboru UTF-8 s BOM?	0	Mar 14, 2007
pep-8 vs. external interfaces?	3	Jan 4, 2009
Question about PEP 8	2	Sep 10, 2007
UTF - SEEK_SET workaround for BOM encoding(utf-16/32) layer Bug	2	Aug 5, 2009

PEP 8: Byte Order Mark (BOM) vs coding cookie

twyk

Marc 'BlackJack' Rintsch

Terry Reedy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads