PEP 8: Byte Order Mark (BOM) vs coding cookie

Discussion in 'Python' started by twyk, Aug 24, 2008.

  1. twyk

    twyk Guest

    PEP 8 says ...

    Files using ASCII (or UTF-8, for Python 3.0) should not have a coding
    cookie.

    What about a BOM (Byte Order Mark)? Per Wikipedia ...

    http://en.wikipedia.org/wiki/Byte-order_mark#endnote_UTF-8)

    'In UTF-8, this is not really a "byte order" mark. It identifies the
    text as UTF-8 but doesn't say anything about the byte order, because
    UTF-8 does not have byte order issues.'

    So is it good style to omit the BOM in UTF-8 for Python 3.0?
     
    twyk, Aug 24, 2008
    #1
    1. Advertising

  2. On Sun, 24 Aug 2008 07:28:53 -0700, twyk wrote:

    > So is it good style to omit the BOM in UTF-8 for Python 3.0?


    I'd say yes because it is unnecessary with UTF-8 and it messes up the she-
    bang line of scripts.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Aug 24, 2008
    #2
    1. Advertising

  3. twyk

    Terry Reedy Guest

    twyk wrote:
    > PEP 8 says ...
    >
    > Files using ASCII (or UTF-8, for Python 3.0) should not have a coding
    > cookie.


    > What about a BOM (Byte Order Mark)? Per Wikipedia ...
    >
    > http://en.wikipedia.org/wiki/Byte-order_mark#endnote_UTF-8)
    >
    > 'In UTF-8, this is not really a "byte order" mark. It identifies the
    > text as UTF-8 but doesn't say anything about the byte order, because
    > UTF-8 does not have byte order issues.'
    >
    > So is it good style to omit the BOM in UTF-8 for Python 3.0?


    According to Unicode manual, yes.

    http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf

    The endian order entry for UTF-8 in Table 2-4 is marked N/A because
    UTF-8 code units are 8 bits in size, and the usual machine issues of
    endian order for larger code units do not apply. The serialized order of
    the bytes must not depart from the order defined by the UTF-
    8 encoding form. Use of a BOM is neither required nor recommended for
    UTF-8, but may be encountered in contexts where UTF-8 data is converted
    from other encoding forms that use a BOM or where the BOM is used as a
    UTF-8 signature. See the “Byte Order Mark” subsection in Section 16.8,
    Specials, for more information.

    Since Ascii files *are*, by intentional design, UTF-8 files, and since
    Python assumes Ascii/UTF-8 as the default, in the absence of a coding
    cookie, it does not need the signature.
     
    Terry Reedy, Aug 25, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    1,279
    Mike Schilling
    Jun 29, 2007
  2. Replies:
    0
    Views:
    1,065
  3. Tim Perrett
    Replies:
    1
    Views:
    244
    Tim Perrett
    Jul 25, 2007
  4. Tim Streater
    Replies:
    0
    Views:
    330
    Tim Streater
    May 27, 2013
  5. dorayme
    Replies:
    0
    Views:
    264
    dorayme
    May 28, 2013
Loading...

Share This Page