PEP 8: Byte Order Mark (BOM) vs coding cookie

Discussion in 'Python' started by twyk, Aug 24, 2008.

  1. twyk

    twyk Guest

    PEP 8 says ...

    Files using ASCII (or UTF-8, for Python 3.0) should not have a coding
    cookie.

    What about a BOM (Byte Order Mark)? Per Wikipedia ...

    http://en.wikipedia.org/wiki/Byte-order_mark#endnote_UTF-8)

    'In UTF-8, this is not really a "byte order" mark. It identifies the
    text as UTF-8 but doesn't say anything about the byte order, because
    UTF-8 does not have byte order issues.'

    So is it good style to omit the BOM in UTF-8 for Python 3.0?
     
    twyk, Aug 24, 2008
    #1
    1. Advertisements

  2. On Sun, 24 Aug 2008 07:28:53 -0700, twyk wrote:

    > So is it good style to omit the BOM in UTF-8 for Python 3.0?


    I'd say yes because it is unnecessary with UTF-8 and it messes up the she-
    bang line of scripts.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Aug 24, 2008
    #2
    1. Advertisements

  3. twyk

    Terry Reedy Guest

    twyk wrote:
    > PEP 8 says ...
    >
    > Files using ASCII (or UTF-8, for Python 3.0) should not have a coding
    > cookie.


    > What about a BOM (Byte Order Mark)? Per Wikipedia ...
    >
    > http://en.wikipedia.org/wiki/Byte-order_mark#endnote_UTF-8)
    >
    > 'In UTF-8, this is not really a "byte order" mark. It identifies the
    > text as UTF-8 but doesn't say anything about the byte order, because
    > UTF-8 does not have byte order issues.'
    >
    > So is it good style to omit the BOM in UTF-8 for Python 3.0?


    According to Unicode manual, yes.

    http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf

    The endian order entry for UTF-8 in Table 2-4 is marked N/A because
    UTF-8 code units are 8 bits in size, and the usual machine issues of
    endian order for larger code units do not apply. The serialized order of
    the bytes must not depart from the order defined by the UTF-
    8 encoding form. Use of a BOM is neither required nor recommended for
    UTF-8, but may be encountered in contexts where UTF-8 data is converted
    from other encoding forms that use a BOM or where the BOM is used as a
    UTF-8 signature. See the “Byte Order Mark” subsection in Section 16.8,
    Specials, for more information.

    Since Ascii files *are*, by intentional design, UTF-8 files, and since
    Python assumes Ascii/UTF-8 as the default, in the absence of a coding
    cookie, it does not need the signature.
     
    Terry Reedy, Aug 25, 2008
    #3
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Christoph Becker-Freyseng

    PEP for new modules (I read PEP 2)

    Christoph Becker-Freyseng, Jan 15, 2004, in forum: Python
    Replies:
    3
    Views:
    594
    Gerrit Holl
    Jan 16, 2004
  2. Roman Roelofsen

    coding conventions, PEP vs. practice

    Roman Roelofsen, Jan 5, 2005, in forum: Python
    Replies:
    3
    Views:
    421
    Terry Reedy
    Jan 5, 2005
  3. Replies:
    3
    Views:
    1,679
    Mike Schilling
    Jun 29, 2007
  4. Replies:
    0
    Views:
    1,280
  5. Lie
    Replies:
    25
    Views:
    1,004
    Dafydd Hughes
    Dec 18, 2007
  6. Cirene
    Replies:
    5
    Views:
    863
    Cirene
    May 17, 2008
  7. Robert Evans
    Replies:
    7
    Views:
    626
    Joel VanderWerf
    Nov 15, 2005
  8. Tim Perrett
    Replies:
    1
    Views:
    345
    Tim Perrett
    Jul 25, 2007
Loading...