Why BOM in logging message?

Discussion in 'Python' started by Roy Smith, Jan 9, 2013.

  1. Roy Smith

    Roy Smith Guest

    We've got 10 (supposedly) identical servers, all running Ubuntu 12.04,
    Python 2.7, Django 1.3. We log to syslog using the logging module and
    a custom fomatter.

    'formatters': {
    'verbose': {
    'format': '%(asctime)s [%(process)d]: %(program)s %(session_id)s %(request_id)s %(request_id_digest)s %(remote_addr)s %(name)s %(level\
    name)s %(funcName)s() %(message)s',
    '()': 'songza.logging.ContextFormatter',
    },
    },

    There's nothing particularly exciting in the formatter code:

    class ContextFormatter(logging.Formatter):
    def format(self, record):
    record.program = context.get_program()
    record.request_id = context.get_request_id()
    record.request_id_digest = context.get_request_id_digest()
    record.session_id = context.get_session_id() or '-'
    record.remote_addr = context.get_remote_addr() or '-'
    return logging.Formatter.format(self, record)

    What's weird is that two of the servers, and only those two, stick a
    BOM (Byte Order Mark) in front of the message they log. It shows up
    in syslog as:

    2013-01-09T00:00:00+00:00 web5.songza.com <U+FEFF>2013-01-0900:00:00,754 [18979]: [etc...]

    The other machines, never put the BOM in. Given that all 10 machines
    are ostensibly clones of each other, we're scratching our heads what
    might be different about those two which cause the BOMs to appear.
    Any ideas?

    I suppose it's possible it's a syslog config problem and not a Python
    problem, but I figured I'd start at the beginning of the chain.
    Roy Smith, Jan 9, 2013
    #1
    1. Advertising

  2. On Thu, Jan 10, 2013 at 9:54 AM, Roy Smith <> wrote:
    > What's weird is that two of the servers, and only those two, stick a
    > BOM (Byte Order Mark) in front of the message they log.


    Could it be this issue you're looking at?

    http://bugs.python.org/issue14452

    What are the exact Python versions in use? Are the two different
    servers running an older revision of Py 2.7?

    ChrisA
    Chris Angelico, Jan 9, 2013
    #2
    1. Advertising

  3. Roy Smith

    John Gordon Guest

    In <kcksg2$bkp$> (Roy Smith) writes:

    > What's weird is that two of the servers, and only those two, stick a
    > BOM (Byte Order Mark) in front of the message they log. It shows up
    > in syslog as:


    > 2013-01-09T00:00:00+00:00 web5.songza.com <U+FEFF>2013-01-0900:00:00,754 [18979]: [etc...]


    I worked on an application that would insert a BOM in syslog messages if
    the logged message contained unicode, but not if it was plain ascii.

    Not sure if this relates to your issue, but it's similar enough that it
    seemed worth mentioning.

    --
    John Gordon A is for Amy, who fell down the stairs
    B is for Basil, assaulted by bears
    -- Edward Gorey, "The Gashlycrumb Tinies"
    John Gordon, Jan 9, 2013
    #3
  4. Roy Smith

    Roy Smith Guest

    In article <kcktti$sbg$>,
    John Gordon <> wrote:

    > In <kcksg2$bkp$> (Roy Smith) writes:
    >
    > > What's weird is that two of the servers, and only those two, stick a
    > > BOM (Byte Order Mark) in front of the message they log. It shows up
    > > in syslog as:

    >
    > > 2013-01-09T00:00:00+00:00 web5.songza.com <U+FEFF>2013-01-0900:00:00,754
    > > [18979]: [etc...]

    >
    > I worked on an application that would insert a BOM in syslog messages if
    > the logged message contained unicode, but not if it was plain ascii.
    >
    > Not sure if this relates to your issue, but it's similar enough that it
    > seemed worth mentioning.


    That doesn't seem to be it. All messages from web{2,5} have BOMs, no
    message from web{1,3,4,6,7,8,9,10} ever does.

    I even tried looking at the output of socket.gethostname() on the
    various machines to see if maybe the hostname had some unicode character
    in it. No joy.
    Roy Smith, Jan 10, 2013
    #4
  5. Roy Smith

    Roy Smith Guest

    In article <>,
    Chris Angelico <> wrote:

    > On Thu, Jan 10, 2013 at 9:54 AM, Roy Smith <> wrote:
    > > What's weird is that two of the servers, and only those two, stick a
    > > BOM (Byte Order Mark) in front of the message they log.

    >
    > Could it be this issue you're looking at?
    >
    > http://bugs.python.org/issue14452
    >
    > What are the exact Python versions in use? Are the two different
    > servers running an older revision of Py 2.7?
    >
    > ChrisA


    It sounds like it might be it, but we're running 2.7.3 on all machines.
    Roy Smith, Jan 10, 2013
    #5
  6. Roy Smith

    Roy Smith Guest

    In article <>,
    Roy Smith <> wrote:
    >In article <>,
    > Chris Angelico <> wrote:
    >
    >> On Thu, Jan 10, 2013 at 9:54 AM, Roy Smith <> wrote:
    >> > What's weird is that two of the servers, and only those two, stick a
    >> > BOM (Byte Order Mark) in front of the message they log.

    >>
    >> Could it be this issue you're looking at?
    >>
    >> http://bugs.python.org/issue14452
    >>
    >> What are the exact Python versions in use? Are the two different
    >> servers running an older revision of Py 2.7?
    >>
    >> ChrisA

    >
    >It sounds like it might be it, but we're running 2.7.3 on all machines.


    Well, this is fascinating. It turns out that while all of our
    machines report that they're running 2.7.3, they have two different
    versions of /usr/lib/python2.7/logging/handlers.py!

    -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
    -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py

    The April 24th version has the BOM code in SysLogHander.emit():

    | # Message is a string. Convert to bytes as required by RFC 5424
    | if type(msg) is unicode:
    | msg = msg.encode('utf-8')
    | if codecs:
    | msg = codecs.BOM_UTF8 + msg
    | msg = prio + msg

    and the August 1st version doesn't:

    | # Message is a string. Convert to bytes as required by RFC 5424
    | if type(msg) is unicode:
    | msg = msg.encode('utf-8')
    | msg = prio + msg

    Is it possible there's two different 2.7.3 builds that somehow got
    packaged by Ubuntu at different times? The pattern of which machines
    have the August code and which have the April code correlates with
    when we rolled out each server instance.
    Roy Smith, Jan 10, 2013
    #6
  7. On Fri, Jan 11, 2013 at 3:06 AM, Roy Smith <> wrote:
    > Well, this is fascinating. It turns out that while all of our
    > machines report that they're running 2.7.3, they have two different
    > versions of /usr/lib/python2.7/logging/handlers.py!
    >
    > -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
    > -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py


    Ha, that would do it!

    I don't have a corresponding system to compare against, but what
    package is that file managed by?

    $ dpkg -S /usr/lib/python2.7/logging/handlers.py

    See if both systems show it as part of the same package, and if so, if
    the package is at the same version on each. On my Maverick desktop, I
    have 2.6, and the package is python2.6-minimal.

    ChrisA
    Chris Angelico, Jan 10, 2013
    #7
  8. Roy Smith

    Roy Smith Guest

    >On Fri, Jan 11, 2013 at 3:06 AM, Roy Smith <> wrote:
    >> -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
    >> -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py


    Chris Angelico <> wrote:
    >$ dpkg -S /usr/lib/python2.7/logging/handlers.py


    Yup, on some machines we've got 2.7.3-0ubuntu3, and on others,
    2.7.3-0ubuntu3.1 of python2.7-minimal. Details at:

    https://launchpad.net/ubuntu/ source/python2.7/2.7.3-0ubuntu3.1

    Well, I guess today is dist-upgrade day :)
    Roy Smith, Jan 10, 2013
    #8
  9. On Fri, Jan 11, 2013 at 3:40 AM, Roy Smith <> wrote:
    >>On Fri, Jan 11, 2013 at 3:06 AM, Roy Smith <> wrote:
    >>> -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
    >>> -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py

    >
    > Chris Angelico <> wrote:
    >>$ dpkg -S /usr/lib/python2.7/logging/handlers.py

    >
    > Yup, on some machines we've got 2.7.3-0ubuntu3, and on others,
    > 2.7.3-0ubuntu3.1 of python2.7-minimal. Details at:
    >
    > https://launchpad.net/ubuntu/ source/python2.7/2.7.3-0ubuntu3.1


    I love it when everything adds up! The message even cites the specific
    change. It's like a cryptic crossword - once you get the answer, you
    KNOW it's the answer because suddenly it all makes sense :)

    Thanks for bringing a fun problem to solve! It's (unfortunately) a
    refreshing change to read a post from someone who knows how to ask
    smart questions.

    ChrisA
    Chris Angelico, Jan 10, 2013
    #9
  10. Roy Smith

    Terry Reedy Guest

    On 1/10/2013 11:06 AM, Roy Smith wrote:

    > Well, this is fascinating. It turns out that while all of our
    > machines report that they're running 2.7.3, they have two different
    > versions of /usr/lib/python2.7/logging/handlers.py!
    >
    > -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
    > -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py
    >
    > The April 24th version has the BOM code in SysLogHander.emit():


    The 'BOM' you are discussing here is not a true 2 or 4 byte
    byte-order-mark, but the pseudo-BOM (pseudo because a stream of single
    bytes has no byte order within the single bytes) that Micro$tupid adds
    (sometimes) to utf-8 encoded bytes to mark their encoding as utf-8
    rather than anything else. In otherwords, it is a non-(anti-)standard
    U(nicode)E(ncoding)M(ark). It is actually the utf-8 encoding of the
    2-byte BOM, and hence not a single mark! It is really confusing when
    people use 'BOM' to refer to a UEM sequence instead of a BOM.

    > | # Message is a string. Convert to bytes as required by RFC 5424
    > | if type(msg) is unicode:
    > | msg = msg.encode('utf-8')
    > | if codecs:
    > | msg = codecs.BOM_UTF8 + msg
    > | msg = prio + msg


    2.7.3 was released in April.

    > and the August 1st version doesn't:
    >
    > | # Message is a string. Convert to bytes as required by RFC 5424
    > | if type(msg) is unicode:
    > | msg = msg.encode('utf-8')
    > | msg = prio + msg


    The issue referenced in an earlier messaged was to remove the UEM where
    it did not belong.

    > Is it possible there's two different 2.7.3 builds that somehow got
    > packaged by Ubuntu at different times?


    As you discovered, Ubuntu sometimes updates a release with bug patches
    before we release a new official bugfix release. 2.7.4, with many
    bugfixes, is still to see the light of day.

    > The pattern of which machines
    > have the August code and which have the April code correlates with
    > when we rolled out each server instance.


    Great detective work ;-).

    --
    Terry Jan Reedy
    Terry Reedy, Jan 10, 2013
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. X_AWemner_X
    Replies:
    3
    Views:
    3,771
    NoName NoName
    Jul 3, 2003
  2. X_AWieminer_X
    Replies:
    2
    Views:
    12,281
    JomoFrodo
    Aug 18, 2011
  3. Mr. SweatyFinger

    why why why why why

    Mr. SweatyFinger, Nov 28, 2006, in forum: ASP .Net
    Replies:
    4
    Views:
    862
    Mark Rae
    Dec 21, 2006
  4. Mr. SweatyFinger
    Replies:
    2
    Views:
    1,762
    Smokey Grindel
    Dec 2, 2006
  5. Replies:
    3
    Views:
    981
    Gabriel Genellina
    Jul 23, 2009
Loading...

Share This Page