codec for UTF-8 with BOM

  • Thread starter Ulrich Eckhardt
  • Start date
U

Ulrich Eckhardt

Hi!

I want to write a file starting with the BOM and using UTF-8, and stumbled
across some problems:

1. I would have expected one of the codecs to be 'UTF-8 with BOM' or
something like that, but I can't find the correct name. Also, I can't find a
way to get a list of the supported codecs at all, which strikes me as odd.


2. I couldn't find a way to write the BOM either. Writing codecs.BOM doesn't
work, as it is an already encoded byte string. Of course, I can write
u'\ufeff', but I'd rather avoid such magic numbers in my code.


3. The docs mention encodings.utf_8_sig, available since 2.5, but I can't
locate that thing there either. What's going on here?


What would you do?

Uli
 
C

Chris Rebert

Hi!

I want to write a file starting with the BOM and using UTF-8, and stumbled
across some problems:

1. I would have expected one of the codecs to be 'UTF-8 with BOM' or
something like that, but I can't find the correct name. Also, I can't find a
way to get a list of the supported codecs at all, which strikes me as odd..

If nothing else, there's
http://docs.python.org/library/codecs.html#standard-encodings

The correct name, as you found below and as is corroborated by the
webpage, seems to be "utf_8_sig":'\xef\xbb\xbfFO\xc3\xb8bar'

This could definitely be documented more straightforwardly.

3. The docs mention encodings.utf_8_sig, available since 2.5, but I can't
locate that thing there either. What's going on here?

Works for meâ„¢:
Python 2.6.6 (r266:84292, Jan 12 2011, 13:35:00)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Cheers,
Chris
 
U

Ulrich Eckhardt

Chris said:
3. The docs mention encodings.utf_8_sig, available since 2.5, but I can't
locate that thing there either. What's going on here?

Works for meâ„¢:
Python 2.6.6 (r266:84292, Jan 12 2011, 13:35:00)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

This works for me, too. What I tried and what failed was

import encodings
encodings.utf_8_sig

which raises an AttributeError or dir(encodings), which doesn't show the
according element. If I do it your way, the encoding then shows up in the
content of the module.

Apart from the encoding issue, I don't understand this behaviour. Is the
module behaving badly or are my expectations simply flawed?


Thanks!

Uli
 
P

Peter Otten

Ulrich said:
Chris said:
3. The docs mention encodings.utf_8_sig, available since 2.5, but I
can't locate that thing there either. What's going on here?

Works for meâ„¢:
Python 2.6.6 (r266:84292, Jan 12 2011, 13:35:00)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
from encodings import utf_8_sig

This works for me, too. What I tried and what failed was

import encodings
encodings.utf_8_sig

which raises an AttributeError or dir(encodings), which doesn't show the
according element. If I do it your way, the encoding then shows up in the
content of the module.

Apart from the encoding issue, I don't understand this behaviour. Is the
module behaving badly or are my expectations simply flawed?

This is standard python package behaviour:
Traceback (most recent call last):
<module 'logging.handlers' from '/usr/lib/python2.6/logging/handlers.pyc'>

You wouldn't see the AttributeError only if encodings/__init__.py contained
a line

from . import utf_8_sig

or similar. The most notable package that acts this way is probably os which
eagerly imports a suitable path module depending on the platform.

As you cannot foresee which encodings are actually needed in a script it
makes sense to omit a just-in-case import.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top