I'm using Python 3.3 (CPython) and am having trouble getting the standard
gettext module to handle Unicode messages.
My problem can be isolated as follows:

I have 3 files in a folder:, greeting.po and

-- --
import gettext

t = gettext.translation("greeting", "locale", ["pt"])
_ = t.lgettext

print("_charset = {0}\n".format(t._charset))
-- EOF --

-- greeting.po --
msgid ""
msgstr ""
"Project-Id-Version: 1.0\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

msgid "hello"
msgstr "olá"
-- EOF -- was downloaded from, since
this tool apparently isn't included in the python3 package available on
Arch Linux official repositories.

It's probably also worth noting that the file greeting.po is encoded itself
as UTF-8.
From that folder, I run the following commands:

$ mkdir -p locale/pt/LC_MESSAGES
$ python -o !$/ greeting.po
$ python

The output is:
_charset = UTF-8

Traceback (most recent call last):
File "", line 7, in <module>
File "/usr/lib/python3.3/", line 314, in lgettext
return tmsg.encode(locale.getpreferredencoding())
UnicodeEncodeError: 'ascii' codec can't encode character '\xe1' in position
2: ordinal not in range(128)

My interpretation of this output is that even though gettext correctly
detects the MO file charset as UTF-8, it tries to encode the translated
message with the system's "preferred encoding", which happens to be ASCII.

Anyone know why this happens? Is this a bug on my code? Maybe I have
misunderstood gettext...




