Problems with gettext and msgfmt

JKPeck · Dec 15, 2009

I'm using Python 2.6 on Windows and having trouble with the charset in
gettext. It seems to be so broken that I must be missing something.

When I run msgfmt.py, as far as I can see it writes no charset
information into the mo file. The actual po files are in utf-8 in
this case and have a charset declaration.

Then when ,_parse in gettext loads the messages, it does no conversion
to Unicode, because it has no charset information. So the message
dictionary is actually in utf-8 despite the comment in the code
# Note: we unconditionally convert both msgids and msgstrs to
# Unicode using the character encoding specified in the
charset
# parameter of the Content-Type header.

Then ugettext tries to just return the translated message, which is
not in Unicode, or to convert to Unicode, which fails because the
unicode call is not specifying any encoding.

The _parse code seems to expect to produce a Unicode translation
dictionary, and gettext expects to encode Unicode into the current
code page, but the message dictionary never gets mapped to Unicode in
the first place.

What I want is simply to use utf-8 po files and get translations in
Unicode.

TIA for any suggestions.

-Jon Peck

JKPeck · Dec 16, 2009

I'm using Python 2.6 on Windows and having trouble with the charset in
gettext. It seems to be so broken that I must be missing something.

When I run msgfmt.py, as far as I can see it writes no charset
information into the mo file. The actual po files are in utf-8 in
this case and have a charset declaration.

Then when ,_parse in gettext loads the messages, it does no conversion
to Unicode, because it has no charset information. So the message
dictionary is actually in utf-8 despite the comment in the code
# Note: we unconditionally convert both msgids and msgstrs to
# Unicode using the character encoding specified in the
charset
# parameter of the Content-Type header.

Then ugettext tries to just return the translated message, which is
not in Unicode, or to convert to Unicode, which fails because the
unicode call is not specifying any encoding.

The _parse code seems to expect to produce a Unicode translation
dictionary, and gettext expects to encode Unicode into the current
code page, but the message dictionary never gets mapped to Unicode in
the first place.

What I want is simply to use utf-8 po files and get translations in
Unicode.

TIA for any suggestions.

-Jon Peck

Never mind. I figured this out. The problem is that a line such as
_("")
in the source that is scanned causes all the meta information to be
lost in the mo file. Once I changed that code, I get the expected
result.

Python 3.3, gettext and Unicode problems	0	Dec 30, 2012
Hello gettext	1	May 14, 2007
gettext translate problem	0	Aug 29, 2009
Looking for very complicated gettext PO file(s) for testing	0	Jun 11, 2010
[ANN] Ruby-GetText-Package-1.93.0	0	Sep 15, 2008
[ANN] Ruby-GetText-Package-1.92.0	0	Aug 2, 2008
gettext on Windows	7	Oct 28, 2006
How to efficiently work with gettext PO files when making small editsto large text values	0	Jun 3, 2010

Problems with gettext and msgfmt

JKPeck

JKPeck

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads