Problems with gettext and msgfmt

J

JKPeck

I'm using Python 2.6 on Windows and having trouble with the charset in
gettext. It seems to be so broken that I must be missing something.

When I run msgfmt.py, as far as I can see it writes no charset
information into the mo file. The actual po files are in utf-8 in
this case and have a charset declaration.

Then when ,_parse in gettext loads the messages, it does no conversion
to Unicode, because it has no charset information. So the message
dictionary is actually in utf-8 despite the comment in the code
# Note: we unconditionally convert both msgids and msgstrs to
# Unicode using the character encoding specified in the
charset
# parameter of the Content-Type header.

Then ugettext tries to just return the translated message, which is
not in Unicode, or to convert to Unicode, which fails because the
unicode call is not specifying any encoding.

The _parse code seems to expect to produce a Unicode translation
dictionary, and gettext expects to encode Unicode into the current
code page, but the message dictionary never gets mapped to Unicode in
the first place.

What I want is simply to use utf-8 po files and get translations in
Unicode.

TIA for any suggestions.

-Jon Peck
 
J

JKPeck

I'm using Python 2.6 on Windows and having trouble with the charset in
gettext.  It seems to be so broken that I must be missing something.

When I run msgfmt.py, as far as I can see it writes no charset
information into the mo file.  The actual po files are in utf-8 in
this case and have a charset declaration.

Then when ,_parse in gettext loads the messages, it does no conversion
to Unicode, because it has no charset information.  So the message
dictionary is actually in utf-8 despite the comment in the code
# Note: we unconditionally convert both msgids and msgstrs to
            # Unicode using the character encoding specified in the
charset
            # parameter of the Content-Type header.

Then ugettext tries to just return the translated message, which is
not in Unicode, or to convert to Unicode, which fails because the
unicode call is not specifying any encoding.

The _parse code seems to expect to produce a Unicode translation
dictionary, and gettext expects to encode Unicode into the current
code page, but the message dictionary never gets mapped to Unicode in
the first place.

What I want is simply to use utf-8 po files and get translations in
Unicode.

TIA for any suggestions.

-Jon Peck

Never mind. I figured this out. The problem is that a line such as
_("")
in the source that is scanned causes all the meta information to be
lost in the mo file. Once I changed that code, I get the expected
result.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top