Hello gettext

J

James T. Dennis

You'd think that using things like gettext would be easy. Superficially
it seems well documented in the Library Reference(*). However, it can
be surprisingly difficult to get the external details right.

* http://docs.python.org/lib/node738.html

Here's what I finally came up with as the simplest instructions, suitable
for an "overview of Python programming" class:

Start with the venerable "Hello, World!" program ... slightly modified
to make it ever-so-slightly more "functional:"



#!/usr/bin/env python
import sys

def hello(s="World"):
print "Hello,", s

if __name__ == "__main__":
args = sys.argv[1:]
if len(args):
for each in args:
hello(each)
else:
hello()

... and add gettext support (and a little os.path handling on the
assumption that our message object files will not be readily
installable into the system /usr/share/locale tree):

#!/usr/bin/env python
import sys, os, gettext

_ = gettext.lgettext
mydir = os.path.realpath(os.path.dirname(sys.argv[0]))
localedir = os.path.join(mydir, "locale")
gettext.bindtextdomain('HelloPython', localedir)
gettext.textdomain('HelloPython')

def hello(s=_("World")):
print _("Hello,"), s

if __name__ == "__main__":
args = sys.argv[1:]
if len(args):
for each in args:
hello(each)
else:
hello()

Note that I've only added five lines, the two modules to my import
line, and wrapped two strings with the conventional _() function.

This part is easy, and well-documented.

Running pygettext or GNU xgettext (-L or --language=Python) is
also easy and gives us a file like:


# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2007-05-14 12:19+PDT\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <[email protected]>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: ENCODING\n"
"Generated-By: pygettext.py 1.5\n"


#: HelloWorld.py:10
msgid "World"
msgstr ""

#: HelloWorld.py:11
msgid "Hello,"
msgstr ""

... I suppose I should add the appropriate magic package name,
version, author and other values to my source. Anyone remember
where those are documented? Does pygettext extract them from the
sources and insert them into the .pot?

Anyway, I minimally have to change one line thus:

"Content-Type: text/plain; charset=utf-8\n"

... and I suppose there are other ways to do this more properly.
(Documented where?)

I did find that I could either change that in the .pot file or
in the individual .po files. However, if I failed to change it
then my translations would NOT work and would throw an exception.

(Where is the setting to force the _() function to fail gracefully
--- falling back to no-translation and NEVER raise exceptions?
I seem to recall there is one somewhere --- but I just spent all
evening reading the docs and various Google hits to get this far; so
please excuse me if it's a blur right now).

Now we just copy these templates to individual .po files and
make our LC_MESSAGES directories:

mkdir locale && mv HelloPython.pot locale
cd locale

for i in es_ES fr_FR # ...
do
cp HelloPython.pot HelloPython_$i.po
mkdir -p $i/LC_MESSAGES
done

... and finally we can work on the translations.

We edit each of the _*.po files inserting "Hola" and "Bonjour" and
"Mundo" and "Monde" in the appropriate places. And then process
these into .mo files and move them into place as follows:

for i in *_*.po; do
i=${i#*_}
msgfmt -o ./${i%.po}/LC_MESSAGES/HelloPython.mo
done

... in other words HelloPython_es_ES.po is written to
./es_ES/LC_MESSAGES/HelloPython.mo, etc.

This last part was the hardest to get right.

To test this we simply run:

$HELLO_PATH/HelloPython.py
Hello, World

export LANG=es_ES
$HELLO_PATH/HelloPython.py
Hola, Mundo

export LANG=fr_FR
$HELLO_PATH/HelloPython.py
Bonjour, Monde

export LANG=zh_ZH
$HELLO_PATH/HelloPython.py
Hello, World

... and we find that our Spanish and French translations work. (With
apologies if my translations are technically wrong).

Of course I realize this only barely scratches the surface of I18n and
L10n issues. Also I don't know, offhand, how much effort would be
required to make even this trivial example work on an MS Windows box.
It would be nice to find a document that would cover the topic in more
detail while still giving a sufficiently clear and concise set of examples
that one could follow them without getting hung up on something stupid
like: "Gee! You have to create $LANG/LC_MESSAGES/ directories and put
the .mo files thereunder; the Python won't find them under directly
under $LANG nor under LC_MESSAGES/$LANG" ... and "Gee! For reasons
I don't yet understand you need call both the .bindtextdomain() AND
the .textdomain() functions." ... and even "Hmmm ... seems that we
don't need to import locale and call local.setlocale() despite what
some examples in Google seem to suggest"(*)

* http://www.pixelbeat.org/programming/i18n.html

(So, when to you need that and when is gettext.install() really
useful?)

(I gather that the setlocale() stuff is not for simple string
translations but for things like numeric string formatting
with "%d" % ... for example).
 
J

James T. Dennis

... just to follow-up my own posting --- as gauche as that is:
You'd think that using things like gettext would be easy. Superficially
it seems well documented in the Library Reference(*). However, it can
be surprisingly difficult to get the external details right.
Here's what I finally came up with as the simplest instructions, suitable
for an "overview of Python programming" class:
Start with the venerable "Hello, World!" program ... slightly modified
to make it ever-so-slightly more "functional:"


#!/usr/bin/env python
import sys
def hello(s="World"):
print "Hello,", s
if __name__ == "__main__":
args = sys.argv[1:]
if len(args):
for each in args:
hello(each)
else:
hello()
... and add gettext support (and a little os.path handling on the
assumption that our message object files will not be readily
installable into the system /usr/share/locale tree):
#!/usr/bin/env python
import sys, os, gettext
_ = gettext.lgettext
mydir = os.path.realpath(os.path.dirname(sys.argv[0]))
localedir = os.path.join(mydir, "locale")
gettext.bindtextdomain('HelloPython', localedir)
gettext.textdomain('HelloPython')
def hello(s=_("World")):
print _("Hello,"), s

Turns out this particular version is a Bad Idea(TM) if you ever
try to import this into another script and use it after changing
you os.environ['LANG'] value.

I mentioned in another message awhile back that I have an aversion
to using defaulted arguments other than by setting them as "None"
and I hesitated this time and then thought: "Oh, it's fine in this
case!"

Here's my updated version of this script:

-----------------------------------------------------------------------

#!/usr/bin/env python
import gettext, os, sys

_ = gettext.lgettext
i18ndomain = 'HelloPython'
mydir = os.path.realpath(os.path.dirname(sys.argv[0]))
localedir = os.path.join(mydir, "locale")
gettext.install(i18ndomain, localedir=None, unicode=1)
gettext.bindtextdomain(i18ndomain, localedir)
gettext.textdomain(i18ndomain)

def hello(s=None):
"""Print "Hello, World" (or its equivalent in any supported language):

Examples:
>>> os.environ['LANG']=''
>>> hello() Hello, World
>>> os.environ['LANG']='es_ES'
>>> hello() Hola, Mundo
>>> os.environ['LANG']='fr_FR'
>>> hello()
Bonjour, Monde

"""
if s is None:
s = _("World")
print _("Hello,"), s

def test():
import doctest
doctest.testmod()

if __name__ == "__main__":
args = sys.argv[1:]
if 'PYDOCTEST' in os.environ and os.environ['PYDOCTEST']:
test()
elif len(args):
for each in args:
hello(each)
else:
hello()

-----------------------------------------------------------------------

... now with doctest support. :)
if __name__ == "__main__":
args = sys.argv[1:]
if len(args):
for each in args:
hello(each)
else:
hello()

Note that I've only added five lines, the two modules to my import
line, and wrapped two strings with the conventional _() function.
This part is easy, and well-documented.
Running pygettext or GNU xgettext (-L or --language=Python) is
also easy and gives us a file like:

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2007-05-14 12:19+PDT\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <[email protected]>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: ENCODING\n"
"Generated-By: pygettext.py 1.5\n"

#: HelloWorld.py:10
msgid "World"
msgstr ""
#: HelloWorld.py:11
msgid "Hello,"
msgstr ""
... I suppose I should add the appropriate magic package name,
version, author and other values to my source. Anyone remember
where those are documented? Does pygettext extract them from the
sources and insert them into the .pot?
Anyway, I minimally have to change one line thus:
"Content-Type: text/plain; charset=utf-8\n"
... and I suppose there are other ways to do this more properly.
(Documented where?)
I did find that I could either change that in the .pot file or
in the individual .po files. However, if I failed to change it
then my translations would NOT work and would throw an exception.
(Where is the setting to force the _() function to fail gracefully
--- falling back to no-translation and NEVER raise exceptions?
I seem to recall there is one somewhere --- but I just spent all
evening reading the docs and various Google hits to get this far; so
please excuse me if it's a blur right now).
Now we just copy these templates to individual .po files and
make our LC_MESSAGES directories:
mkdir locale && mv HelloPython.pot locale
cd locale
for i in es_ES fr_FR # ...
do
cp HelloPython.pot HelloPython_$i.po
mkdir -p $i/LC_MESSAGES
done
... and finally we can work on the translations.
We edit each of the _*.po files inserting "Hola" and "Bonjour" and
"Mundo" and "Monde" in the appropriate places. And then process
these into .mo files and move them into place as follows:
for i in *_*.po; do
i=${i#*_}
msgfmt -o ./${i%.po}/LC_MESSAGES/HelloPython.mo
done
... in other words HelloPython_es_ES.po is written to
./es_ES/LC_MESSAGES/HelloPython.mo, etc.
This last part was the hardest to get right.
To test this we simply run:
$HELLO_PATH/HelloPython.py
Hello, World
export LANG=es_ES
$HELLO_PATH/HelloPython.py
Hola, Mundo
export LANG=fr_FR
$HELLO_PATH/HelloPython.py
Bonjour, Monde
export LANG=zh_ZH
$HELLO_PATH/HelloPython.py
Hello, World
... and we find that our Spanish and French translations work. (With
apologies if my translations are technically wrong).
Of course I realize this only barely scratches the surface of I18n and
L10n issues. Also I don't know, offhand, how much effort would be
required to make even this trivial example work on an MS Windows box.
It would be nice to find a document that would cover the topic in more
detail while still giving a sufficiently clear and concise set of examples
that one could follow them without getting hung up on something stupid
like: "Gee! You have to create $LANG/LC_MESSAGES/ directories and put
the .mo files thereunder; the Python won't find them under directly
under $LANG nor under LC_MESSAGES/$LANG" ... and "Gee! For reasons
I don't yet understand you need call both the .bindtextdomain() AND
the .textdomain() functions." ... and even "Hmmm ... seems that we
don't need to import locale and call local.setlocale() despite what
some examples in Google seem to suggest"(*)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,053
Messages
2,570,431
Members
47,075
Latest member
TysonV438

Latest Threads

Top