LANG, locale, unicode, setup.py and Debian packaging

Donn · Jan 14, 2008

Given that getlocale() is not to be used, what's the best way to get the
locale later in the app? I need that two-letter code that's hidden in a
typical locale like en_ZA.utf8 -- I want that 'en' part.

BTW - things are hanging-together much better now, thanks to your info. I have
it running in locale 'C' as well as my other test locales. What a relief!

\d

Martin v. LÃ¶wis · Jan 14, 2008

Given that getlocale() is not to be used, what's the best way to get the

locale later in the app?

You get the full locale name with locale.setlocale(category) (i.e.
without the second argument)

I need that two-letter code that's hidden in a
typical locale like en_ZA.utf8 -- I want that 'en' part.

Not sure why you want that. Notice that the locale name is fairly system
specific, in particular on non-POSIX systems. It may be
"English_SouthAfrica" on some systems.

If you are certain that *your* locale names will only ever be of the
form <languagecode>[_<countrycode>][.<encoding][@modifier] (or whatever
the syntax is), take anything before the underscore as the language code.

However, you should reevaluate why you need that.

BTW - things are hanging-together much better now, thanks to your info. I have
it running in locale 'C' as well as my other test locales. What a relief!

Great!

Martin

Donn · Jan 14, 2008

You get the full locale name with locale.setlocale(category) (i.e.

without the second argument)

Ah. Can one call it after the full call has been done:
locale.setlocale(locale.LC_ALL,'')
locale.setlocale(locale.LC_ALL)
Without any issues?
Okay, I need it because I have a tree of dirs: en, it, fr and so on for the
help files -- it's to help build a path to the right html file for the
language being supported.

Not sure why you want that. Notice that the locale name is fairly system
specific, in particular on non-POSIX systems. It may be
"English_SouthAfrica" on some systems.

Wow, another thing I had no idea about. So far all I've seen are the
xx_yy.utf8 shaped ones.

I will have some trouble then, with the help system.

Thanks,
\d

--
"There may be fairies at the bottom of the garden. There is no evidence for
it, but you can't prove that there aren't any, so shouldn't we be agnostic
with respect to fairies?"
-- Richard Dawkins

Fonty Python and other dev news at:
http://otherwiseingle.blogspot.com/

Martin v. LÃ¶wis · Jan 14, 2008

Ah. Can one call it after the full call has been done:

locale.setlocale(locale.LC_ALL,'')
locale.setlocale(locale.LC_ALL)
Without any issues?

If you pass LC_ALL, then some systems will give you funny results
(semicolon-separated enumerations of all the categoryies). Instead,
pick a specific category, e.g. LC_CTYPE.

Okay, I need it because I have a tree of dirs: en, it, fr and so on for the
help files -- it's to help build a path to the right html file for the
language being supported.

Ok - taking the first two letters should then be fine, assuming all your
directories have two-letter codes.

Wow, another thing I had no idea about. So far all I've seen are the
xx_yy.utf8 shaped ones.

I will have some trouble then, with the help system.

If you have "unknown" systems, you can try to use locale.normalize.
This has a hard-coded database which tries to deal with some different
spellings. For "English", it will give you en_EN.ISO8859-1.

OTOH, if your software only works on POSIX systems, anyway, I think
it is a fair assumption that they use two-letter codes for the
languages (the full language name is only used on Windows, AFAIK).

Notice that xx_yy.utf8 definitely is *not* the only syntactical form.
utf8 is spelled in various ways (lower and upper case, with and without
dash), and there may be other encodings (see the en_EN example above),
or no encoding at all in the locale name, and their may be "modifiers":

aa_ER@saaho (saaho dialect in Eritrea)
be_BY@latin (as opposed to the Cyrillic be_BY locale)
likewise for sr_RS
de_DE@euro (as opposed to the D-Mark locale); likewise for other
members of the Euro zone
ca_ES.UTF-8@valencia (Valencian - Southern Catalan)
(no real difference to ca_ES@euro, but differences in
message translations)
gez_ER@abegede (Ge'ez language in Eritrea with Abegede collation)
(e-mail address removed)-8 (Tatar language written in IQTElif alphabet)
uz_UZ@cyrillic (as opposed to latin uz_UZ)

There used to be a @bokmal modifier for Norwegian (as opposed to
the Nynorsk grammar), but they have separate language codes now
(nb vs. nn).

Regards,
Martin

Regards,
Martin

Neil Hodgson · Jan 14, 2008

Martin v. Löwis:

That's not true. Try open("\xff","w"), then try interpreting the file
name as UTF-8. Some byte strings are not meaningful UTF-8, hence that
approach cannot work.

Has it been decided how Python 3.0 will implement os.listdir on
Unix? Will there be only a single attempt to encode using the current
locale or will there be a backup technique? I'd probably define an
optional encoding parameter so you can ask for
os.listdir(encoding="iso-8859-1") although that then propagates into
open, ...

Neil

Martin v. Löwis · Jan 15, 2008

Has it been decided how Python 3.0 will implement os.listdir on Unix?

Will there be only a single attempt to encode using the current locale
or will there be a backup technique?

That's what it currently does.

I'd probably define an optional
encoding parameter so you can ask for os.listdir(encoding="iso-8859-1")
although that then propagates into open, ...

I had the same idea, and I think that parameter should be added.

For open(), I think we should continue to accept byte strings as file
names.

Regards,
Martin

setup.py	0	Apr 20, 2012
When deployed to Heroku, python setup.py egg info did not run successfully.	1	Jul 4, 2022
python setup.py install and dependencies	0	Apr 14, 2013
setup.py install and compile errors	0	Jan 14, 2014
Unicode	20	Dec 16, 2012
setup.py install and bdist_egg	1	Mar 13, 2009
Python and unicode	8	Sep 19, 2010
Portable locale usage	10	Sep 6, 2011

LANG, locale, unicode, setup.py and Debian packaging

Donn

Martin v. LÃ¶wis

Donn

Martin v. LÃ¶wis

Neil Hodgson

Martin v. Löwis

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads