LANG, locale, unicode, setup.py and Debian packaging

D

Donn

Given that getlocale() is not to be used, what's the best way to get the
locale later in the app? I need that two-letter code that's hidden in a
typical locale like en_ZA.utf8 -- I want that 'en' part.

BTW - things are hanging-together much better now, thanks to your info. I have
it running in locale 'C' as well as my other test locales. What a relief!

\d
 
M

Martin v. Löwis

Given that getlocale() is not to be used, what's the best way to get the
locale later in the app?

You get the full locale name with locale.setlocale(category) (i.e.
without the second argument)
I need that two-letter code that's hidden in a
typical locale like en_ZA.utf8 -- I want that 'en' part.

Not sure why you want that. Notice that the locale name is fairly system
specific, in particular on non-POSIX systems. It may be
"English_SouthAfrica" on some systems.

If you are certain that *your* locale names will only ever be of the
form <languagecode>[_<countrycode>][.<encoding][@modifier] (or whatever
the syntax is), take anything before the underscore as the language code.

However, you should reevaluate why you need that.
BTW - things are hanging-together much better now, thanks to your info. I have
it running in locale 'C' as well as my other test locales. What a relief!

Great!

Martin
 
D

Donn

You get the full locale name with locale.setlocale(category) (i.e.
without the second argument)
Ah. Can one call it after the full call has been done:
locale.setlocale(locale.LC_ALL,'')
locale.setlocale(locale.LC_ALL)
Without any issues?
Okay, I need it because I have a tree of dirs: en, it, fr and so on for the
help files -- it's to help build a path to the right html file for the
language being supported.
Not sure why you want that. Notice that the locale name is fairly system
specific, in particular on non-POSIX systems. It may be
"English_SouthAfrica" on some systems.
Wow, another thing I had no idea about. So far all I've seen are the
xx_yy.utf8 shaped ones.

I will have some trouble then, with the help system.

Thanks,
\d


--
"There may be fairies at the bottom of the garden. There is no evidence for
it, but you can't prove that there aren't any, so shouldn't we be agnostic
with respect to fairies?"
-- Richard Dawkins

Fonty Python and other dev news at:
http://otherwiseingle.blogspot.com/
 
M

Martin v. Löwis

Ah. Can one call it after the full call has been done:
locale.setlocale(locale.LC_ALL,'')
locale.setlocale(locale.LC_ALL)
Without any issues?

If you pass LC_ALL, then some systems will give you funny results
(semicolon-separated enumerations of all the categoryies). Instead,
pick a specific category, e.g. LC_CTYPE.
Okay, I need it because I have a tree of dirs: en, it, fr and so on for the
help files -- it's to help build a path to the right html file for the
language being supported.

Ok - taking the first two letters should then be fine, assuming all your
directories have two-letter codes.
Wow, another thing I had no idea about. So far all I've seen are the
xx_yy.utf8 shaped ones.

I will have some trouble then, with the help system.

If you have "unknown" systems, you can try to use locale.normalize.
This has a hard-coded database which tries to deal with some different
spellings. For "English", it will give you en_EN.ISO8859-1.

OTOH, if your software only works on POSIX systems, anyway, I think
it is a fair assumption that they use two-letter codes for the
languages (the full language name is only used on Windows, AFAIK).

Notice that xx_yy.utf8 definitely is *not* the only syntactical form.
utf8 is spelled in various ways (lower and upper case, with and without
dash), and there may be other encodings (see the en_EN example above),
or no encoding at all in the locale name, and their may be "modifiers":

aa_ER@saaho (saaho dialect in Eritrea)
be_BY@latin (as opposed to the Cyrillic be_BY locale)
likewise for sr_RS
de_DE@euro (as opposed to the D-Mark locale); likewise for other
members of the Euro zone
ca_ES.UTF-8@valencia (Valencian - Southern Catalan)
(no real difference to ca_ES@euro, but differences in
message translations)
gez_ER@abegede (Ge'ez language in Eritrea with Abegede collation)
(e-mail address removed)-8 (Tatar language written in IQTElif alphabet)
uz_UZ@cyrillic (as opposed to latin uz_UZ)

There used to be a @bokmal modifier for Norwegian (as opposed to
the Nynorsk grammar), but they have separate language codes now
(nb vs. nn).

Regards,
Martin



Regards,
Martin
 
N

Neil Hodgson

Martin v. Löwis:
That's not true. Try open("\xff","w"), then try interpreting the file
name as UTF-8. Some byte strings are not meaningful UTF-8, hence that
approach cannot work.

Has it been decided how Python 3.0 will implement os.listdir on
Unix? Will there be only a single attempt to encode using the current
locale or will there be a backup technique? I'd probably define an
optional encoding parameter so you can ask for
os.listdir(encoding="iso-8859-1") although that then propagates into
open, ...

Neil
 
M

Martin v. Löwis

Has it been decided how Python 3.0 will implement os.listdir on Unix?
Will there be only a single attempt to encode using the current locale
or will there be a backup technique?

That's what it currently does.
I'd probably define an optional
encoding parameter so you can ask for os.listdir(encoding="iso-8859-1")
although that then propagates into open, ...

I had the same idea, and I think that parameter should be added.

For open(), I think we should continue to accept byte strings as file
names.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top