Locale confusion

J

Jorgen Grahn

[Long posting due to the examples, but pretty simple question.]

I'm sitting here with a Debian Linux 'Woody' system with the default Python
2.2 installation, and I want the re module to understand that
re.compile(r'\W+'. re.LOCALE) doesn't match my national, accented
characters.

I don't quite understand how the locale module reasons about these things,
and Python doesn't seem to act as other programs on my system. Bug or my
mistake? Here's my environment:

frailea> env |grep -e LC -e LANG
LC_MESSAGES=C
LC_TIME=C
LANG=sv_SE
LC_NUMERIC=C
LC_MONETARY=C
frailea> locale
LANG=sv_SE
LC_CTYPE="sv_SE"
LC_NUMERIC=C
LC_TIME=C
LC_COLLATE="sv_SE"
LC_MONETARY=C
LC_MESSAGES=C
LC_PAPER="sv_SE"
LC_NAME="sv_SE"
LC_ADDRESS="sv_SE"
LC_TELEPHONE="sv_SE"
LC_MEASUREMENT="sv_SE"
LC_IDENTIFICATION="sv_SE"
LC_ALL=

This seems to indicate that $LANG acts as a fallback when other things (e.g.
LC_CTYPE isn't defined) and that's also what the glibc setlocale(3) man page
says. Works well for me in general, too. However, consider this tiny Python
program:

frailea> cat foo
import locale
print locale.getlocale()
locale.setlocale(locale.LC_CTYPE)
print locale.getlocale()

When I paste it into an interactive Python session, the locale is already
set up correctly (which is what I suppose interactive mode /should/ do):
import locale
print locale.getlocale() ['sv_SE', 'ISO8859-1']
locale.setlocale(locale.LC_CTYPE) 'sv_SE'
print locale.getlocale() ['sv_SE', 'ISO8859-1']

When I run it as a script it isn't though, and the setlocale() call does not
appear to fall back to looking at $LANG as it's supposed to(?), so my
LC_CTYPE remains in the POSIX locale:

frailea> python foo
(None, None)
(None, None)

The corresponding program written in C works as expected:

frailea> cat foot.c
#include <stdio.h>
#include <locale.h>
int main(void) {
printf("%s\n", setlocale(LC_CTYPE, 0));
printf("%s\n", setlocale(LC_CTYPE, ""));
printf("%s\n", setlocale(LC_CTYPE, 0));
return 0;
}
frailea> ./foot
C
sv_SE
sv_SE

So, is this my fault or Python's? I realize I could just adapt and set
$LC_CTYPE explicitly in my environment, but I don't want to capitulate for a
Python bug, if that's what this is.

BR,
Jorgen
 
S

Serge.Orlov

Jorgen Grahn wrote:
[snip]
frailea> cat foo
import locale
print locale.getlocale()
locale.setlocale(locale.LC_CTYPE)
print locale.getlocale()

When I paste it into an interactive Python session, the locale is already
set up correctly (which is what I suppose interactive mode /should/ do):
import locale
print locale.getlocale() ['sv_SE', 'ISO8859-1']
locale.setlocale(locale.LC_CTYPE) 'sv_SE'
print locale.getlocale() ['sv_SE', 'ISO8859-1']

When I run it as a script it isn't though, and the setlocale() call does not
appear to fall back to looking at $LANG as it's supposed to(?), so my
LC_CTYPE remains in the POSIX locale:

frailea> python foo
(None, None)
(None, None)

The corresponding program written in C works as expected:

frailea> cat foot.c
#include <stdio.h>
#include <locale.h>
int main(void) {
printf("%s\n", setlocale(LC_CTYPE, 0));
printf("%s\n", setlocale(LC_CTYPE, ""));
printf("%s\n", setlocale(LC_CTYPE, 0));
return 0;
}
frailea> ./foot
C
sv_SE
sv_SE

So, is this my fault or Python's? I realize I could just adapt and set
$LC_CTYPE explicitly in my environment, but I don't want to capitulate for a
Python bug, if that's what this is.

Try locale.setlocale(locale.LC_CTYPE,"") as in your C program. It would
be great if locale.setlocale with one parameter would be deprecated,
because it suddenly acts like getlocale. It's unpythonic.

By the way, since you took time to setup various LC_* variables there
is no need to play with LC_CTYPE category. Just use the standard idiom.
import locale
locale.setlocale(LC_ALL,"")

Serge.
 
J

Jorgen Grahn

Jorgen Grahn wrote:
[snip]
frailea> cat foo
import locale
print locale.getlocale()
locale.setlocale(locale.LC_CTYPE)
print locale.getlocale()
....
When I run it as a script it isn't though, and the setlocale() call does not
appear to fall back to looking at $LANG as it's supposed to(?), so my
LC_CTYPE remains in the POSIX locale: ....
So, is this my fault or Python's? I realize I could just adapt and set
$LC_CTYPE explicitly in my environment, but I don't want to capitulate for a
Python bug, if that's what this is.

Try locale.setlocale(locale.LC_CTYPE,"") as in your C program.

Oops, you are right. locale.setlocale(locale.LC_CTYPE,"") sets the locale
from my environment (and gets it right!) while
locale.setlocale(locale.LC_CTYPE) /returns/ the current locale. I don't know
how I could have missed that, since it's clearly documented and also maps
directly to C usage.
It would
be great if locale.setlocale with one parameter would be deprecated,
because it suddenly acts like getlocale. It's unpythonic.

I dislike the term "unpythonic", but I tend to agree with you in practice
here. Even better, but maybe not feasible, would be an approach to locales
which doesn't involve changing a global state in this fashion.
By the way, since you took time to setup various LC_* variables there
is no need to play with LC_CTYPE category. Just use the standard idiom.
import locale
locale.setlocale(LC_ALL,"")

Thanks for pointing that out. I picked out LC_CTYPE for my small program
because I was in a hurry and didn't want to risk non-standard sorting
elsewhere in the program. I hate what the LC_COLLATE=C does to swedish
national characters, but I hate what LC_COLLATE=sv_SE does to non-alphabetic
characters even more.

To paraphrase Barbie: "i18n is hard". ;-)

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top