locale.CODESET / different in python shell and scripts

N

Nuff Said

When I type the following code in the interactive python shell,
I get 'UTF-8'; but if I put the code into a Python script and
run the script - in the same terminal on my Linux box in which
I opened the python shell before -, I get 'ANSI_X3.4-1968'.

How does that come?

Thanks in advance for your answers! Nuff.


The Code:

import locale
print locale.nl_langinfo(locale.CODESET)
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Nuff said:
When I type the following code in the interactive python shell,
I get 'UTF-8'; but if I put the code into a Python script and
run the script - in the same terminal on my Linux box in which
I opened the python shell before -, I get 'ANSI_X3.4-1968'.

How does that come?

Because, for some reason, locale.setlocale() is called in your
interactive startup, but not in the normal startup.

It is uncertain why this happens - setlocale is not normally
called automatically; not even in interactive mode. Perhaps
you have created your own startup file?

Regards,
Martin
 
M

Michael Hudson

Martin v. Löwis said:
Because, for some reason, locale.setlocale() is called in your
interactive startup, but not in the normal startup.

It is uncertain why this happens - setlocale is not normally
called automatically; not even in interactive mode. Perhaps
you have created your own startup file?

readline calls setlocale() iirc.

Cheers,
mwh
 
G

Guest

Michael said:
readline calls setlocale() iirc.

Sure. However, we restore the locale to what it was before
readline initialization messes with the locale.

Regards,
Martin
 
N

Nuff Said

Because, for some reason, locale.setlocale() is called in your
interactive startup, but not in the normal startup.

It is uncertain why this happens - setlocale is not normally
called automatically; not even in interactive mode. Perhaps
you have created your own startup file?

I use two Python versions on my Linux box (Fedora Core 1):
the Python 2.2 which came with Fedora and a Python 2.3 which
I compiled myself. (I didn't tinker with the last one;
Fedora's Python is a (well known) mess.)

Both Python versions give me 'ANSI_X3.4-1968' when I run a script
with 'print locale.nl_langinfo(locale.CODESET)'.
When I execute the same command in an interactive Python shell,
I get the (correct) 'UTF-8'.

(By 'correct', I mean that the bash command 'locale' gives me
'LANG=en_US.UTF-8, LC_CTYPE="en_US.UTF-8", ...'. This seems to
be correct, because e.g. the 'less ...' command shows files which
are UTF-8 encoded in the correct way; files which are e.g.
'ISO-8859-1' encoded are not shown in the correct way.)


Things are getting even worse:

I write a Python script which uses Unicode strings; now I want
to 'print ...' one of those strings (containing non-ASCII characters;
e.g. German umlauts).
With Fedora's Python 2.2 I have to use 'print s.encode('ISO-8859-1')
or something similar.
With my self-compiled Python 2.3, I have to use (the expected)
'print s.encode('UTF-8')' (though it shows me 'ANSI_X3.4-1968' when
using 'print locale.nl_langinfo(locale.CODESET)' in the same file).

???

Any ideas what's going wrong here?

(I tried 'python -S ...'; doesn't make a difference.)
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Nuff said:
Both Python versions give me 'ANSI_X3.4-1968' when I run a script
with 'print locale.nl_langinfo(locale.CODESET)'.
When I execute the same command in an interactive Python shell,
I get the (correct) 'UTF-8'.

PLEASE invoke

locale.setlocale(locale.LC_ALL, "")

before invoking nl_langinfo. Different C libraries behave differently
in their nl_langinfo responses if setlocale hasn't been called.

Regards,
Martin
 
N

Nuff Said

PLEASE invoke

locale.setlocale(locale.LC_ALL, "")

before invoking nl_langinfo. Different C libraries behave differently
in their nl_langinfo responses if setlocale hasn't been called.

Thanks a lot for your help!

That solved (part of) the problem; now I get 'UTF-8' (which is correct)
when running the following script (with either my self-compiled Python
2.3 or Fedora's Python 2.2):

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import locale

locale.setlocale(locale.LC_ALL, "")
encoding = locale.nl_langinfo(locale.CODESET)
print encoding


Still, one problem remains:

When I add the following line to the above script

print u"schönes Mädchen".encode(encoding)

the result is:

schönes Mädchen (with my self-compiled Python 2.3)
schönes Mädchen (with Fedora's Python 2.2)

I observed, that my Python gives me (the correct value) 15 for
len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
for each German umlaut, i.e. the len of the UTF-8 representation of
the string; observe, that the file uses the coding cookie for UTF-8).
Maybe Fedora's Python was compiled without Unicode support?

(Is that even possible? I recall something about a UCS2 resp.
UCS4 switch when compiling Python; but without Unicode support?
And if it would be possible, shouldn't a Python without Unicode
support disallow strings of the form u"..." resp. show a warning???)


This really drives me nuts because I thought the above approach
should be the correct way to assure that Python scripts can print
non-ASCII characters on any terminal (which is able to display
those characters in some encoding as UTF-8, ISO-8859-x, ...).

Is there something I do utterly wrong here?
Python can't be that complicated?

Nuff.
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Nuff said:
When I add the following line to the above script

print u"schönes Mädchen".encode(encoding)

the result is:

schönes Mädchen (with my self-compiled Python 2.3)
schönes Mädchen (with Fedora's Python 2.2)

I observed, that my Python gives me (the correct value) 15 for
len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
for each German umlaut, i.e. the len of the UTF-8 representation of
the string; observe, that the file uses the coding cookie for UTF-8).
Maybe Fedora's Python was compiled without Unicode support?

Certainly not: It would not support u"" literals without Unicode.

Please understand that you can use non-ASCII characters in source
code unless you also use the facilities described in

http://www.python.org/peps/pep-0263.html

So instead of "ö", you should write "\xf6".
Is there something I do utterly wrong here?

Yes, you are.
Python can't be that complicated?

Python is not. Encodings are.

Regards,
Martin
 
N

Nuff Said

Certainly not: It would not support u"" literals without Unicode.

That's what I thought.

Please understand that you can use non-ASCII characters in source
code unless you also use the facilities described in

http://www.python.org/peps/pep-0263.html

So instead of "ö", you should write "\xf6".

But *I do use* the line

# -*- coding: UTF-8 -*-

from your PEP (directly after the shebang-line; s. the full source
code in my earlier posting). I thought, that allows me to write u"ö"
(which - as described above - works in one of my two Pythons).

??? Nuff.
 
N

Nuff Said

But *I do use* the line

# -*- coding: UTF-8 -*-

from your PEP (directly after the shebang-line; s. the full source
code in my earlier posting). I thought, that allows me to write u"ö"
(which - as described above - works in one of my two Pythons).

Follow up to myself:

Arrgh!!! Think I got it now. Your PEP 263: 'Source Code Encodings' was
incorporated into Python 2.3 (i.e. my self-compiled Python) but not
into Python 2.2 (Fedora's Python).

Thanks for your help!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top