Martin v. Löwis said:
I haven't gotten into the Python source, and my name is not Luke.
Also, don't respond to my e-mail address. Unfortunately, I had a problem
where I had to reload my system, and it's gotten out to usenet. It used
to go to an ISP I no longer have an account with.
Python uses environ, which is a C library
variable pointing to byte strings, so no Unicode here.
The OP's question revolved around ***which*** code page was
being used internally. Windows uses Unicode. That's not the same
question as what code set Python uses to attempt to translate Unicode
into a single byte character set.
You are certainly able to recover non-ascii values, as long as they
only use CP_ACP.
I said "may not," not "cannot in any and all circumstances."
That, in general, is wrong. It is only true for the Western European and
American editions of Windows. In all other installations, CP_ACP differs
significantly from Latin-1.
The OP's problem was a character that's in the Western European range.
It doesn't. "Fixing" something here is less urgent and more difficult,
as environment variables rarely exceed CP_ACP.
Less urgent I can see, unless you're concerned about whether Python
survives against systems that do it right. Now that the Windows 9x
series is dying off, the vast majority of systems on the desktop are
going to have Unicode support internally. Granted, Python is not
targeted at "the vast majority of systems," but if you can't easily get
Unicode from the environment and the registry, then it's not very
useful for system administration tasks or automation tasks on
Windows.
Many, if not most, environment variables are file names. If file
names need Unicode support, then so do environment variables.
As to more difficult, as I said above, I haven't perused the source,
so I can't comment on that. If I had to do it myself, I'd probably
start out by always using the Unicode variant of the Windows API
call, and then check the type of the arguement to environ() to determine
which to pass back. I'm not sure whether or not I'd throw an exception
if the actual value couldn't be translated to the current SBCS code.
If people get support for Unicode environment variables, they want
Unicode command line arguments next.
Why not? I can enter a command with Unicode at the Windows
command prompt, and that command is likely to contain file names.
Same problem raising it's head in a different spot.
John Roth
On reading this over, it does sound a bit more strident than my
responses usually do, but I will admit to being irritated at the
assumption that you need to read the source to find out the
answer to various questions.