Windows XP - Environment variable - Unicode

S

sebastien.hugues

Hi

I would like to retrieve the application data directory path of the
logged user on
windows XP. To achieve this goal i use the environment variable
APPDATA.

The logged user has this name: sébastien. The second character is not an
ascii one and when i try to encode the path that contains this name in
utf-8,
i got this error:

Ascii error: index not in range (128)

I would like to first decode this string and then re-encode it in utf-8, but
i am not able to find out what encoding is used when i make:

appdata = os.environ ['APPDATA']

Any ideas ?

Thanks in advance
Sebastien
 
R

Rob Williscroft

sebastien.hugues wrote in
Hi

I would like to retrieve the application data directory path of the
logged user on
windows XP. To achieve this goal i use the environment variable
APPDATA.

The logged user has this name: sébastien. The second character is not
an ascii one and when i try to encode the path that contains this name
in utf-8,
i got this error:

Ascii error: index not in range (128)

I would like to first decode this string and then re-encode it in
utf-8, but i am not able to find out what encoding is used when i
make:

appdata = os.environ ['APPDATA']

Any ideas ?

I don't know if it will help but:
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
env = shell.GetEnvironment("VOLATILE")
j = []
for i in env:
.... j.append(i)
.... [u'LOGONSERVER=\\\\COMPUTERNAME', u'APPDATA=C:\\Documents and Settings
\\username\\Application Data']
Note the leading u, which I don't get with:
import os
os.environ["APPDATA"]
'C:\\Documents and Settings\\username\\Application Data'

Also note that APPDATA should also be in
HTH

Rob.
 
J

John Roth

sebastien.hugues said:
Hi

I would like to retrieve the application data directory path of the
logged user on
windows XP. To achieve this goal i use the environment variable
APPDATA.

The logged user has this name: sébastien. The second character is not an
ascii one and when i try to encode the path that contains this name in
utf-8,
i got this error:

Ascii error: index not in range (128)

I would like to first decode this string and then re-encode it in utf-8, but
i am not able to find out what encoding is used when i make:

appdata = os.environ ['APPDATA']

Any ideas ?

I don't think encoding is an issue. Windows XP stores all character data as
unicode internally, so whatever you get back from os.environ() is either
going to be unicode, or it's going to be translated back to some single byte
code by Python. In the latter case, you may not be able to recover non-ascii
values, so Rob Willscroft's workaround to get the unicode version may be
your only hope.

If you're getting a standard string though, I'd try using Latin-1, or the
Windows
equivalent first (it's got an additional 32 characters that aren't in
Latin-1.)
Sorry I don't remember the actual names.

Note that Release 2.3 fixes the unicode problems for files under XP.
It's currently in late beta, though. I don't know if it fixes the
os.environ()
interface though, and it's rather late to get anything into 2.3.

John Roth
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

John said:
I don't think encoding is an issue. Windows XP stores all character data as
unicode internally, so whatever you get back from os.environ() is either
going to be unicode, or it's going to be translated back to some single byte
code by Python.

Read the source, Luke. Python uses environ, which is a C library
variable pointing to byte strings, so no Unicode here.
> In the latter case, you may not be able to recover non-ascii
values, so Rob Willscroft's workaround to get the unicode version may be
your only hope.

You are certainly able to recover non-ascii values, as long as they
only use CP_ACP.
If you're getting a standard string though, I'd try using Latin-1, or the
Windows equivalent first (it's got an additional 32 characters that aren't in
Latin-1.)

That, in general, is wrong. It is only true for the Western European and
American editions of Windows. In all other installations, CP_ACP differs
significantly from Latin-1.
Note that Release 2.3 fixes the unicode problems for files under XP.
It's currently in late beta, though. I don't know if it fixes the
os.environ()

It doesn't. "Fixing" something here is less urgent and more difficult,
as environment variables rarely exceed CP_ACP.

If people get support for Unicode environment variables, they want
Unicode command line arguments next.

Regards,
Martin
 
J

John Roth

Martin v. Löwis said:
Read the source, Luke.

I haven't gotten into the Python source, and my name is not Luke.
Also, don't respond to my e-mail address. Unfortunately, I had a problem
where I had to reload my system, and it's gotten out to usenet. It used
to go to an ISP I no longer have an account with.
Python uses environ, which is a C library
variable pointing to byte strings, so no Unicode here.

The OP's question revolved around ***which*** code page was
being used internally. Windows uses Unicode. That's not the same
question as what code set Python uses to attempt to translate Unicode
into a single byte character set.
You are certainly able to recover non-ascii values, as long as they
only use CP_ACP.

I said "may not," not "cannot in any and all circumstances."
That, in general, is wrong. It is only true for the Western European and
American editions of Windows. In all other installations, CP_ACP differs
significantly from Latin-1.

The OP's problem was a character that's in the Western European range.
It doesn't. "Fixing" something here is less urgent and more difficult,
as environment variables rarely exceed CP_ACP.

Less urgent I can see, unless you're concerned about whether Python
survives against systems that do it right. Now that the Windows 9x
series is dying off, the vast majority of systems on the desktop are
going to have Unicode support internally. Granted, Python is not
targeted at "the vast majority of systems," but if you can't easily get
Unicode from the environment and the registry, then it's not very
useful for system administration tasks or automation tasks on
Windows.

Many, if not most, environment variables are file names. If file
names need Unicode support, then so do environment variables.

As to more difficult, as I said above, I haven't perused the source,
so I can't comment on that. If I had to do it myself, I'd probably
start out by always using the Unicode variant of the Windows API
call, and then check the type of the arguement to environ() to determine
which to pass back. I'm not sure whether or not I'd throw an exception
if the actual value couldn't be translated to the current SBCS code.
If people get support for Unicode environment variables, they want
Unicode command line arguments next.

Why not? I can enter a command with Unicode at the Windows
command prompt, and that command is likely to contain file names.
Same problem raising it's head in a different spot.

John Roth

On reading this over, it does sound a bit more strident than my
responses usually do, but I will admit to being irritated at the
assumption that you need to read the source to find out the
answer to various questions.
 
M

Martin v. =?iso-8859-15?q?L=F6wis?=

John Roth said:
The OP's question revolved around ***which*** code page was
being used internally. Windows uses Unicode. That's not the same
question as what code set Python uses to attempt to translate Unicode
into a single byte character set.

Yes and no. What Windows uses is largely irrelevant, as Python does
not use Windows here. Instead, it uses the Microsoft C library, in
which environment variables are *not* stored in some Unicode encoding,
when accessed through the _environ pointer.
As to more difficult, as I said above, I haven't perused the source,
so I can't comment on that. If I had to do it myself, I'd probably
start out by always using the Unicode variant of the Windows API
call, and then check the type of the arguement to environ() to determine
which to pass back. I'm not sure whether or not I'd throw an exception
if the actual value couldn't be translated to the current SBCS code.

Notice that os.environ is not a function, but a dictionary. So there
is no system call involved when retrieving an environment
variable. Instead, they are all precomputed.
On reading this over, it does sound a bit more strident than my
responses usually do, but I will admit to being irritated at the
assumption that you need to read the source to find out the
answer to various questions.

If the question is "how does software Foo do something", the *only*
reliable way is to read the source. You may have a mental model that
may allow you to give an educated guess how Foo *might* do
something. In this case, your educated guess was wrong, that's why I
referred you to the source.

Regards,
Martin
 
J

John Roth

Martin v. Löwis said:
Yes and no. What Windows uses is largely irrelevant, as Python does
not use Windows here. Instead, it uses the Microsoft C library, in
which environment variables are *not* stored in some Unicode encoding,
when accessed through the _environ pointer.

I've found at various times that using the C library causes lots of
problems with Microsoft.
Notice that os.environ is not a function, but a dictionary. So there
is no system call involved when retrieving an environment
variable. Instead, they are all precomputed.

Good point. That does make it somewhat harder; the routine
would have to precompute both versions, and store them with
both standard strings and unicode strings as keys. Whether the
overhead would be worth it is debatable. It's not, however,
all that difficult to understand for the user of the facility, though.
It would work exactly the same way the file functions work: if
you use a unicode key, you get a unicode result.

John Roth
 
J

John Roth

Fredrik Lundh said:
And life's to short to waste on movies...

Depends on what your goals in life are.
Well, you obviously didn't bother to read the documentation for
os.environ, so pointing you to the source sounds like a reasonable
idea.

Not particularly. I might be one of that not inconsiderable number
of people that doesn't know C. I'm not, but the number of people
who use Python and who don't know C is not zero.

I like Python because, for the most part, it's much more
understandable than many languages I know, and that
makes it much more productive. What I've learned in this
conversation is that os.environ fails to handle one of the
major corner cases in a Windows NT/2000/XP environment.
So if I need that corner case, I'm going to have to use
the Windows API call. Not a big deal, but also not something
that I regard as one of the language's strengths.

John Roth
 
J

John Roth

Martin v. Löwis said:
That doesn't work. You cannot have separate dictionary entries
for unicode and byte string keys if the keys compare and hash
equal, which is the case for all-ASCII keys (which environment
variable names typically are).

Ah, so.

John Roth
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top