Windows XP - Environment variable - Unicode

Discussion in 'Python' started by sebastien.hugues, Jul 11, 2003.

  1. Hi

    I would like to retrieve the application data directory path of the
    logged user on
    windows XP. To achieve this goal i use the environment variable
    APPDATA.

    The logged user has this name: sébastien. The second character is not an
    ascii one and when i try to encode the path that contains this name in
    utf-8,
    i got this error:

    Ascii error: index not in range (128)

    I would like to first decode this string and then re-encode it in utf-8, but
    i am not able to find out what encoding is used when i make:

    appdata = os.environ ['APPDATA']

    Any ideas ?

    Thanks in advance
    Sebastien
     
    sebastien.hugues, Jul 11, 2003
    #1
    1. Advertising

  2. sebastien.hugues wrote in news::

    > Hi
    >
    > I would like to retrieve the application data directory path of the
    > logged user on
    > windows XP. To achieve this goal i use the environment variable
    > APPDATA.
    >
    > The logged user has this name: sébastien. The second character is not
    > an ascii one and when i try to encode the path that contains this name
    > in utf-8,
    > i got this error:
    >
    > Ascii error: index not in range (128)
    >
    > I would like to first decode this string and then re-encode it in
    > utf-8, but i am not able to find out what encoding is used when i
    > make:
    >
    > appdata = os.environ ['APPDATA']
    >
    > Any ideas ?
    >


    I don't know if it will help but:

    >>> import win32com.client
    >>> shell = win32com.client.Dispatch("WScript.Shell")
    >>> env = shell.GetEnvironment("VOLATILE")


    >>> j = []
    >>> for i in env:

    .... j.append(i)
    ....
    >>> j

    [u'LOGONSERVER=\\\\COMPUTERNAME', u'APPDATA=C:\\Documents and Settings
    \\username\\Application Data']
    >>>


    Note the leading u, which I don't get with:

    >>> import os
    >>> os.environ["APPDATA"]

    'C:\\Documents and Settings\\username\\Application Data'

    Also note that APPDATA should also be in
    >>> env = shell.GetEnvironment("PROCESS")


    HTH

    Rob.
    --
    http://www.victim-prime.dsl.pipex.com/
     
    Rob Williscroft, Jul 11, 2003
    #2
    1. Advertising

  3. sebastien.hugues

    John Roth Guest

    "sebastien.hugues" <> wrote in message
    news:...
    > Hi
    >
    > I would like to retrieve the application data directory path of the
    > logged user on
    > windows XP. To achieve this goal i use the environment variable
    > APPDATA.
    >
    > The logged user has this name: sébastien. The second character is not an
    > ascii one and when i try to encode the path that contains this name in
    > utf-8,
    > i got this error:
    >
    > Ascii error: index not in range (128)
    >
    > I would like to first decode this string and then re-encode it in utf-8,

    but
    > i am not able to find out what encoding is used when i make:
    >
    > appdata = os.environ ['APPDATA']
    >
    > Any ideas ?


    I don't think encoding is an issue. Windows XP stores all character data as
    unicode internally, so whatever you get back from os.environ() is either
    going to be unicode, or it's going to be translated back to some single byte
    code by Python. In the latter case, you may not be able to recover non-ascii
    values, so Rob Willscroft's workaround to get the unicode version may be
    your only hope.

    If you're getting a standard string though, I'd try using Latin-1, or the
    Windows
    equivalent first (it's got an additional 32 characters that aren't in
    Latin-1.)
    Sorry I don't remember the actual names.

    Note that Release 2.3 fixes the unicode problems for files under XP.
    It's currently in late beta, though. I don't know if it fixes the
    os.environ()
    interface though, and it's rather late to get anything into 2.3.

    John Roth


    >
    > Thanks in advance
    > Sebastien
    >
     
    John Roth, Jul 12, 2003
    #3
  4. John Roth wrote:

    > I don't think encoding is an issue. Windows XP stores all character data as
    > unicode internally, so whatever you get back from os.environ() is either
    > going to be unicode, or it's going to be translated back to some single byte
    > code by Python.


    Read the source, Luke. Python uses environ, which is a C library
    variable pointing to byte strings, so no Unicode here.

    > In the latter case, you may not be able to recover non-ascii
    > values, so Rob Willscroft's workaround to get the unicode version may be
    > your only hope.


    You are certainly able to recover non-ascii values, as long as they
    only use CP_ACP.

    > If you're getting a standard string though, I'd try using Latin-1, or the
    > Windows equivalent first (it's got an additional 32 characters that aren't in
    > Latin-1.)


    That, in general, is wrong. It is only true for the Western European and
    American editions of Windows. In all other installations, CP_ACP differs
    significantly from Latin-1.

    > Note that Release 2.3 fixes the unicode problems for files under XP.
    > It's currently in late beta, though. I don't know if it fixes the
    > os.environ()


    It doesn't. "Fixing" something here is less urgent and more difficult,
    as environment variables rarely exceed CP_ACP.

    If people get support for Unicode environment variables, they want
    Unicode command line arguments next.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Jul 12, 2003
    #4
  5. sebastien.hugues

    John Roth Guest

    "Martin v. Löwis" <> wrote in message
    news:...
    > John Roth wrote:
    >
    > > I don't think encoding is an issue. Windows XP stores all character data

    as
    > > unicode internally, so whatever you get back from os.environ() is either
    > > going to be unicode, or it's going to be translated back to some single

    byte
    > > code by Python.

    >
    > Read the source, Luke.


    I haven't gotten into the Python source, and my name is not Luke.
    Also, don't respond to my e-mail address. Unfortunately, I had a problem
    where I had to reload my system, and it's gotten out to usenet. It used
    to go to an ISP I no longer have an account with.

    > Python uses environ, which is a C library
    > variable pointing to byte strings, so no Unicode here.


    The OP's question revolved around ***which*** code page was
    being used internally. Windows uses Unicode. That's not the same
    question as what code set Python uses to attempt to translate Unicode
    into a single byte character set.

    > > In the latter case, you may not be able to recover non-ascii
    > > values, so Rob Willscroft's workaround to get the unicode version may be
    > > your only hope.

    >
    > You are certainly able to recover non-ascii values, as long as they
    > only use CP_ACP.


    I said "may not," not "cannot in any and all circumstances."

    > > If you're getting a standard string though, I'd try using Latin-1, or

    the
    > > Windows equivalent first (it's got an additional 32 characters that

    aren't in
    > > Latin-1.)

    >
    > That, in general, is wrong. It is only true for the Western European and
    > American editions of Windows. In all other installations, CP_ACP differs
    > significantly from Latin-1.


    The OP's problem was a character that's in the Western European range.

    > > Note that Release 2.3 fixes the unicode problems for files under XP.
    > > It's currently in late beta, though. I don't know if it fixes the
    > > os.environ()

    >
    > It doesn't. "Fixing" something here is less urgent and more difficult,
    > as environment variables rarely exceed CP_ACP.


    Less urgent I can see, unless you're concerned about whether Python
    survives against systems that do it right. Now that the Windows 9x
    series is dying off, the vast majority of systems on the desktop are
    going to have Unicode support internally. Granted, Python is not
    targeted at "the vast majority of systems," but if you can't easily get
    Unicode from the environment and the registry, then it's not very
    useful for system administration tasks or automation tasks on
    Windows.

    Many, if not most, environment variables are file names. If file
    names need Unicode support, then so do environment variables.

    As to more difficult, as I said above, I haven't perused the source,
    so I can't comment on that. If I had to do it myself, I'd probably
    start out by always using the Unicode variant of the Windows API
    call, and then check the type of the arguement to environ() to determine
    which to pass back. I'm not sure whether or not I'd throw an exception
    if the actual value couldn't be translated to the current SBCS code.

    > If people get support for Unicode environment variables, they want
    > Unicode command line arguments next.


    Why not? I can enter a command with Unicode at the Windows
    command prompt, and that command is likely to contain file names.
    Same problem raising it's head in a different spot.

    John Roth

    On reading this over, it does sound a bit more strident than my
    responses usually do, but I will admit to being irritated at the
    assumption that you need to read the source to find out the
    answer to various questions.

    > Regards,
    > Martin
    >
     
    John Roth, Jul 13, 2003
    #5
  6. "John Roth" <> writes:

    > The OP's question revolved around ***which*** code page was
    > being used internally. Windows uses Unicode. That's not the same
    > question as what code set Python uses to attempt to translate Unicode
    > into a single byte character set.


    Yes and no. What Windows uses is largely irrelevant, as Python does
    not use Windows here. Instead, it uses the Microsoft C library, in
    which environment variables are *not* stored in some Unicode encoding,
    when accessed through the _environ pointer.

    > As to more difficult, as I said above, I haven't perused the source,
    > so I can't comment on that. If I had to do it myself, I'd probably
    > start out by always using the Unicode variant of the Windows API
    > call, and then check the type of the arguement to environ() to determine
    > which to pass back. I'm not sure whether or not I'd throw an exception
    > if the actual value couldn't be translated to the current SBCS code.


    Notice that os.environ is not a function, but a dictionary. So there
    is no system call involved when retrieving an environment
    variable. Instead, they are all precomputed.

    > On reading this over, it does sound a bit more strident than my
    > responses usually do, but I will admit to being irritated at the
    > assumption that you need to read the source to find out the
    > answer to various questions.


    If the question is "how does software Foo do something", the *only*
    reliable way is to read the source. You may have a mental model that
    may allow you to give an educated guess how Foo *might* do
    something. In this case, your educated guess was wrong, that's why I
    referred you to the source.

    Regards,
    Martin
     
    Martin v. =?iso-8859-15?q?L=F6wis?=, Jul 13, 2003
    #6
  7. sebastien.hugues

    John Roth Guest

    "Martin v. Löwis" <> wrote in message
    news:-berlin.de...
    > "John Roth" <> writes:
    >
    > > The OP's question revolved around ***which*** code page was
    > > being used internally. Windows uses Unicode. That's not the same
    > > question as what code set Python uses to attempt to translate Unicode
    > > into a single byte character set.

    >
    > Yes and no. What Windows uses is largely irrelevant, as Python does
    > not use Windows here. Instead, it uses the Microsoft C library, in
    > which environment variables are *not* stored in some Unicode encoding,
    > when accessed through the _environ pointer.


    I've found at various times that using the C library causes lots of
    problems with Microsoft.

    > > As to more difficult, as I said above, I haven't perused the source,
    > > so I can't comment on that. If I had to do it myself, I'd probably
    > > start out by always using the Unicode variant of the Windows API
    > > call, and then check the type of the arguement to environ() to determine
    > > which to pass back. I'm not sure whether or not I'd throw an exception
    > > if the actual value couldn't be translated to the current SBCS code.

    >
    > Notice that os.environ is not a function, but a dictionary. So there
    > is no system call involved when retrieving an environment
    > variable. Instead, they are all precomputed.


    Good point. That does make it somewhat harder; the routine
    would have to precompute both versions, and store them with
    both standard strings and unicode strings as keys. Whether the
    overhead would be worth it is debatable. It's not, however,
    all that difficult to understand for the user of the facility, though.
    It would work exactly the same way the file functions work: if
    you use a unicode key, you get a unicode result.

    John Roth

    >
    > Regards,
    > Martin
    >
     
    John Roth, Jul 13, 2003
    #7
  8. sebastien.hugues

    John Roth Guest

    "Fredrik Lundh" <> wrote in message
    news:...
    > John Roth wrote:
    >
    > > > Read the source, Luke.

    > >
    > > I haven't gotten into the Python source, and my name
    > > is not Luke.

    >
    > And life's to short to waste on movies...


    Depends on what your goals in life are.

    > > On reading this over, it does sound a bit more strident than my
    > > responses usually do, but I will admit to being irritated at the
    > > assumption that you need to read the source to find out the
    > > answer to various questions.

    >
    > Well, you obviously didn't bother to read the documentation for
    > os.environ, so pointing you to the source sounds like a reasonable
    > idea.


    Not particularly. I might be one of that not inconsiderable number
    of people that doesn't know C. I'm not, but the number of people
    who use Python and who don't know C is not zero.

    I like Python because, for the most part, it's much more
    understandable than many languages I know, and that
    makes it much more productive. What I've learned in this
    conversation is that os.environ fails to handle one of the
    major corner cases in a Windows NT/2000/XP environment.
    So if I need that corner case, I'm going to have to use
    the Windows API call. Not a big deal, but also not something
    that I regard as one of the language's strengths.

    John Roth

    > </F>
     
    John Roth, Jul 13, 2003
    #8
  9. sebastien.hugues

    John Roth Guest

    "Martin v. Löwis" <> wrote in message
    news:bert0r$o6$01$-online.com...
    > John Roth wrote:
    > > Good point. That does make it somewhat harder; the routine
    > > would have to precompute both versions, and store them with
    > > both standard strings and unicode strings as keys.

    >
    > That doesn't work. You cannot have separate dictionary entries
    > for unicode and byte string keys if the keys compare and hash
    > equal, which is the case for all-ASCII keys (which environment
    > variable names typically are).


    Ah, so.

    John Roth
    >
    > Regards,
    > Martin
    >
     
    John Roth, Jul 13, 2003
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Alan Kennedy
    Replies:
    0
    Views:
    1,549
    Alan Kennedy
    Jul 11, 2003
  2. Fuzzyman

    Set Windows Environment Variable

    Fuzzyman, Mar 30, 2006, in forum: Python
    Replies:
    6
    Views:
    8,651
    Fuzzyman
    Mar 30, 2006
  3. Replies:
    5
    Views:
    685
  4. Pavel Ledin

    Setting Windows environment variable

    Pavel Ledin, Sep 20, 2006, in forum: Ruby
    Replies:
    2
    Views:
    183
    Pavel Ledin
    Sep 20, 2006
  5. Hans
    Replies:
    2
    Views:
    335
Loading...

Share This Page