WTF? Printing unicode strings

Discussion in 'Python' started by Ron Garret, May 18, 2006.

  1. Ron Garret

    Ron Garret Guest

    >>> u'\xbd'
    u'\xbd'
    >>> print _

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
    position 0: ordinal not in range(128)
    >>>
     
    Ron Garret, May 18, 2006
    #1
    1. Advertising

  2. Ron Garret

    John Salerno Guest

    Ron Garret wrote:
    >>>> u'\xbd'

    > u'\xbd'
    >>>> print _

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
    > position 0: ordinal not in range(128)


    Not sure if this really helps you, but:

    >>> u'\xbd'

    u'\xbd'
    >>> print _

    ½
    >>>
     
    John Salerno, May 18, 2006
    #2
    1. Advertising

  3. Ron Garret wrote:

    >>>> u'\xbd'

    > u'\xbd'
    >>>> print _

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
    > position 0: ordinal not in range(128)


    so stdout on your machine is ascii, and you don't understand why you
    cannot print a non-ascii unicode character to it? wtf?

    </F>
     
    Fredrik Lundh, May 18, 2006
    #3
  4. Ron Garret

    Ron Garret Guest

    In article <>,
    Fredrik Lundh <> wrote:

    > Ron Garret wrote:
    >
    > >>>> u'\xbd'

    > > u'\xbd'
    > >>>> print _

    > > Traceback (most recent call last):
    > > File "<stdin>", line 1, in ?
    > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
    > > position 0: ordinal not in range(128)

    >
    > so stdout on your machine is ascii, and you don't understand why you
    > cannot print a non-ascii unicode character to it? wtf?
    >
    > </F>


    I forgot to mention:

    >>> sys.getdefaultencoding()

    'utf-8'
    >>> print u'\xbd'

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
    position 0: ordinal not in range(128)
    >>>
     
    Ron Garret, May 18, 2006
    #4
  5. Ron Garret

    Robert Kern Guest

    Ron Garret wrote:

    > I forgot to mention:
    >
    >>>>sys.getdefaultencoding()

    >
    > 'utf-8'


    A) You shouldn't be able to do that.
    B) Don't do that.
    C) It's not relevant to the encoding of stdout which determines how unicode
    strings get converted to bytes when printing them:

    >>> import sys
    >>> sys.stdout.encoding

    'UTF-8'
    >>> sys.getdefaultencoding()

    'ascii'
    >>> print u'\xbd'

    ½

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, May 18, 2006
    #5
  6. Ron Garret

    Ron Garret Guest

    In article <>,
    Robert Kern <> wrote:

    > Ron Garret wrote:
    >
    > > I forgot to mention:
    > >
    > >>>>sys.getdefaultencoding()

    > >
    > > 'utf-8'

    >
    > A) You shouldn't be able to do that.


    What can I say? I can.

    > B) Don't do that.


    OK. What should I do instead?

    > C) It's not relevant to the encoding of stdout which determines how unicode
    > strings get converted to bytes when printing them:
    >
    > >>> import sys
    > >>> sys.stdout.encoding

    > 'UTF-8'
    > >>> sys.getdefaultencoding()

    > 'ascii'
    > >>> print u'\xbd'

    > 1â„2


    OK, so how am I supposed to change the encoding of sys.stdout? It comes
    up as US-ASCII on my system. Simply setting it doesn't work:

    >>> import sys
    >>> sys.stdout.encoding='utf-8'

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    TypeError: readonly attribute
    >>>


    rg
     
    Ron Garret, May 18, 2006
    #6
  7. Ron Garret

    Robert Kern Guest

    Ron Garret wrote:
    > In article <>,
    > Robert Kern <> wrote:
    >
    >>Ron Garret wrote:
    >>
    >>>I forgot to mention:
    >>>
    >>>
    >>>>>>sys.getdefaultencoding()
    >>>
    >>>'utf-8'

    >>
    >>A) You shouldn't be able to do that.

    >
    > What can I say? I can.


    See B).

    >>B) Don't do that.

    >
    > OK. What should I do instead?


    See below.

    >>C) It's not relevant to the encoding of stdout which determines how unicode
    >>strings get converted to bytes when printing them:
    >>
    >>>>>import sys
    >>>>>sys.stdout.encoding

    >>
    >>'UTF-8'
    >>
    >>>>>sys.getdefaultencoding()

    >>
    >>'ascii'
    >>
    >>>>>print u'\xbd'

    >>
    >>1â„2

    >
    > OK, so how am I supposed to change the encoding of sys.stdout? It comes
    > up as US-ASCII on my system. Simply setting it doesn't work:


    You will have to use a terminal that accepts UTF-8.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, May 18, 2006
    #7
  8. Ron Garret

    Serge Orlov Guest

    Re: WTF? Printing unicode strings

    Ron Garret wrote:
    > In article <>,
    > Robert Kern <> wrote:
    >
    > > Ron Garret wrote:
    > >
    > > > I forgot to mention:
    > > >
    > > >>>>sys.getdefaultencoding()
    > > >
    > > > 'utf-8'

    > >
    > > A) You shouldn't be able to do that.

    >
    > What can I say? I can.
    >
    > > B) Don't do that.

    >
    > OK. What should I do instead?


    Exact answer depends on what OS and terminal you are using and what
    your program is supposed to do, are you going to distribute the program
    or it's just for internal use.
     
    Serge Orlov, May 18, 2006
    #8
  9. Ron Garret

    Ron Garret Guest

    Re: WTF? Printing unicode strings

    In article <>,
    "Serge Orlov" <> wrote:

    > Ron Garret wrote:
    > > In article <>,
    > > Robert Kern <> wrote:
    > >
    > > > Ron Garret wrote:
    > > >
    > > > > I forgot to mention:
    > > > >
    > > > >>>>sys.getdefaultencoding()
    > > > >
    > > > > 'utf-8'
    > > >
    > > > A) You shouldn't be able to do that.

    > >
    > > What can I say? I can.
    > >
    > > > B) Don't do that.

    > >
    > > OK. What should I do instead?

    >
    > Exact answer depends on what OS and terminal you are using and what
    > your program is supposed to do, are you going to distribute the program
    > or it's just for internal use.


    I'm using an OS X terminal to ssh to a Linux machine.

    But what about this:

    >>> f2=open('foo','w')
    >>> f2.write(u'\xFF')

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
    position 0: ordinal not in range(128)
    >>>


    That should have nothing to do with my terminal, right?

    I just found http://www.amk.ca/python/howto/unicode, which seems to be
    enlightening. The answer seems to be something like:

    import codecs
    f = codecs.open('foo','w','utf-8')

    but that seems pretty awkward.

    rg
     
    Ron Garret, May 19, 2006
    #9
  10. Ron Garret

    Robert Kern Guest

    Re: WTF? Printing unicode strings

    Ron Garret wrote:

    > I'm using an OS X terminal to ssh to a Linux machine.


    Click on the "Terminal" menu, then "Window Settings...". Choose "Display" from
    the combobox. At the bottom you will see a combobox title "Character Set
    Encoding". Choose "Unicode (UTF-8)".

    > But what about this:
    >
    >>>>f2=open('foo','w')
    >>>>f2.write(u'\xFF')

    >
    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
    > position 0: ordinal not in range(128)
    >
    > That should have nothing to do with my terminal, right?


    Correct, that is a different problem. f.write() expects a string of bytes, not a
    unicode string. In order to convert unicode strings to byte strings without an
    explicit .encode() method call, Python uses the default encoding which is
    'ascii'. It's not easily changeable for a good reason. Your modules won't work
    on anyone else's machine if you hack that setting.

    > I just found http://www.amk.ca/python/howto/unicode, which seems to be
    > enlightening. The answer seems to be something like:
    >
    > import codecs
    > f = codecs.open('foo','w','utf-8')
    >
    > but that seems pretty awkward.


    <shrug> About as clean as it gets when dealing with text encodings.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, May 19, 2006
    #10
  11. Ron Garret

    Paul Boddie Guest

    Re: WTF? Printing unicode strings

    Ron Garret wrote:
    >
    > But what about this:
    >
    > >>> f2=open('foo','w')
    > >>> f2.write(u'\xFF')

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
    > position 0: ordinal not in range(128)
    > >>>

    >
    > That should have nothing to do with my terminal, right?


    Correct. But first try to answer this: given that you want to write the
    Unicode character value 255 to a file, how is that character to be
    represented in the file?

    For example, one might think that one could just get a byte whose value
    is 255 and write that to a file, but what happens if one chooses a
    Unicode character whose value is greater than 255? One could use two
    bytes or three bytes or as many as one needs, but what if the lowest 8
    bits of that value are all set? How would one know, if one reads a file
    back and gets a byte whose value is 255 whether it represents a
    character all by itself or is part of another character's
    representation? It gets complicated!

    The solution is that you choose an encoding which allows you to store
    the characters in the file, thus answering indirectly the question
    above: encodings determine how the characters are represented in the
    file and allow you to read the file and get back the characters you put
    into it. One of the most common encodings suitable for the storage of
    Unicode character values is UTF-8, which has been designed with the
    above complications in mind, but as long as you remember to choose an
    encoding, you don't have to think about it: Python takes care of the
    difficult stuff on your behalf. In the above code you haven't made that
    choice.

    So, to answer the above question, you can either...

    * Use the encode method on Unicode objects to turn them into plain
    strings, then write them to a file - at that point, you are
    writing specific byte values.
    * Use the codecs.open function and other codecs module features to
    write Unicode objects directly to files and streams - here, the
    module's infrastructure deals with byte-level issues.
    * If you're using something like an XML library, you can often pass a
    normal file or stream object to some function or method whilst
    stating the output encoding.

    There is no universally correct answer to which encoding should be used
    when writing Unicode character values to files, contrary to some
    beliefs and opinions which, for example, lead to people pretending that
    everything is in UTF-8 in order to appease legacy applications with the
    minimum of tweaks necessary to stop them from breaking completely.
    Thus, Python doesn't make a decision for you here.

    Paul
     
    Paul Boddie, May 19, 2006
    #11
  12. Ron Garret

    Ron Garret Guest

    Re: WTF? Printing unicode strings

    In article <>,
    Robert Kern <> wrote:

    > Ron Garret wrote:
    >
    > > I'm using an OS X terminal to ssh to a Linux machine.

    >
    > Click on the "Terminal" menu, then "Window Settings...". Choose "Display"
    > from
    > the combobox. At the bottom you will see a combobox title "Character Set
    > Encoding". Choose "Unicode (UTF-8)".


    It was already set to UTF-8.

    > > But what about this:
    > >
    > >>>>f2=open('foo','w')
    > >>>>f2.write(u'\xFF')

    > >
    > > Traceback (most recent call last):
    > > File "<stdin>", line 1, in ?
    > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
    > > position 0: ordinal not in range(128)
    > >
    > > That should have nothing to do with my terminal, right?

    >
    > Correct, that is a different problem. f.write() expects a string of bytes,
    > not a
    > unicode string. In order to convert unicode strings to byte strings without
    > an
    > explicit .encode() method call, Python uses the default encoding which is
    > 'ascii'. It's not easily changeable for a good reason. Your modules won't
    > work
    > on anyone else's machine if you hack that setting.


    OK.

    > > I just found http://www.amk.ca/python/howto/unicode, which seems to be
    > > enlightening. The answer seems to be something like:
    > >
    > > import codecs
    > > f = codecs.open('foo','w','utf-8')
    > >
    > > but that seems pretty awkward.

    >
    > <shrug> About as clean as it gets when dealing with text encodings.


    OK. Thanks.

    rg
     
    Ron Garret, May 19, 2006
    #12
  13. Ron Garret

    Serge Orlov Guest

    Re: WTF? Printing unicode strings

    Ron Garret wrote:
    > In article <>,
    > "Serge Orlov" <> wrote:
    >
    > > Ron Garret wrote:
    > > > In article <>,
    > > > Robert Kern <> wrote:
    > > >
    > > > > Ron Garret wrote:
    > > > >
    > > > > > I forgot to mention:
    > > > > >
    > > > > >>>>sys.getdefaultencoding()
    > > > > >
    > > > > > 'utf-8'
    > > > >
    > > > > A) You shouldn't be able to do that.
    > > >
    > > > What can I say? I can.
    > > >
    > > > > B) Don't do that.
    > > >
    > > > OK. What should I do instead?

    > >
    > > Exact answer depends on what OS and terminal you are using and what
    > > your program is supposed to do, are you going to distribute the program
    > > or it's just for internal use.

    >
    > I'm using an OS X terminal to ssh to a Linux machine.


    In theory it should work out of the box. OS X terminal should set
    enviromental variable LANG=en_US.utf-8, then ssh should transfer this
    variable to Linux and python will know that your terminal is utf-8.
    Unfortunately AFAIK OS X terminal doesn't set that variable and most
    (all?) ssh clients don't transfer it between machines. As a workaround
    you can set that variable on linux yourself . This should work in the
    command line right away:

    LANG=en_US.utf-8 python -c "print unichr(0xbd)"

    Or put the following line in ~/.bashrc and logout/login

    export LANG=en_US.utf-8
     
    Serge Orlov, May 19, 2006
    #13
  14. Ron Garret

    Robert Kern Guest

    Re: WTF? Printing unicode strings

    Ron Garret wrote:
    > In article <>,
    > Robert Kern <> wrote:
    >
    >>Ron Garret wrote:
    >>
    >>>I'm using an OS X terminal to ssh to a Linux machine.

    >>
    >>Click on the "Terminal" menu, then "Window Settings...". Choose "Display"
    >>from
    >>the combobox. At the bottom you will see a combobox title "Character Set
    >>Encoding". Choose "Unicode (UTF-8)".

    >
    > It was already set to UTF-8.


    Then take a look at your LANG environment variable on your Linux machine. For
    example, I have LANG=en_US.UTF-8 on my Linux machine, and I can ssh into it from
    a UTF-8-configured Terminal.app and print unicode strings just fine.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, May 19, 2006
    #14
  15. Ron Garret

    Ron Garret Guest

    Re: WTF? Printing unicode strings

    In article <>,
    "Serge Orlov" <> wrote:

    > Ron Garret wrote:
    > > In article <>,
    > > "Serge Orlov" <> wrote:
    > >
    > > > Ron Garret wrote:
    > > > > In article <>,
    > > > > Robert Kern <> wrote:
    > > > >
    > > > > > Ron Garret wrote:
    > > > > >
    > > > > > > I forgot to mention:
    > > > > > >
    > > > > > >>>>sys.getdefaultencoding()
    > > > > > >
    > > > > > > 'utf-8'
    > > > > >
    > > > > > A) You shouldn't be able to do that.
    > > > >
    > > > > What can I say? I can.
    > > > >
    > > > > > B) Don't do that.
    > > > >
    > > > > OK. What should I do instead?
    > > >
    > > > Exact answer depends on what OS and terminal you are using and what
    > > > your program is supposed to do, are you going to distribute the program
    > > > or it's just for internal use.

    > >
    > > I'm using an OS X terminal to ssh to a Linux machine.

    >
    > In theory it should work out of the box. OS X terminal should set
    > enviromental variable LANG=en_US.utf-8, then ssh should transfer this
    > variable to Linux and python will know that your terminal is utf-8.
    > Unfortunately AFAIK OS X terminal doesn't set that variable and most
    > (all?) ssh clients don't transfer it between machines. As a workaround
    > you can set that variable on linux yourself . This should work in the
    > command line right away:
    >
    > LANG=en_US.utf-8 python -c "print unichr(0xbd)"
    >
    > Or put the following line in ~/.bashrc and logout/login
    >
    > export LANG=en_US.utf-8


    No joy.

    ron@www01:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)"
    Traceback (most recent call last):
    File "<string>", line 1, in ?
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
    position 0: ordinal not in range(128)
    ron@www01:~$

    rg
     
    Ron Garret, May 19, 2006
    #15
  16. Ron Garret

    Serge Orlov Guest

    Re: WTF? Printing unicode strings

    Ron Garret wrote:
    > > > I'm using an OS X terminal to ssh to a Linux machine.

    > >
    > > In theory it should work out of the box. OS X terminal should set
    > > enviromental variable LANG=en_US.utf-8, then ssh should transfer this
    > > variable to Linux and python will know that your terminal is utf-8.
    > > Unfortunately AFAIK OS X terminal doesn't set that variable and most
    > > (all?) ssh clients don't transfer it between machines. As a workaround
    > > you can set that variable on linux yourself . This should work in the
    > > command line right away:
    > >
    > > LANG=en_US.utf-8 python -c "print unichr(0xbd)"
    > >
    > > Or put the following line in ~/.bashrc and logout/login
    > >
    > > export LANG=en_US.utf-8

    >
    > No joy.
    >
    > ron@www01:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)"
    > Traceback (most recent call last):
    > File "<string>", line 1, in ?
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
    > position 0: ordinal not in range(128)
    > ron@www01:~$


    What version of python and what shell do you run? What the following
    commands print:

    python -V
    echo $SHELL
    $SHELL --version
     
    Serge Orlov, May 19, 2006
    #16
  17. Ron Garret

    Ron Garret Guest

    Re: WTF? Printing unicode strings

    In article <>,
    "Serge Orlov" <> wrote:

    > Ron Garret wrote:
    > > > > I'm using an OS X terminal to ssh to a Linux machine.
    > > >
    > > > In theory it should work out of the box. OS X terminal should set
    > > > enviromental variable LANG=en_US.utf-8, then ssh should transfer this
    > > > variable to Linux and python will know that your terminal is utf-8.
    > > > Unfortunately AFAIK OS X terminal doesn't set that variable and most
    > > > (all?) ssh clients don't transfer it between machines. As a workaround
    > > > you can set that variable on linux yourself . This should work in the
    > > > command line right away:
    > > >
    > > > LANG=en_US.utf-8 python -c "print unichr(0xbd)"
    > > >
    > > > Or put the following line in ~/.bashrc and logout/login
    > > >
    > > > export LANG=en_US.utf-8

    > >
    > > No joy.
    > >
    > > ron@www01:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)"
    > > Traceback (most recent call last):
    > > File "<string>", line 1, in ?
    > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
    > > position 0: ordinal not in range(128)
    > > ron@www01:~$

    >
    > What version of python and what shell do you run? What the following
    > commands print:
    >
    > python -V
    > echo $SHELL
    > $SHELL --version


    ron@www01:~$ python -V
    Python 2.3.4
    ron@www01:~$ echo $SHELL
    /bin/bash
    ron@www01:~$ $SHELL --version
    GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
    Copyright (C) 2002 Free Software Foundation, Inc.
    ron@www01:~$
     
    Ron Garret, May 19, 2006
    #17
  18. Ron Garret

    Serge Orlov Guest

    Re: WTF? Printing unicode strings

    Ron Garret wrote:
    > In article <>,
    > "Serge Orlov" <> wrote:
    >
    > > Ron Garret wrote:
    > > > > > I'm using an OS X terminal to ssh to a Linux machine.
    > > > >
    > > > > In theory it should work out of the box. OS X terminal should set
    > > > > enviromental variable LANG=en_US.utf-8, then ssh should transfer this
    > > > > variable to Linux and python will know that your terminal is utf-8.
    > > > > Unfortunately AFAIK OS X terminal doesn't set that variable and most
    > > > > (all?) ssh clients don't transfer it between machines. As a workaround
    > > > > you can set that variable on linux yourself . This should work in the
    > > > > command line right away:
    > > > >
    > > > > LANG=en_US.utf-8 python -c "print unichr(0xbd)"
    > > > >
    > > > > Or put the following line in ~/.bashrc and logout/login
    > > > >
    > > > > export LANG=en_US.utf-8
    > > >
    > > > No joy.
    > > >
    > > > ron@www01:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)"
    > > > Traceback (most recent call last):
    > > > File "<string>", line 1, in ?
    > > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
    > > > position 0: ordinal not in range(128)
    > > > ron@www01:~$

    > >
    > > What version of python and what shell do you run? What the following
    > > commands print:
    > >
    > > python -V
    > > echo $SHELL
    > > $SHELL --version

    >
    > ron@www01:~$ python -V
    > Python 2.3.4
    > ron@www01:~$ echo $SHELL
    > /bin/bash
    > ron@www01:~$ $SHELL --version
    > GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
    > Copyright (C) 2002 Free Software Foundation, Inc.
    > ron@www01:~$


    That's recent enough. I guess the distribution you're using set LC_*
    variables for no good reason. Either unset all enviromental variables
    starting with LC_ and set LANG variable or overide LC_CTYPE variable:

    LC_CTYPE=en_US.utf-8 python -c "print unichr(0xbd)"

    Should be working now :)
     
    Serge Orlov, May 19, 2006
    #18
  19. Ron Garret

    Serge Orlov Guest

    Re: WTF? Printing unicode strings

    Serge Orlov wrote:
    > Ron Garret wrote:
    > > In article <>,
    > > "Serge Orlov" <> wrote:
    > >
    > > > Ron Garret wrote:
    > > > > > > I'm using an OS X terminal to ssh to a Linux machine.
    > > > > >
    > > > > > In theory it should work out of the box. OS X terminal should set
    > > > > > enviromental variable LANG=en_US.utf-8, then ssh should transfer this
    > > > > > variable to Linux and python will know that your terminal is utf-8.
    > > > > > Unfortunately AFAIK OS X terminal doesn't set that variable and most
    > > > > > (all?) ssh clients don't transfer it between machines. As a workaround
    > > > > > you can set that variable on linux yourself . This should work in the
    > > > > > command line right away:
    > > > > >
    > > > > > LANG=en_US.utf-8 python -c "print unichr(0xbd)"
    > > > > >
    > > > > > Or put the following line in ~/.bashrc and logout/login
    > > > > >
    > > > > > export LANG=en_US.utf-8
    > > > >
    > > > > No joy.
    > > > >
    > > > > ron@www01:~$ LANG=en_US.utf-8 python -c "print unichr(0xbd)"
    > > > > Traceback (most recent call last):
    > > > > File "<string>", line 1, in ?
    > > > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
    > > > > position 0: ordinal not in range(128)
    > > > > ron@www01:~$
    > > >
    > > > What version of python and what shell do you run? What the following
    > > > commands print:
    > > >
    > > > python -V
    > > > echo $SHELL
    > > > $SHELL --version

    > >
    > > ron@www01:~$ python -V
    > > Python 2.3.4
    > > ron@www01:~$ echo $SHELL
    > > /bin/bash
    > > ron@www01:~$ $SHELL --version
    > > GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
    > > Copyright (C) 2002 Free Software Foundation, Inc.
    > > ron@www01:~$

    >
    > That's recent enough. I guess the distribution you're using set LC_*
    > variables for no good reason. Either unset all enviromental variables
    > starting with LC_ and set LANG variable or overide LC_CTYPE variable:
    >
    > LC_CTYPE=en_US.utf-8 python -c "print unichr(0xbd)"
    >
    > Should be working now :)


    I've pulled myself together and installed linux in vwware player.
    Apparently there is another way linux distributors can screw up. I
    chose debian 3.1 minimal network install and after answering all
    installation questions I found that only ascii and latin-1 english
    locales were installed:
    $ locale -a
    C
    en_US
    en_US.iso88591
    POSIX

    In 2006, I would expect utf-8 english locale to be present even in
    minimal install. I had to edit /etc/locale.gen and run locale-gen as
    root. After that python started to print unicode characters.
     
    Serge Orlov, May 19, 2006
    #19
  20. Ron Garret a écrit :
    > In article <>,
    > Fredrik Lundh <> wrote:
    >
    >> Ron Garret wrote:
    >>
    >>>>>> u'\xbd'
    >>> u'\xbd'
    >>>>>> print _
    >>> Traceback (most recent call last):
    >>> File "<stdin>", line 1, in ?
    >>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
    >>> position 0: ordinal not in range(128)

    >> so stdout on your machine is ascii, and you don't understand why you
    >> cannot print a non-ascii unicode character to it? wtf?
    >>
    >> </F>

    >
    > I forgot to mention:
    >
    >>>> sys.getdefaultencoding()

    > 'utf-8'
    >>>> print u'\xbd'

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
    > position 0: ordinal not in range(128)


    This is default encoding for evaluation of expressions in u"..."
    strings, this has nothing to do with printing.

    For the output encoding, see sys.stdout.encoding.

    >>> import sys
    >>> sys.stdout.encoding

    'cp850'
    >>>


    A+

    Laurent.
     
    Laurent Pointal, May 19, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    813
    Malcolm
    Jun 24, 2006
  2. 7stud

    printing unicode strings

    7stud, Jul 24, 2007, in forum: Python
    Replies:
    3
    Views:
    449
    7stud
    Jul 25, 2007
  3. Asterix
    Replies:
    5
    Views:
    752
    Matt Nordhoff
    Aug 31, 2008
  4. Marc 'BlackJack' Rintsch

    Re: the official way of printing unicode strings

    Marc 'BlackJack' Rintsch, Dec 14, 2008, in forum: Python
    Replies:
    2
    Views:
    607
    Martin v. Löwis
    Dec 14, 2008
  5. Ron Garret

    Yet another unicode WTF

    Ron Garret, Jun 5, 2009, in forum: Python
    Replies:
    9
    Views:
    406
    Ned Deily
    Jun 5, 2009
Loading...

Share This Page