unable to print Unicode characters in Python 3

J

jefm

Hi,
while checking out Python 3, I read that all text strings are now
natively Unicode.
In the Python language reference (http://docs.python.org/3.0/reference/
lexical_analysis.html) I read that I can show Unicode character in
several ways.
"\uxxxx" supposedly allows me to specify the Unicode character by hex
number and the format "\N{name}" allows me to specify by Unicode
name.
Neither seem to work for me.
What am I doing wrong ?

Please see error output below where I am trying to show the EURO sign
(http://www.fileformat.info/info/unicode/char/20ac/index.htm):

Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python30\lib\io.py", line 1491, in write
b = encoder.encode(s)
File "c:\python30\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python30\lib\io.py", line 1491, in write
b = encoder.encode(s)
File "c:\python30\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in
position 0: character maps to <undefined>
 
M

Martin

Hmm this works for me,

it's a self compiled version:

~ $ python3
Python 3.0 (r30:67503, Dec 29 2008, 21:35:15)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

2009/1/26 jefm said:
What am I doing wrong ?

"\N{EURO SIGN}".encode("ISO-8859-15") ## could be something but I'm
pretty sure I'm totally wrong on this


--
http://soup.alt.delete.co.at
http://www.xing.com/profile/Martin_Marcher
http://www.linkedin.com/in/martinmarcher

You are not free to read this message,
by doing so, you have violated my licence
and are required to urinate publicly. Thank you.

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
 
J

jefm

Hmm this works for me,
it's a self compiled version:
~ $ python3
Python 3.0 (r30:67503, Dec 29 2008, 21:35:15)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2

You are running on Linux. Mine is on Windows.
Anyone else have this issue on Windows ?
 
M

Michael Torrie

jefm said:
Hmm this works for me,
it's a self compiled version:
~ $ python3
Python 3.0 (r30:67503, Dec 29 2008, 21:35:15)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2

You are running on Linux. Mine is on Windows.
Anyone else have this issue on Windows ?


As Benjamin Kaplin said, Windows terminals use the old cp1252 character
set, which cannot display the euro sign. You'll either have to run it in
something more modern like the cygwin rxvt terminal, or output some
other way, such as through a GUI.
 
T

Terry Reedy

jefm said:
Hi,
while checking out Python 3, I read that all text strings are now
natively Unicode.
True

In the Python language reference (http://docs.python.org/3.0/reference/
lexical_analysis.html) I read that I can show Unicode character in
several ways.
"\uxxxx" supposedly allows me to specify the Unicode character by hex
number and the format "\N{name}" allows me to specify by Unicode
name.

These are ways to *specify* unicode chars on input.
Neither seem to work for me.

If you separate text creation from text printing, you would see that
they do. Try
s='\u20ac'
print(s)
What am I doing wrong ?

Using the interactive interpreter running in a Windows console.
Please see error output below where I am trying to show the EURO sign
(http://www.fileformat.info/info/unicode/char/20ac/index.htm):

Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python30\lib\io.py", line 1491, in write
b = encoder.encode(s)
File "c:\python30\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in
position 0: character maps to <undefined>

With the standard console, I get the same. But with IDLE, using the
same Python build but through a different interface
'€' # euro sign

I have fiddled with the shortcut to supposed make it work better as
claimed by posts found on the web, but to no avail. Very frustrating
since I have fonts on the system for at least all of the first 64K
chars. Scream at Microsoft or try to find or encourage a console
replacement that Python could use. In the meanwhile, use IDLE. Not
perfect for Unicode, but better.

Terry Jan Reedy
 
J

jefm

As Benjamin Kaplin said, Windows terminals use the old cp1252 character
set, which cannot display the euro sign. You'll either have to run it in
something more modern like the cygwin rxvt terminal, or output some
other way, such as through a GUI.
With the standard console, I get the same. But with IDLE, using the
same Python build but through a different interface
Scream at Microsoft or try to find or encourage a console
replacement that Python could use. In the meanwhile, use IDLE. Not
perfect for Unicode, but better.


So, if I understand it correctly, it should work as long as you run
your Python code on something that can actually print the Unicode
character.
Apparently, the Windows command line can not.

I mainly program command line tools to be used by Windows users. So I
guess I am screwed.

Other than converting my tools to have a graphic interface, is there
any other solution, other than give Bill Gates a call and bring his
command line up to the 21st century ?
 
J

jefm

Now that I know the problem, I found the following on Google.

Windows uses codepages to display different character sets. (http://
en.wikipedia.org/wiki/Code_page)

The Windows chcp command allows you to change the character set from
the original 437 set.

When you type on the command line: chcp 65001
it sets your console in UTF-8 mode.
(http://en.wikipedia.org/wiki/Code_page_65001)

Unfortunately, it still doesn't do what I want. Instead of printing
the error message above, it prints nothing.
 
G

Giampaolo Rodola'

Hi,
while checking out Python 3, I read that all text strings are now
natively Unicode.
In the Python language reference (http://docs.python.org/3.0/reference/
lexical_analysis.html) I read that I can show Unicode character in
several ways.
"\uxxxx" supposedly allows me to specify the Unicode character by hex
number and the format  "\N{name}" allows me to specify by Unicode
name.
Neither seem to work for me.
What am I doing wrong ?

Please see error output below where I am trying to show the EURO sign
(http://www.fileformat.info/info/unicode/char/20ac/index.htm):

Python 3.0 (r30:67507, Dec  3 2008, 20:14:27) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.>>> print('\u20ac')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\python30\lib\io.py", line 1491, in write
    b = encoder.encode(s)
  File "c:\python30\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\python30\lib\io.py", line 1491, in write
    b = encoder.encode(s)
  File "c:\python30\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in
position 0: character maps to <undefined>

I have this same issue on Windows.
Note that on Python 2.6 it works:

Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.\u20ac

This is pretty serious, IMHO, since breaks any Windows software
priting unicode to stdout.
I've filed an issue on the Python bug tracker:
http://bugs.python.org/issue5081


--- Giampaolo
http://code.google.com/p/pyftpdlib/
 
D

Denis Kasak

I have this same issue on Windows.
Note that on Python 2.6 it works:

Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.\u20ac

Shouldn't this be

print unicode(u'\u20ac')

on 2.6? Without the 'u' prefix, 2.6 will just encode it as a normal
(byte) string and escape the backslash. In Python 3.0 you don't need
to do this because all strings are "unicode" to start with. I suspect
you will see the same error with 2.6 on Windows once you correct this.

(note to Giampaolo: sorry, resending this because I accidentally
selected "reply" instead of "reply to all")
 
J

John Machin

Hi,
while checking out Python 3, I read that all text strings are now
natively Unicode.
In the Python language reference (http://docs.python.org/3.0/reference/
lexical_analysis.html) I read that I can show Unicode character in
several ways.
"\uxxxx" supposedly allows me to specify the Unicode character by hex
number and the format  "\N{name}" allows me to specify by Unicode
name.
Neither seem to work for me.
What am I doing wrong ?
Please see error output below where I am trying to show the EURO sign
(http://www.fileformat.info/info/unicode/char/20ac/index.htm):
Python 3.0 (r30:67507, Dec  3 2008, 20:14:27) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.>>> print('\u20ac')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\python30\lib\io.py", line 1491, in write
    b = encoder.encode(s)
  File "c:\python30\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in
position 0: character maps to <undefined>
print ("\N{EURO SIGN}")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\python30\lib\io.py", line 1491, in write
    b = encoder.encode(s)
  File "c:\python30\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in
position 0: character maps to <undefined>

I have this same issue on Windows.
Note that on Python 2.6 it works:

Python 2.6.1 (r261:67517, Dec  4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.>>> print unicode('\u20ac')

\u20ac

This is pretty serious, IMHO, since breaks any Windows software
priting unicode to stdout.
I've filed an issue on the Python bug tracker:http://bugs.python.org/issue5081

Hello hello -- (1) that's *not* attempting to print Unicode. Look at
your own output ... "\u20ac"" was printed, not a euro character!!!
With 2.X for *any* X:6

(2) Printing Unicode to a Windows console has never *worked*; that's
why this thread was pursuing the faint ray of hope offered by cp65001.
 
T

Thorsten Kampe

* Giampaolo Rodola' (Tue, 27 Jan 2009 04:52:16 -0800 (PST))
I have this same issue on Windows.
Note that on Python 2.6 it works:

Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.\u20ac

This is pretty serious, IMHO, since breaks any Windows software
priting unicode to stdout.
I've filed an issue on the Python bug tracker:
http://bugs.python.org/issue5081

For printing to stdout you have to give an encoding that the terminal
understands and that contains the character. In your case the terminal
says "I speak cp 850" but of course there is no Euro sign in there. Why
should that be a bug?

Thorsten
 
T

Thorsten Kampe

* Denis Kasak (Tue, 27 Jan 2009 14:22:32 +0100)
I have this same issue on Windows.
Note that on Python 2.6 it works:

Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
print unicode('\u20ac')
\u20ac

Shouldn't this be

print unicode(u'\u20ac')

You are trying to create a Unicode object from a Unicode object. Doesn't
make any sense.
on 2.6? Without the 'u' prefix, 2.6 will just encode it as a normal
(byte) string and escape the backslash.

You are confusing encoding and decoding. unicode(str) = str.decode. To
print it you have to encode it again to a character set that the
terminal understands and that contains the desired character.

Thorsten
 
D

Denis Kasak

* Denis Kasak (Tue, 27 Jan 2009 14:22:32 +0100)


You are trying to create a Unicode object from a Unicode object. Doesn't
make any sense.

Of course it doesn't. :)

Giampaolo's example was wrong because he was creating a str object
with a non-escaped backslash inside it (which automatically got
escaped) and then converting it to a unicode object. In other words,
he was doing:

print unicode('\\u20ac')

so the Unicode escape sequence didn't get interpreted the way he
intended it to. I then modified that by adding the extra 'u' but
forgot to delete the extraneous unicode().
You are confusing encoding and decoding. unicode(str) = str.decode. To
print it you have to encode it again to a character set that the
terminal understands and that contains the desired character.

I agree (except for the first sentence :) ). As I said, I simply
forgot to delete the call to the unicode builtin.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,534
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top