printing unicode strings

7stud · Jul 24, 2007

Can anyone tell me why I can print out the individual variables in the
following code, but when I print them out combined into a single
string, I get an error?

symbol = u'ibm'
price = u'4 \xbd' # 4 1/2

print "%s" % symbol
print "%s" % price.encode("utf-8")
print "%s %s" % (symbol, price.encode("utf-8") )

--output:--
ibm
4 1/2
File "pythontest.py", line 6, in ?
print "%s %s" % (symbol, price.encode("utf-8") )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
2: ordinal not in range(128)

Peter Otten · Jul 24, 2007

7stud said:
Can anyone tell me why I can print out the individual variables in the
following code, but when I print them out combined into a single
string, I get an error?

symbol = u'ibm'
price = u'4 \xbd' # 4 1/2

print "%s" % symbol
print "%s" % price.encode("utf-8")
print "%s %s" % (symbol, price.encode("utf-8") )

--output:--
ibm
4 1/2
File "pythontest.py", line 6, in ?
print "%s %s" % (symbol, price.encode("utf-8") )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
2: ordinal not in range(128)

For format % args, if the format or any arg is a unicode string, the result
will be unicode, too. This implies that byte strings have to be decoded,
and for that process the default ascii codec is used. In your example

print "%s %s" % (symbol, price.encode("utf-8") )

symbol is a unicode, so python tries to decode "%s %s" and "4 \xc2\xbd"
(the result of price.encode("utf8")). The latter contains non-ascii chars
and fails.

Solution: use unicode throughout and let the print statement do the
encoding.
ibm 4 ?

Sometimes, e. g. if you redirect stdout, the above can fail. Here's a
workaround that uses utf8 in such cases.

import sys
if sys.stdout.encoding is None:
import codecs
sys.stdout = codecs.lookup("utf8").streamwriter(sys.stdout)

Peter

John Machin · Jul 24, 2007

Can anyone tell me why I can print out the individual variables in the
following code, but when I print them out combined into a single
string, I get an error?

symbol = u'ibm'
price = u'4 \xbd' # 4 1/2

print "%s" % symbol
print "%s" % price.encode("utf-8")
print "%s %s" % (symbol, price.encode("utf-8") )

--output:--
ibm
4 1/2
File "pythontest.py", line 6, in ?
print "%s %s" % (symbol, price.encode("utf-8") )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
2: ordinal not in range(128)

Because the first part is Unicode and the second part (after encoding
in utf8) is str.

It is trying to convert the second part to Unicode, using the default
codec (ascii), which of course must fail:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
2: ordinal
not in range(128)

7stud · Jul 25, 2007

Thanks.

Unicode confusion	0	Jul 14, 2008
print() and unicode strings (python 3.1)	12	Aug 24, 2009
error when printing a UTF-8 string (python 2.6.2)	9	Apr 21, 2010
helping with unicode	4	Jul 3, 2012
Unicode	20	Dec 16, 2012
Thinking Unicode	0	Aug 8, 2013
Python 3.3, gettext and Unicode problems	0	Dec 31, 2012
Py3: Read file with Unicode characters	4	Apr 8, 2010

printing unicode strings

7stud

Peter Otten

John Machin

7stud

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads