printing unicode strings

7

7stud

Can anyone tell me why I can print out the individual variables in the
following code, but when I print them out combined into a single
string, I get an error?

symbol = u'ibm'
price = u'4 \xbd' # 4 1/2

print "%s" % symbol
print "%s" % price.encode("utf-8")
print "%s %s" % (symbol, price.encode("utf-8") )

--output:--
ibm
4 1/2
File "pythontest.py", line 6, in ?
print "%s %s" % (symbol, price.encode("utf-8") )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
2: ordinal not in range(128)
 
P

Peter Otten

7stud said:
Can anyone tell me why I can print out the individual variables in the
following code, but when I print them out combined into a single
string, I get an error?

symbol = u'ibm'
price = u'4 \xbd' # 4 1/2

print "%s" % symbol
print "%s" % price.encode("utf-8")
print "%s %s" % (symbol, price.encode("utf-8") )

--output:--
ibm
4 1/2
File "pythontest.py", line 6, in ?
print "%s %s" % (symbol, price.encode("utf-8") )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
2: ordinal not in range(128)

For format % args, if the format or any arg is a unicode string, the result
will be unicode, too. This implies that byte strings have to be decoded,
and for that process the default ascii codec is used. In your example
print "%s %s" % (symbol, price.encode("utf-8") )

symbol is a unicode, so python tries to decode "%s %s" and "4 \xc2\xbd"
(the result of price.encode("utf8")). The latter contains non-ascii chars
and fails.

Solution: use unicode throughout and let the print statement do the
encoding.
ibm 4 ?

Sometimes, e. g. if you redirect stdout, the above can fail. Here's a
workaround that uses utf8 in such cases.

import sys
if sys.stdout.encoding is None:
import codecs
sys.stdout = codecs.lookup("utf8").streamwriter(sys.stdout)

Peter
 
J

John Machin

Can anyone tell me why I can print out the individual variables in the
following code, but when I print them out combined into a single
string, I get an error?

symbol = u'ibm'
price = u'4 \xbd' # 4 1/2

print "%s" % symbol
print "%s" % price.encode("utf-8")
print "%s %s" % (symbol, price.encode("utf-8") )

--output:--
ibm
4 1/2
File "pythontest.py", line 6, in ?
print "%s %s" % (symbol, price.encode("utf-8") )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
2: ordinal not in range(128)

Because the first part is Unicode and the second part (after encoding
in utf8) is str.

It is trying to convert the second part to Unicode, using the default
codec (ascii), which of course must fail:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
2: ordinal
not in range(128)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top