Unicode problem with exec

T

Thomas Heller

I'm using code.Interactive console but it doesn't work correctly
with non-ascii characters. I think it boils down to this problem:

Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 1, in ?
File "c:\python24\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)

Why does the exec call fail, and is there a workaround?

Thanks,
Thomas
 
D

Diez B. Roggisch

Thomas said:
I'm using code.Interactive console but it doesn't work correctly
with non-ascii characters. I think it boils down to this problem:

Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 1, in ?
File "c:\python24\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x84' in

Why does the exec call fail, and is there a workaround?

Most probably because you failed to encode the snippet as whole - so the
embedded unicode literal isn't encoded properly.

As your exec-encoding seems to be cp850, maybe

exec u"print u'ä'".encode("cp850")

works.

Diez
 
J

John Machin

I'm using code.Interactive console but it doesn't work correctly
with non-ascii characters. I think it boils down to this problem:

Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.

This is utterly useless for diagnostic purposes. What you see is NOT
what you've got. Use repr().

What you've got, as the error message says, is u'\x84' which is not
u"\N{LATIN SMALL LETTER A WITH DIAERESIS}", it is a control character.

See below.
ä
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 1, in ?
File "c:\python24\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x84' in


Why does the exec call fail, and is there a workaround?

Executive summary:

The exec statement didn't fail, it was the print statement trying to
print, to your CP850 console, a unicode char that doesn't exist in CP850.

This happened because you copied a character whose repr() is '\x84' from
your MS-DOS console and pasted it into 'u"<insert any old rubbish
here>"' :)

Details:

Windows XP, in a console screen:

Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
|>> uc = u"\N{LATIN SMALL LETTER A WITH DIAERESIS}"
|>> uc
u'\xe4' <<== agrees with Unicode book
|>> encoded = uc.encode('cp850')'\x84' <<== agrees with
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP850.TXT
|>> print uc
ä <<== looks like LATIN SMALL LETTER A WITH DIAERESIS, as expected
|>> print encoded
ä <<== looks like LATIN SMALL LETTER A WITH DIAERESIS, as expected
|>> print u"\x84"
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "c:\python24\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x84' in
position 0: character maps to <undefined>
<<== as expected

Looks like Python is working fine to me ...
So, what's happening? Look at this:

|>> char1 = u"ä" <<= corresponds to your "print"
|>> char2 = "ä" <<= corresponds to your exec -- which was given a STRING
constant, like this, not a Unicode constant.

Character in char1 was copied from DOS console.
Second line was obtained by DOS console editing of copy of first line.

|>> char1
u'\xe4'
|>> char2
'\x84' <<= Aha!

What you have done is effectively: exec 'print u"\x84"'

Workaround/kludge/bypass:

exec u'print u"ä"'
......^

Much better: embed non-ASCII characters in source code *ONLY* when you
have a proper coding header: http://www.python.org/dev/peps/pep-0263/

HTH,
John
 
J

John Machin

I'm using code.Interactive console but it doesn't work correctly
with non-ascii characters. I think it boils down to this problem:

Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.

This is utterly useless for diagnostic purposes. What you see is NOT
what you've got. Use repr().

What you've got, as the error message says, is u'\x84' which is not
u"\N{LATIN SMALL LETTER A WITH DIAERESIS}", it is a control character.

See below.
ä
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 1, in ?
File "c:\python24\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x84' in


Why does the exec call fail, and is there a workaround?

Executive summary:

The exec statement didn't fail, it was the print statement trying to
print, to your CP850 console, a unicode char that doesn't exist in CP850.

This happened because you copied a character whose repr() is '\x84' from
your MS-DOS console and pasted it into 'u"<insert any old rubbish
here>"' :)

Details:

Windows XP, in a console screen:

Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
|>> uc = u"\N{LATIN SMALL LETTER A WITH DIAERESIS}"
|>> uc
u'\xe4' <<== agrees with Unicode book
|>> encoded = uc.encode('cp850')'\x84' <<== agrees with
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP850.TXT
|>> print uc
ä <<== looks like LATIN SMALL LETTER A WITH DIAERESIS, as expected
|>> print encoded
ä <<== looks like LATIN SMALL LETTER A WITH DIAERESIS, as expected
|>> print u"\x84"
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "c:\python24\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x84' in
position 0: character maps to <undefined>
<<== as expected

Looks like Python is working fine to me ...
So, what's happening? Look at this:

|>> char1 = u"ä" <<= corresponds to your "print"
|>> char2 = "ä" <<= corresponds to your exec -- which was given a STRING
constant, like this, not a Unicode constant.

Character in char1 was copied from DOS console.
Second line was obtained by DOS console editing of copy of first line.

|>> char1
u'\xe4'
|>> char2
'\x84' <<= Aha!

What you have done is effectively: exec 'print u"\x84"'

Workaround/kludge/bypass:

exec u'print u"ä"'
......^

Much better: embed non-ASCII characters in source code *ONLY* when you
have a proper coding header: http://www.python.org/dev/peps/pep-0263/

HTH,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top