Py3.3 unicode literal and input()

S

Steven D'Aprano

Python 3.3.0a4 (v3.3.0a4:7c51388a3aa7+, May 31 2012, 20:15:21) [MSC v.
1600
32 bit (Intel)] on win32running smidzero.py...
...smidzero has been executed

What is "smidzero.py", and what is it doing?
input(':')
:éléphant
'éléphant'

Why are your input lines preceded by three dashes?


input(':')
:u'\u00e9l\xe9phant'
'éléphant'

I cannot reproduce that behaviour. When I try it, I get the expected
result:
: u'\u00e9l\xe9phant'
"u'\\u00e9l\\xe9phant'"


I expect that the mysterious smidzero.py is monkey-patching the input
builtin to do something silly. If that is the case, you are making a rod
for your own back.
 
J

jmfauth

Mea culpa. I had not my head on my shoulders.
Inputing if working fine, it returns "text" correctly.

However, and this is something different, I'm a little
bit surprised, input() does not handle escaped characters
(\u, \U).
Workaround: encode() and decode() as "raw-unicode-escape".

jmf
 
C

Chris Angelico

Mea culpa. I had not my head on my shoulders.
Inputing if working fine, it returns "text" correctly.

However, and this is something different, I'm a little
bit surprised, input() does not handle escaped characters
(\u, \U).
Workaround: encode() and decode() as "raw-unicode-escape".

It's the exact same thing. They're a backslash followed by a letter U.
However, if your stdin is set to (say) UTF-8, then bytes that
represent non-ASCII characters will be correctly translated,
eliminating any need for code-style escapes.

Allowing your users to put escaped Unicode characters into their input
opens up a huge morass - do they have to double every backslash? what
if they actually wanted "\", "u", etc? etc? etc?

ChrisA
 
S

Steven D'Aprano

Mea culpa. I had not my head on my shoulders. Inputing if working fine,
it returns "text" correctly.

However, and this is something different, I'm a little bit surprised,
input() does not handle escaped characters (\u, \U).

No, it is not different, it is exactly the same. input always returns the
exact characters you type.

If you type a b c space d, input returns "abc d".

If you type a b c backslash u, input returns "abc\u".

input does not treat the text as anything special. If you want to treat
the text "1234" as an int, you have to convert it into an int using the
int() function. If you want to treat the text as a list, you have to
parse it as a list. If you want to treat it as Python source code, you
have to parse it as Python source code. And if you want to treat it as a
Python string literal format using escape characters, you have to parse
it as Python string literal format.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top