Py3.3 unicode literal and input()

Discussion in 'Python' started by jmfauth, Jun 18, 2012.

  1. jmfauth

    jmfauth Guest

    What is input() supposed to return?

    >>> u'a' == 'a'

    True
    >>>
    >>> r1 = input(':')

    :a
    >>> r2 = input(':')

    :u'a'
    >>> r1 == r2

    False
    >>> type(r1), len(r1)

    (<class 'str'>, 1)
    >>> type(r2), len(r2)

    (<class 'str'>, 4)
    >>>


    ---

    sys.argv?

    jmf
     
    jmfauth, Jun 18, 2012
    #1
    1. Advertising

  2. jmfauth

    jmfauth Guest

    On 18 juin, 10:28, Benjamin Kaplan <> wrote:
    > On Mon, Jun 18, 2012 at 1:19 AM, jmfauth <> wrote:
    > > What is input() supposed to return?

    >
    > >>>> u'a' == 'a'

    > > True

    >
    > >>>> r1 = input(':')

    > > :a
    > >>>> r2 = input(':')

    > > :u'a'
    > >>>> r1 == r2

    > > False
    > >>>> type(r1), len(r1)

    > > (<class 'str'>, 1)
    > >>>> type(r2), len(r2)

    > > (<class 'str'>, 4)

    >
    > > ---

    >
    > > sys.argv?

    >
    > > jmf

    >
    > Python 3 made several backwards-incompatible changes over Python 2.
    > First of all, input() in Python 3 is equivalent to raw_input() in
    > Python 2. It always returns a string. If you want the equivalent of
    > Python 2's input(), eval the result. Second, Python 3 is now unicode
    > by default. The "str" class is a unicode string. There is a separate
    > bytes class, denoted by b"", for byte strings. The u prefix is only
    > there to make it easier to port a codebase from Python 2 to Python 3.
    > It doesn't actually do anything.



    It does. I shew it!

    Related:

    http://groups.google.com/group/comp.lang.python/browse_thread/thread/3aefd602507d2fbe#

    http://mail.python.org/pipermail/python-dev/2012-June/120341.html

    jmf
     
    jmfauth, Jun 18, 2012
    #2
    1. Advertising

  3. On Mon, 18 Jun 2012 01:19:32 -0700, jmfauth wrote:

    > What is input() supposed to return?


    Whatever you type.

    >>>> u'a' == 'a'

    > True


    This demonstrates that in Python 3.3, u'a' gives a string equal to 'a'.

    >>>> r1 = input(':')

    > :a


    Since you typed the letter a, r1 is the string "a" (a single character).

    >>>> r2 = input(':')

    > :u'a'


    Since you typed four characters, namely lowercase u, single quote,
    lowercase a, single quote, r2 is the string "u'a'" (four characters).



    >>>> r1 == r2

    > False
    >>>> type(r1), len(r1)

    > (<class 'str'>, 1)
    >>>> type(r2), len(r2)

    > (<class 'str'>, 4)


    If you call print(r1) and print(r2), that will show you what they hold.
    If in doubt, calling print(repr(r1)) will show extra information about
    the object.


    > sys.argv?


    What about it?

    >
    > jmf
     
    Steven D'Aprano, Jun 18, 2012
    #3
  4. On Mon, 18 Jun 2012 02:30:50 -0700, jmfauth wrote:

    > On 18 juin, 10:28, Benjamin Kaplan <> wrote:


    >> The u prefix is only there to
    >> make it easier to port a codebase from Python 2 to Python 3. It doesn't
    >> actually do anything.

    >
    >
    > It does. I shew it!


    Incorrect. You are assuming that Python 3 input eval's the input like
    Python 2 does. That is wrong. All you show is that the one-character
    string "a" is not equal to the four-character string "u'a'", which is
    hardly a surprise. You wouldn't expect the string "3" to equal the string
    "int('3')" would you?



    --
    Steven
     
    Steven D'Aprano, Jun 18, 2012
    #4
  5. jmfauth

    jmfauth Guest

    On 18 juin, 12:11, Steven D'Aprano <steve
    > wrote:
    > On Mon, 18 Jun 2012 02:30:50 -0700, jmfauth wrote:
    > > On 18 juin, 10:28, Benjamin Kaplan <> wrote:
    > >> The u prefix is only there to
    > >> make it easier to port a codebase from Python 2 to Python 3. It doesn't
    > >> actually do anything.

    >
    > > It does. I shew it!

    >
    > Incorrect. You are assuming that Python 3 input eval's the input like
    > Python 2 does. That is wrong. All you show is that the one-character
    > string "a" is not equal to the four-character string "u'a'", which is
    > hardly a surprise. You wouldn't expect the string "3" to equal the string
    > "int('3')" would you?
    >
    > --
    > Steven



    A string is a string, a "piece of text", period.

    I do not see why a unicode literal and an (well, I do not
    know how the call it) a "normal class <str>" should behave
    differently in code source or as an answer to an input().

    Should a user write two derived functions?

    input_for_entering_text()
    and
    input_if_you_are_entering_a_text_as_litteral()

    ---

    Side effect from the unicode litteral reintroduction.
    I do not mind about this, but I expect it does
    work logically and correctly. And it does not.

    PS English is not my native language. I never know
    to reply to an (interro)-negative sentence.

    jmf
     
    jmfauth, Jun 18, 2012
    #5
  6. jmfauth

    Dave Angel Guest

    On 06/18/2012 10:00 AM, jmfauth wrote:
    > <SNIP>


    > A string is a string, a "piece of text", period. I do not see why a
    > unicode literal and an (well, I do not know how the call it) a "normal
    > class <str>" should behave differently in code source or as an answer
    > to an input().


    Wrong. The rules for parsing source code are NOT applied in general to
    Python 3's input data, nor to file I/O done with methods like
    myfile.readline(). We do not expect the runtime code to look for def
    statements, nor for class statements, and not for literals. A literal
    is a portion of source code where there are specific rules applied,
    starting with the presence of some quote characters.

    This is true of nearly all languages, and in most languages, the
    difference is so obvious that the question seldom gets raised. For
    example, in C code a literal is evaluated at compile time, and by the
    time an end user sees an input prompt, he probably doesn't even have a
    compiler on the same machine.

    When an end user types in his data (into an input statement, typically),
    he does NOT use quote literals, he does not use hex escape codes, he
    does not escape things with backslash. If he wants an o with an umlaut
    on it, he'd better have such a character available on his keyboard.

    i'd suggest playing around a little with literal assignments and input
    statements and print functions. In those literals, try entering escape
    sequences (eg. "ab\x41cd") Run such programs from the command line,
    and observe the output from the prints. Do this without using the
    interactive interpreter, as by default it "helpfully" displays
    expressions with the repr() function, which confuses the issue.


    > Should a user write two derived functions? input_for_entering_text()
    > and input_if_you_are_entering_a_text_as_litteral() --- Side effect
    > from the unicode litteral reintroduction. I do not mind about this,
    > but I expect it does work logically and correctly. And it does not. PS
    > English is not my native language. I never know to reply to an
    > (interro)-negative sentence. jmf


    The user doesn't write functions, the programmer does. Until you learn
    to distinguish between those two phases, you'll continue having this
    confusion.

    If you (the programmer) want a function that asks the user to enter a
    literal at the input prompt, you'll have to write a post-processing for
    it, which looks for prefixes, for quotes, for backslashes, etc., and
    encodes the result. There very well may be such a decoder in the Python
    library, but input does nothing of the kind.


    The literal modifiers (u"" or r"") are irrelevant here. The "problem"
    you're having is universal, and not new. The characters in source code
    have different semantic meanings than those entered in input, or read
    from file I/O.


    --

    DaveA
     
    Dave Angel, Jun 18, 2012
    #6
  7. Am 18.06.2012 16:00, schrieb jmfauth:
    > A string is a string, a "piece of text", period.


    No. There are different representations for the same piece of text even
    in the context of just Python. b'fou', u'fou', 'fou' are three different
    source code representations, resulting in two different runtime
    representation and they all represent the same text: fou.


    > I do not see why a unicode literal and an (well, I do not
    > know how the call it) a "normal class <str>" should behave
    > differently in code source or as an answer to an input().


    input() retrieves a string from a user, not from a programmer that can
    be expected to know the difference between b'\x81' and u'\u20ac'.


    > Should a user write two derived functions?
    >
    > input_for_entering_text()
    > and
    > input_if_you_are_entering_a_text_as_litteral()


    With "user" above, I guess you mean "Python programmer". In that case,
    the answer is yes. Although asking the user of your program to learn
    about Python's string literal formatting options is a bit much.


    > Side effect from the unicode litteral reintroduction.
    > I do not mind about this, but I expect it does
    > work logically and correctly. And it does not.


    Yes it does. The user enters something. Python receives this and
    provides it as string. You as a programmer are now supposed to
    interpret, parse etc this string according to your program logic.


    BTW: Just in case there is a language (native language, not programming
    language) problem, don't hesitate to write in your native language, too.
    Chances are good that someone here understands you.

    Good luck!

    Uli
     
    Ulrich Eckhardt, Jun 18, 2012
    #7
  8. jmfauth

    jmfauth Guest

    Thinks are very clear to me. I wrote enough interactive
    interpreters with all available toolkits for Windows
    since I know Python (v. 1.5.6).

    I do not see why the semantic may vary differently
    in code source or in an interactive interpreter,
    esp. if Python allow it!

    If you have to know by advance what an end user
    is supposed to type and/or check it ('str' or unicode
    literal) in order to know if the answer has to be
    evaluated or not, then it is better to reintroduce
    input() and raw_input().

    jmf
     
    jmfauth, Jun 18, 2012
    #8
  9. On Tue, Jun 19, 2012 at 1:44 AM, jmfauth <> wrote:
    > I do not see why the semantic may vary differently
    > in code source or in an interactive interpreter,
    > esp. if Python allow it!


    When you're asking for input, you usually aren't looking for code. It
    doesn't matter about string literal formats, because you don't need to
    delimit it. In code, you need to make it clear to the interpreter
    where your string finishes, and that's traditionally done with quote
    characters:

    name = "Chris Angelico" # this isn't part of the string, because the
    two quotes mark off the ends of it

    And you can include characters in your literals that you don't want in
    your source code:

    bad_chars = "\x00\x1A\x0A" # three characters NUL, SUB, LF

    Everything about raw strings, Unicode literals, triple-quoted strings,
    etc, etc, etc, is just variants on these two basic concepts. The
    interpreter needs to know what you mean.

    With input, though, the end of the string is defined in some other way
    (such as by the user pushing Enter). The interpreter knows without any
    extra hints where it's to stop parsing. Also, there's no need to
    protect certain characters from getting into your code. It's a much
    easier job for the interpreter, which translates to being much simpler
    for the user: just type what you want and hit Enter. Quote characters
    have no meaning.

    Chris Angelico
     
    Chris Angelico, Jun 18, 2012
    #9
  10. jmfauth writes:

    > Thinks are very clear to me. I wrote enough interactive
    > interpreters with all available toolkits for Windows


    >>> r = input()

    u'a
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    SyntaxError: u'a

    Er, no, not really :)
     
    Jussi Piitulainen, Jun 18, 2012
    #10
  11. jmfauth

    jmfauth Guest

    We are turning in circles. You are somehow
    legitimating the reintroduction of unicode
    literals and I shew, not to say proofed, it may
    be a source of problems.

    Typical Python desease. Introduce a problem,
    then discuss how to solve it, but surely and
    definitivly do not remove that problem.

    As far as I know, Python 3.2 is working very
    well.

    jmf
     
    jmfauth, Jun 18, 2012
    #11
  12. jmfauth

    Andrew Berg Guest

    On 6/18/2012 11:32 AM, Jussi Piitulainen wrote:
    > jmfauth writes:
    >
    >> Thinks are very clear to me. I wrote enough interactive
    >> interpreters with all available toolkits for Windows

    >
    >>>> r = input()

    > u'a
    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > SyntaxError: u'a
    >
    > Er, no, not really :)
    >

    You're using 2.x; this thread concerns 3.3, which, as has been repeated
    several times, does not evaluate strings passed via input() like 2.x.
    That code does not raise a SyntaxError in 3.x.

    --
    CPython 3.3.0a4 | Windows NT 6.1.7601.17803
     
    Andrew Berg, Jun 18, 2012
    #12
  13. jmfauth

    John Roth Guest

    On Monday, June 18, 2012 9:44:17 AM UTC-6, jmfauth wrote:
    > Thinks are very clear to me. I wrote enough interactive
    > interpreters with all available toolkits for Windows
    > since I know Python (v. 1.5.6).
    >
    > I do not see why the semantic may vary differently
    > in code source or in an interactive interpreter,
    > esp. if Python allow it!
    >
    > If you have to know by advance what an end user
    > is supposed to type and/or check it ('str' or unicode
    > literal) in order to know if the answer has to be
    > evaluated or not, then it is better to reintroduce
    > input() and raw_input().
    >


    The change between Python 2.x and 3.x was made for security reasons. The developers felt, correctly in my opinion, that the simpler operation should not pose a security risk of a malicious user entering an expression that would corrupt the program.

    In Python 3.x the equivalent of Python 2.x's input() function is eval(input()). It poses the same security risk: acting on unchecked user data.

    John Roth


    > jmf
     
    John Roth, Jun 18, 2012
    #13
  14. jmfauth

    Dave Angel Guest

    On 06/18/2012 12:55 PM, Andrew Berg wrote:
    > On 6/18/2012 11:32 AM, Jussi Piitulainen wrote:
    >> jmfauth writes:
    >>
    >>> Thinks are very clear to me. I wrote enough interactive
    >>> interpreters with all available toolkits for Windows
    >>>>> r = input()

    >> u'a
    >> Traceback (most recent call last):
    >> File "<stdin>", line 1, in <module>
    >> SyntaxError: u'a
    >>
    >> Er, no, not really :)
    >>

    > You're using 2.x; this thread concerns 3.3, which, as has been repeated
    > several times, does not evaluate strings passed via input() like 2.x.
    > That code does not raise a SyntaxError in 3.x.
    >


    And you're missing the context. jmfauth thinks we should re-introduce
    the input/raw-input distinction so he could parse literal strings. So
    Jussi demonstrated that the 2.x input did NOT satisfy fmfauth's dreams.



    --

    DaveA
     
    Dave Angel, Jun 18, 2012
    #14
  15. jmfauth

    Andrew Berg Guest

    On 6/18/2012 12:03 PM, Dave Angel wrote:
    > And you're missing the context. jmfauth thinks we should re-introduce
    > the input/raw-input distinction so he could parse literal strings. So
    > Jussi demonstrated that the 2.x input did NOT satisfy fmfauth's dreams.


    You're right. I missed that part of jmfauth's post.
    --
    CPython 3.3.0a4 | Windows NT 6.1.7601.17803
     
    Andrew Berg, Jun 18, 2012
    #15
  16. Andrew Berg writes:
    > On 6/18/2012 11:32 AM, Jussi Piitulainen wrote:
    > > jmfauth writes:
    > >
    > >> Thinks are very clear to me. I wrote enough interactive
    > >> interpreters with all available toolkits for Windows

    > >
    > >>>> r = input()

    > > u'a
    > > Traceback (most recent call last):
    > > File "<stdin>", line 1, in <module>
    > > SyntaxError: u'a
    > >
    > > Er, no, not really :)
    > >

    > You're using 2.x; this thread concerns 3.3, which, as has been
    > repeated several times, does not evaluate strings passed via input()
    > like 2.x. That code does not raise a SyntaxError in 3.x.


    I used 3.1.2, and I really meant the "not really". And the ":)". I
    edited out the command that raised the exception.

    This thread is weird. If I didn't know that things are very clear to
    jmfauth, I would think that the behaviour of input() that I observe
    has absolutely nothing to do with the u'' syntax in source code.
     
    Jussi Piitulainen, Jun 18, 2012
    #16
  17. jmfauth

    Terry Reedy Guest

    On 6/18/2012 12:39 PM, jmfauth wrote:
    > We are turning in circles.


    You are, not we. Please stop.

    > You are somehow legitimating the reintroduction of unicode
    > literals


    We are not 'reintroducing' unicode literals. In Python 3, string
    literals *are* unicode literals.

    Other developers reintroduced a now meaningless 'u' prefix for the
    purpose of helping people write 2&3 code that runs on both Python 2 and
    Python 3. Read about it here http://python.org/dev/peps/pep-0414/

    In Python 3.3, 'u' should *only* be used for that purpose and should be
    ignored by anyone not writing or editing 2&3 code. If you are not
    writing such code, ignore it.

    > and I shew, not to say proofed, it may
    > be a source of problems.


    You are the one making it be a problem.

    > Typical Python desease. Introduce a problem,
    > then discuss how to solve it, but surely and
    > definitivly do not remove that problem.


    The simultaneous reintroduction of 'ur', but with a different meaning
    than in 2.7, *was* a problem and it should be removed in the next release.

    > As far as I know, Python 3.2 is working very
    > well.


    Except that many public libraries that we would like to see ported to
    Python 3 have not been. The purpose of reintroducing 'u' is to encourage
    more porting of Python 2 code. Period.

    --
    Terry Jan Reedy
     
    Terry Reedy, Jun 18, 2012
    #17
  18. jmfauth

    jmfauth Guest

    On Jun 18, 8:45 pm, Terry Reedy <> wrote:
    > On 6/18/2012 12:39 PM, jmfauth wrote:
    >
    > > We are turning in circles.

    >
    > You are, not we. Please stop.
    >
    > > You are somehow legitimating the reintroduction of unicode
    > > literals

    >
    > We are not 'reintroducing' unicode literals. In Python 3, string
    > literals *are* unicode literals.
    >
    > Other developers reintroduced a now meaningless 'u' prefix for the
    > purpose of helping people write 2&3 code that runs on both Python 2 and
    > Python 3. Read about it herehttp://python.org/dev/peps/pep-0414/
    >
    > In Python 3.3, 'u' should *only* be used for that purpose and should be
    > ignored by anyone not writing or editing 2&3 code. If you are not
    > writing such code, ignore it.
    >
    >  > and I shew, not to say proofed, it may
    >
    > > be a source of problems.

    >
    > You are the one making it be a problem.
    >
    > > Typical Python desease. Introduce a problem,
    > > then discuss how to solve it, but surely and
    > > definitivly do not remove that problem.

    >
    > The simultaneous reintroduction of 'ur', but with a different meaning
    > than in 2.7, *was* a problem and it should be removed in the next release..
    >
    > > As far as I know, Python 3.2 is working very
    > > well.

    >
    > Except that many public libraries that we would like to see ported to
    > Python 3 have not been. The purpose of reintroducing 'u' is to encourage
    > more porting of Python 2 code. Period.
    >
    > --
    > Terry Jan Reedy


    It's a matter of perspective. I expected to have
    finally a clean Python, the goal is missed.

    I have nothing to object. It is "your" (core devs)
    project, not mine. At least, you understood my point
    of view.

    I'm a more than two decades TeX user. At the release
    of XeTeX (a pure unicode TeX-engine), the devs had,
    like Python2/3, to make anything incompatible. A success.
    It did not happen a week without seeing a updated
    package or a refreshed documentation.

    Luckily for me, Xe(La)TeX is more important than
    Python.

    As a scientist, Python is perfect.
    From an educational point of view, I'm becoming
    more and more skeptical about this language, a
    moving target.

    Note that I'm not complaining, only "desappointed".

    jmf
     
    jmfauth, Jun 18, 2012
    #18
  19. On Mon, 18 Jun 2012 07:00:01 -0700, jmfauth wrote:

    > On 18 juin, 12:11, Steven D'Aprano <steve
    > > wrote:
    >> On Mon, 18 Jun 2012 02:30:50 -0700, jmfauth wrote:
    >> > On 18 juin, 10:28, Benjamin Kaplan <> wrote:
    >> >> The u prefix is only there to
    >> >> make it easier to port a codebase from Python 2 to Python 3. It
    >> >> doesn't actually do anything.

    >>
    >> > It does. I shew it!

    >>
    >> Incorrect. You are assuming that Python 3 input eval's the input like
    >> Python 2 does. That is wrong. All you show is that the one-character
    >> string "a" is not equal to the four-character string "u'a'", which is
    >> hardly a surprise. You wouldn't expect the string "3" to equal the
    >> string "int('3')" would you?
    >>
    >> --
    >> Steven

    >
    >
    > A string is a string, a "piece of text", period.
    >
    > I do not see why a unicode literal and an (well, I do not know how the
    > call it) a "normal class <str>" should behave differently in code source
    > or as an answer to an input().


    They do not. As you showed earlier, in Python 3.3 the literal strings
    u'a' and 'a' have the same meaning: both create a one-character string
    containing the Unicode letter LOWERCASE-A.

    Note carefully that the quotation marks are not part of the string. They
    are delimiters. Python 3.3 allows you to create a string by using
    delimiters:

    ' '
    " "
    u' '
    u" "

    plus triple-quoted versions of the same. The delimiter is not part of the
    string. They are only there to mark the start and end of the string in
    source code so that Python can tell the difference between the string "a"
    and the variable named "a".

    Note carefully that quotation marks can exist inside strings:

    my_string = "This string has 'quotation marks'."

    The " at the start and end of the string literal are delimiters, not part
    of the string, but the internal ' characters *are* part of the string.

    When you read data from a file, or from the keyboard using input(),
    Python takes the data and returns a string. You don't need to enter
    delimiters, because there is no confusion between a string (all data you
    read) and other programming tokens.

    For example:

    py> s = input("Enter a string: ")
    Enter a string: 42
    py> print(s, type(s))
    42 <class 'str'>

    Because what I type is automatically a string, I don't need to enclose it
    in quotation marks to distinguish it from the integer 42.

    py> s = input("Enter a string: ")
    Enter a string: This string has 'quotation marks'.
    py> print(s, type(s))
    This string has 'quotation marks'. <class 'str'>


    What you type is exactly what you get, no more, no less.

    If you type 42, you get the two character string "42" and not the int 42.

    If you type [1, 2, 3], then you get the nine character string "[1, 2, 3]"
    and not a list containing integers 1, 2 and 3.

    If you type 3**0.5 then you get the six character string "3**0.5" and not
    the float 1.7320508075688772.

    If you type u'a' then you get the four character string "u'a'" and not
    the single character 'a'.

    There is nothing new going on here. The behaviour of input() in Python 3,
    and raw_input() in Python 2, has not changed.


    > Should a user write two derived functions?
    >
    > input_for_entering_text()
    > and
    > input_if_you_are_entering_a_text_as_litteral()


    If you, the programmer, want to force the user to write input in Python
    syntax, then yes, you have to write a function to do so. input() is very
    simple: it just reads strings exactly as typed. It is up to you to
    process those strings however you wish.



    --
    Steven
     
    Steven D'Aprano, Jun 20, 2012
    #19
  20. jmfauth

    jmfauth Guest

    On Jun 20, 1:21 am, Steven D'Aprano <steve
    > wrote:
    > On Mon, 18 Jun 2012 07:00:01 -0700, jmfauth wrote:
    > > On 18 juin, 12:11, Steven D'Aprano <steve
    > > > wrote:
    > >> On Mon, 18 Jun 2012 02:30:50 -0700, jmfauth wrote:
    > >> > On 18 juin, 10:28, Benjamin Kaplan <> wrote:
    > >> >> The u prefix is only there to
    > >> >> make it easier to port a codebase from Python 2 to Python 3. It
    > >> >> doesn't actually do anything.

    >
    > >> > It does. I shew it!

    >
    > >> Incorrect. You are assuming that Python 3 input eval's the input like
    > >> Python 2 does. That is wrong. All you show is that the one-character
    > >> string "a" is not equal to the four-character string "u'a'", which is
    > >> hardly a surprise. You wouldn't expect the string "3" to equal the
    > >> string "int('3')" would you?

    >
    > >> --
    > >> Steven

    >
    > > A string is a string, a "piece of text", period.

    >
    > > I do not see why a unicode literal and an (well, I do not know how the
    > > call it) a "normal class <str>" should behave differently in code source
    > > or as an answer to an input().

    >
    > They do not. As you showed earlier, in Python 3.3 the literal strings
    > u'a' and 'a' have the same meaning: both create a one-character string
    > containing the Unicode letter LOWERCASE-A.
    >
    > Note carefully that the quotation marks are not part of the string. They
    > are delimiters. Python 3.3 allows you to create a string by using
    > delimiters:
    >
    > ' '
    > " "
    > u' '
    > u" "
    >
    > plus triple-quoted versions of the same. The delimiter is not part of the
    > string. They are only there to mark the start and end of the string in
    > source code so that Python can tell the difference between the string "a"
    > and the variable named "a".
    >
    > Note carefully that quotation marks can exist inside strings:
    >
    > my_string = "This string has 'quotation marks'."
    >
    > The " at the start and end of the string literal are delimiters, not part
    > of the string, but the internal ' characters *are* part of the string.
    >
    > When you read data from a file, or from the keyboard using input(),
    > Python takes the data and returns a string. You don't need to enter
    > delimiters, because there is no confusion between a string (all data you
    > read) and other programming tokens.
    >
    > For example:
    >
    > py> s = input("Enter a string: ")
    > Enter a string: 42
    > py> print(s, type(s))
    > 42 <class 'str'>
    >
    > Because what I type is automatically a string, I don't need to enclose it
    > in quotation marks to distinguish it from the integer 42.
    >
    > py> s = input("Enter a string: ")
    > Enter a string: This string has 'quotation marks'.
    > py> print(s, type(s))
    > This string has 'quotation marks'. <class 'str'>
    >
    > What you type is exactly what you get, no more, no less.
    >
    > If you type 42, you get the two character string "42" and not the int 42.
    >
    > If you type [1, 2, 3], then you get the nine character string "[1, 2, 3]"
    > and not a list containing integers 1, 2 and 3.
    >
    > If you type 3**0.5 then you get the six character string "3**0.5" and not
    > the float 1.7320508075688772.
    >
    > If you type u'a' then you get the four character string "u'a'" and not
    > the single character 'a'.
    >
    > There is nothing new going on here. The behaviour of input() in Python 3,
    > and raw_input() in Python 2, has not changed.
    >
    > > Should a user write two derived functions?

    >
    > > input_for_entering_text()
    > > and
    > > input_if_you_are_entering_a_text_as_litteral()

    >
    > If you, the programmer, want to force the user to write input in Python
    > syntax, then yes, you have to write a function to do so. input() is very
    > simple: it just reads strings exactly as typed. It is up to you to
    > process those strings however you wish.
    >
    > --
    > Steven



    Python 3.3.0a4 (v3.3.0a4:7c51388a3aa7+, May 31 2012, 20:15:21) [MSC v.
    1600
    32 bit (Intel)] on win32
    >>> ---

    running smidzero.py...
    ....smidzero has been executed
    >>> ---

    input(':')
    :éléphant
    'éléphant'
    >>> ---

    input(':')
    :u'éléphant'
    'éléphant'
    >>> ---

    input(':')
    :u'\u00e9l\xe9phant'
    'éléphant'
    >>> ---

    input(':')
    :u'\U000000e9léphant'
    'éléphant'
    >>> ---

    input(':')
    :\U000000e9léphant
    'éléphant'
    >>> ---
    >>> ---

    # this is expected
    >>> ---

    input(':')
    :b'éléphant'
    "b'éléphant'"
    >>> ---

    len(input(':'))
    :b'éléphant'
    11

    ---

    Good news on the ru''/ur'' front:
    http://bugs.python.org/issue15096

    ---

    Finally I'm just wondering if this unicode_literal
    reintroduction is not a bad idea.

    b'these_are_bytes'
    u'this_is_a_unicode_string'

    I wrote all my Py2 code in a "unicode mode" since ... Py2.3 (?).

    jmf
     
    jmfauth, Jun 20, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    23
    Views:
    608
    Steve Holden
    Jul 17, 2006
  2. Kenneth McDonald
    Replies:
    3
    Views:
    296
    Paul Boddie
    Sep 7, 2007
  3. Gnarlodious

    Py3: Read file with Unicode characters

    Gnarlodious, Apr 8, 2010, in forum: Python
    Replies:
    4
    Views:
    360
    Gnarlodious
    Apr 8, 2010
  4. Anonieko Ramos

    What's wrong with rpc-literal? Why use doc-literal?

    Anonieko Ramos, Sep 27, 2004, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    400
    Anonieko Ramos
    Sep 27, 2004
  5. jmfauth
    Replies:
    2
    Views:
    209
    jmfauth
    Feb 29, 2012
Loading...

Share This Page