Python encoding question

Discussion in 'Python' started by Marc Muehlfeld, Feb 25, 2011.

  1. Hi,

    I'm doing my first steps with python and I have a problem with understanding
    an encoding problem I have. My script:

    import os
    os.environ["NLS_LANG"] = "German_Germany.UTF8"
    import cx_Oracle
    connection = cx_Oracle.Connection("username/password@SID")
    cursor = connection.cursor()
    cursor.execute("SELECT NAME1 FROM COR WHERE CORNB='ABCDEF'")
    TEST = cursor.fetchone()
    print TEST[0]
    print TEST


    When I run this script It prints me:
    München
    ('M\xc3\xbcnchen',)

    Why is the Umlaut of TEST[0] printed and not from TEST?


    And why are both prints show the wrong encoding, when I switch "fetchone()" to
    "fetchall()":
    ('M\xc3\xbcnchen',)
    [('M\xc3\xbcnchen',)]


    I'm running Python 2.4.3 on CentOS 5.


    Regards,
    Marc
    Marc Muehlfeld, Feb 25, 2011
    #1
    1. Advertising

  2. Marc Muehlfeld wrote:
    > Hi,
    >
    > I'm doing my first steps with python and I have a problem with
    > understanding an encoding problem I have. My script:
    >
    > import os
    > os.environ["NLS_LANG"] = "German_Germany.UTF8"
    > import cx_Oracle
    > connection = cx_Oracle.Connection("username/password@SID")
    > cursor = connection.cursor()
    > cursor.execute("SELECT NAME1 FROM COR WHERE CORNB='ABCDEF'")
    > TEST = cursor.fetchone()
    > print TEST[0]
    > print TEST
    >
    >
    > When I run this script It prints me:
    > München
    > ('M\xc3\xbcnchen',)
    >
    > Why is the Umlaut of TEST[0] printed and not from TEST?
    >
    >
    > And why are both prints show the wrong encoding, when I switch
    > "fetchone()" to "fetchall()":
    > ('M\xc3\xbcnchen',)
    > [('M\xc3\xbcnchen',)]
    >
    >
    > I'm running Python 2.4.3 on CentOS 5.
    >
    >
    > Regards,
    > Marc

    Nothing related to encoding here. TEST[0] is a string, TEST is a tupple.

    s1 = 'aline \n anotherline'

    > print str(s1)

    aline
    anotherline

    > print repr(s1)

    'aline \n anotherline'

    atuple = (s1,)
    > print str(atuple)

    ('aline \n anotherline',)

    > print repr(atuple)

    ('aline \n anotherline',)

    Read http://docs.python.org/reference/datamodel.html regarding __repr__
    and __str__.

    Basically, __str__ and __repr__ are the same method for tuples, while it
    differs from each other for strings.
    If you want a nice representation of tuple elements you have to do it
    yourself:

    print ', '.join([str(elem) for elem in atuple])

    In a more general manner only strings will print nicely with carriage
    returns & UTF8 characters. Everyhing else, like tuple, lists, objects
    will using the __repr__ method which displays formal data.

    JM

    PS :

    > class Foo:

    def __str__(self):
    return 'I am a nice representation of a Foo instance'



    > print Foo()

    I am a nice representation of a Foo instance

    > print str(Foo())

    I am a nice representation of a Foo instance

    > print repr(Foo())

    <__main__.Foo instance at 0xb73a07ac>
    Jean-Michel Pichavant, Feb 25, 2011
    #2
    1. Advertising

  3. Marc Muehlfeld

    Dave Angel Guest

    On 01/-10/-28163 02:59 PM, Marc Muehlfeld wrote:
    > Hi,
    >
    > <snip>
    > TEST = cursor.fetchone()
    > print TEST[0]
    > print TEST
    >
    >
    > When I run this script It prints me:
    > München
    > ('M\xc3\xbcnchen',)
    >
    > Why is the Umlaut of TEST[0] printed and not from TEST?
    >


    When you print a string, it simply prints it, control characters,
    international characters, and all.

    When you print a more complex object, it's up to that object to decide
    how to print. In the case of a tuple above, the tuple logic displays
    the parentheses and the comma, but calls the repr() of any objects it
    contains. Tuple doesn't make a special case for strings, or for
    numbers, it just always calls repr() (actually it's __repr__(), I think)

    A list does the same thing, though it'll use square brackets at the ends.

    So the question boils down to what repr() does. It attempts to create a
    representation that could be used to create the specific object. So if
    there's a newline, it uses \n. And if there are non-ASCII codes, it
    uses hex escape sequences. And of course it adds the quote marks.

    DaveA
    Dave Angel, Feb 25, 2011
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hardy Wang

    Encoding.Default and Encoding.UTF8

    Hardy Wang, Jun 8, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    18,805
    Jon Skeet [C# MVP]
    Jun 9, 2004
  2. Replies:
    1
    Views:
    23,320
    Real Gagnon
    Oct 8, 2004
  3. Peter Otten
    Replies:
    0
    Views:
    409
    Peter Otten
    Nov 30, 2010
  4. Peter Otten
    Replies:
    10
    Views:
    857
    Nobody
    Dec 2, 2010
  5. Replies:
    2
    Views:
    356
Loading...

Share This Page