etree, minidom unicode

Discussion in 'Python' started by n00b, Dec 5, 2008.

  1. n00b

    n00b Guest

    hi,

    i have a feew questions concnering unicode and utf-8 handling and
    would appreciate any insights.

    1) i got a xml document, utf-8, encoded and been trying to use etree
    to parse and then commit to mysql db. using etree, everything i've
    been extracting is return as a string except ascii char > 127, which
    come back as a unicode. using minidom on the same document, however,
    i get all unicode. is there a way to 'force' etree to use unicode?

    2) i'm using mysql 5.x on * nix (mac, linux) and after much messing
    around, have things
    working, i.e. i have unicode from the (minidom) parser, set all mysql
    and mysqldb attributes, i get <str> back from mysql. is that expected
    behavior? #!/usr/bin/env python
    # -*- coding: UTF-8 -*-
    from xml.dom import minidom
    import MySQLdb
    import codecs
    from onix_model_01 import *

    db = MySQLdb.connect(host='localhost', user='root', passwd='',
    db='lsi', charset='utf8')
    cur = db.cursor()
    #cur.execute('SET NAMES utf8')
    #cur.execute('SET CHARACTER SET utf8')
    cur.execute('SET character_set_connection=utf8')
    cur.execute('SET character_set_server=utf8')
    cur.execute('''SHOW VARIABLES LIKE 'char%'; ''')
    ....
    >>> print 'firstname, lastname types from xml: ', type(a.firstname), type(a..lastname)
    >>>firstname, lastname types from xml: <type 'unicode'> <type 'unicode'>

    ....
    >>>cur.execute('''INSERT INTO encoding_test VALUES(null, %s, %s)''', (a.firstname, a.lastname))


    .... now i'm getting the results back from mysql

    >>>cur.execute('SELECT * FROM encoding_test')
    >>>query = cur.fetchall()
    >>>for q in query:

    ....print q, type(q[0]), type(q[1]), type(q[2])
    ....print q[1], q[2]
    ....print repr(q[1]), repr(q[2])

    >>>(24L, 'Bront\xc3\xab', 'Charlotte ') <type 'long'> <type 'str'> <type 'str'>
    >>> Brontë Charlotte
    >>>'Bront\xc3\xab' 'Charlotte '


    so everything is coming back as it should, but i though i would get
    the sql results back as unicode not str ... what gives?

    finally, from a utf-8 perspective, is there any advantage using innodb
    over myisam?

    thx
    n00b, Dec 5, 2008
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. webdev
    Replies:
    4
    Views:
    502
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Aug 6, 2005
  2. Fredrik Lundh

    Re: minidom and unicode errors

    Fredrik Lundh, Mar 7, 2006, in forum: Python
    Replies:
    0
    Views:
    484
    Fredrik Lundh
    Mar 7, 2006
  3. Daniel Nogradi

    list-like behaviour of etree.Element

    Daniel Nogradi, Mar 4, 2007, in forum: Python
    Replies:
    5
    Views:
    252
  4. Replies:
    3
    Views:
    511
    Stefan Behnel
    Aug 3, 2007
  5. Replies:
    6
    Views:
    1,936
Loading...

Share This Page