dealing with special characters in Python and MySQL

R

ronrsr

I have an MySQL database called zingers. The structure is:

zid - integer, key, autoincrement
keyword - varchar
citation - text
quotation - text

the encoding and collation is utf-8


I am having trouble storing text, as typed in last two fields. Special
characters and punctuation all seem not to be stored and retrieved
correctly.

Special apostrophes and single quotes from Microsoft Word are causing a
special problem, even though I have ''ed all 's

error messages, when trying to save to database:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position
71: ordinal not in range(128)
args = ('ascii', "update zingers set keywords = ' aaaaaa;Action',
....ation = '''''''''''''\n\n ' where zid = 422", 71, 72, 'ordinal not
in range(128)')
encoding = 'ascii'
end = 72
object = "update zingers set keywords = ' aaaaaa;Action',
....ation = '''''''''''''\n\n ' where zid = 422"
reason = 'ordinal not in range(128)'
start = 71


I think my problem may be that I need to encode the string before
saving it in the databse. Can anyone point me in the right direction
here?




Thanks so much,
 
D

Dennis Lee Bieber

I have an MySQL database called zingers. The structure is:

zid - integer, key, autoincrement
keyword - varchar
citation - text
quotation - text

the encoding and collation is utf-8
Why not dump the actual MySQL definition and include it...
error messages, when trying to save to database:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position
71: ordinal not in range(128)
args = ('ascii', "update zingers set keywords = ' aaaaaa;Action',
...ation = '''''''''''''\n\n ' where zid = 422", 71, 72, 'ordinal not
in range(128)')
encoding = 'ascii'
end = 72
object = "update zingers set keywords = ' aaaaaa;Action',
...ation = '''''''''''''\n\n ' where zid = 422"
reason = 'ordinal not in range(128)'
start = 71

And where is the actual code... What you've supplied doesn't show
how you are generating the SQL...
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
F

Fredrik Lundh

ronrsr said:
I have an MySQL database called zingers. The structure is:

zid - integer, key, autoincrement
keyword - varchar
citation - text
quotation - text

the encoding and collation is utf-8

I am having trouble storing text, as typed in last two fields. Special
characters and punctuation all seem not to be stored and retrieved
correctly.

are you passing in the strings as Unicode strings, or as something else?
if you're using something else, what have you done to tell the
database what it is?

</F>
 
R

ronrsr

structure for the DB:

CREATE TABLE `zingers` (
`zid` int(9) unsigned NOT NULL auto_increment,
`keywords` varchar(255) default NULL,
`citation` text,
`quotation` text,
PRIMARY KEY (`zid`)
) ENGINE=MyISAM DEFAULT CHARSET=UTF8 AUTO_INCREMENT=422 ;




code for storing to database:

querystring = "update zingers set keywords = '%s', citation =
'%s', quotation = %s' where zid = %d" %
(keywords,citation,quotation,zid)


;

cursor.execute(querystring)



at this point, querystring = update zingers set keywords = '
aaaaaa;Action', citation = '''''''''
', quotation = '''''''''''''

' where zid = 422

where '''' are an assortment of apostrophes and single quotes copied
and pasted from a Word Document.
 
R

ronrsr

are you passing in the strings as Unicode strings, or as something else?
if you're using something else, what have you done to tell the
database what it is?


not at all sure what I'm passing it as.

The database default encoding is utf-8
the database collation is utf-8
the page encoding of the page is utf-8

what else do I have to tell the database?
 
J

John Nagle

ronrsr said:
not at all sure what I'm passing it as.

The database default encoding is utf-8
the database collation is utf-8
the page encoding of the page is utf-8

what else do I have to tell the database?
Try putting "use_unicode=True" in the MySQLdb "connect" call.

See

http://sourceforge.net/tracker/index.php?func=detail&aid=1559350&group_id=22307&atid=374932

and look at other charset related bugs at

http://sourceforge.net/tracker/?atid=374932&group_id=22307&func=browse

Also note that MySQLdb didn't support this until recently, so check
your version of MySQLdb. There still seem to be problems in that area.

John Nagle
Animats
 
F

fumanchu

ronrsr said:
code for storing to database:

querystring = "update zingers set keywords = '%s', citation =
'%s', quotation = %s' where zid = %d" %
(keywords,citation,quotation,zid)

You're missing a single quote in there around the quotation %s.

Are you also replacing "\\" with r"\\" ? You should be.


Robert Brewer
System Architect
Amor Ministries
(e-mail address removed)
 
D

Dennis Lee Bieber

querystring = "update zingers set keywords = '%s', citation =
'%s', quotation = %s' where zid = %d" %
(keywords,citation,quotation,zid)


;

cursor.execute(querystring)
DON'T DO THAT!

querystring = """update zingers
set
keywords = %s,
citation = %s,
quotation = %s
where zid = %s"""

(note: I like triple quoting to make a readable statement)

cursor.execute(querystring,
(keywords, citation,
quotation, zid) )

Let the DB-API adapter do as much of the conversion as it can (it
will also properly delimit fields). With luck you won't even have to do
an encode/decode operation on any of the arguments first.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
R

ronrsr

Try putting "use_unicode=True" in the MySQLdb "connect" call.

tried that, and also added charset="utf8" -

now, I can't do any string operations, I get the error msg:

descriptor 'lower' requires a 'str' object but received a 'unicode'
args = ("descriptor 'lower' requires a 'str' object but received
a 'unicode'",)


or similar, on every string operation.

many thanks,

-0rsr-
 
F

Fredrik Lundh

ronrsr said:
now, I can't do any string operations, I get the error msg:

descriptor 'lower' requires a 'str' object but received a 'unicode'
args = ("descriptor 'lower' requires a 'str' object but received
a 'unicode'",)

what's a "string operation" in this context? are you trying to call
string methods by explicit calls to class methods in the str class:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor 'upper' requires a 'str' object but received a
'unicode'

instead of calling methods on the string object:
u'FOO'

? if so, what did you get that idea from?

</F>
 
L

Leo Kislov

ronrsr said:
tried that, and also added charset="utf8" -

now, I can't do any string operations, I get the error msg:

descriptor 'lower' requires a 'str' object but received a 'unicode'
args = ("descriptor 'lower' requires a 'str' object but received
a 'unicode'",)


or similar, on every string operation.

What is string operation? Every time you say "I get error" please
provide source code where this error occurs. And by the way, do you
know that for non-ascii characters you should use unicode type, not str
type?

-- Leo
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top