MySQL: 'latin-1' codec can't encode character

F

francescomoi

Hi.

I'm trying to store a text within a MySQL field (v 3.23.58) by using
MySQLdb
(v 1.2.1c3).

The text is: "telephone..." (note the last character)

And I get this error message:
-----------
File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py", line
33, in defaulterrorhandler
raise errorclass, errorvalue
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2026' in
position 288: ordinal not in range(256)
-----------------------

Position 288 is the character I've mentioned. I suppose I must encode
this caracter
into a right one which MySQL could store, but I have no idea about how
to perform
it. Any suggestion?

Thank you very much.
 
F

Fredrik Lundh

"(e-mail address removed)"
I'm trying to store a text within a MySQL field (v 3.23.58) by using
MySQLdb
(v 1.2.1c3).

The text is: "telephone..." (note the last character)

And I get this error message:
-----------
File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py", line
33, in defaulterrorhandler
raise errorclass, errorvalue
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2026' in
position 288: ordinal not in range(256)
-----------------------

Position 288 is the character I've mentioned. I suppose I must encode
this caracter
into a right one which MySQL could store, but I have no idea about how
to perform
it. Any suggestion?

the character \u2026 is not part of the ISO-8859-1 character set. if you
insist on storing that in 8-bit string, you have to find an 8-bit encoding
that includes that character (UTF-8 is one such alternative).

if MySQL is set to store ISO-8859-1 only, you can replace the character
with it with three periods, drop it (use the "ignore" encoding option) or
replace it with a suitable marker (use the "replace" encoding option).

</F>
 
F

francescomoi

Hi Fredrik.

Thank you very much for your quick answer.

Do you suggest to change it by using regexp or must I encode the whole
texto into a suitable one?

Regards.
 
F

Fredrik Lundh

Thank you very much for your quick answer.

Do you suggest to change it by using regexp or must I encode the whole
texto into a suitable one?

a simple solution would be to manually create a table of problematic
unicode characters, use the translate method on the unicode string,
and then encode using the "replace" option.

charmap = {
0x2026: u"...",
# ...
}

text = u'telephone\u2026'

text = text.translate(charmap)
text = text.encode("iso-8859-1", "replace")

print text

http://docs.python.org/lib/string-methods.html

if you want more control of the replacement, you can skip the translate
step and use your own error handler, e.g.

charmap = ... see above ...

def fixunicode(info):
s = info.object[info.start:info.end]
try:
return charmap[ord(s)], info.end
except KeyError:
# fallback
return u"<U+%04x>" % ord(s), info.end

import codecs
codecs.register_error("fixunicode", fixunicode)

text = u'telephone\u2026'

text = text.encode("iso-8859-1", "fixunicode")

hope this helps!

</F>
 
?

=?ISO-8859-1?Q?Walter_D=F6rwald?=

Fredrik said:
[...]
if you want more control of the replacement, you can skip the translate
step and use your own error handler, e.g.

charmap = ... see above ...

def fixunicode(info):
s = info.object[info.start:info.end]
try:
return charmap[ord(s)], info.end

This will fail if there's more than one consecutive unencodable
character, better use
return charmap[ord(s[0])], info.start+1
or
return "".join(charmap.get(ord(c), u"<U+%04x>" % ord(c)) for c in
s), info.end
(without the try:) instead.

Bye,
Walter Dörwald
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top