MySQL: 'latin-1' codec can't encode character

Discussion in 'Python' started by francescomoi@europe.com, May 13, 2005.

  1. Guest

    Hi.

    I'm trying to store a text within a MySQL field (v 3.23.58) by using
    MySQLdb
    (v 1.2.1c3).

    The text is: "telephone..." (note the last character)

    And I get this error message:
    -----------
    File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py", line
    33, in defaulterrorhandler
    raise errorclass, errorvalue
    UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2026' in
    position 288: ordinal not in range(256)
    -----------------------

    Position 288 is the character I've mentioned. I suppose I must encode
    this caracter
    into a right one which MySQL could store, but I have no idea about how
    to perform
    it. Any suggestion?

    Thank you very much.
     
    , May 13, 2005
    #1
    1. Advertising

  2. ""

    > I'm trying to store a text within a MySQL field (v 3.23.58) by using
    > MySQLdb
    > (v 1.2.1c3).
    >
    > The text is: "telephone..." (note the last character)
    >
    > And I get this error message:
    > -----------
    > File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py", line
    > 33, in defaulterrorhandler
    > raise errorclass, errorvalue
    > UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2026' in
    > position 288: ordinal not in range(256)
    > -----------------------
    >
    > Position 288 is the character I've mentioned. I suppose I must encode
    > this caracter
    > into a right one which MySQL could store, but I have no idea about how
    > to perform
    > it. Any suggestion?


    the character \u2026 is not part of the ISO-8859-1 character set. if you
    insist on storing that in 8-bit string, you have to find an 8-bit encoding
    that includes that character (UTF-8 is one such alternative).

    if MySQL is set to store ISO-8859-1 only, you can replace the character
    with it with three periods, drop it (use the "ignore" encoding option) or
    replace it with a suitable marker (use the "replace" encoding option).

    </F>
     
    Fredrik Lundh, May 13, 2005
    #2
    1. Advertising

  3. Guest

    Hi Fredrik.

    Thank you very much for your quick answer.

    Do you suggest to change it by using regexp or must I encode the whole
    texto into a suitable one?

    Regards.

    Fredrik Lundh wrote:
    > ""
    >
    > > I'm trying to store a text within a MySQL field (v 3.23.58) by

    using
    > > MySQLdb
    > > (v 1.2.1c3).
    > >
    > > The text is: "telephone..." (note the last character)
    > >
    > > And I get this error message:
    > > -----------
    > > File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py",

    line
    > > 33, in defaulterrorhandler
    > > raise errorclass, errorvalue
    > > UnicodeEncodeError: 'latin-1' codec can't encode character

    u'\u2026' in
    > > position 288: ordinal not in range(256)
    > > -----------------------
    > >
    > > Position 288 is the character I've mentioned. I suppose I must

    encode
    > > this caracter
    > > into a right one which MySQL could store, but I have no idea about

    how
    > > to perform
    > > it. Any suggestion?

    >
    > the character \u2026 is not part of the ISO-8859-1 character set. if

    you
    > insist on storing that in 8-bit string, you have to find an 8-bit

    encoding
    > that includes that character (UTF-8 is one such alternative).
    >
    > if MySQL is set to store ISO-8859-1 only, you can replace the

    character
    > with it with three periods, drop it (use the "ignore" encoding

    option) or
    > replace it with a suitable marker (use the "replace" encoding

    option).
    >
    > </F>
     
    , May 13, 2005
    #3
  4. wrote:

    > > > File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py",

    > line
    > > > 33, in defaulterrorhandler
    > > > raise errorclass, errorvalue
    > > > UnicodeEncodeError: 'latin-1' codec can't encode character

    > u'\u2026' in
    > > > position 288: ordinal not in range(256)


    > Thank you very much for your quick answer.
    >
    > Do you suggest to change it by using regexp or must I encode the whole
    > texto into a suitable one?


    a simple solution would be to manually create a table of problematic
    unicode characters, use the translate method on the unicode string,
    and then encode using the "replace" option.

    charmap = {
    0x2026: u"...",
    # ...
    }

    text = u'telephone\u2026'

    text = text.translate(charmap)
    text = text.encode("iso-8859-1", "replace")

    print text

    http://docs.python.org/lib/string-methods.html

    if you want more control of the replacement, you can skip the translate
    step and use your own error handler, e.g.

    charmap = ... see above ...

    def fixunicode(info):
    s = info.object[info.start:info.end]
    try:
    return charmap[ord(s)], info.end
    except KeyError:
    # fallback
    return u"<U+%04x>" % ord(s), info.end

    import codecs
    codecs.register_error("fixunicode", fixunicode)

    text = u'telephone\u2026'

    text = text.encode("iso-8859-1", "fixunicode")

    hope this helps!

    </F>
     
    Fredrik Lundh, May 13, 2005
    #4
  5. Fredrik Lundh wrote:

    > [...]
    > if you want more control of the replacement, you can skip the translate
    > step and use your own error handler, e.g.
    >
    > charmap = ... see above ...
    >
    > def fixunicode(info):
    > s = info.object[info.start:info.end]
    > try:
    > return charmap[ord(s)], info.end


    This will fail if there's more than one consecutive unencodable
    character, better use
    return charmap[ord(s[0])], info.start+1
    or
    return "".join(charmap.get(ord(c), u"<U+%04x>" % ord(c)) for c in
    s), info.end
    (without the try:) instead.

    Bye,
    Walter Dörwald
     
    =?ISO-8859-1?Q?Walter_D=F6rwald?=, May 13, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. oziko
    Replies:
    1
    Views:
    526
    Leif K-Brooks
    Aug 17, 2004
  2. Martin Slouf
    Replies:
    6
    Views:
    938
    Martin Slouf
    Aug 18, 2004
  3. Ben Last
    Replies:
    0
    Views:
    428
    Ben Last
    Aug 17, 2004
  4. oziko
    Replies:
    2
    Views:
    11,470
    Diez B. Roggisch
    Aug 17, 2004
  5. thomas Armstrong

    'ascii' codec can't encode character u'\u2013'

    thomas Armstrong, Sep 30, 2005, in forum: Python
    Replies:
    3
    Views:
    4,490
    John J. Lee
    Sep 30, 2005
Loading...

Share This Page