sqlite utf8 encoding error

Discussion in 'Python' started by Greg Miller, Nov 17, 2005.

  1. Greg Miller

    Greg Miller Guest

    I have an application that uses sqlite3 to store job/error data. When
    I log in as a German user the error codes generated are translated into
    German. The error code text is then stored in the db. When I use the
    fetchall() to retrieve the data to generate a report I get the
    following error:

    Traceback (most recent call last):
    File "c:\Pest3\Glosser\baseApp\reportGen.py", line 199, in
    OnGenerateButtonNow
    self.OnGenerateButton(event)
    File "c:\Pest3\Glosser\baseApp\reportGen.py", line 243, in
    OnGenerateButton
    warningresult = messagecursor1.fetchall()
    UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
    unsupported Unicode code range

    does anyone have any idea on what could be going wrong? The string
    that I store in the database table is:

    'Keinen Text für Übereinstimmungsfehler gefunden'

    I thought that all strings were stored in unicode in sqlite.

    Greg Miller
     
    Greg Miller, Nov 17, 2005
    #1
    1. Advertising

  2. Greg Miller wrote:

    > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
    > unsupported Unicode code range
    >
    > does anyone have any idea on what could be going wrong? The string
    > that I store in the database table is:
    >
    > 'Keinen Text für Übereinstimmungsfehler gefunden'


    $ more test.py
    # -*- coding: iso-8859-1 -*-
    u = u'Keinen Text für Übereinstimmungsfehler gefunden'
    s = u.encode("iso-8859-1")
    u = s.decode("utf-8") # <-- this gives an error

    $ python test.py
    Traceback (most recent call last):
    File "test.py", line 4, in ?
    u = s.decode("utf-8") # <-- this gives an error
    File "lib/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
    UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
    unsupported Unicode code range

    > I thought that all strings were stored in unicode in sqlite.


    did you pass in a Unicode string or an 8-bit string when you stored the text ?

    </F>
     
    Fredrik Lundh, Nov 17, 2005
    #2
    1. Advertising

  3. Greg Miller enlightened us with:
    > 'Keinen Text für Übereinstimmungsfehler gefunden'


    You posted it as "Keinen Text f<FC>r ...", which is Latin-1, not
    UTF-8.

    > I thought that all strings were stored in unicode in sqlite.


    Only if you put them into the DB as such. Make sure you're inserting
    UTF-8 text, since the DB won't do character conversion for you.

    Sybren
    --
    The problem with the world is stupidity. Not saying there should be a
    capital punishment for stupidity, but why don't we just take the
    safety labels off of everything and let the problem solve itself?
    Frank Zappa
     
    Sybren Stuvel, Nov 17, 2005
    #3
  4. Greg Miller

    Jarek Zgoda Guest

    Fredrik Lundh napisa³(a):

    >>UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
    >>unsupported Unicode code range
    >>
    >>does anyone have any idea on what could be going wrong? The string
    >>that I store in the database table is:
    >>
    >>'Keinen Text für Übereinstimmungsfehler gefunden'

    >
    > $ more test.py
    > # -*- coding: iso-8859-1 -*-
    > u = u'Keinen Text für Übereinstimmungsfehler gefunden'
    > s = u.encode("iso-8859-1")
    > u = s.decode("utf-8") # <-- this gives an error
    >
    > $ python test.py
    > Traceback (most recent call last):
    > File "test.py", line 4, in ?
    > u = s.decode("utf-8") # <-- this gives an error
    > File "lib/encodings/utf_8.py", line 16, in decode
    > return codecs.utf_8_decode(input, errors, True)
    > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
    > unsupported Unicode code range


    I cann't wait for the moment when encoded strings go away from Python.
    The more I program in this language, the more confusion this difference
    is causing. Now most of functions and various object's methods accept
    strings and unicode, making it hard to find sources of Unicode*Errors.

    --
    Jarek Zgoda
    http://jpa.berlios.de/
     
    Jarek Zgoda, Nov 17, 2005
    #4
  5. Greg Miller

    Serge Orlov Guest

    Jarek Zgoda wrote:
    > Fredrik Lundh napisa³(a):
    >
    > >>UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
    > >>unsupported Unicode code range
    > >>
    > >>does anyone have any idea on what could be going wrong? The string
    > >>that I store in the database table is:
    > >>
    > >>'Keinen Text für Übereinstimmungsfehler gefunden'

    > >
    > > $ more test.py
    > > # -*- coding: iso-8859-1 -*-
    > > u = u'Keinen Text für Übereinstimmungsfehler gefunden'
    > > s = u.encode("iso-8859-1")
    > > u = s.decode("utf-8") # <-- this gives an error
    > >
    > > $ python test.py
    > > Traceback (most recent call last):
    > > File "test.py", line 4, in ?
    > > u = s.decode("utf-8") # <-- this gives an error
    > > File "lib/encodings/utf_8.py", line 16, in decode
    > > return codecs.utf_8_decode(input, errors, True)
    > > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
    > > unsupported Unicode code range

    >
    > I cann't wait for the moment when encoded strings go away from Python.
    > The more I program in this language, the more confusion this difference
    > is causing. Now most of functions and various object's methods accept
    > strings and unicode, making it hard to find sources of Unicode*Errors.


    Library writers can speed up the transition by hiding 8bit interface,
    for example:

    import sqlite
    sqlite.I_promise_to_pass_8bit_string_only_in_utf8_encoding(my_signature="sig.gif")

    if you don't call this function 8bit strings will not be accepted :)
    IMHO if libraries keep on excepting both str and unicode till python
    3.0, it will just prolong the confusion of unicode newbies instead of
    guiding them in the right direction _right now_.
     
    Serge Orlov, Nov 17, 2005
    #5
  6. On 17 Nov 2005 03:47:00 -0800, "Greg Miller" <>
    wrote:

    >I have an application that uses sqlite3 to store job/error data. When
    >I log in as a German user the error codes generated are translated into
    >German. The error code text is then stored in the db. When I use the
    >fetchall() to retrieve the data to generate a report I get the
    >following error:
    >
    >Traceback (most recent call last):
    > File "c:\Pest3\Glosser\baseApp\reportGen.py", line 199, in
    >OnGenerateButtonNow
    > self.OnGenerateButton(event)
    > File "c:\Pest3\Glosser\baseApp\reportGen.py", line 243, in
    >OnGenerateButton
    > warningresult = messagecursor1.fetchall()
    >UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
    >unsupported Unicode code range
    >
    >does anyone have any idea on what could be going wrong? The string
    >that I store in the database table is:
    >
    >'Keinen Text für Übereinstimmungsfehler gefunden'
    >
    >I thought that all strings were stored in unicode in sqlite.
    >



    No, they are stored as UTF-8 in sqlite and pysqlite has no way to make
    sure the string you insert into the database is really encoded in
    UTF-8 (the only secure way is to use Unicode strings).

    How did you insert that string?

    As a partial solution, try to disable automatic conversion of text
    fields in Unicode strings:


    def convert_text(s):
    # XXX do not use Unicode
    return s


    # Register the converter with SQLite
    sqlite.register_converter("TEXT", convert_text)


    ....connect("...",
    detect_types=sqlite.PARSE_DECLTYPES|sqlite.PARSE_COLNAMES
    )




    Regards Manlio Perillo
     
    Manlio Perillo, Nov 18, 2005
    #6
  7. Greg Miller

    Greg Miller Guest

    Thank you for all your suggestions. I ended up casting the string to
    unicode prior to inserting into the database.

    Greg Miller
     
    Greg Miller, Nov 18, 2005
    #7
  8. On 18 Nov 2005 09:09:24 -0800, "Greg Miller" <>
    wrote:

    >Thank you for all your suggestions. I ended up casting the string to
    >unicode prior to inserting into the database.
    >


    Don't do it by hand if it can be done by an automated system.

    Try with:

    from pysqlite2 import dbapi2 as sqlite

    def adapt_str(s):
    # if you have declared this encoding at begin of the module
    return s.decode("iso-8859-1")

    sqlite.register_adapter(str, adapt_str)


    Read pysqlite documentation for more informations:
    http://initd.org/pub/software/pysqlite/doc/usage-guide.html



    Regards Manlio Perillo
     
    Manlio Perillo, Nov 19, 2005
    #8
  9. Greg Miller

    Greg Miller Guest

    Thanks again, I'll look into this method.

    Greg Miller
     
    Greg Miller, Nov 21, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Erik Sandblom

    open with encoding(utf8) takes forever

    Erik Sandblom, May 28, 2004, in forum: Perl
    Replies:
    0
    Views:
    561
    Erik Sandblom
    May 28, 2004
  2. Hardy Wang

    Encoding.Default and Encoding.UTF8

    Hardy Wang, Jun 8, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    19,052
    Jon Skeet [C# MVP]
    Jun 9, 2004
  3. Carl Youngblood
    Replies:
    1
    Views:
    261
    Carl Youngblood
    Apr 9, 2005
  4. Replies:
    4
    Views:
    372
  5. gry
    Replies:
    2
    Views:
    822
    Alf P. Steinbach
    Mar 13, 2012
Loading...

Share This Page