Re: is there a library/program that converts sqlite database fromwindows-1252 to utf-8 ?

Discussion in 'Python' started by Robert Kern, Sep 11, 2010.

  1. Robert Kern

    Robert Kern Guest

    On 9/11/10 4:45 PM, Stef Mientki wrote:
    > On 11-09-2010 21:11, Robert Kern wrote:
    >> SQLite internally stores its strings as UTF-8 or UTF-16 encoded Unicode. So it's not clear what
    >> you mean when you say the database is "windows-1252". Can you be more specific?

    > I doubt that, but I'm not sure ...


    From the documentation, it looks like SQLite does not attempt to validate the
    input as UTF-8 encoded, so it is possible that someone pushed in raw bytes. See
    "Support for UTF-8 and UTF-16" in the following page:

    http://www.sqlite.org/version3.html

    > For some databases written by other programs and
    > written with Python, with
    > cursor = self.conn.cursor ()
    > self.conn.text_factory = str
    >
    > Can only be read back with with text_factory = str
    > then the resulting string columns contains normal strings with windows 1252 coding, like character 0xC3


    You can probably use

    self.conn.text_factory = lambda x: x.decode('windows-1252')

    to read the data, though I've never tried to use that API myself.

    You will need to write a program yourself that opens one connection to your
    existing database for reading and another connection to another database (using
    the defaults) for writing. Then iterate over your tables and copy data from one
    database to the other.

    You may also be able to simply dump the database to a text file using "sqlite3
    bad-database.db .dump > bad-sql.sql", read the text file into Python as a
    string, decode it from windows-1252 to unicode and then encode it as utf-8 and
    write it back out. Then use "sqlite3 good-database.db .read good-sql.sql" to
    create the new database. I've never tried such a thing, so it may not work.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, Sep 11, 2010
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Noé Alejandro Castro Sánchez

    From UTF-8 to windows-1252

    Noé Alejandro Castro Sánchez, Jan 6, 2011, in forum: Ruby
    Replies:
    3
    Views:
    278
    Y. NOBUOKA
    Jan 7, 2011
  2. nevosa
    Replies:
    5
    Views:
    253
    David Squire
    Jul 11, 2006
  3. nevosa
    Replies:
    0
    Views:
    86
    nevosa
    Jul 10, 2006
  4. Replies:
    3
    Views:
    257
  5. Joe
    Replies:
    7
    Views:
    303
    Dr.Ruud
    Dec 20, 2012
Loading...

Share This Page