sqlite fetchall breacking because decoding.

T

tyoc

here is the snips that can reproduce my problem in the system Im
working http://tyoc.nonlogic.org/darcsrepos/snip1.zip it is 2 files a
sample db with 1 row and a py with 1 function for read that row

The output I get is the following:

--------------------------------------------------

Traceback (most recent call last):
File "C:\Documents and Settings\dortiz\Escritorio\x\snip1.py", line
14, in <module>
l = CargaRegs(basededatos, "identificadores")
File "C:\Documents and Settings\dortiz\Escritorio\x\snip1.py", line
6, in CargaRegs
db_curs.execute("SELECT * FROM " + tabla)
OperationalError: Could not decode to UTF-8 column 't' with text 'some
texto--------------- años'


I dont understand exactly (thought I know is a decode error) but I
have created the dabatabse with python reading a CSV file that
apparently have encoding ANSI or CP1252 because the "ñ" is stored as
0xF1.


The problem like I understand is that python try to decode it like if
it where UTF8 (if the database contain ASCII chars only, this work
just great, but with other chars like "ñ" this doesnt work).

Like you see the problem come when I do db_curs.fetchall() thus I dont
know how to fetchall without this problem.

The database is not corrupt, I mean Im sure if I do a C program for
read and print the row, it will get it and just printit and not fail
like this, also you can open the DB directly with sqlite3 and
just .dump and it work at less it can fetch the row without fail (even
that in the depending on the console ir will or not print the correct
character).
 
F

Fredrik Lundh

tyoc said:
The database is not corrupt, I mean Im sure if I do a C program for
read and print the row, it will get it and just printit and not fail
like this

well, the database *is* corrupt, since sqlite3 (both the engine and the
Python binding) expects you to use a supported encoding for the data
stored in the database:

http://www.sqlite.org/datatype3.html
http://docs.python.org/lib/node346.html

the fact that you're able to use an API that doesn't care about
encodings at all to violate the database requirements doesn't
necessarily mean that an API that does the right thing is broken...

if you're not able to fix your database, you have to plug in a custom
text_factory handler. this should work:

conn = sqlite3.Connect(...)
conn.text_factory = str

also see:

http://docs.python.org/lib/sqlite3-Connection-Objects.html

</F>
 
T

tyoc

well, the database *is* corrupt, since sqlite3 (both the engine and the
Python binding) expects you to use a supported encoding for the data
stored in the database:

http://www.sqlite.org/datatype3.html
http://docs.python.org/lib/node346.html

Still like I said before, I have imported that data from python source
was a CVS with that encoding, I readed the lines with module CSV, then
used insert to the database like:


-----
stmt = "INSERT INTO %s (CID, t) VALUES (?,?)"%(tabla)
for l in ls:
db_curs.execute(stmt, (l[0], l[2]))
db_connection.commit()
-----
Where l[0] and l[2] where the ones read from the CSV each row was
append to ls, l[1] was the key thus I dont need it here.



Thus this "corruption" can only come from those statements (the ones
adding data), thus is doable that you corrupt your database within
python without see it until you try to read them (because even that
the ASCII codec can't decode 0xF1 an ASCII string 'container' can
contain it...).


Anyway, I think the data is stored as cp1252 this show it:'\xf1a\xf1a'
that is the 0xF1 that I can see, but it need "encoded" like "\xf1" not
like the byte 0xF1.

I read the original CSV file with normal open... thus I gess I need
some like

f = codecs.open("csv.txt", encoding="cp1252")
ls = f.readlines()

And then store the lines and see if sqlite can handle those lines?, I
will try that.
 
F

Fredrik Lundh

tyoc said:
Still like I said before, I have imported that data from python source
was a CVS with that encoding, I readed the lines with module CSV, then
used insert to the database like:

the CSV module doesn't decode stuff for you; that's up to your code. see

http://docs.python.org/lib/csv-examples.html#csv-examples

for sample code (scroll down to the paragraph that starts with "The csv
module doesn't directly support reading and writing Unicode").

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top