SQlite none english char

G

Gandalf

I works with python 2.5 on windows, And I use sqlite3

Now, I have problem searching string in Hebrew in my database

I have table called "test" with field num and test
firs row i insert "1" and "עברית" (that is "Hebrew" in Hebrew)
second row i insert "2" and "English"

now this code will print meas it should:

i="Englisht"
cur.execute("select * from `test` where text like '%"+i+"%' ")
for row in cur:
print row[1]

but this one print me nothinginstead of עברית

i="עברית"
cur.execute("select * from `test` where text like '%"+i+"%' ")
for row in cur:
print row[1]

does any one have an idea how can i solve it?
 
G

Gerhard Häring

Gandalf said:
I works with python 2.5 on windows, And I use sqlite3

Now, I have problem searching string in Hebrew in my database

I have table called "test" with field num and test
firs row i insert "1" and "עברית" (that is "Hebrew" in Hebrew)
second row i insert "2" and "English" [...]

I recommend you use Unicode strings for all your Python strings in the
application. You can then be that things will just work with the sqlite3
module.

The problem is that the sqlite3 module doesn't currently check if what
you insert into the database is in UTF-8 encoding. But if it isn't,
you're likely to run into problems later on.

So, let's assume that you've written your Python using a UTF-8 editor,
you can then use something like:

# coding: utf-8

as the first line in the script. Among others, this will allow you to
write Unicode literals in UTF-8, i. e. do something like:

data = u"עברית".

then you can insert this into the database:

cur.execute("insert into test(test) values (?)", (data,))

Note that I use the parametrized form of execute() with ? as
placeholders. *Always* use this when the SQL statement is not constant,
but has parameters of some sort.
[...]
cur.execute("select * from `test` where text like '%"+i+"%' ")
for row in cur:
print row[1]

but this one print me nothing
instead of עברית

This could be two problems. Either (likely) it simply isn't found
because you inserted garbage (non-UTF-8 data) or you got a decoding
error to UTF-8 executing the select, which the sqlite3 module will
currently unfortunately ignore and return a None value instead.

Again, use the parametrized form of the execute() statement with Unicode
Python strings to avoid any problems.

cur.execute("select * from test where text like '%' || ? || '%'",
(searchstr,))

!!! Ok, I just found out that you used the + operator in SQL to
concatenate strings. Don't, as it will not work (except on maybe MySQL).
Use the || operator instead!

Note that when you use cur.execute(sql, params), then params is actually
a tuple of parameters. In the above examples there was only one
parameter, so it was a one-tuple, which is written as (element,).
Dont'write it as (element), because Python will not recognize it as a
tuple then.

Don't hesitate to ask if you need further help.

-- Gerhard
 
G

Gandalf

Gandalf said:
I works with python 2.5 on windows,  And I use sqlite3
Now, I have problem searching string in Hebrew in my database
I have table called "test" with field num and test
firs row i insert  "1" and "עברית"  (that is "Hebrew" in Hebrew)
second row i insert "2" and "English" [...]

I recommend you use Unicode strings for all your Python strings in the
application. You can then be that things will just work with the sqlite3
module.

The problem is that the sqlite3 module doesn't currently check if what
you insert into the database is in UTF-8 encoding. But if it isn't,
you're likely to run into problems later on.

So, let's assume that you've written your Python using a UTF-8 editor,
you can then use something like:

# coding: utf-8

as the first line in the script. Among others, this will allow you to
write Unicode literals in UTF-8, i. e. do something like:

data = u"עברית".

then you can insert this into the database:

cur.execute("insert into test(test) values (?)", (data,))

Note that I use the parametrized form of execute() with ? as
placeholders. *Always* use this when the SQL statement is not constant,
but has parameters of some sort.
[...]
cur.execute("select *  from `test` where text like '%"+i+"%'  ")
for row in cur:
    print row[1]
but this one print me nothing
instead of עברית

This could be two problems. Either (likely) it simply isn't found
because you inserted garbage (non-UTF-8 data) or you got a decoding
error to UTF-8 executing the select, which the sqlite3 module will
currently unfortunately ignore and return a None value instead.

Again, use the parametrized form of the execute() statement with Unicode
Python strings to avoid any problems.

cur.execute("select * from test where text like '%' || ? || '%'",
(searchstr,))

!!! Ok, I just found out that you used the + operator in SQL to
concatenate strings. Don't, as it will not work (except on maybe MySQL).
Use the || operator instead!

Note that when you use cur.execute(sql, params), then params is actually
a tuple of parameters. In the above examples there was only one
parameter, so it was a one-tuple, which is written as (element,).
Dont'write it as (element), because Python will not recognize it as a
tuple then.

Don't hesitate to ask if you need further help.

-- Gerhard

I solved the problem by entering data manually but now the problem is
that i had to delete this function:
con.text_factory = str
and now I can't see the data that I entered thought my sqlite manager
 
G

Gerhard Häring

Gandalf said:
[...]
I solved the problem by entering data manually but now the problem is
that i had to delete this function:
con.text_factory = str
and now I can't see the data that I entered thought my sqlite manager

Then apparently there is still non-UTF-8 data in the database. Perhaps
SQLite Manager allows you to insert invalid data as well?

Try this:

con.text_factory = lambda s: unicode(s, "utf-8", "replace")

This will decode to UTF-8 as good as possible, and for non-decodable
characters you will get a REPLACEMENT CHARACTER instead.

To demonstrate:
ok, too')).fetchone()[0]

The chr(230) is just random garbage that isn't valid UTF-8:
text ok �k, too

As you can see, there is now a strange character, but the second 'ok,
too' got messed up :-/ But at least you can then find out which rows in
the database have messed up data. You can then iterate over all rows
with something like:

Then, assuming the text to check for validity is in result, you can do
something like:
True

Does this help?

-- Gerhard

PS: This thread reinforces my believe that I have to make it harder for
users of pysqlite to make themselves shoot in the foot with non-UTF-8 data.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,144
Latest member
KetoBaseReviews
Top