Anoying unicode / str conversion problem

H

Hans Müller

Hi python experts,

in the moment I'm struggling with an annoying problem in conjunction with mysql.

I'm fetching rows from a database, which the mysql drive returns as a list of tuples.

The default coding of the database is utf-8.

Unfortunately in the database there are rows with different codings and there is a blob
column.

In the app. I search for double entries in the database with this code.

hash = {}
cursor.execute("select * from table")
rows = cursor.fetchall()
for row in rows:
key = "|".join([str(x) for x in row]) <- here the problem arises
if key in hash:
print "found double entry"

This code works as expected with python 2.5.2
With 2.5.1 it shows this error:


key = "|".join(str(x) for x in row)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u017e' in position 3: ordinal
not in range(128)

When I replace the str() call by unicode(), I get this error when a blob column is being
processed:

key = "|".join(unicode(x) for x in row)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 119: ordinal not in
range(128)


Please help, how can I convert ANY column data to a string which is usable as a key to a
dictionary. The purpose of using a dictionary is to find equal rows in some database
tables. Perhaps using a md5 hash from the column data is also an idea ?

Thanks a lot in advance,

Hans.
 
P

Peter Otten

Hans said:
Hi python experts,

in the moment I'm struggling with an annoying problem in conjunction with
mysql.

I'm fetching rows from a database, which the mysql drive returns as a list
of tuples.

The default coding of the database is utf-8.

Unfortunately in the database there are rows with different codings and
there is a blob column.

In the app. I search for double entries in the database with this code.

hash = {}
cursor.execute("select * from table")
rows = cursor.fetchall()
for row in rows:
key = "|".join([str(x) for x in row]) <- here the problem arises
if key in hash:
print "found double entry"

This code works as expected with python 2.5.2
With 2.5.1 it shows this error:


key = "|".join(str(x) for x in row)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u017e' in
position 3: ordinal not in range(128)

When I replace the str() call by unicode(), I get this error when a blob
column is being processed:

key = "|".join(unicode(x) for x in row)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 119:
ordinal not in range(128)


Please help, how can I convert ANY column data to a string which is usable
as a key to a dictionary. The purpose of using a dictionary is to find
equal rows in some database tables. Perhaps using a md5 hash from the
column data is also an idea ?

Thanks a lot in advance,

No direct answer, but can't you put the rows into the dict (or a set)
without converting them to a string?

seen = set()
for row in rows:
if row in seen:
print "dupe"
else:
seen.add(row)


Or, even better, solve the problem within the db:

select <fields> from <table> group by <fields> having count(*) > 1

Peter
 
H

Hans Müller

Thanks Peter,

your answer did the trick.
I programed a lot with awk (also a very cool scripting language).
So I was focused on the concept a dictionary key has to be string
(as in awk). Since it's impossible to use a list as a dictionary
key I thought it's also impossible to use a tuple as a key.
I was wrong!
So my code will become more pythonic, much simpler and even faster!
Your suggestion to use the database is also an idea, but the actual
task is to see if some rows in (identical) tables across many
servers are missing and if so to add the missing rows.
In my post I showed a simplified code.

Again, thanks for the hint!

Greetings
Hans
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Python dict as unicode 1
Unicode 2
Thinking Unicode 0
Unicode error 19
Unicode confusion 0
Unicode 20
str and unicode proper usage 2
Python 3.3, gettext and Unicode problems 0

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,066
Latest member
VytoKetoReviews

Latest Threads

Top