Bug in python (Weird UnicodeDecodeError)

D

dbri.tcc

Hello

I am getting somewhat random UnicodeDecodeError messages in my program.

It is random in that I will be going through a pysqlite database of
records, manipulate
the results, and it will throw UnicodeDecodeError apparently without
regard
as to what data is being used.

For example I am reading in some book barcodes. These are 7 digit
strings. It processes a few thousand of these with no problem. But then
parway through the database results I get something like this:

for item in list:
UnicodeDecodeError : 'utf8' codec can't decode bytes in position 26-28:
invalid data

I am not quite sure how u'2349350' is 26 bytes long. Maybe it is. But
another time it said something like 'position 129'.

Furthermore when I rearrange the order the database data is retrieved
(with Order By), it barfs on a different piece of data than it did
before. I cannot figure it out.

I thought at first it was the eval() function being loopy, so I wrote
my own 'evalfix' function that handled the limited set of data i was
using. Then the thing barfed in a completely different spot.

And then I added more data to the database, and it barfed it yet a new
spot.

Any help is appreciated, thanks.
 
S

Scott David Daniels

... partway through the database results I get something like this:
for item in list:
UnicodeDecodeError : 'utf8' codec can't decode bytes in position 26-28:
invalid data
It is quite likely that the position is not what you think it is.
For one of the bad strings, print:
repr(thestring), [ord(ch) for ch in thestring]
This may give you a clue (and will definitely help us help you).
So far you have explained to us why you are confused, but have
not explained (with enough precision) what is going wrong in a
way that anyone can help you. I suspect that "position" is more
like a Unicode data point than the position within the string you
are feeding.

Show us the code doing the translation and the data it is being fed,
and we can help.

--Scott David Daniels
(e-mail address removed)
 
F

Fredrik Lundh

I am getting somewhat random UnicodeDecodeError messages in my program.

It is random in that I will be going through a pysqlite database of
records, manipulate
the results, and it will throw UnicodeDecodeError apparently without
regard
as to what data is being used.

For example I am reading in some book barcodes. These are 7 digit
strings. It processes a few thousand of these with no problem. But then
parway through the database results I get something like this:

for item in list:
UnicodeDecodeError : 'utf8' codec can't decode bytes in position 26-28:
invalid data

I am not quite sure how u'2349350' is 26 bytes long. Maybe it is. But
another time it said something like 'position 129'.

Furthermore when I rearrange the order the database data is retrieved
(with Order By), it barfs on a different piece of data than it did
before. I cannot figure it out.

it's probably a bug in pysqlite (or some other C extension), which does
some conversion somewhere, but forgets to check the return status.

(if you raise an exception at the C level, but forget to flag it back to
the interpreter when you return to Python, the error may occur in a
seemingly random location.)

you can usually

hasattr(None, "none")

to reset the error state (at least this worked in older versions; I think
it should work in 2.3 and 2.4 as well). try adding such calls after the
database calls, and see if the problem goes away... (if it does, com-
plain to the pysqlite developers).

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top