Convertion of Unicode to ASCII NIGHTMARE

S

Serge Orlov

ChaosKCW said:
If what you say is true, I have to ask why I get a converstion error
which states it cant convert to ASCII, not it cant convert to UNICODE?

You do get error about convertion to unicode. Quote from you message:
SQLiteCur.execute(sql, row)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdc in position 12: ordinal not in >range(128)

Notice the name of the error: UnicodeDecode or in other words
ToUnicode.

So I am missing how a
function which supposedly converts evereythin to unicode lands up doing
an ascii converstion ?

When python tries to concatenate a byte string and a unicode string, it
assumes that the byte string is encoded ascii and tries to convert from
encoded ascii to unicode. It calls ascii decoder to do the decoding. If
decoding fails you see message from ascii decoder about the error.

Serge
 
P

Paul Boddie

ChaosKCW said:
That is an EVIL setting which should not be used. The NLS_CHARSET
environment variable causes so many headaches its not worth playing
with it at all.

Well, at this very point in time I don't remember the preferred way of
getting Oracle, the client libraries and the database adapter to agree
on the character encoding used for communicating data between
applications and the database system. Nevertheless, what you need to do
is to make sure that you know which encoding is used so that if you
either get plain strings (ie. not Unicode objects) out of the database,
or if you need to write plain strings to the database, you can provide
the encoding to the unicode built-in function or to the decode/encode
methods; this is much better than just stripping out characters that
can't be represented by ASCII.

Anyway, despite my objections to digging through Oracle documentation,
I found the following useful documents: the "Globalization Support"
index [1], an FAQ about NLS_LANG [2], and a white paper about Unicode
support in Oracle [3]. It may well be the case that NLS_LANG might help
you do what you want, but since the database systems I have installed
(PostgreSQL, sqlite3) seem to "do Unicode" without such horsing around,
I'm not really able to offer much more advice on this subject.

Paul

[1]
http://www.oracle.com/technology/tech/globalization/index.html
[2]
http://www.oracle.com/technology/tech/globalization/htdocs/nls_lang
faq.htm
[3]
http://www.oracle.com/technology/tech/globalization/pdf/TWP_AppDev_Unicode_10gR2.pdf
 
C

ChaosKCW

When python tries to concatenate a byte string and a unicode string, it
assumes that the byte string is encoded ascii and tries to convert from
encoded ascii to unicode. It calls ascii decoder to do the decoding. If
decoding fails you see message from ascii decoder about the error.

Serge

Ok I get it now. Sorry for the slowness. I have to say as a lover of
python for its simplicity and clarity, the charatcer set thing has been
harder than I would have liked to figure out.

Thanks for all the help.
 
S

Serge Orlov

ChaosKCW said:
Ok I get it now. Sorry for the slowness. I have to say as a lover of
python for its simplicity and clarity, the charatcer set thing has been
harder than I would have liked to figure out.

I think there is a room for improvement here. In my opinion the message
is too confusing for newbies. It would be easier for them if there is a
mini tutorial available about what's going on with links to other more
broad tutorials (like unicode tutorial). Instead of generic error
----------------------
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa5 in position 0:
ordinal not in range(128)
----------------------

It can be like this. Notice special url, it is pointing to a
non-existing :) tutorial about why concatenating byte strings with
unicode strings can produce UnicodeDecodeError
----------------------
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa5 in position 0:
ordinal
not in range(128)
For additional information about this exception see:
http://docs.python.org/2.4/exceptions/concat+UnicodeDecodeError+str
----------------------

Here is the sample code how it can be done:

---------------------------
extended_help = {
("concat", UnicodeDecodeError, str, unicode):
"http://docs.python.org/2.4/exceptions/concat+UnicodeDecodeError+str",
("concat", UnicodeDecodeError, unicode, str):
"http://docs.python.org/2.4/exceptions/concat+UnicodeDecodeError+str"
}

def get_more_help(error, key):
if not extended_help.has_key(key):
return
error.reason += "\nFor additional information about this exception
see:\n "
error.reason += extended_help[key]


def concat(s1,s2):
try:
return s1 + s2
except Exception, e:
key = "concat", e.__class__, type(s1), type(s2)
get_more_help(e, key)
raise

concat(chr(0xA5),unichr(0x5432))
 
R

Roger Binns

Serge Orlov said:
It can be like this. Notice special url, it is pointing to a
non-existing :) tutorial about why concatenating byte strings with
unicode strings can produce UnicodeDecodeError

An alternate is to give an error code that people can use to
look up (and users can report). It also has the advantage
of being language neutral. You can see this kind of approach
on IBM systems (mainframe, AIX etc) and Oracle.

Roger
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,188
Latest member
Crypto TaxSoftware

Latest Threads

Top