Convertion of Unicode to ASCII NIGHTMARE

Serge Orlov · Apr 10, 2006

ChaosKCW said:
If what you say is true, I have to ask why I get a converstion error
which states it cant convert to ASCII, not it cant convert to UNICODE?

You do get error about convertion to unicode. Quote from you message:

SQLiteCur.execute(sql, row)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdc in position 12: ordinal not in >range(128)

Notice the name of the error: UnicodeDecode or in other words
ToUnicode.

So I am missing how a
function which supposedly converts evereythin to unicode lands up doing
an ascii converstion ?

When python tries to concatenate a byte string and a unicode string, it
assumes that the byte string is encoded ascii and tries to convert from
encoded ascii to unicode. It calls ascii decoder to do the decoding. If
decoding fails you see message from ascii decoder about the error.

Serge

Paul Boddie · Apr 10, 2006

ChaosKCW said:
That is an EVIL setting which should not be used. The NLS_CHARSET
environment variable causes so many headaches its not worth playing
with it at all.

Well, at this very point in time I don't remember the preferred way of
getting Oracle, the client libraries and the database adapter to agree
on the character encoding used for communicating data between
applications and the database system. Nevertheless, what you need to do
is to make sure that you know which encoding is used so that if you
either get plain strings (ie. not Unicode objects) out of the database,
or if you need to write plain strings to the database, you can provide
the encoding to the unicode built-in function or to the decode/encode
methods; this is much better than just stripping out characters that
can't be represented by ASCII.

Anyway, despite my objections to digging through Oracle documentation,
I found the following useful documents: the "Globalization Support"
index [1], an FAQ about NLS_LANG [2], and a white paper about Unicode
support in Oracle [3]. It may well be the case that NLS_LANG might help
you do what you want, but since the database systems I have installed
(PostgreSQL, sqlite3) seem to "do Unicode" without such horsing around,
I'm not really able to offer much more advice on this subject.

Paul

[1]
http://www.oracle.com/technology/tech/globalization/index.html
[2]
http://www.oracle.com/technology/tech/globalization/htdocs/nls_lang
faq.htm
[3]
http://www.oracle.com/technology/tech/globalization/pdf/TWP_AppDev_Unicode_10gR2.pdf

ChaosKCW · Apr 10, 2006

When python tries to concatenate a byte string and a unicode string, it
assumes that the byte string is encoded ascii and tries to convert from
encoded ascii to unicode. It calls ascii decoder to do the decoding. If
decoding fails you see message from ascii decoder about the error.

Serge

Ok I get it now. Sorry for the slowness. I have to say as a lover of
python for its simplicity and clarity, the charatcer set thing has been
harder than I would have liked to figure out.

Thanks for all the help.

Serge Orlov · Apr 10, 2006

ChaosKCW said:
Ok I get it now. Sorry for the slowness. I have to say as a lover of
python for its simplicity and clarity, the charatcer set thing has been
harder than I would have liked to figure out.

I think there is a room for improvement here. In my opinion the message
is too confusing for newbies. It would be easier for them if there is a
mini tutorial available about what's going on with links to other more
broad tutorials (like unicode tutorial). Instead of generic error
----------------------
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa5 in position 0:
ordinal not in range(128)
----------------------

It can be like this. Notice special url, it is pointing to a
non-existing

tutorial about why concatenating byte strings with
unicode strings can produce UnicodeDecodeError
----------------------
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa5 in position 0:
ordinal
not in range(128)
For additional information about this exception see:
http://docs.python.org/2.4/exceptions/concat+UnicodeDecodeError+str
----------------------

Here is the sample code how it can be done:

---------------------------
extended_help = {
("concat", UnicodeDecodeError, str, unicode):
"http://docs.python.org/2.4/exceptions/concat+UnicodeDecodeError+str",
("concat", UnicodeDecodeError, unicode, str):
"http://docs.python.org/2.4/exceptions/concat+UnicodeDecodeError+str"
}

def get_more_help(error, key):
if not extended_help.has_key(key):
return
error.reason += "\nFor additional information about this exception
see:\n "
error.reason += extended_help[key]

def concat(s1,s2):
try:
return s1 + s2
except Exception, e:
key = "concat", e.__class__, type(s1), type(s2)
get_more_help(e, key)
raise

concat(chr(0xA5),unichr(0x5432))

Roger Binns · Apr 10, 2006

Serge Orlov said:
It can be like this. Notice special url, it is pointing to a
non-existing tutorial about why concatenating byte strings with
unicode strings can produce UnicodeDecodeError

An alternate is to give an error code that people can use to
look up (and users can report). It also has the advantage
of being language neutral. You can see this kind of approach
on IBM systems (mainframe, AIX etc) and Oracle.

Roger

Ascii to Unicode.	4	Jul 28, 2010
Unicode/ascii encoding nightmare	19	Nov 6, 2006
Ascii to Unicode.	16	Jul 28, 2010
Looking for UNICODE to ASCII Conversioni Example Code	15	Oct 18, 2013
trying to strip out non ascii.. or rather convert non ascii	38	Oct 26, 2013
convert Unicode filenames to good-looking ASCII	3	May 6, 2010
Right solution to unicode error?	21	Nov 7, 2012
Python 3.3, gettext and Unicode problems	0	Dec 30, 2012

Convertion of Unicode to ASCII NIGHTMARE

Serge Orlov

Paul Boddie

ChaosKCW

Serge Orlov

Roger Binns

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads