what is this UnicodeDecodeError:....?

kath · Oct 10, 2006

I have a number of excel files. In each file DATE is represented by
different name. I want to read the date from those different file. Also
the date is in different column in different file.

To identify the date field in different files I have created a file
called _globals where I keep all aliases for DATE in a array called
'alias_DATE'.

Array alias_DATE looks like,

alias_DATE=['TRADEDATE', 'Accounting Date', 'Date de VL','Datum',
'Kurs-datum', 'Date', 'Fecha Datos', 'Calculation Date', 'ClosingDate',
'Pricing Date', 'NAV Date', 'NAVDate', 'NAVDATE', 'ValuationDate',
'Datestamp', 'Fecha de Valoración', 'Kurs-','datum',
"""Kurs-\ndatum""", "Kurs-\ndatum"]

Now I want the index of the column where date is there. I followed the
with followin code.

Traceback (most recent call last):
File "<interactive input>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position
17: ordinal not in range(128)
Though I have matching value in the array, why I am getting this error.
Can any one please tell me why is this error, and how to get rid of
this error. Because I have some files which containing some more
special characters.

Thank you in advance.
Sudhir.

Marc 'BlackJack' Rintsch · Oct 10, 2006

To identify the date field in different files I have created a file
called _globals where I keep all aliases for DATE in a array called
'alias_DATE'.

Array alias_DATE looks like,

alias_DATE=['TRADEDATE', 'Accounting Date', 'Date de VL','Datum',
'Kurs-datum', 'Date', 'Fecha Datos', 'Calculation Date', 'ClosingDate',
'Pricing Date', 'NAV Date', 'NAVDate', 'NAVDATE', 'ValuationDate',
'Datestamp', 'Fecha de ValoraciÃ³n', 'Kurs-','datum',
"""Kurs-\ndatum""", "Kurs-\ndatum"]

Now I want the index of the column where date is there. I followed the
with followin code.

Traceback (most recent call last):
File "<interactive input>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position
17: ordinal not in range(128)
Though I have matching value in the array, why I am getting this error.

Because you are trying to compare a unicode string `val` with a byte
string in the list. The unicode string will be converted to a byte string
for this comparison with the default encoding: ASCII. But 'Ã³' is not
contained in ASCII.

Can any one please tell me why is this error, and how to get rid of
this error. Because I have some files which containing some more
special characters.

Either use an unicode string in the list search too or explicitly encode
the unicode string `val` with the appropriate encoding before using it to
search the list.

Ciao,
Marc 'BlackJack' Rintsch

John Machin · Oct 10, 2006

kath said:
I have a number of excel files. In each file DATE is represented by
different name. I want to read the date from those different file. Also
the date is in different column in different file.

To identify the date field in different files I have created a file
called _globals where I keep all aliases for DATE in a array called
'alias_DATE'.

It's actually a list. In Python an array is something else; look at the
docs for the array module if you're interested.

Array alias_DATE looks like,

alias_DATE=['TRADEDATE', 'Accounting Date', 'Date de VL','Datum',
'Kurs-datum', 'Date', 'Fecha Datos', 'Calculation Date', 'ClosingDate',
'Pricing Date', 'NAV Date', 'NAVDate', 'NAVDATE', 'ValuationDate',
'Datestamp', 'Fecha de Valoración', 'Kurs-','datum',
"""Kurs-\ndatum""", "Kurs-\ndatum"]

Nothing to do with the question you asked, but the last two entries
have the same value; is that intentional?
| >>> """Kurs-\ndatum""" == "Kurs-\ndatum"
| True

Now I want the index of the column where date is there. I followed the
with followin code.

Traceback (most recent call last):
File "<interactive input>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position
17: ordinal not in range(128)

Though I have matching value in the array, why I am getting this error.
Can any one please tell me why is this error, and how to get rid of
this error. Because I have some files which containing some more
special characters.

Hello again, Sudhir.

The text string returned by xlrd is a unicode object (u'Fecha de
Valoraci\xf3n'). The text strings in your list are str objects, encoded
in some unspecified encoding. Python is trying to convert the str
object 'Fecha de Valoración' to Unicode, using the (default) ascii
codec to do the conversion, and failing.

One way to handle this is to specify any non-ASCII strings in your
lookup list as unicode, like this:

contents of sudhir.py:
| # -*- coding: cp1252 -*-
| alist = ['Datestamp', u'Fecha de Valoraci\xf3n', 'Kurs-','datum']
| blist = ['Datestamp', u'Fecha de Valoración', 'Kurs-','datum']
| assert alist == blist
| val = u'Fecha de Valoraci\xf3n'
| print 'a', alist.index(val)
| print 'b', blist.index(val)

| OS prompt>sudhir.py
| a 1
| b 1

Note: the encoding "cp1252" is appropriate to my environment, not
necessarily to yours.

You may like to have a look through this:
http://www.amk.ca/python/howto/unicode

HTH,
John

John Machin · Oct 10, 2006

Marc said:
Because you are trying to compare a unicode string `val` with a byte
string in the list. The unicode string will be converted to a byte string
for this comparison with the default encoding: ASCII.

I presume you must live north of the equator. Down under, it seems to
happen the other way up -- the byte strings are decoded to unicode:

| >>> ['a', 'exotic1\xff', 'exotic2\xf3'].index(u'\xf3')
| Traceback (most recent call last):
| File "<stdin>", line 1, in ?
| UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position
7: ordinal not in range(128)

(-:

Steve Holden · Oct 11, 2006

John said:
Marc 'BlackJack' Rintsch wrote:

Because you are trying to compare a unicode string `val` with a byte
string in the list. The unicode string will be converted to a byte string
for this comparison with the default encoding: ASCII.

Click to expand...

I presume you must live north of the equator. Down under, it seems to
happen the other way up -- the byte strings are decoded to unicode:

| >>> ['a', 'exotic1\xff', 'exotic2\xf3'].index(u'\xf3')
| Traceback (most recent call last):
| File "<stdin>", line 1, in ?
| UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position
7: ordinal not in range(128)

(-:

I see you also use little-endian smileys in the antipodes.

regards
Steve

Marc 'BlackJack' Rintsch · Oct 11, 2006

John Machin said:
I presume you must live north of the equator. Down under, it seems to
happen the other way up -- the byte strings are decoded to unicode:

(-: Ooops, I stand corrected.

Ciao,
Marc 'BlackJack' Rintsch

John Machin · Oct 11, 2006

Steve said:
John said:

[stuff]
(-:

Click to expand...

I see you also use little-endian smileys in the antipodes.

I was using it in a bracketing manner similar to the Spanish ¿and ¡
except at the other end of the bracketed text. This admittedly
confusing usage of course overloads the normal

While that sort of
caper should be a doddle for a document-level parser, it could present
a problem to parsers with limited buffers (like humans), so it looks
like I should reverse the order.

I wonder what Unicode.org would think of a proposal for 4 new
characters: open/close smiley/grumpy bracket. No weirder than some of
the characters on the roster.

Cheers,
John

kath · Oct 11, 2006

John said:
kath said:

I have a number of excel files. In each file DATE is represented by
different name. I want to read the date from those different file. Also
the date is in different column in different file.

To identify the date field in different files I have created a file
called _globals where I keep all aliases for DATE in a array called
'alias_DATE'.

Click to expand...

It's actually a list. In Python an array is something else; look at the
docs for the array module if you're interested.

Array alias_DATE looks like,

alias_DATE=['TRADEDATE', 'Accounting Date', 'Date de VL','Datum',
'Kurs-datum', 'Date', 'Fecha Datos', 'Calculation Date', 'ClosingDate',
'Pricing Date', 'NAV Date', 'NAVDate', 'NAVDATE', 'ValuationDate',
'Datestamp', 'Fecha de Valoración', 'Kurs-','datum',
"""Kurs-\ndatum""", "Kurs-\ndatum"]

Click to expand...

Nothing to do with the question you asked, but the last two entries
have the same value; is that intentional?
| >>> """Kurs-\ndatum""" == "Kurs-\ndatum"
| True

Now I want the index of the column where date is there. I followed the
with followin code.

Traceback (most recent call last):
File "<interactive input>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position
17: ordinal not in range(128)

Though I have matching value in the array, why I am getting this error.
Can any one please tell me why is this error, and how to get rid of
this error. Because I have some files which containing some more
special characters.

Click to expand...

Hello again, Sudhir.

The text string returned by xlrd is a unicode object (u'Fecha de
Valoraci\xf3n'). The text strings in your list are str objects, encoded
in some unspecified encoding. Python is trying to convert the str
object 'Fecha de Valoración' to Unicode, using the (default) ascii
codec to do the conversion, and failing.

One way to handle this is to specify any non-ASCII strings in your
lookup list as unicode, like this:

contents of sudhir.py:
| # -*- coding: cp1252 -*-
| alist = ['Datestamp', u'Fecha de Valoraci\xf3n', 'Kurs-','datum']
| blist = ['Datestamp', u'Fecha de Valoración', 'Kurs-','datum']
| assert alist == blist
| val = u'Fecha de Valoraci\xf3n'
| print 'a', alist.index(val)
| print 'b', blist.index(val)

| OS prompt>sudhir.py
| a 1
| b 1

Note: the encoding "cp1252" is appropriate to my environment, not
necessarily to yours.

You may like to have a look through this:
http://www.amk.ca/python/howto/unicode

HTH,
John

Hi.... thanks for your brave reply. The link you gave was the good one.
It had comprehensive information.I enjoyed reading it. Well it cleared
my doubts regarding encoding data, what is Unicode data, how to deal
with unicode data.

Thank you very much..

Regards,
sudhir.

Webpy and UnicodeDecodeError	3	Dec 18, 2009
minidom's setAttribute + UnicodeDecodeError	1	Sep 7, 2004
What code do I add / overwrite so that the ebDriver' object has no attribute 'find_element_by_css_selector error is gone ?	0	Sep 19, 2022
Is there a way to pass this state from component to the fetch?	1	Apr 24, 2023
Why is this WordPress comments form not submitting?	1	Jan 12, 2020
What the \xc2\xa0 ?!!	1	Sep 7, 2010
How to work around a unicode problem?	4	Jan 24, 2012
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 10: ordinal not in range(128)	4	Oct 8, 2004

what is this UnicodeDecodeError:....?

kath

Marc 'BlackJack' Rintsch

John Machin

John Machin

Steve Holden

Marc 'BlackJack' Rintsch

John Machin

kath

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads