what is this UnicodeDecodeError:....?

Discussion in 'Python' started by kath, Oct 10, 2006.

  1. kath

    kath Guest

    I have a number of excel files. In each file DATE is represented by
    different name. I want to read the date from those different file. Also
    the date is in different column in different file.

    To identify the date field in different files I have created a file
    called _globals where I keep all aliases for DATE in a array called
    'alias_DATE'.

    Array alias_DATE looks like,

    alias_DATE=['TRADEDATE', 'Accounting Date', 'Date de VL','Datum',
    'Kurs-datum', 'Date', 'Fecha Datos', 'Calculation Date', 'ClosingDate',
    'Pricing Date', 'NAV Date', 'NAVDate', 'NAVDATE', 'ValuationDate',
    'Datestamp', 'Fecha de Valoración', 'Kurs-','datum',
    """Kurs-\ndatum""", "Kurs-\ndatum"]

    Now I want the index of the column where date is there. I followed the
    with followin code.


    >>> b=xlrd.open_workbook('Santander_051206.xls')
    >>> sh=b.sheet_by_index(0)
    >>> sh.cell_value(rowx=0, colx=11)

    u'Fecha de Valoraci\xf3n'
    >>> val=sh.cell_value(rowx=0, colx=11)
    >>> val

    u'Fecha de Valoraci\xf3n'
    >>> print val

    Fecha de Valoración
    >>> import _globals # the file where I have stored my 'alias_DATE' array
    >>> _globals.alias_DATE.index(val)

    Traceback (most recent call last):
    File "<interactive input>", line 1, in ?
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position
    17: ordinal not in range(128)
    >>>


    Though I have matching value in the array, why I am getting this error.
    Can any one please tell me why is this error, and how to get rid of
    this error. Because I have some files which containing some more
    special characters.


    Thank you in advance.
    Sudhir.
     
    kath, Oct 10, 2006
    #1
    1. Advertising

  2. In <>, kath wrote:

    > To identify the date field in different files I have created a file
    > called _globals where I keep all aliases for DATE in a array called
    > 'alias_DATE'.
    >
    > Array alias_DATE looks like,
    >
    > alias_DATE=['TRADEDATE', 'Accounting Date', 'Date de VL','Datum',
    > 'Kurs-datum', 'Date', 'Fecha Datos', 'Calculation Date', 'ClosingDate',
    > 'Pricing Date', 'NAV Date', 'NAVDate', 'NAVDATE', 'ValuationDate',
    > 'Datestamp', 'Fecha de Valoración', 'Kurs-','datum',
    > """Kurs-\ndatum""", "Kurs-\ndatum"]
    >
    > Now I want the index of the column where date is there. I followed the
    > with followin code.
    >
    >
    >>>> b=xlrd.open_workbook('Santander_051206.xls')
    >>>> sh=b.sheet_by_index(0)
    >>>> sh.cell_value(rowx=0, colx=11)

    > u'Fecha de Valoraci\xf3n'
    >>>> val=sh.cell_value(rowx=0, colx=11)
    >>>> val

    > u'Fecha de Valoraci\xf3n'
    >>>> print val

    > Fecha de Valoración
    >>>> import _globals # the file where I have stored my 'alias_DATE' array
    >>>> _globals.alias_DATE.index(val)

    > Traceback (most recent call last):
    > File "<interactive input>", line 1, in ?
    > UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position
    > 17: ordinal not in range(128)
    >>>>

    >
    > Though I have matching value in the array, why I am getting this error.


    Because you are trying to compare a unicode string `val` with a byte
    string in the list. The unicode string will be converted to a byte string
    for this comparison with the default encoding: ASCII. But 'ó' is not
    contained in ASCII.

    > Can any one please tell me why is this error, and how to get rid of
    > this error. Because I have some files which containing some more
    > special characters.


    Either use an unicode string in the list search too or explicitly encode
    the unicode string `val` with the appropriate encoding before using it to
    search the list.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Oct 10, 2006
    #2
    1. Advertising

  3. kath

    John Machin Guest

    kath wrote:
    > I have a number of excel files. In each file DATE is represented by
    > different name. I want to read the date from those different file. Also
    > the date is in different column in different file.
    >
    > To identify the date field in different files I have created a file
    > called _globals where I keep all aliases for DATE in a array called
    > 'alias_DATE'.


    It's actually a list. In Python an array is something else; look at the
    docs for the array module if you're interested.

    >
    > Array alias_DATE looks like,
    >
    > alias_DATE=['TRADEDATE', 'Accounting Date', 'Date de VL','Datum',
    > 'Kurs-datum', 'Date', 'Fecha Datos', 'Calculation Date', 'ClosingDate',
    > 'Pricing Date', 'NAV Date', 'NAVDate', 'NAVDATE', 'ValuationDate',
    > 'Datestamp', 'Fecha de Valoración', 'Kurs-','datum',
    > """Kurs-\ndatum""", "Kurs-\ndatum"]


    Nothing to do with the question you asked, but the last two entries
    have the same value; is that intentional?
    | >>> """Kurs-\ndatum""" == "Kurs-\ndatum"
    | True


    >
    > Now I want the index of the column where date is there. I followed the
    > with followin code.
    >
    >
    > >>> b=xlrd.open_workbook('Santander_051206.xls')
    > >>> sh=b.sheet_by_index(0)
    > >>> sh.cell_value(rowx=0, colx=11)

    > u'Fecha de Valoraci\xf3n'
    > >>> val=sh.cell_value(rowx=0, colx=11)
    > >>> val

    > u'Fecha de Valoraci\xf3n'
    > >>> print val

    > Fecha de Valoración
    > >>> import _globals # the file where I have stored my 'alias_DATE' array
    > >>> _globals.alias_DATE.index(val)

    > Traceback (most recent call last):
    > File "<interactive input>", line 1, in ?
    > UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position
    > 17: ordinal not in range(128)
    > >>>

    >
    > Though I have matching value in the array, why I am getting this error.
    > Can any one please tell me why is this error, and how to get rid of
    > this error. Because I have some files which containing some more
    > special characters.
    >


    Hello again, Sudhir.

    The text string returned by xlrd is a unicode object (u'Fecha de
    Valoraci\xf3n'). The text strings in your list are str objects, encoded
    in some unspecified encoding. Python is trying to convert the str
    object 'Fecha de Valoración' to Unicode, using the (default) ascii
    codec to do the conversion, and failing.

    One way to handle this is to specify any non-ASCII strings in your
    lookup list as unicode, like this:

    contents of sudhir.py:
    | # -*- coding: cp1252 -*-
    | alist = ['Datestamp', u'Fecha de Valoraci\xf3n', 'Kurs-','datum']
    | blist = ['Datestamp', u'Fecha de Valoración', 'Kurs-','datum']
    | assert alist == blist
    | val = u'Fecha de Valoraci\xf3n'
    | print 'a', alist.index(val)
    | print 'b', blist.index(val)

    | OS prompt>sudhir.py
    | a 1
    | b 1

    Note: the encoding "cp1252" is appropriate to my environment, not
    necessarily to yours.

    You may like to have a look through this:
    http://www.amk.ca/python/howto/unicode

    HTH,
    John
     
    John Machin, Oct 10, 2006
    #3
  4. kath

    John Machin Guest

    Marc 'BlackJack' Rintsch wrote:

    > Because you are trying to compare a unicode string `val` with a byte
    > string in the list. The unicode string will be converted to a byte string
    > for this comparison with the default encoding: ASCII.


    :)

    I presume you must live north of the equator. Down under, it seems to
    happen the other way up -- the byte strings are decoded to unicode:

    | >>> ['a', 'exotic1\xff', 'exotic2\xf3'].index(u'\xf3')
    | Traceback (most recent call last):
    | File "<stdin>", line 1, in ?
    | UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position
    7: ordinal not in range(128)

    (-:
     
    John Machin, Oct 10, 2006
    #4
  5. kath

    Steve Holden Guest

    John Machin wrote:
    > Marc 'BlackJack' Rintsch wrote:
    >
    >
    >>Because you are trying to compare a unicode string `val` with a byte
    >>string in the list. The unicode string will be converted to a byte string
    >>for this comparison with the default encoding: ASCII.

    >
    >
    > :)
    >
    > I presume you must live north of the equator. Down under, it seems to
    > happen the other way up -- the byte strings are decoded to unicode:
    >
    > | >>> ['a', 'exotic1\xff', 'exotic2\xf3'].index(u'\xf3')
    > | Traceback (most recent call last):
    > | File "<stdin>", line 1, in ?
    > | UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position
    > 7: ordinal not in range(128)
    >
    > (-:
    >

    I see you also use little-endian smileys in the antipodes.

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://holdenweb.blogspot.com
    Recent Ramblings http://del.icio.us/steve.holden
     
    Steve Holden, Oct 11, 2006
    #5
  6. In <>, John Machin
    wrote:

    > Marc 'BlackJack' Rintsch wrote:
    >
    >> Because you are trying to compare a unicode string `val` with a byte
    >> string in the list. The unicode string will be converted to a byte string
    >> for this comparison with the default encoding: ASCII.

    >
    > :)
    >
    > I presume you must live north of the equator. Down under, it seems to
    > happen the other way up -- the byte strings are decoded to unicode:


    (-: Ooops, I stand corrected. :)

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Oct 11, 2006
    #6
  7. kath

    John Machin Guest

    Steve Holden wrote:
    > John Machin wrote:
    > > :)

    [stuff]
    > > (-:
    > >

    > I see you also use little-endian smileys in the antipodes.
    >


    I was using it in a bracketing manner similar to the Spanish ¿and ¡
    except at the other end of the bracketed text. This admittedly
    confusing usage of course overloads the normal :) While that sort of
    caper should be a doddle for a document-level parser, it could present
    a problem to parsers with limited buffers (like humans), so it looks
    like I should reverse the order.

    I wonder what Unicode.org would think of a proposal for 4 new
    characters: open/close smiley/grumpy bracket. No weirder than some of
    the characters on the roster.

    Cheers,
    John
     
    John Machin, Oct 11, 2006
    #7
  8. kath

    kath Guest

    John Machin wrote:
    > kath wrote:
    > > I have a number of excel files. In each file DATE is represented by
    > > different name. I want to read the date from those different file. Also
    > > the date is in different column in different file.
    > >
    > > To identify the date field in different files I have created a file
    > > called _globals where I keep all aliases for DATE in a array called
    > > 'alias_DATE'.

    >
    > It's actually a list. In Python an array is something else; look at the
    > docs for the array module if you're interested.
    >
    > >
    > > Array alias_DATE looks like,
    > >
    > > alias_DATE=['TRADEDATE', 'Accounting Date', 'Date de VL','Datum',
    > > 'Kurs-datum', 'Date', 'Fecha Datos', 'Calculation Date', 'ClosingDate',
    > > 'Pricing Date', 'NAV Date', 'NAVDate', 'NAVDATE', 'ValuationDate',
    > > 'Datestamp', 'Fecha de Valoración', 'Kurs-','datum',
    > > """Kurs-\ndatum""", "Kurs-\ndatum"]

    >
    > Nothing to do with the question you asked, but the last two entries
    > have the same value; is that intentional?
    > | >>> """Kurs-\ndatum""" == "Kurs-\ndatum"
    > | True
    >
    >
    > >
    > > Now I want the index of the column where date is there. I followed the
    > > with followin code.
    > >
    > >
    > > >>> b=xlrd.open_workbook('Santander_051206.xls')
    > > >>> sh=b.sheet_by_index(0)
    > > >>> sh.cell_value(rowx=0, colx=11)

    > > u'Fecha de Valoraci\xf3n'
    > > >>> val=sh.cell_value(rowx=0, colx=11)
    > > >>> val

    > > u'Fecha de Valoraci\xf3n'
    > > >>> print val

    > > Fecha de Valoración
    > > >>> import _globals # the file where I have stored my 'alias_DATE' array
    > > >>> _globals.alias_DATE.index(val)

    > > Traceback (most recent call last):
    > > File "<interactive input>", line 1, in ?
    > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position
    > > 17: ordinal not in range(128)
    > > >>>

    > >
    > > Though I have matching value in the array, why I am getting this error.
    > > Can any one please tell me why is this error, and how to get rid of
    > > this error. Because I have some files which containing some more
    > > special characters.
    > >

    >
    > Hello again, Sudhir.
    >
    > The text string returned by xlrd is a unicode object (u'Fecha de
    > Valoraci\xf3n'). The text strings in your list are str objects, encoded
    > in some unspecified encoding. Python is trying to convert the str
    > object 'Fecha de Valoración' to Unicode, using the (default) ascii
    > codec to do the conversion, and failing.
    >
    > One way to handle this is to specify any non-ASCII strings in your
    > lookup list as unicode, like this:
    >
    > contents of sudhir.py:
    > | # -*- coding: cp1252 -*-
    > | alist = ['Datestamp', u'Fecha de Valoraci\xf3n', 'Kurs-','datum']
    > | blist = ['Datestamp', u'Fecha de Valoración', 'Kurs-','datum']
    > | assert alist == blist
    > | val = u'Fecha de Valoraci\xf3n'
    > | print 'a', alist.index(val)
    > | print 'b', blist.index(val)
    >
    > | OS prompt>sudhir.py
    > | a 1
    > | b 1
    >
    > Note: the encoding "cp1252" is appropriate to my environment, not
    > necessarily to yours.
    >
    > You may like to have a look through this:
    > http://www.amk.ca/python/howto/unicode
    >
    > HTH,
    > John



    Hi.... thanks for your brave reply. The link you gave was the good one.
    It had comprehensive information.I enjoyed reading it. Well it cleared
    my doubts regarding encoding data, what is Unicode data, how to deal
    with unicode data.

    Thank you very much..

    Regards,
    sudhir.
     
    kath, Oct 11, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ruslan
    Replies:
    1
    Views:
    504
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Sep 7, 2004
  2. Robin Siebler
    Replies:
    4
    Views:
    26,345
    Tim Peters
    Oct 8, 2004
  3. Thomas Thomas

    UnicodeDecodeError

    Thomas Thomas, May 5, 2005, in forum: Python
    Replies:
    2
    Views:
    313
    Michael Spencer
    May 5, 2005
  4. F. GEIGER
    Replies:
    0
    Views:
    1,581
    F. GEIGER
    May 27, 2005
  5. ash

    UnicodeDecodeError

    ash, Nov 30, 2005, in forum: Python
    Replies:
    5
    Views:
    477
Loading...

Share This Page