unicode compare errors

Discussion in 'Python' started by Ross, Dec 10, 2010.

  1. Ross

    Ross Guest

    I've a character encoding issue that has stumped me (not that hard to
    do). I am parsing a small text file with some possibility of various
    currencies being involved, and want to handle them without messing up.

    Initially I was simply doing:

    currs = [u'$', u'£', u'€', u'¥']
    aFile = open(thisFile, 'r')
    for mline in aFile: # mline might be "£5.50"
    if item[0] in currs:
    item = item[1:]

    But the problem was:
    SyntaxError: Non-ASCII character '\xa3' in file

    The remedy was of course to declare the file encoding for my Python
    module, at the start of the file I used:

    # -*- coding: UTF-8 -*-

    That allowed me to progress. But now when I come to line item that is
    a non $ currency, I get this error:

    views.py:3364: UnicodeWarning: Unicode equal comparison failed to
    convert both arguments to Unicode - interpreting them as being
    unequal.

    …which I think means Python's unable to convert the char's in the file
    I'm reading from into unicode to compare to the items in the list
    currs.

    I think this is saying that u'£' == '£' is false.
    (I hope those chars show up okay in my post here)

    Since I can't control the encoding of the input file that users
    submit, how to I get past this? How do I make such comparisons be
    True?

    Thanks in advance for any suggestions
    Ross.
    Ross, Dec 10, 2010
    #1
    1. Advertising

  2. Ross

    Ross Guest

    On Dec 10, 2:51 pm, Ross <> wrote:

    > Initially I was simply doing:
    >
    >   currs = [u'$', u'£', u'€', u'¥']
    >   aFile = open(thisFile, 'r')
    >   for mline in aFile:              # mline might be "£5..50"
    >      if item[0] in currs:
    >           item = item[1:]
    >


    Don't you love it when someone solves their own problem? Posting a
    reply here so that other poor chumps like me can get around this...

    I found I could import codecs that allow me to read the file with my
    desired encoding. Huzzah!

    Instead of opening the file with a standard
    aFile = open(thisFile, 'r')

    I instead ensure I've imported the codecs:

    import codecs

    .... and then I used a specific encoding on the file read:

    aFile = codecs.open(thisFile, encoding='utf-8')

    Then all my compares seem to work fine.
    If I'm off-base and kludgey here and should be doing something
    differently please give me a poke.

    Regards,
    Ross.
    Ross, Dec 10, 2010
    #2
    1. Advertising

  3. Ross

    Nobody Guest

    On Fri, 10 Dec 2010 11:51:44 -0800, Ross wrote:

    > Since I can't control the encoding of the input file that users
    > submit, how to I get past this? How do I make such comparisons be
    > True?


    On Fri, 10 Dec 2010 12:07:19 -0800, Ross wrote:

    > I found I could import codecs that allow me to read the file with my
    > desired encoding. Huzzah!


    > If I'm off-base and kludgey here and should be doing something


    Er, do you know the file's encoding or don't you? Using:

    aFile = codecs.open(thisFile, encoding='utf-8')

    is telling Python that the file /is/ in utf-8. If it isn't in utf-8,
    you'll get decoding errors.

    If you are given a file with no known encoding, then you can't reliably
    determine what /characters/ it contains, and thus can't reliably compare
    the contents of the file against strings of characters, only against
    strings of bytes.

    About the best you can do is to use an autodetection library such as:

    http://chardet.feedparser.org/
    Nobody, Dec 10, 2010
    #3
  4. Ross

    Ross Guest

    On Dec 10, 4:09 pm, Nobody <> wrote:
    > On Fri, 10 Dec 2010 11:51:44 -0800, Ross wrote:
    > > Since I can't control the encoding of the input file that users
    > > submit, how to I get past this?  How do I make such comparisons be
    > > True?

    > On Fri, 10 Dec 2010 12:07:19 -0800, Ross wrote:
    > > I found I could import codecs that allow me to read the file with my
    > > desired encoding. Huzzah!
    > > If I'm off-base and kludgey here and should be doing something

    >
    > Er, do you know the file's encoding or don't you? Using:
    >
    >     aFile = codecs.open(thisFile, encoding='utf-8')
    >
    > is telling Python that the file /is/ in utf-8. If it isn't in utf-8,
    > you'll get decoding errors.
    >
    > If you are given a file with no known encoding, then you can't reliably
    > determine what /characters/ it contains, and thus can't reliably compare
    > the contents of the file against strings of characters, only against
    > strings of bytes.
    >
    > About the best you can do is to use an autodetection library such as:
    >
    >        http://chardet.feedparser.org/


    That's right I don't know what encoding the user will have used. The
    use of autodetection sounds good - I'll look into that. Thx.

    R.
    Ross, Dec 13, 2010
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Goldin

    Errors, errors, errors

    Mark Goldin, Jan 17, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    912
    Mark Goldin
    Jan 17, 2004
  2. Robert Mark Bram
    Replies:
    0
    Views:
    3,897
    Robert Mark Bram
    Sep 28, 2003
  3. ygao

    unicode wrap unicode object?

    ygao, Apr 8, 2006, in forum: Python
    Replies:
    6
    Views:
    521
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Apr 8, 2006
  4. Gabriele *darkbard* Farina

    Unicode digit to unicode string

    Gabriele *darkbard* Farina, May 16, 2006, in forum: Python
    Replies:
    2
    Views:
    484
    Gabriele *darkbard* Farina
    May 16, 2006
  5. Asterix
    Replies:
    5
    Views:
    691
    Matt Nordhoff
    Aug 31, 2008
Loading...

Share This Page