unicode compare errors

Ross · Dec 10, 2010

I've a character encoding issue that has stumped me (not that hard to
do). I am parsing a small text file with some possibility of various
currencies being involved, and want to handle them without messing up.

Initially I was simply doing:

currs = [u'$', u'£', u'€', u'¥']
aFile = open(thisFile, 'r')
for mline in aFile: # mline might be "£5.50"
if item[0] in currs:
item = item[1:]

But the problem was:
SyntaxError: Non-ASCII character '\xa3' in file

The remedy was of course to declare the file encoding for my Python
module, at the start of the file I used:

# -*- coding: UTF-8 -*-

That allowed me to progress. But now when I come to line item that is
a non $ currency, I get this error:

views.py:3364: UnicodeWarning: Unicode equal comparison failed to
convert both arguments to Unicode - interpreting them as being
unequal.

…which I think means Python's unable to convert the char's in the file
I'm reading from into unicode to compare to the items in the list
currs.

I think this is saying that u'£' == '£' is false.
(I hope those chars show up okay in my post here)

Since I can't control the encoding of the input file that users
submit, how to I get past this? How do I make such comparisons be
True?

Thanks in advance for any suggestions
Ross.

Ross · Dec 10, 2010

Initially I was simply doing:

currs = [u'$', u'£', u'€', u'¥']
aFile = open(thisFile, 'r')
for mline in aFile: # mline might be "£5..50"
if item[0] in currs:
item = item[1:]

Don't you love it when someone solves their own problem? Posting a
reply here so that other poor chumps like me can get around this...

I found I could import codecs that allow me to read the file with my
desired encoding. Huzzah!

Instead of opening the file with a standard
aFile = open(thisFile, 'r')

I instead ensure I've imported the codecs:

import codecs

.... and then I used a specific encoding on the file read:

aFile = codecs.open(thisFile, encoding='utf-8')

Then all my compares seem to work fine.
If I'm off-base and kludgey here and should be doing something
differently please give me a poke.

Regards,
Ross.

Nobody · Dec 10, 2010

Since I can't control the encoding of the input file that users
submit, how to I get past this? How do I make such comparisons be
True?

I found I could import codecs that allow me to read the file with my
desired encoding. Huzzah!

If I'm off-base and kludgey here and should be doing something

Er, do you know the file's encoding or don't you? Using:

aFile = codecs.open(thisFile, encoding='utf-8')

is telling Python that the file /is/ in utf-8. If it isn't in utf-8,
you'll get decoding errors.

If you are given a file with no known encoding, then you can't reliably
determine what /characters/ it contains, and thus can't reliably compare
the contents of the file against strings of characters, only against
strings of bytes.

About the best you can do is to use an autodetection library such as:

http://chardet.feedparser.org/

Ross · Dec 13, 2010

Er, do you know the file's encoding or don't you? Using:

aFile = codecs.open(thisFile, encoding='utf-8')

is telling Python that the file /is/ in utf-8. If it isn't in utf-8,
you'll get decoding errors.

If you are given a file with no known encoding, then you can't reliably
determine what /characters/ it contains, and thus can't reliably compare
the contents of the file against strings of characters, only against
strings of bytes.

About the best you can do is to use an autodetection library such as:

http://chardet.feedparser.org/

That's right I don't know what encoding the user will have used. The
use of autodetection sounds good - I'll look into that. Thx.

R.

Unicode	2	Mar 15, 2013
Preserving unicode filename encoding	1	Oct 20, 2012
Ascii to Unicode.	4	Jul 28, 2010
Unicode questions	17	Oct 19, 2010
Right solution to unicode error?	21	Nov 7, 2012
Unicode Chars in Windows Path	12	Apr 3, 2014
Unicode strings as arguments to exceptions	3	Jan 16, 2014
convert Unicode filenames to good-looking ASCII	3	May 6, 2010

unicode compare errors

Ross

Ross

Nobody

Ross

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads