UnicodeEncodeError in Windows

geoff_ness · Sep 17, 2007

Hello - and apologies in advance for the length of this post.

I am having a hard time understanding the errors being generated by a
program I've written. The code is intended to parse text files which
are copied and pasted from web pages from an online game. The encoding
of the pages is ISO-8859-1, but the text that gets copied contains
characters from character sets other than latin-1.
For instance, one of the lines I need to be able to read is:
196679 Daimyo çŸ³ Druid 145 27 12/09/07 21:40:04 [ Expel ]

I start with the file 'citizen_list' and use this function to read it
and return a list of names (for instance, Daimyo çŸ³ Druid) and ID
numbers:

# builds the list of names from the citizens list
def getNames(f):
"""Builds a list from the town list of names

Returns a list"""
newlist = []
for line in f:
namewords = line.rstrip('[Expel]\n\t ')\
.rstrip(':/0123456789 ').rstrip('\t ').rstrip('0123456789 ')\
.rstrip('\t ').rstrip('0123456789 ').rstrip('\t ').split()
entry = ";".join([namewords[0], "
".join(namewords[1:len(namewords)])])
newlist.append(entry)
return newlist

citizens = codecs.open('citizen_list', 'r', 'utf-8', 'strict')
listNames = getNames(citizens)
citizens.close()

I've specified 'utf-8' as the encoding as this seemed to be the best
candidate for picking up all the names in the list. I use the names in
other functions - for example:

def getdamage(warrior, rpt):
"""reads each line of war report

returns damage and number of kills for citizen name"""
for line in rpt:
if (line.startswith(warrior.name) or \
line.startswith('A blue aura surrounds ' +
warrior.name))\
and line.find('weapon') > 0:
warrior.addDamage(int(line[line.find('caused ')
+7:line.find(' damage')]))
if rpt.next().find('is dead') >0:
warrior.addKill()
elif line.startswith(warrior.name+' is dead'):
warrior.dies()
break
elif line.startswith('Starting round'):
warrior.addRound()

for cit in listNames:
c = Warrior(cit.split(';')[0], cit.split(';')[1])
totalnum += 1
report = codecs.open('war_report','r', 'utf-8', 'strict')
getdamage(c, report)
report.close()
--[snip]--

def buildString(warrior):
"""Build a string from a warrior's stats

Returns string for output to warStat."""
return "!tr!!td!!id!"+str(warrior.ID)+"!/id!!/td!"+\
"!td!"+str(warrior.damage)+"!/td!!td!"+str(warrior.kills)+\
"!/td!!td!"+str(warrior.survived)+"!/td!!/tr!"

This code runs fine on my linux machine, but when I sent the code to a
friend with python running on windows, he got the following error:

Traceback (most recent call last):
File "D:\Python25\Lib\SITE-P~1\PYTHON~1\pywin\framework
\scriptutils.py", line 310, in RunScript
exec codeObject in _main_._dict_
File "C:\Documents and Settings\Administrator\Desktop
\reparser_014(2)\parser_1.0.py", line 63, in <module>
"".join(["%s" % buildString(c) for c in citlistS[:100]])+"!/
table!")
File "C:\Documents and Settings\Administrator\Desktop
\reparser_014(2)\iotp_alt2.py", line 169, in buildString
"!/td!!td!"+str(warrior.survived)+"!/td!!/tr!"
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in
position 0: ordinal not in range(128)

As I understand it the error is related to the ascii codec being
unable to cope with the unicode string u'\ufeff'.
The issue I have is that this error doesn't show up for me - ascii is
the default encoding for me also. Any thoughts or assistance would be
welcomed.

Cheers

Gabriel Genellina · Sep 17, 2007

def buildString(warrior):
"""Build a string from a warrior's stats

Returns string for output to warStat."""
return "!tr!!td!!id!"+str(warrior.ID)+"!/id!!/td!"+\
"!td!"+str(warrior.damage)+"!/td!!td!"+str(warrior.kills)+\
"!/td!!td!"+str(warrior.survived)+"!/td!!/tr!"

This code runs fine on my linux machine, but when I sent the code to a
friend with python running on windows, he got the following error:

Traceback (most recent call last):
File "C:\Documents and Settings\Administrator\Desktop
\reparser_014(2)\iotp_alt2.py", line 169, in buildString
"!/td!!td!"+str(warrior.survived)+"!/td!!/tr!"
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in
position 0: ordinal not in range(128)

As I understand it the error is related to the ascii codec being
unable to cope with the unicode string u'\ufeff'.
The issue I have is that this error doesn't show up for me - ascii is
the default encoding for me also. Any thoughts or assistance would be
welcomed.

Some of those `warrior` attributes is an Unicode object that contains
characters outside ASCII. str(x) tries to convert to string, using the
default encoding, and fails. This happens on Windows and Linux too,
depending on the data.
I've seen that you use codecs.open: you should write Unicode objects to
the file, not strings, and that would be fine.
Look for some recent posts about this same problem.

geoff_ness · Sep 18, 2007

En Mon, 17 Sep 2007 07:38:16 -0300, geoff_ness <[email protected]>
escribi?:

Some of those `warrior` attributes is an Unicode object that contains
characters outside ASCII. str(x) tries to convert to string, using the
default encoding, and fails. This happens on Windows and Linux too,
depending on the data.
I've seen that you use codecs.open: you should write Unicode objects to
the file, not strings, and that would be fine.
Look for some recent posts about this same problem.

Thanks Gabriel, I hadn't thought about the str() function that way - I
had initially used it to coerce the attributes which are type int to
type str so that I could write them to the output file. I've rewritten
the buildString() function now so that the unicode objects don't get
fed to str(), and apparently windows copes ok with that. I'm still
puzzled as to why python at my end had no problem with it...

Trouble with UnicodeEncodeError and email	0	Jan 8, 2014
UnicodeEncodeError during repr()	3	Apr 19, 2010
os.stat UnicodeEncodeError:	0	Mar 22, 2011
UnicodeEncodeError in compile	16	Jan 10, 2012
[UnicodeEncodeError] Don't know what else to try	7	Nov 14, 2008
os.path.expanduser on Windows: UnicodeEncodeError	1	Jul 18, 2005
Why in 'due column', same value is showing for every clients?	4	Jul 18, 2020
Python battle game help	2	Feb 23, 2023

UnicodeEncodeError in Windows

geoff_ness

Gabriel Genellina

geoff_ness

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads