Unicode characters, XML/RSS

A

Adam W.

So I wrote a little video podcast downloading script that checks a
list of RSS feeds and downloads any new videos. Every once in a while
it find a character that is out of the 128 range in the feed and my
script blows up:

Traceback (most recent call last):
File "C:\Users\Adam\Desktop\Rev3 DL\Rev3.py", line 88, in <module>
mainloop()
File "C:\Users\Adam\Desktop\Rev3 DL\Rev3.py", line 75, in mainloop
update()
File "C:\Users\Adam\Desktop\Rev3 DL\Rev3.py", line 69, in update
couldhave = getshowlst(x[1],episodecnt)
File "C:\Users\Adam\Desktop\Rev3 DL\Rev3.py", line 30, in getshowlst
masterlist = XMLWorkspace.parsexml(url)
File "C:\Users\Adam\Desktop\Rev3 DL\XMLWorkspace.py", line 54, in
parsexml
parse(url, FeedHandlerInst)
File "C:\Python25\lib\xml\sax\__init__.py", line 33, in parse
parser.parse(source)
File "C:\Python25\lib\xml\sax\expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "C:\Python25\lib\xml\sax\xmlreader.py", line 123, in parse
self.feed(buffer)
File "C:\Python25\lib\xml\sax\expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
File "C:\Users\Adam\Desktop\Rev3 DL\XMLWorkspace.py", line 51, in
characters
self.data.append(string)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in
position 236: ordinal not in range(128)


Now its my understanding that XML can contain upper Unicode characters
as long as the encoding is specified, which it is (UTF-8). The feed
validates every validator I've ran it through, every program I open it
with seems to be ok with it, except my python script. Why? Here is
the URL of the feed in question: http://revision3.com/winelibraryreserve/
My script is complaining of the fancy e in Mourvèdre

At first glance I though it was the data.append(string) that was un
accepting of the Unicode, but even if I put a return in the Character
handler loop, it still breaks. What am I doing wrong?
 
S

Stefan Behnel

Adam said:
File "C:\Python25\lib\xml\sax\expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
File "C:\Users\Adam\Desktop\Rev3 DL\XMLWorkspace.py", line 51, in
characters
self.data.append(string)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in
position 236: ordinal not in range(128)

You seem to be doing an implicit conversion from a unicode string to a byte
string, maybe by concatenating ('+' operator) strings of different types or by
writing it out into a file (or printing it, or ...) - I don't know what
self.data is or does, since you didn't provide any code.

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top