ignoring chinese characters parsing xml file

  • Thread starter =?ISO-8859-1?Q?Fabian_L=F3pez?=
  • Start date
?

=?ISO-8859-1?Q?Fabian_L=F3pez?=

Hi,
I am parsing an XML file that includes chineses characters, like ^
ÔuÔuà¢à¢²ÅÊDZw.¼ššìéLï³²ÅÊÇÛ or ¥Ø¥¢¥¢¥¤¥í¥ó... The problem is that I get an error like:
UnicodeEncodeerror:'charmap' codec can't encode characters in position....
The thing is that I would like to ignore it and parse all the characters
less these ones. So, could anyone help me? I suppose that I can catch an
exception that ignores it or maybe use any function that detects this
chinese characters and after that ignore them.

Thanks!!
Fabian
 
M

Marc 'BlackJack' Rintsch

I am parsing an XML file that includes chineses characters, like ^
uuå•–å•–æ‰æ˜¯w.扉Lé”æ‰æ˜¯ or ヘアアイロン... The problem is that I get an error like:
UnicodeEncodeerror:'charmap' codec can't encode characters in
position..

You say you are *parsing* the file but this is an *encode* error. Parsing
means *decoding*.

You have to show some code and the actual traceback to get help. Crystal
balls are not that reliable. ;-)

Ciao,
Marc 'BlackJack' Rintsch
 
S

Stefan Behnel

Fabian said:
Thanks Mark, the code is like this. The attrib name is the problem:

from lxml import etree

context = etree.iterparse("file.xml")
for action, elem in context:
if elem.tag == "weblog":
print action, elem.tag , elem.attrib["name"],elem.attrib["url"],

The problem is the print statement. Looks like your terminal encoding (that
Python needs to encode the unicode string to) can't handle these unicode
characters.

Stefan
 
L

limodou

Fabian said:
Thanks Mark, the code is like this. The attrib name is the problem:

from lxml import etree

context = etree.iterparse("file.xml")
for action, elem in context:
if elem.tag == "weblog":
print action, elem.tag , elem.attrib["name"],elem.attrib["url"],

The problem is the print statement. Looks like your terminal encoding (that
Python needs to encode the unicode string to) can't handle these unicode
characters.
I agree. For Japanese, you should know the exactly encoding name, and
convert them, just like:

print text.encoding('encoding')
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top