Ignoring XML Namespaces with ElementTree

P

Pete

Is there anyway to configure ElementTree to ignore the XML namespace?
For the past couple months, I've been using minidom to parse an XML
file that is generated by a unit within my organization that can't
stick with a standard. This hasnt been a problem until recently when
the script was provided a 30MB file that once parsed, increased the
python memory footprint by 1.0GB and now I'm running into Memory
Errors. Based on Google searches and testing it looks like ElementTree
is much more efficient with memory and I'd like to switch, however I'd
like to be able to ignore the namespaces. These XML files tend to
randomly switch the namespace for no reason and ignoring these
namespaces would help the script adapt to the changes. Any help on
this would be greatly appreciated. I'm having a hard time finding the
answer.

Additionally, anyone know how ElementTree handle's XML elements that
include Unicode?
 
S

Stefan Behnel

Pete, 03.12.2009 19:21:
Is there anyway to configure ElementTree to ignore the XML namespace?
For the past couple months, I've been using minidom to parse an XML
file that is generated by a unit within my organization that can't
stick with a standard. This hasnt been a problem until recently when
the script was provided a 30MB file that once parsed, increased the
python memory footprint by 1.0GB and now I'm running into Memory
Errors. Based on Google searches and testing it looks like ElementTree
is much more efficient with memory and I'd like to switch,

Make sure you use cElementTree, then that's certainly the right choice to make.

however I'd
like to be able to ignore the namespaces. These XML files tend to
randomly switch the namespace for no reason and ignoring these
namespaces would help the script adapt to the changes. Any help on
this would be greatly appreciated. I'm having a hard time finding the
answer.

ET uses namespace URIs as part of the tag name, so if you want to ignore
namespaces, just strip the leading "{...}" (if any) from the tag and work
with the rest (so-called "local name").

Additionally, anyone know how ElementTree handle's XML elements that
include Unicode?

It's an XML parser, so the answer is: without any difficulties.

Stefan
 
P

Pete

Pete, 03.12.2009 19:21:


Make sure you use cElementTree, then that's certainly the right choice to make.


ET uses namespace URIs as part of the tag name, so if you want to ignore
namespaces, just strip the leading "{...}" (if any) from the tag and work
with the rest (so-called "local name").


It's an XML parser, so the answer is: without any difficulties.

Stefan

Perfect... I can work with that. Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,439
Members
44,829
Latest member
PIXThurman

Latest Threads

Top