Problem parsing namespaces with xml.dom.minidom

M

Mike McGavin

Hi everyone.

I've been trying for several hours now to get minidom to parse
namespaces properly from my stream of XML, so that I can use DOM methods
such as getElementsByTagNameNS(). For some reason, though, it just
doesn't seem to want to split the prefixes from the rest of the tags
when parsing.

The minidom documentation at
http://docs.python.org/lib/module-xml.dom.minidom.html implies that
namespaces are supposed to be supported as long as I'm using a parser
that supports them, but I just can't seem to get it to work. I was
wondering if anyone can see what I'm doing wrong.

Here's a simple test case that represents the problem I'm having. If it
makes a difference, I have PyXML installed, or at the very least, I have
the Debian Linux python-xml package installed, which I'm pretty sure is
PyXML.


========

from xml.dom import minidom
from xml import sax
text = '''<?xml version="1.0" encoding="UTF-8"?>
<xte:xte xmlns:xte='http://www.mcs.vuw.ac.nz/renata/xte'>
<xte:creator>alias</xte:creator>
<xte:date>Thu Jan 30 15:06:06 NZDT 2003</xte:date>
<xte:eek:bject objectid="object1">
Nothing
</xte:eek:bject>
</xte:xte>
'''
# Set up a parser for namespace-ready parsing.
parser = sax.make_parser()
parser.setFeature(sax.handler.feature_namespaces, 1)
parser.setFeature(sax.handler.feature_namespace_prefixes, 1)

# Parse the string into a minidom
mydom = minidom.parseString(text)

# Look for some elements

# This one shouldn't return any (I think).
object_el1 = mydom.getElementsByTagName("xte:eek:bject")

# This one definitely should, at least for what I want.
object_el2 = mydom.getElementsByTagNameNS("object",
'http://www.mcs.vuw.ac.nz/renata/xte')
print '1: ' + str(object_el1)
print '2: ' + str(object_el2)

=========

Output is:

1: [<DOM Element: xte:eek:bject at 0x404a922c>]
2: []

=========

What *seems* to be happening is that the namespace prefix isn't being
separated, and is simply being parsed as if it's part of the rest of the
tag. Therefore when I search for a tag in a particular namespace, it's
not being found.

I've looked through the code in the python libraries, and the
minidom.parseString function appears to be calling the PullDOM parse
method, which creates a PullDOM object to be the ContentHandler. Just
browsing over that code, it *appears* to be trying to split the prefix
from the local name in order to build a namespace-ready DOM as I would
expect it to. I can't quite figure out why this isn't working for me,
though.


I'm not terribly experienced with XML in general, so it's possible that
I'm just incorrectly interpreting how things are supposed to work to
begin with. If this is the case, please accept my apologies, but I'd
like any suggestions for how I should be doing it. I'd really just like
to be able to parse an XML document into a DOM, and then be able to pull
out elements relative to their namespaces.

Can anyone see what I'm doing wrong?

Thanks.
Mike.
 
F

Fredrik Lundh

Mike said:
I'm not terribly experienced with XML in general, so it's possible that I'm just incorrectly
interpreting how things are supposed to work to begin with. If this is the case, please accept my
apologies, but I'd like any suggestions for how I should be doing it. I'd really just like to be
able to parse an XML document into a DOM, and then be able to pull out elements relative to their
namespaces.

is the DOM API an absolute requirement?

</F>
 
M

Mike McGavin

Hi Fredrik.

is the DOM API an absolute requirement?

It wouldn't need to conform to the official specifications of the DOM
API, but I guess I'm after some comparable functionality.

In particular, I need to be able to parse a namespace-using XML document
into some kind of node tree, and then being able to query the tree to
select elements based on their namespace and local tag names, and so on.
I don't mind if the methods provided don't conform exactly to DOM
specifications.


I guess I could write my own code to build a namespace-recognising DOM
from an XML file, but it seems as if that's already been done and I'd be
very surprised if it hadn't. I just can't figure out why minidom
doesn't seem to be working properly for me when namespaces are involved.

Thanks.
Mike.
 
F

Fredrik Lundh

Mike said:
It wouldn't need to conform to the official specifications of the DOM API, but I guess I'm after
some comparable functionality.

In particular, I need to be able to parse a namespace-using XML document into some kind of node
tree, and then being able to query the tree to select elements based on their namespace and local
tag names, and so on. I don't mind if the methods provided don't conform exactly to DOM
specifications.

sounds like this might be exactly what you need:

http://effbot.org/zone/element-index.htm

(it's also the fastest and most memory-efficient Python-only parser you
can get, but I suppose that's not a problem ;-)

</F>
 
P

Paul Prescod

You've reversed some function parameters. Here's a program that works
fine (note that you don't need to set up a SAX parser):

from xml.dom import minidom
text = '''<?xml version="1.0" encoding="UTF-8"?>
<xte:xte xmlns:xte='http://www.mcs.vuw.ac.nz/renata/xte'>
<xte:creator>alias</xte:creator>
<xte:date>Thu Jan 30 15:06:06 NZDT 2003</xte:date>
<xte:eek:bject objectid="object1">
Nothing
</xte:eek:bject>
</xte:xte>
'''

# Parse the string into a minidom
mydom = minidom.parseString(text)

# Look for some elements

# This one shouldn't return any (I think).
object_el1 = mydom.getElementsByTagName("xte:eek:bject")

# This one definitely should, at least for what I want.
object_el2 = mydom.getElementsByTagNameNS(
'http://www.mcs.vuw.ac.nz/renata/xte',"object",
)
print '1: ' + str(object_el1)
print '2: ' + str(object_el2)
 
M

Mike McGavin

Hi Fredrik.

Fredrik said:
It wouldn't need to conform to the official specifications of the DOM API, but I guess I'm after
some comparable functionality. [--snip--]
sounds like this might be exactly what you need:
http://effbot.org/zone/element-index.htm
(it's also the fastest and most memory-efficient Python-only parser you
can get, but I suppose that's not a problem ;-)

Thanks. The original problem I was having turned out to the be
reversing a couple of parameters in a method call, as Paul pointed out,
and I now feel pretty silly as a result. But I'll take a look at this, too.

Much appreciated.
Mike.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top