[libxml]: Can't find nodes using XPath, namespaces mess

S

Stanislaw Wozniak

Hi,

I am having problems accessing elements in the XML documents using
XPath. My xml document looks like that:

<?xml version="1.0" encoding="UTF-8"?>
<configuration-data
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v1"
xmlns="urn:company:platform:foundation:configuration:defn:v1">
<attributeList>
<attribute name="siteid" validationRuleName="String" description="Site
id">
<tree name="siteid_hierarchy">
<treenode name="Root">
<treenode name="1" />
</treenode>
</tree>
</attribute>
</attributeList>
</configuration-data>


My XPath only works when I remove all the namespaces from the root node
but I do need to access it without modifying the xml.

I am using:
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-mswin32]
libxml-ruby (1.1.3)
 
M

Matt Neuburg

Stanislaw Wozniak said:
Hi,

I am having problems accessing elements in the XML documents using
XPath. My xml document looks like that:

<?xml version="1.0" encoding="UTF-8"?>
<configuration-data
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v1"
xmlns="urn:company:platform:foundation:configuration:defn:v1">
<attributeList>
<attribute name="siteid" validationRuleName="String" description="Site
id">
<tree name="siteid_hierarchy">
<treenode name="Root">
<treenode name="1" />
</treenode>
</tree>
</attribute>
</attributeList>
</configuration-data>


My XPath only works when I remove all the namespaces from the root node
but I do need to access it without modifying the xml.

Have your run your XML thru a validator? That semicolon looks invalid to
me. m.
 
M

Mark Thomas

As Matt said, the document is not well-formed XML. Try adding the
RECOVER option to the parser, which tells libxml to ignore syntax
errors like that.
 
S

Stanislaw Wozniak

Hi, this was a typo, no semicolon in there:

<?xml version="1.0" encoding="UTF-8"?>
<configuration-data
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v1"
xmlns="urn:company:platform:foundation:configuration:defn:v1">
<attributeList>
<attribute name="siteid" validationRuleName="String" description="Site
id">
<tree name="siteid_hierarchy">
<treenode name="Root">
<treenode name="1" />
</treenode>
</tree>
</attribute>
</attributeList>
</configuration-data>
 
M

Matt Neuburg

Stanislaw Wozniak said:
Hi, this was a typo, no semicolon in there:

<?xml version="1.0" encoding="UTF-8"?>
<configuration-data
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v1"
xmlns="urn:company:platform:foundation:configuration:defn:v1">
<attributeList>
<attribute name="siteid" validationRuleName="String" description="Site
id">
<tree name="siteid_hierarchy">
<treenode name="Root">
<treenode name="1" />
</treenode>
</tree>
</attribute>
</attributeList>
</configuration-data>

Then what's the problem? XPath works:

s = <<END
<?xml version="1.0" encoding="UTF-8"?>
<configuration-data
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v
1"
xmlns="urn:company:platform:foundation:configuration:defn:v1">
<attributeList>
<attribute name="siteid" validationRuleName="String" description="Site
id">
<tree name="siteid_hierarchy">
<treenode name="Root">
<treenode name="1" />
</treenode>
</tree>
</attribute>
</attributeList>
</configuration-data>
END
require 'rexml/document'
include REXML
doc = Document.new(s)
p XPath.match(doc, "//treenode['Root']/treenode")
#=> [<treenode name='1'/>]

Oh, wait, you said you were using libxml:

s = <<END
<?xml version="1.0" encoding="UTF-8"?>
<configuration-data>
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v
1"
xmlns="urn:company:platform:foundation:configuration:defn:v1">
<attributeList>
<attribute name="siteid" validationRuleName="String" description="Site
id">
<tree name="siteid_hierarchy">
<treenode name="Root">
<treenode name="1" />
</treenode>
</tree>
</attribute>
</attributeList>
</configuration-data>
END
require 'rubygems'
require 'xml'
doc = XML::Document.string(s)
doc.find("//treenode['Root']/treenode").each do |el|
p el #=> <treenode name="1"/>
end

Sorry, I'm failing to guess what problem you're having. Perhaps if you
showed your actual code? m.
 
A

Aaron Patterson

Hi,

I am having problems accessing elements in the XML documents using
XPath. My xml document looks like that:

<?xml version="1.0" encoding="UTF-8"?>
<configuration-data
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v1"
xmlns="urn:company:platform:foundation:configuration:defn:v1">

^^^^^ That says that all nodes inside this document (if not explicitly
namespaced) belong to an implicit namespace
<attributeList>
<attribute name="siteid" validationRuleName="String" description="Site
id">
<tree name="siteid_hierarchy">
<treenode name="Root">
<treenode name="1" />
</treenode>
</tree>
</attribute>
</attributeList>
</configuration-data>


My XPath only works when I remove all the namespaces from the root node
but I do need to access it without modifying the xml.

You need to register that namespace with the libxml xpath engine. I'm
not sure how you register namespaces with libxml-ruby, but with
nokogiri, I would do this:

doc = Nokogiri::XML(xml)
doc.xpath('//ns:attribute', 'ns' => 'urn:company:platform:foundation:configuration:defn:v1')

Nokogiri will automatically register root level namespaces, so you could
also do this:

doc = Nokogiri::XML(xml)
doc.xpath('//xmlns:attribute')

I know there is a way to do this with libxml-ruby, I just don't know the
syntax off the top of my head. Look through the libxml-ruby
documentation for "find", and I'm sure you'll find how to register
namespaces.
 
A

Aaron Patterson

Stanislaw Wozniak said:
Hi, this was a typo, no semicolon in there:

<?xml version="1.0" encoding="UTF-8"?>
<configuration-data
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v1"
xmlns="urn:company:platform:foundation:configuration:defn:v1">
<attributeList>
<attribute name="siteid" validationRuleName="String" description="Site
id">
<tree name="siteid_hierarchy">
<treenode name="Root">
<treenode name="1" />
</treenode>
</tree>
</attribute>
</attributeList>
</configuration-data>

Then what's the problem? XPath works:

s = <<END
<?xml version="1.0" encoding="UTF-8"?>
<configuration-data
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v
1"
xmlns="urn:company:platform:foundation:configuration:defn:v1">
<attributeList>
<attribute name="siteid" validationRuleName="String" description="Site
id">
<tree name="siteid_hierarchy">
<treenode name="Root">
<treenode name="1" />
</treenode>
</tree>
</attribute>
</attributeList>
</configuration-data>
END
require 'rexml/document'
include REXML
doc = Document.new(s)
p XPath.match(doc, "//treenode['Root']/treenode")
#=> [<treenode name='1'/>]

Wow. These results are just wrong. This is a bug in REXML. In XPath,
when you do not specify a namespace for your node, that means that you
want a node *with no namespace*.

For example:

require 'rexml/document'

include REXML

s = <<END
<?xml version="1.0" encoding="UTF-8"?>
<shop>
<!-- car inventory -->
<inventory xmlns="http://gm.com/">
<tire name="all season" />
</inventory>

<!-- bike inventory -->
<inventory xmlns="http://schwinn.com/">
<tire name="street" />
</inventory>

<!-- no namespace inventory -->
<inventory>
<tire name="wtf" />
</inventory>
</shop>
END

doc = Document.new(s)

p XPath.match(doc, "//tire")

REXML matches *all three* tires. Surely a car tire is not the same as a bike
tire? Using XPath, how would I query for a tire that has *no namespace*
(the third one) without matching the two that *do* belong in a
namespace (it's possible to do this with REXML, just strange)? The XPath used
above *should* only match the third entry.

This is a broken implementation of XPath.
Oh, wait, you said you were using libxml:

You have an error in your XML below
s = <<END
<?xml version="1.0" encoding="UTF-8"?>
<configuration-data>

^ That ">" should not be there.
libxml-ruby has corrections turned on by default, so you've effectively
removed all namespaces from this document.
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v
1"
xmlns="urn:company:platform:foundation:configuration:defn:v1">
<attributeList>
<attribute name="siteid" validationRuleName="String" description="Site
id">
<tree name="siteid_hierarchy">
<treenode name="Root">
<treenode name="1" />
</treenode>
</tree>
</attribute>
</attributeList>
</configuration-data>
END
require 'rubygems'
require 'xml'
doc = XML::Document.string(s)
doc.find("//treenode['Root']/treenode").each do |el|
p el #=> <treenode name="1"/>
end

Sorry, I'm failing to guess what problem you're having. Perhaps if you
showed your actual code? m.

Since the namespaces were removed, this example succeeds.
 
M

Matt Neuburg

Aaron Patterson said:
You have an error in your XML below

Thanks for spotting that. I must have removed the namespace and then put
it back, to see if I could duplicate the OP's problems, and I must have
put it back wrong. I wish libxml had just complained that my XML was
bad...

You're right; fixing the error, I can now duplicate the OP's problem in
libxml (but not in REXML, as you also observed). And then I can solve
it:

s = <<END
<?xml version="1.0" encoding="UTF-8"?>
<configuration-data
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v
1"
xmlns="urn:company:platform:foundation:configuration:defn:v1">
<attributeList>
<attribute name="siteid" validationRuleName="String" description="Site
id">
<tree name="siteid_hierarchy">
<treenode name="Root">
<treenode name="1" />
</treenode>
</tree>
</attribute>
</attributeList>
</configuration-data>
END
require 'rubygems'
require 'xml'
doc = XML::Document.string(s)
ns = {"xsi" => "urn:company:platform:foundation:configuration:defn:v1"}
doc.find("//xsi:treenode['Root']/xsi:treenode", ns).each do |el|
p el #=> <treenode name="1"/>
end

That is the desired sort of result, I take it. Notice that we register
the namespace with the XPath engine and that we actually use the
namescape in our XPath expression. m.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top