[libxml]: Can't find nodes using XPath, namespaces mess

Discussion in 'Ruby' started by Stanislaw Wozniak, Jul 31, 2009.

  1. Hi,

    I am having problems accessing elements in the XML documents using
    XPath. My xml document looks like that:

    <?xml version="1.0" encoding="UTF-8"?>
    <configuration-data
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
    xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v1"
    xmlns="urn:company:platform:foundation:configuration:defn:v1">
    <attributeList>
    <attribute name="siteid" validationRuleName="String" description="Site
    id">
    <tree name="siteid_hierarchy">
    <treenode name="Root">
    <treenode name="1" />
    </treenode>
    </tree>
    </attribute>
    </attributeList>
    </configuration-data>


    My XPath only works when I remove all the namespaces from the root node
    but I do need to access it without modifying the xml.

    I am using:
    ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-mswin32]
    libxml-ruby (1.1.3)
    --
    Posted via http://www.ruby-forum.com/.
     
    Stanislaw Wozniak, Jul 31, 2009
    #1
    1. Advertising

  2. Stanislaw Wozniak

    Matt Neuburg Guest

    Stanislaw Wozniak <> wrote:

    > Hi,
    >
    > I am having problems accessing elements in the XML documents using
    > XPath. My xml document looks like that:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <configuration-data
    > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
    > xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v1"
    > xmlns="urn:company:platform:foundation:configuration:defn:v1">
    > <attributeList>
    > <attribute name="siteid" validationRuleName="String" description="Site
    > id">
    > <tree name="siteid_hierarchy">
    > <treenode name="Root">
    > <treenode name="1" />
    > </treenode>
    > </tree>
    > </attribute>
    > </attributeList>
    > </configuration-data>
    >
    >
    > My XPath only works when I remove all the namespaces from the root node
    > but I do need to access it without modifying the xml.


    Have your run your XML thru a validator? That semicolon looks invalid to
    me. m.
     
    Matt Neuburg, Aug 1, 2009
    #2
    1. Advertising

  3. Stanislaw Wozniak

    Mark Thomas Guest

    Re: : Can't find nodes using XPath, namespaces mess

    As Matt said, the document is not well-formed XML. Try adding the
    RECOVER option to the parser, which tells libxml to ignore syntax
    errors like that.
     
    Mark Thomas, Aug 1, 2009
    #3
  4. Re: : Can't find nodes using XPath, namespaces mess

    Hi, this was a typo, no semicolon in there:

    <?xml version="1.0" encoding="UTF-8"?>
    <configuration-data
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v1"
    xmlns="urn:company:platform:foundation:configuration:defn:v1">
    <attributeList>
    <attribute name="siteid" validationRuleName="String" description="Site
    id">
    <tree name="siteid_hierarchy">
    <treenode name="Root">
    <treenode name="1" />
    </treenode>
    </tree>
    </attribute>
    </attributeList>
    </configuration-data>
    --
    Posted via http://www.ruby-forum.com/.
     
    Stanislaw Wozniak, Aug 1, 2009
    #4
  5. Stanislaw Wozniak

    Matt Neuburg Guest

    Re: : Can't find nodes using XPath, namespaces mess

    Stanislaw Wozniak <> wrote:

    > Hi, this was a typo, no semicolon in there:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <configuration-data
    > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    > xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v1"
    > xmlns="urn:company:platform:foundation:configuration:defn:v1">
    > <attributeList>
    > <attribute name="siteid" validationRuleName="String" description="Site
    > id">
    > <tree name="siteid_hierarchy">
    > <treenode name="Root">
    > <treenode name="1" />
    > </treenode>
    > </tree>
    > </attribute>
    > </attributeList>
    > </configuration-data>


    Then what's the problem? XPath works:

    s = <<END
    <?xml version="1.0" encoding="UTF-8"?>
    <configuration-data
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v
    1"
    xmlns="urn:company:platform:foundation:configuration:defn:v1">
    <attributeList>
    <attribute name="siteid" validationRuleName="String" description="Site
    id">
    <tree name="siteid_hierarchy">
    <treenode name="Root">
    <treenode name="1" />
    </treenode>
    </tree>
    </attribute>
    </attributeList>
    </configuration-data>
    END
    require 'rexml/document'
    include REXML
    doc = Document.new(s)
    p XPath.match(doc, "//treenode['Root']/treenode")
    #=> [<treenode name='1'/>]

    Oh, wait, you said you were using libxml:

    s = <<END
    <?xml version="1.0" encoding="UTF-8"?>
    <configuration-data>
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v
    1"
    xmlns="urn:company:platform:foundation:configuration:defn:v1">
    <attributeList>
    <attribute name="siteid" validationRuleName="String" description="Site
    id">
    <tree name="siteid_hierarchy">
    <treenode name="Root">
    <treenode name="1" />
    </treenode>
    </tree>
    </attribute>
    </attributeList>
    </configuration-data>
    END
    require 'rubygems'
    require 'xml'
    doc = XML::Document.string(s)
    doc.find("//treenode['Root']/treenode").each do |el|
    p el #=> <treenode name="1"/>
    end

    Sorry, I'm failing to guess what problem you're having. Perhaps if you
    showed your actual code? m.
     
    Matt Neuburg, Aug 1, 2009
    #5
  6. On Sat, Aug 01, 2009 at 05:33:31AM +0900, Stanislaw Wozniak wrote:
    > Hi,
    >
    > I am having problems accessing elements in the XML documents using
    > XPath. My xml document looks like that:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <configuration-data
    > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
    > xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v1"
    > xmlns="urn:company:platform:foundation:configuration:defn:v1">


    ^^^^^ That says that all nodes inside this document (if not explicitly
    namespaced) belong to an implicit namespace

    > <attributeList>
    > <attribute name="siteid" validationRuleName="String" description="Site
    > id">
    > <tree name="siteid_hierarchy">
    > <treenode name="Root">
    > <treenode name="1" />
    > </treenode>
    > </tree>
    > </attribute>
    > </attributeList>
    > </configuration-data>
    >
    >
    > My XPath only works when I remove all the namespaces from the root node
    > but I do need to access it without modifying the xml.


    You need to register that namespace with the libxml xpath engine. I'm
    not sure how you register namespaces with libxml-ruby, but with
    nokogiri, I would do this:

    doc = Nokogiri::XML(xml)
    doc.xpath('//ns:attribute', 'ns' => 'urn:company:platform:foundation:configuration:defn:v1')

    Nokogiri will automatically register root level namespaces, so you could
    also do this:

    doc = Nokogiri::XML(xml)
    doc.xpath('//xmlns:attribute')

    I know there is a way to do this with libxml-ruby, I just don't know the
    syntax off the top of my head. Look through the libxml-ruby
    documentation for "find", and I'm sure you'll find how to register
    namespaces.

    --
    Aaron Patterson
    http://tenderlovemaking.com/
     
    Aaron Patterson, Aug 2, 2009
    #6
  7. Re: : Can't find nodes using XPath, namespaces mess

    On Sun, Aug 02, 2009 at 12:50:05AM +0900, Matt Neuburg wrote:
    > Stanislaw Wozniak <> wrote:
    >
    > > Hi, this was a typo, no semicolon in there:
    > >
    > > <?xml version="1.0" encoding="UTF-8"?>
    > > <configuration-data
    > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    > > xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v1"
    > > xmlns="urn:company:platform:foundation:configuration:defn:v1">
    > > <attributeList>
    > > <attribute name="siteid" validationRuleName="String" description="Site
    > > id">
    > > <tree name="siteid_hierarchy">
    > > <treenode name="Root">
    > > <treenode name="1" />
    > > </treenode>
    > > </tree>
    > > </attribute>
    > > </attributeList>
    > > </configuration-data>

    >
    > Then what's the problem? XPath works:
    >
    > s = <<END
    > <?xml version="1.0" encoding="UTF-8"?>
    > <configuration-data
    > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    > xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v
    > 1"
    > xmlns="urn:company:platform:foundation:configuration:defn:v1">
    > <attributeList>
    > <attribute name="siteid" validationRuleName="String" description="Site
    > id">
    > <tree name="siteid_hierarchy">
    > <treenode name="Root">
    > <treenode name="1" />
    > </treenode>
    > </tree>
    > </attribute>
    > </attributeList>
    > </configuration-data>
    > END
    > require 'rexml/document'
    > include REXML
    > doc = Document.new(s)
    > p XPath.match(doc, "//treenode['Root']/treenode")
    > #=> [<treenode name='1'/>]


    Wow. These results are just wrong. This is a bug in REXML. In XPath,
    when you do not specify a namespace for your node, that means that you
    want a node *with no namespace*.

    For example:

    require 'rexml/document'

    include REXML

    s = <<END
    <?xml version="1.0" encoding="UTF-8"?>
    <shop>
    <!-- car inventory -->
    <inventory xmlns="http://gm.com/">
    <tire name="all season" />
    </inventory>

    <!-- bike inventory -->
    <inventory xmlns="http://schwinn.com/">
    <tire name="street" />
    </inventory>

    <!-- no namespace inventory -->
    <inventory>
    <tire name="wtf" />
    </inventory>
    </shop>
    END

    doc = Document.new(s)

    p XPath.match(doc, "//tire")

    REXML matches *all three* tires. Surely a car tire is not the same as a bike
    tire? Using XPath, how would I query for a tire that has *no namespace*
    (the third one) without matching the two that *do* belong in a
    namespace (it's possible to do this with REXML, just strange)? The XPath used
    above *should* only match the third entry.

    This is a broken implementation of XPath.

    >
    > Oh, wait, you said you were using libxml:
    >


    You have an error in your XML below

    > s = <<END
    > <?xml version="1.0" encoding="UTF-8"?>
    > <configuration-data>


    ^ That ">" should not be there.
    libxml-ruby has corrections turned on by default, so you've effectively
    removed all namespaces from this document.

    > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    > xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v
    > 1"
    > xmlns="urn:company:platform:foundation:configuration:defn:v1">
    > <attributeList>
    > <attribute name="siteid" validationRuleName="String" description="Site
    > id">
    > <tree name="siteid_hierarchy">
    > <treenode name="Root">
    > <treenode name="1" />
    > </treenode>
    > </tree>
    > </attribute>
    > </attributeList>
    > </configuration-data>
    > END
    > require 'rubygems'
    > require 'xml'
    > doc = XML::Document.string(s)
    > doc.find("//treenode['Root']/treenode").each do |el|
    > p el #=> <treenode name="1"/>
    > end
    >
    > Sorry, I'm failing to guess what problem you're having. Perhaps if you
    > showed your actual code? m.


    Since the namespaces were removed, this example succeeds.

    --
    Aaron Patterson
    http://tenderlovemaking.com/
     
    Aaron Patterson, Aug 2, 2009
    #7
  8. Stanislaw Wozniak

    Matt Neuburg Guest

    Re: : Can't find nodes using XPath, namespaces mess

    Aaron Patterson <> wrote:

    > You have an error in your XML below


    Thanks for spotting that. I must have removed the namespace and then put
    it back, to see if I could duplicate the OP's problems, and I must have
    put it back wrong. I wish libxml had just complained that my XML was
    bad...

    You're right; fixing the error, I can now duplicate the OP's problem in
    libxml (but not in REXML, as you also observed). And then I can solve
    it:

    s = <<END
    <?xml version="1.0" encoding="UTF-8"?>
    <configuration-data
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="urn:company:platform:foundation:configuration:defn:v
    1"
    xmlns="urn:company:platform:foundation:configuration:defn:v1">
    <attributeList>
    <attribute name="siteid" validationRuleName="String" description="Site
    id">
    <tree name="siteid_hierarchy">
    <treenode name="Root">
    <treenode name="1" />
    </treenode>
    </tree>
    </attribute>
    </attributeList>
    </configuration-data>
    END
    require 'rubygems'
    require 'xml'
    doc = XML::Document.string(s)
    ns = {"xsi" => "urn:company:platform:foundation:configuration:defn:v1"}
    doc.find("//xsi:treenode['Root']/xsi:treenode", ns).each do |el|
    p el #=> <treenode name="1"/>
    end

    That is the desired sort of result, I take it. Notice that we register
    the namespace with the XPath engine and that we actually use the
    namescape in our XPath expression. m.
     
    Matt Neuburg, Aug 3, 2009
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ian Gregory
    Replies:
    1
    Views:
    508
  2. Olav
    Replies:
    3
    Views:
    4,251
  3. Alfie Noakes
    Replies:
    4
    Views:
    806
    Alfie Noakes
    Nov 11, 2008
  4. Vaucher Bastien

    Dealing with xpath using libxml

    Vaucher Bastien, Feb 21, 2007, in forum: Ruby
    Replies:
    0
    Views:
    94
    Vaucher Bastien
    Feb 21, 2007
  5. Jay McGavren
    Replies:
    4
    Views:
    378
    A. Sinan Unur
    Jul 7, 2005
Loading...

Share This Page