Using XPath to retrieve an XML element which contains a given text

A

anne001

This code returns the first dataformat element.
And yet the second dataformat is the one containing SPPT.
What am I doing wrong?

require "rexml/document"

include REXML

string = <<EOF
<dataformats>
<dataformat>
<fileidentifiers>
<fileidentifier>CFMT</fileidentifier>
</fileidentifiers>
</dataformat>
<dataformat>
<fileidentifiers>
<fileidentifier>SPPT</fileidentifier>
</fileidentifiers>
</dataformat>
</dataformats>
EOF

doc = Document.new string
xpathquery="//dataformat[contains(fileidentifier, SPPT)]"
p XPath.first(doc,xpathquery).to_s
 
D

Dejan Dimic

This code returns the first dataformat element.
And yet the second dataformat is the one containing SPPT.
What am I doing wrong?

require "rexml/document"

include REXML

string = <<EOF
  <dataformats>
      <dataformat>
                <fileidentifiers>
                        <fileidentifier>CFMT</fileidentifier>
                </fileidentifiers>
        </dataformat>
      <dataformat>
                <fileidentifiers>
                        <fileidentifier>SPPT</fileidentifier>
                </fileidentifiers>
      </dataformat>
  </dataformats>
EOF

doc = Document.new string
xpathquery="//dataformat[contains(fileidentifier, SPPT)]"
p XPath.first(doc,xpathquery).to_s

I think you XPath query should be:
xpathquery="//dataformat[contains(., 'SPPT')]"

or more specific one:
xpathquery="//dataformat[contains(fileidentifiers/
fileidentifier,'SPPT')]"
 
A

anne001

Thank you, the first formulation works.

I had tried the second one on the complete xml file and it does not
work.
Do you have an idea why? Is there a typo I am not seeing?

Here is a test file a little closer to the XML file I am working with

require "rexml/document"
include REXML

string = <<EOF
<dataformats>
<dataformat>
<name>NARSAD recognition</name>
<fileidentifiers>
<fileidentifier>NARSAD</fileidentifier>
</fileidentifiers>
</dataformat>
<dataformat>
<name>SPFT</name>
<fileidentifiers>
<fileidentifier>SPFT</fileidentifier>
<fileidentifier>SPPT</fileidentifier>
</fileidentifiers>
</dataformat>
</dataformats>
EOF

doc = Document.new string

xpathquery="//dataformat[contains(., 'SPPT')]"
p 'yours1'
p XPath.first(doc,xpathquery).to_s

xpathquery="//dataformat[contains(fileidentifiers/
fileidentifier,'SPPT')]"
p 'yours2'
p XPath.first(doc,xpathquery).to_s

result
"yours1"
"<dataformat>\n\t\t<name>SPFT</name>\n\t\t<fileidentifiers>\n\t\t
\t<fileidentifier>SPFT</fileidentifier>\n\t\t\t<fileidentifier>SPPT</
fileidentifier>\n\t\t</fileidentifiers>\n\t</dataformat>"
"yours2"
""
 
R

Robert Klemme

Hi Anne,

welcome back!

2008/8/11 anne001 said:
Thank you, the first formulation works.

I had tried the second one on the complete xml file and it does not
work.
Do you have an idea why? Is there a typo I am not seeing?

Here is a test file a little closer to the XML file I am working with

require "rexml/document"
include REXML

string = <<EOF
<dataformats>
<dataformat>
<name>NARSAD recognition</name>
<fileidentifiers>
<fileidentifier>NARSAD</fileidentifier>
</fileidentifiers>
</dataformat>
<dataformat>
<name>SPFT</name>
<fileidentifiers>
<fileidentifier>SPFT</fileidentifier>
<fileidentifier>SPPT</fileidentifier>
</fileidentifiers>
</dataformat>
</dataformats>
EOF

doc = Document.new string

xpathquery="//dataformat[contains(., 'SPPT')]"
p 'yours1'
p XPath.first(doc,xpathquery).to_s

xpathquery="//dataformat[contains(fileidentifiers/
fileidentifier,'SPPT')]"
p 'yours2'
p XPath.first(doc,xpathquery).to_s

I believe "contains" is the wrong function as it does a textual
comparison and I have no idea whether a node is actually allowed as
input. I believe the correct XPath expression is this:

"//dataformat[descendant::fileidentifier[text()='SPPT']]"

Here are some expressions that you may want to try:

# find the correct fileidentifier
XPath.each doc, "//fileidentifier[text()='SPPT']" do |elm|
puts elm
end

puts '-------------'

# go upwards from there to find the dataformat node
XPath.each doc, "//fileidentifier[text()='SPPT']/ancestor::dataformat" do |elm|
puts elm
end

puts '-------------'

# select all dataformats that contain a fileidentifier with text "SPPT"
# this seems to best reflect what you want
XPath.each doc,
"//dataformat[descendant::fileidentifier[text()='SPPT']]" do |elm|
puts elm
end

Btw, I have these bookmarked and they serve me well with regard to
XPath issues (I always have to look them up):
http://www.w3schools.com/xpath/default.asp
http://www.zvon.org/xxl/XPathTutorial/General/examples.html

(I use the first one most of the time.)

Kind regards

robert
 
R

Robert Klemme

I believe "contains" is the wrong function as it does a textual
comparison and I have no idea whether a node is actually allowed as
input. I believe the correct XPath expression is this:

Wait, change "correct" to "more appropriate".
"//dataformat[descendant::fileidentifier[text()='SPPT']]"

Here are some expressions that you may want to try:

Here are even more that yield the result you want (or so I believe):

[
"//dataformat[descendant::fileidentifier[text()='SPPT']]",
"//dataformat[fileidentifiers/fileidentifier[text()='SPPT']]",
"//dataformat[descendant::fileidentifier[contains(text(),'SPPT')]]",
"//dataformat[fileidentifiers/fileidentifier[contains(text(),'SPPT')]]",
"//dataformat[descendant::fileidentifier[starts-with(text(),'SPPT')]]",

"//dataformat[fileidentifiers/fileidentifier[starts-with(text(),'SPPT')]]",
"//dataformat[descendant::fileidentifier[ends-with(text(),'SPPT')]]",
"//dataformat[fileidentifiers/fileidentifier[ends-with(text(),'SPPT')]]",
].each do |xpath|
printf "\nXPath: %p\n\n", xpath

XPath.each doc, xpath do |elm|
puts elm
end
end

Interestingly ends-with() does not seem to work. Maybe we hit a REXML bug.

XPath nicely fits Ruby because of TIMTOWTDI. :)

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top