REXML XPath: bug or misunderstanding?

Eric Armstrong · Aug 2, 2006

This code looks for a table that matches
specific criteria:

XPath.each(@doc, '//table') do |tbl|
XPath.each(tbl, '//td') do |td|
# look for matching data

According to the XPath doc, the first argument
is the "context" element.

So what /should/ happen is that search
for td elements occurs in the subtree
rooted at the tbl element, using the
//td path --which I take to mean "anywhere
within the context".

But what actually happens is that the
search for td elements occurs in the
entire document, so the code above returns
the first table, regardless of where the
matching data is found.

The workaround is to dispense with the
outer loop. Once matching data is found
and keep visiting parents until the
table ancestor is found. (That patch
simplifies the code, actually.)

But if this implementation isn't a bug, it
means that the definition of "context" is
"the entire tree in which the specified
node is found".

In that case, the XPath expression that
asks for "all <td> elements under the
current <table> node" must be something
other than what I coded...

What might that path be, I wonder?

Robert Klemme · Aug 2, 2006

Eric said:
This code looks for a table that matches
specific criteria:

XPath.each(@doc, '//table') do |tbl|
XPath.each(tbl, '//td') do |td|
# look for matching data

According to the XPath doc, the first argument
is the "context" element.

So what /should/ happen is that search
for td elements occurs in the subtree
rooted at the tbl element, using the
//td path --which I take to mean "anywhere
within the context".

But what actually happens is that the
search for td elements occurs in the
entire document, so the code above returns
the first table, regardless of where the
matching data is found.

That sounds like a bug.

The workaround is to dispense with the
outer loop. Once matching data is found
and keep visiting parents until the
table ancestor is found. (That patch
simplifies the code, actually.)

IMHO the proper solution is to craft a single XPath expression that will
cover your requirement, i.e. any "td" somewhere below a "table".

But if this implementation isn't a bug, it
means that the definition of "context" is
"the entire tree in which the specified
node is found".

In that case, the XPath expression that
asks for "all <td> elements under the
current <table> node" must be something
other than what I coded...

What might that path be, I wonder?

I'm not too involveld with XPath but did you try something like this?

//table/*/td
//table//td

IMHO having a single XPath expression is the preferred way to go.

Also, I usually use method doc.elements.each 'xpath here' do ... instead
of the specific XPath expression you used.

Kind regards

robert

Leslie Viljoen · Aug 2, 2006

I have a collection of XPath examples on my Wiki, stolen from an old
Microsoft document I think:
http://mobeus.homelinux.org/eclectica/show/XmlPath

I haven't used it yet because I started playing with REXML yesterday,
but it's pretty detailed.

Les

Marcus Andersson · Aug 2, 2006

Eric Armstrong skrev:

This code looks for a table that matches
specific criteria:

XPath.each(@doc, '//table') do |tbl|
XPath.each(tbl, '//td') do |td|
# look for matching data

According to the XPath doc, the first argument
is the "context" element.

So what /should/ happen is that search
for td elements occurs in the subtree
rooted at the tbl element, using the
//td path --which I take to mean "anywhere
within the context".

But what actually happens is that the
search for td elements occurs in the
entire document, so the code above returns
the first table, regardless of where the
matching data is found.

// always uses the document root as the context regardless of the
context node provided. This is according to the XPath spec.

But if this implementation isn't a bug, it
means that the definition of "context" is
"the entire tree in which the specified
node is found".

Well, yes, when you are using // in the beginning of your path expression.

In that case, the XPath expression that
asks for "all <td> elements under the
current <table> node" must be something
other than what I coded...

What might that path be, I wonder?

To get all descendants of the current context node using a shorthand you do

//td

The . in the beginning makes sure the expression will use the current
node as the context. There is also the following to xpath axes you can use

descendant-or-self::td (includes the context node itself)
and
descendant::td (equivalent with .//td)

/Marcus

Thomas, Mark - BLS CTR · Aug 2, 2006

Eric said:
This code looks for a table that matches specific criteria:
=20
XPath.each(@doc, '//table') do |tbl|
XPath.each(tbl, '//td') do |td|
# look for matching data

Are you doing anything with tbl other than making it a stopping point
for iteration? You could remove one loop if you simply delve to
precisely what you're looking for:

XPath.each(@doc, '//table//td') do |td|
# look for matching data

And, depending on how you define "matching data" you may be able to add
some XPath conditions that get rid of the loop entirely.

- Mark.

Eric Armstrong · Aug 4, 2006

Paul said:
That's what I thought when I first encountered it, but after looking
into it in more detail I concluded that it follows the specification.
The receiving element does indeed provide context, but using //
overrides it: // means the root node or any of its descendants.

Matches my experience.

...could have used: XPath.each(tbl, 'descendant::td')

Interesting construct. I guess that's a good way to go,
and I guess that matches the Xpath spec...but I'm
highly dubious about a spec that defines "context" as
something it can simply ignore. Makes little sense,
from my current perspective.

Eric Armstrong · Aug 4, 2006

Leslie said:
I have a collection of XPath examples on my Wiki, stolen from an old
Microsoft document I think:
http://mobeus.homelinux.org/eclectica/show/XmlPath

I haven't used it yet because I started playing with REXML yesterday,
but it's pretty detailed.

Good collection of examples. Just what the doctor
order for a fast fix...

Eric Armstrong · Aug 4, 2006

Marcus said:
// always uses the document root as the context regardless of the
context node provided. This is according to the XPath spec.

To get all descendants of the current context node using a shorthand you do

.//td

The . in the beginning makes sure the expression will use the current
node as the context.

Very useful syntax. Thanks.

I continue to be disgruntled by a syntax that ignores
the context you specified, unless you add the additional
"." to say, "No, I really mean it". But I thank you for
a fine solution, and the additional explanation.

Eric Armstrong · Aug 4, 2006

Thomas said:
You could remove one loop if you simply delve to
precisely what you're looking for:

XPath.each(@doc, '//table//td') do |td|
# look for matching data

And, depending on how you define "matching data" you may be able to add
some XPath conditions that get rid of the loop entirely.

Most excellent. Some damn good Xpath expertise
on this list. Thanks, all.

eric
(Who hasn't used Xpath expressions in more than 3 years,
and who is entirely capable of forgetting everything he
ever knew in less than 6 months.)
:_)

Robert Klemme · Aug 4, 2006

Eric said:
Most excellent. Some damn good Xpath expertise
on this list. Thanks, all.

eric
(Who hasn't used Xpath expressions in more than 3 years,
and who is entirely capable of forgetting everything he
ever knew in less than 6 months.)
:_)

Same here: I rarely use XPath and I always have to look up the details
again. IMHO it's not very intuitive. My 0.02EUR...

Btw, I find http://www.xmlcooktop.com/ a handy tool for experimenting
with XPath expressions. It's an XML editor with an XPath evaluation
window where you can immediately see results. Nothing too fancy but I
liked the XPath direct evaluation.

Kind regards

robert

Robert Klemme · Aug 4, 2006

Robert said:
Same here: I rarely use XPath and I always have to look up the details
again. IMHO it's not very intuitive. My 0.02EUR...

PS: This is the page I use occasionally for refreshing:
http://www.w3schools.com/xpath/xpath_syntax.asp

Can anyone recommend http://www.oreilly.com/catalog/xpathpointer ?

robert

Keith Fahlgren · Aug 4, 2006

Can anyone recommend http://www.oreilly.com/catalog/xpathpointer ?

No, not really. I think it's much easier to learn XPath (an incredibly
powerful language once you get your head around it) in a more practical
environment (probably XSLT or XQuery). Quite a few of the XSLT books
have good introductions to XPath. I quite like the new edition of the
XSLT Cookbook, which has a section comparing XPath1 and XPath2, but
this may not cover enough of the basics for some folks.

http://www.oreilly.com/catalog/xsltckbk2/

(Note: Our first edition general XSLT book is being revised presently,
so I'd wait for the new edition)

HTH,
Keith

Robert Klemme · Aug 4, 2006

Keith said:
No, not really. I think it's much easier to learn XPath (an incredibly
powerful language once you get your head around it) in a more practical
environment (probably XSLT or XQuery).

So you're basically saying that the book is more similar to the standard
page at http://www.w3.org/TR/xpath - did I get you right?

Quite a few of the XSLT books
have good introductions to XPath. I quite like the new edition of the
XSLT Cookbook, which has a section comparing XPath1 and XPath2, but
this may not cover enough of the basics for some folks.

http://www.oreilly.com/catalog/xsltckbk2/

(Note: Our first edition general XSLT book is being revised presently,
so I'd wait for the new edition)

I'll probably go with the XPath / XPointer anyway as I prefer the more
thorough coverage over the easy learning path.

Thanks for your hints anyway!

Kind regards

robert

Eric Armstrong · Aug 13, 2006

Robert said:
I rarely use XPath and I always have to look up the details
again. IMHO it's not very intuitive. My 0.02EUR...

I find http://www.xmlcooktop.com/ a handy tool for experimenting
with XPath expressions. It's an XML editor with an XPath evaluation
window where you can immediately see results. Nothing too fancy but I
liked the XPath direct evaluation.

PS: This is the page I use occasionally for refreshing:
http://www.w3schools.com/xpath/xpath_syntax.asp

Again, greatly appreciated.

Eric Armstrong · Aug 13, 2006

Marcus said:
I suppose this is because XPath originated from XSLT. When using XPath in
XSLT you always have, implicitly, the current node as the context node.
Sometimes you need to break out of the context and then you do it by
appending a "/" or a "//" in the beginning of your path expression to
search
from the root. XPath works extremely smooth in most ways in XSLT.

But from a DOM and XPath perspective it's kind of silly...

That XSLT perspective may explain it, somewhat.
Thanks for the attempt to make it seem rational, at least.
\:_)

ruby / rexml / xpath bug?	7	Sep 15, 2008
getting XPath in REXML to dive deeper	2	Sep 24, 2007
REXML XPath bug?	1	May 14, 2008
Documentation Patch: Preventing XPath Injection attacks	5	Apr 29, 2008
REXML 3.1.6 has XPath problems	3	Dec 30, 2006
REXML, XPath and Namespace	1	Jun 16, 2007
[newbie] REXML, each_element and XPath	3	Mar 8, 2006
REXML and XPath	4	Jan 27, 2006

REXML XPath: bug or misunderstanding?

Eric Armstrong

Robert Klemme

Leslie Viljoen

Marcus Andersson

Thomas, Mark - BLS CTR

Eric Armstrong

Eric Armstrong

Eric Armstrong

Eric Armstrong

Robert Klemme

Robert Klemme

Keith Fahlgren

Robert Klemme

Eric Armstrong

Eric Armstrong

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads