REXML XPath: bug or misunderstanding?

E

Eric Armstrong

This code looks for a table that matches
specific criteria:

XPath.each(@doc, '//table') do |tbl|
XPath.each(tbl, '//td') do |td|
# look for matching data

According to the XPath doc, the first argument
is the "context" element.

So what /should/ happen is that search
for td elements occurs in the subtree
rooted at the tbl element, using the
//td path --which I take to mean "anywhere
within the context".

But what actually happens is that the
search for td elements occurs in the
entire document, so the code above returns
the first table, regardless of where the
matching data is found.

The workaround is to dispense with the
outer loop. Once matching data is found
and keep visiting parents until the
table ancestor is found. (That patch
simplifies the code, actually.)

But if this implementation isn't a bug, it
means that the definition of "context" is
"the entire tree in which the specified
node is found".

In that case, the XPath expression that
asks for "all <td> elements under the
current <table> node" must be something
other than what I coded...

What might that path be, I wonder?
 
R

Robert Klemme

Eric said:
This code looks for a table that matches
specific criteria:

XPath.each(@doc, '//table') do |tbl|
XPath.each(tbl, '//td') do |td|
# look for matching data

According to the XPath doc, the first argument
is the "context" element.

So what /should/ happen is that search
for td elements occurs in the subtree
rooted at the tbl element, using the
//td path --which I take to mean "anywhere
within the context".

But what actually happens is that the
search for td elements occurs in the
entire document, so the code above returns
the first table, regardless of where the
matching data is found.

That sounds like a bug.
The workaround is to dispense with the
outer loop. Once matching data is found
and keep visiting parents until the
table ancestor is found. (That patch
simplifies the code, actually.)

IMHO the proper solution is to craft a single XPath expression that will
cover your requirement, i.e. any "td" somewhere below a "table".
But if this implementation isn't a bug, it
means that the definition of "context" is
"the entire tree in which the specified
node is found".

In that case, the XPath expression that
asks for "all <td> elements under the
current <table> node" must be something
other than what I coded...

What might that path be, I wonder?

I'm not too involveld with XPath but did you try something like this?

//table/*/td
//table//td

IMHO having a single XPath expression is the preferred way to go.

Also, I usually use method doc.elements.each 'xpath here' do ... instead
of the specific XPath expression you used.

Kind regards

robert
 
M

Marcus Andersson

Eric Armstrong skrev:
This code looks for a table that matches
specific criteria:

XPath.each(@doc, '//table') do |tbl|
XPath.each(tbl, '//td') do |td|
# look for matching data

According to the XPath doc, the first argument
is the "context" element.

So what /should/ happen is that search
for td elements occurs in the subtree
rooted at the tbl element, using the
//td path --which I take to mean "anywhere
within the context".

But what actually happens is that the
search for td elements occurs in the
entire document, so the code above returns
the first table, regardless of where the
matching data is found.

// always uses the document root as the context regardless of the
context node provided. This is according to the XPath spec.
But if this implementation isn't a bug, it
means that the definition of "context" is
"the entire tree in which the specified
node is found".

Well, yes, when you are using // in the beginning of your path expression.
In that case, the XPath expression that
asks for "all <td> elements under the
current <table> node" must be something
other than what I coded...

What might that path be, I wonder?

To get all descendants of the current context node using a shorthand you do

//td

The . in the beginning makes sure the expression will use the current
node as the context. There is also the following to xpath axes you can use

descendant-or-self::td (includes the context node itself)
and
descendant::td (equivalent with .//td)

/Marcus
 
T

Thomas, Mark - BLS CTR

Eric said:
This code looks for a table that matches specific criteria:
=20
XPath.each(@doc, '//table') do |tbl|
XPath.each(tbl, '//td') do |td|
# look for matching data

Are you doing anything with tbl other than making it a stopping point
for iteration? You could remove one loop if you simply delve to
precisely what you're looking for:

XPath.each(@doc, '//table//td') do |td|
# look for matching data

And, depending on how you define "matching data" you may be able to add
some XPath conditions that get rid of the loop entirely.

- Mark.
 
E

Eric Armstrong

Paul said:
That's what I thought when I first encountered it, but after looking
into it in more detail I concluded that it follows the specification.
The receiving element does indeed provide context, but using //
overrides it: // means the root node or any of its descendants.
Matches my experience.
...could have used: XPath.each(tbl, 'descendant::td')
Interesting construct. I guess that's a good way to go,
and I guess that matches the Xpath spec...but I'm
highly dubious about a spec that defines "context" as
something it can simply ignore. Makes little sense,
from my current perspective.
 
E

Eric Armstrong

Marcus said:
// always uses the document root as the context regardless of the
context node provided. This is according to the XPath spec.

To get all descendants of the current context node using a shorthand you do

.//td

The . in the beginning makes sure the expression will use the current
node as the context.
Very useful syntax. Thanks.

I continue to be disgruntled by a syntax that ignores
the context you specified, unless you add the additional
"." to say, "No, I really mean it". But I thank you for
a fine solution, and the additional explanation.
 
E

Eric Armstrong

Thomas said:
You could remove one loop if you simply delve to
precisely what you're looking for:

XPath.each(@doc, '//table//td') do |td|
# look for matching data

And, depending on how you define "matching data" you may be able to add
some XPath conditions that get rid of the loop entirely.
Most excellent. Some damn good Xpath expertise
on this list. Thanks, all.

eric
(Who hasn't used Xpath expressions in more than 3 years,
and who is entirely capable of forgetting everything he
ever knew in less than 6 months.)
:_)
 
R

Robert Klemme

Eric said:
Most excellent. Some damn good Xpath expertise
on this list. Thanks, all.

eric
(Who hasn't used Xpath expressions in more than 3 years,
and who is entirely capable of forgetting everything he
ever knew in less than 6 months.)
:_)

Same here: I rarely use XPath and I always have to look up the details
again. IMHO it's not very intuitive. My 0.02EUR...

Btw, I find http://www.xmlcooktop.com/ a handy tool for experimenting
with XPath expressions. It's an XML editor with an XPath evaluation
window where you can immediately see results. Nothing too fancy but I
liked the XPath direct evaluation.

Kind regards

robert
 
K

Keith Fahlgren


No, not really. I think it's much easier to learn XPath (an incredibly
powerful language once you get your head around it) in a more practical
environment (probably XSLT or XQuery). Quite a few of the XSLT books
have good introductions to XPath. I quite like the new edition of the
XSLT Cookbook, which has a section comparing XPath1 and XPath2, but
this may not cover enough of the basics for some folks.

http://www.oreilly.com/catalog/xsltckbk2/

(Note: Our first edition general XSLT book is being revised presently,
so I'd wait for the new edition)

HTH,
Keith
 
R

Robert Klemme

Keith said:
No, not really. I think it's much easier to learn XPath (an incredibly
powerful language once you get your head around it) in a more practical
environment (probably XSLT or XQuery).

So you're basically saying that the book is more similar to the standard
page at http://www.w3.org/TR/xpath - did I get you right?
Quite a few of the XSLT books
have good introductions to XPath. I quite like the new edition of the
XSLT Cookbook, which has a section comparing XPath1 and XPath2, but
this may not cover enough of the basics for some folks.

http://www.oreilly.com/catalog/xsltckbk2/

(Note: Our first edition general XSLT book is being revised presently,
so I'd wait for the new edition)

I'll probably go with the XPath / XPointer anyway as I prefer the more
thorough coverage over the easy learning path. :)

Thanks for your hints anyway!

Kind regards

robert
 
E

Eric Armstrong

Robert said:
I rarely use XPath and I always have to look up the details
again. IMHO it's not very intuitive. My 0.02EUR...

I find http://www.xmlcooktop.com/ a handy tool for experimenting
with XPath expressions. It's an XML editor with an XPath evaluation
window where you can immediately see results. Nothing too fancy but I
liked the XPath direct evaluation.

PS: This is the page I use occasionally for refreshing:
http://www.w3schools.com/xpath/xpath_syntax.asp
Again, greatly appreciated.
 
E

Eric Armstrong

Marcus said:
I suppose this is because XPath originated from XSLT. When using XPath in
XSLT you always have, implicitly, the current node as the context node.
Sometimes you need to break out of the context and then you do it by
appending a "/" or a "//" in the beginning of your path expression to
search
from the root. XPath works extremely smooth in most ways in XSLT.

But from a DOM and XPath perspective it's kind of silly...
That XSLT perspective may explain it, somewhat.
Thanks for the attempt to make it seem rational, at least.
\:_)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top