Ways to tell if two XPath expressions have intersecting results

W

Weston

Are there any quick, reliable shortcuts to determine if two arbitrary
XPath expressions yield result sets that intersect?

The obviously way would be to iterate over the set of nodes from one
expression, checking each one to see if it's also contained in the
other set, but I'm wondering if there might be ways of doing this by
looking at or rewriting the actual XPath expressions themselves,
without evaluating them.
 
J

Joseph Kesselman

Weston said:
Are there any quick, reliable shortcuts to determine if two arbitrary
XPath expressions yield result sets that intersect?

In general, no. Appropriate logic on the paths can recognize cases that
must intersect, and cases that definitely can't intersect, but there's
going to be a huge set in the middle which depends on the actual data
the paths are being evaluated against.

Folks who have worked on XML processing have explored, published, and/or
patented some related techniques. So a literature search would seem to
be a good place to start... Sorry, I don't have any pointers already on
hand to pass you.
 
A

Anthony Jones

Weston said:
Are there any quick, reliable shortcuts to determine if two arbitrary
XPath expressions yield result sets that intersect?

The obviously way would be to iterate over the set of nodes from one
expression, checking each one to see if it's also contained in the
other set, but I'm wondering if there might be ways of doing this by
looking at or rewriting the actual XPath expressions themselves,
without evaluating them.

(count(XPath1) + count(XPath2)) != Count(XPath1 | XPath2)

When creating union of node sets with the | operator any node common to both
sides will only be included in the resulting set once. Hence the above
boolean expression is true when XPath1 and XPath2 contain at least one Node
in common.
 
J

Joseph Kesselman

Anthony said:
(count(XPath1) + count(XPath2)) != Count(XPath1 | XPath2)

I was presuming the question was how to determine this for all possible
input documents (ie, only via analysis of the XPaths), rather than for a
specific set of inputs. But I may have misunderstood.
 
J

Joe Fawcett

Anthony Jones said:
(count(XPath1) + count(XPath2)) != Count(XPath1 | XPath2)

When creating union of node sets with the | operator any node common to
both
sides will only be included in the resulting set once. Hence the above
boolean expression is true when XPath1 and XPath2 contain at least one
Node
in common.
As there are an infinite number of XPaths for any sequence of nodes in a
document it seems unlikely that you could do this.
<a>
<b>
<c/>
</b>
</a>
How could you tell that c/.. = a/* , especially without a schema of some
sort, just by examination?

As well as Anthony's technique there is an official 'intersect' operator if
you actually need the nodes.
 
D

Dimitre Novatchev

Weston said:
Are there any quick, reliable shortcuts to determine if two arbitrary
XPath expressions yield result sets that intersect?

The obviously way would be to iterate over the set of nodes from one
expression, checking each one to see if it's also contained in the
other set, but I'm wondering if there might be ways of doing this by
looking at or rewriting the actual XPath expressions themselves,
without evaluating them.

In XPath 1.0 the following XPath expression selects all nodes that are the
intersection (belong to both) of two node-sets:

$ns1[count(. | $ns2) = count($ns2)]

Also knowns as the Kaysian (after Michael Kay) method of finding the
intersection of two node-sets.

In the above XPath expression just replace $ns1 and $ns2 with your two XPath
expressions.


Cheers,
Dimitre Novatchev
 
W

Weston

I was presuming the question was how to determine this for all possible
input documents (ie, only via analysis of the XPaths), rather than for a
specific set of inputs. But I may have misunderstood.

You're right, that's what I was originally hoping for, but I've
realized it's impossible for the reasons Joe F pointed out. So I'm
definitely open to helpful tips that involve evaluating the
expressions against a document, and Anthony's test looks like one
thing that would work for me.

I've also realized that the problem I'm trying to solve might be
reducible to a comparison between two expressions which are known to
resolve to single nodes. If this turns out to be the case, then
perhaps one could solve the problem by comparing canonical xpath
expressions for the given nodes, right? Are there convenient ways of
getting a canonical expression?

As well as Anthony's technique there is an official 'intersect' operator if
you actually need the nodes.

I'd had no idea -- that would *definitely* solve the problem. :) Looks
like it's part of XPath 2, though... is it commonly implemented in
processors? Guess I'll have to check mine to see if it works.

In XPath 1.0 the following XPath expression selects all nodes that are the
intersection (belong to both) of two node-sets:

$ns1[count(. | $ns2) = count($ns2)]

Thank you! This will especially be helpful if the intersection
operator isn't supported. I'd like to understand how it works, though,
and I don't think I do.
 
J

Joseph Kesselman

$ns1[count(. | $ns2) = count($ns2)]
I'd like to understand how it works, though, and I don't think I do.

The predicate is true if the union of . ("this node") and the node set
$ns2 is the same length as $ns2 by itself... in other words, if . is
already present in $ns2.

So applying the predicate to nodeset $ns1 yields "those nodes in $ns1
which are present in $ns2"... in other words, the intersection of the
two sets.
 
P

Philippe Poulard

Weston a écrit :
You're right, that's what I was originally hoping for, but I've
realized it's impossible for the reasons Joe F pointed out. So I'm
definitely open to helpful tips that involve evaluating the
expressions against a document, and Anthony's test looks like one
thing that would work for me.

I've also realized that the problem I'm trying to solve might be
reducible to a comparison between two expressions which are known to
resolve to single nodes. If this turns out to be the case, then
perhaps one could solve the problem by comparing canonical xpath
expressions for the given nodes, right?

Right.

However, it depends on what you call "canonical path" ; my own
definition is here :
http://ns.inria.org/active-tags/glossary/glossary.html#canonical-path
> Are there convenient ways of getting a canonical expression?

I've got a class that can be built from a DOM node :
http://reflex.gforge.inria.fr/javadoc/org/inria/reflex/xml/CanonicalPath.html
and 2 canonical paths can be compared, or tested to check if a node
addressed by one is reachable by the other (browse the doc of the class
for details)

Additionally, if you have to store that canonical path inside an element
(say a DOM attribute), the host element will take care of the prefix
mappings of the canonical path, and eventually will redefine prefixes to
avoid clashes.

--
Cordialement,

///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
-----------------------------
http://reflex.gforge.inria.fr/
Have the RefleX !
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top