Ways to tell if two XPath expressions have intersecting results

Weston · Oct 18, 2007

Are there any quick, reliable shortcuts to determine if two arbitrary
XPath expressions yield result sets that intersect?

The obviously way would be to iterate over the set of nodes from one
expression, checking each one to see if it's also contained in the
other set, but I'm wondering if there might be ways of doing this by
looking at or rewriting the actual XPath expressions themselves,
without evaluating them.

Joseph Kesselman · Oct 18, 2007

Weston said:
Are there any quick, reliable shortcuts to determine if two arbitrary
XPath expressions yield result sets that intersect?

In general, no. Appropriate logic on the paths can recognize cases that
must intersect, and cases that definitely can't intersect, but there's
going to be a huge set in the middle which depends on the actual data
the paths are being evaluated against.

Folks who have worked on XML processing have explored, published, and/or
patented some related techniques. So a literature search would seem to
be a good place to start... Sorry, I don't have any pointers already on
hand to pass you.

Anthony Jones · Oct 18, 2007

Weston said:
Are there any quick, reliable shortcuts to determine if two arbitrary
XPath expressions yield result sets that intersect?

The obviously way would be to iterate over the set of nodes from one
expression, checking each one to see if it's also contained in the
other set, but I'm wondering if there might be ways of doing this by
looking at or rewriting the actual XPath expressions themselves,
without evaluating them.

(count(XPath1) + count(XPath2)) != Count(XPath1 | XPath2)

When creating union of node sets with the | operator any node common to both
sides will only be included in the resulting set once. Hence the above
boolean expression is true when XPath1 and XPath2 contain at least one Node
in common.

Joseph Kesselman · Oct 18, 2007

Anthony said:
(count(XPath1) + count(XPath2)) != Count(XPath1 | XPath2)

I was presuming the question was how to determine this for all possible
input documents (ie, only via analysis of the XPaths), rather than for a
specific set of inputs. But I may have misunderstood.

Joe Fawcett · Oct 19, 2007

Anthony Jones said:
(count(XPath1) + count(XPath2)) != Count(XPath1 | XPath2)

When creating union of node sets with the | operator any node common to
both
sides will only be included in the resulting set once. Hence the above
boolean expression is true when XPath1 and XPath2 contain at least one
Node
in common.

As there are an infinite number of XPaths for any sequence of nodes in a
document it seems unlikely that you could do this.
<a>
<b>
<c/>
</b>
</a>
How could you tell that c/.. = a/* , especially without a schema of some
sort, just by examination?

As well as Anthony's technique there is an official 'intersect' operator if
you actually need the nodes.

Dimitre Novatchev · Oct 19, 2007

Weston said:
Are there any quick, reliable shortcuts to determine if two arbitrary
XPath expressions yield result sets that intersect?

The obviously way would be to iterate over the set of nodes from one
expression, checking each one to see if it's also contained in the
other set, but I'm wondering if there might be ways of doing this by
looking at or rewriting the actual XPath expressions themselves,
without evaluating them.

In XPath 1.0 the following XPath expression selects all nodes that are the
intersection (belong to both) of two node-sets:

$ns1[count(. | $ns2) = count($ns2)]

Also knowns as the Kaysian (after Michael Kay) method of finding the
intersection of two node-sets.

In the above XPath expression just replace $ns1 and $ns2 with your two XPath
expressions.

Cheers,
Dimitre Novatchev

Weston · Oct 19, 2007

I was presuming the question was how to determine this for all possible
input documents (ie, only via analysis of the XPaths), rather than for a
specific set of inputs. But I may have misunderstood.

You're right, that's what I was originally hoping for, but I've
realized it's impossible for the reasons Joe F pointed out. So I'm
definitely open to helpful tips that involve evaluating the
expressions against a document, and Anthony's test looks like one
thing that would work for me.

I've also realized that the problem I'm trying to solve might be
reducible to a comparison between two expressions which are known to
resolve to single nodes. If this turns out to be the case, then
perhaps one could solve the problem by comparing canonical xpath
expressions for the given nodes, right? Are there convenient ways of
getting a canonical expression?

As well as Anthony's technique there is an official 'intersect' operator if
you actually need the nodes.

I'd had no idea -- that would *definitely* solve the problem.

Looks
like it's part of XPath 2, though... is it commonly implemented in
processors? Guess I'll have to check mine to see if it works.

In XPath 1.0 the following XPath expression selects all nodes that are the
intersection (belong to both) of two node-sets:

$ns1[count(. | $ns2) = count($ns2)]

Thank you! This will especially be helpful if the intersection
operator isn't supported. I'd like to understand how it works, though,
and I don't think I do.

Joseph Kesselman · Oct 19, 2007

$ns1[count(. | $ns2) = count($ns2)]

I'd like to understand how it works, though, and I don't think I do.

The predicate is true if the union of . ("this node") and the node set
$ns2 is the same length as $ns2 by itself... in other words, if . is
already present in $ns2.

So applying the predicate to nodeset $ns1 yields "those nodes in $ns1
which are present in $ns2"... in other words, the intersection of the
two sets.

Philippe Poulard · Oct 22, 2007

Weston a écrit :

You're right, that's what I was originally hoping for, but I've
realized it's impossible for the reasons Joe F pointed out. So I'm
definitely open to helpful tips that involve evaluating the
expressions against a document, and Anthony's test looks like one
thing that would work for me.

I've also realized that the problem I'm trying to solve might be
reducible to a comparison between two expressions which are known to
resolve to single nodes. If this turns out to be the case, then
perhaps one could solve the problem by comparing canonical xpath
expressions for the given nodes, right?

Right.

However, it depends on what you call "canonical path" ; my own
definition is here :
http://ns.inria.org/active-tags/glossary/glossary.html#canonical-path

> Are there convenient ways of getting a canonical expression?

I've got a class that can be built from a DOM node :
http://reflex.gforge.inria.fr/javadoc/org/inria/reflex/xml/CanonicalPath.html
and 2 canonical paths can be compared, or tested to check if a node
addressed by one is reachable by the other (browse the doc of the class
for details)

Additionally, if you have to store that canonical path inside an element
(say a DOM attribute), the host element will take care of the prefix
mappings of the canonical path, and eventually will redefine prefixes to
avoid clashes.

--
Cordialement,

///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
-----------------------------
http://reflex.gforge.inria.fr/
Have the RefleX !

Using XSLT and XPath for graph data structure processing?	6	Jan 11, 2004
Request for Feedback; a module making it easier to use regular expressions.	1	Jan 31, 2005
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Dec 15, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Nov 1, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Apr 1, 2008

Ways to tell if two XPath expressions have intersecting results

Weston

Joseph Kesselman

Anthony Jones

Joseph Kesselman

Joe Fawcett

Dimitre Novatchev

Weston

Joseph Kesselman

Philippe Poulard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads