XPath subtree pattern matching

  • Thread starter ahogue at theory dot lcs dot mit dot edu
  • Start date
A

ahogue at theory dot lcs dot mit dot edu

Hello -

Is there any way to match complex subtree patterns with XPath? The
functions I see all seem to match along a single path from root to leaf.
I would like to match full subtrees.

For example, given the XHTML:

<html>
<body>
<p>
<a>#text</a>
<br/>
#text
<b>#text</b>
#text
<br/>
<font>
<a>#text</a>
</font>
</p>
<p>
<a>#text</a>
<br/>
#text
<br/>
<font>
<a>#text</a>
</font>
</p>
</body>
</html>

I would like to construct a "pattern" using XPath to match all subtrees
like:

<p>
<a>*</a>
<br/>
*
(<b>*</b>)?
(*)?
<br/>
<font>
<a>*</a>
</font>
</p>

where the "*" means that any text can be matched, and the "?" means that
0 or 1 instances of the item may be matched, similar to a regular
expression.

Is there an easy way to do this kind of "subtree pattern matching" in
XPath? Would I be better off writing a wrapper over XPath and using
several XPath queries to represent and retreive my pattern?

Thanks in advance,

Andrew Hogue
 
S

Sebastian Schaffert

"ahogue at theory dot lcs dot mit dot edu" <"ahogue at theory dot lcs dot
mit said:
Hello -

Is there any way to match complex subtree patterns with XPath? The
functions I see all seem to match along a single path from root to leaf.
I would like to match full subtrees.

XPath is basically a tree language, not a path language, so you *can*
specify tree patterns. This is usually done by using qualifiers. To match
e.g.

<f>
<a/>
<b>Text</b>
<c>Other Text</c>
</f>

and select "Text", an XPath expression could be used as follows:
f/b[preceding-sibling::a][following-sibling::c]

However, Tree matching in XPath has two restrictions:
1. It is not "nice", since you basically encode the tree in a linear
representation which is not straightforward, as it does not
resemble the XML document
2. It is not possible to select content at several positions (e.g.
"Text" and "Other Text" together)

I don't want to make too much advertisement again, but you might want to
have a look at http://www.xcerpt.org if you want to have a look at a
language with "real" tree patterns.
 
D

Dimitre Novatchev

As easy as:

node()[count(ancestor-or-self::someNode | theRoot-someNode)
=
count(ancestor-or-self::someNode )
]

This matches all nodes of the tree with root theRoot-someNode, which is a
specific "someNode" element.

In case we want simply to select all nodes of a given tree, we can use the
following simpler XPath expression, which is not a match pattern, because
the location steps (not the predicates) of a match pattern may only contain
the child and attribute axis:

theRoot-someNode//descendant-or-self::node()

This selects all nodes of the tree with root a "theRoot-someNode" element.

=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL



"ahogue at theory dot lcs dot mit dot edu" <"ahogue at theory dot lcs dot
mit dot edu"> wrote in message
news:[email protected]...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top