XPath subtree pattern matching

Discussion in 'XML' started by ahogue at theory dot lcs dot mit dot edu, Jul 28, 2003.

  1. Hello -

    Is there any way to match complex subtree patterns with XPath? The
    functions I see all seem to match along a single path from root to leaf.
    I would like to match full subtrees.

    For example, given the XHTML:

    <html>
    <body>
    <p>
    <a>#text</a>
    <br/>
    #text
    <b>#text</b>
    #text
    <br/>
    <font>
    <a>#text</a>
    </font>
    </p>
    <p>
    <a>#text</a>
    <br/>
    #text
    <br/>
    <font>
    <a>#text</a>
    </font>
    </p>
    </body>
    </html>

    I would like to construct a "pattern" using XPath to match all subtrees
    like:

    <p>
    <a>*</a>
    <br/>
    *
    (<b>*</b>)?
    (*)?
    <br/>
    <font>
    <a>*</a>
    </font>
    </p>

    where the "*" means that any text can be matched, and the "?" means that
    0 or 1 instances of the item may be matched, similar to a regular
    expression.

    Is there an easy way to do this kind of "subtree pattern matching" in
    XPath? Would I be better off writing a wrapper over XPath and using
    several XPath queries to represent and retreive my pattern?

    Thanks in advance,

    Andrew Hogue
    ahogue at theory dot lcs dot mit dot edu, Jul 28, 2003
    #1
    1. Advertising

  2. "ahogue at theory dot lcs dot mit dot edu" <"ahogue at theory dot lcs dot
    mit dot edu"> wrote:

    > Hello -
    >
    > Is there any way to match complex subtree patterns with XPath? The
    > functions I see all seem to match along a single path from root to leaf.
    > I would like to match full subtrees.
    >


    XPath is basically a tree language, not a path language, so you *can*
    specify tree patterns. This is usually done by using qualifiers. To match
    e.g.

    <f>
    <a/>
    <b>Text</b>
    <c>Other Text</c>
    </f>

    and select "Text", an XPath expression could be used as follows:
    f/b[preceding-sibling::a][following-sibling::c]

    However, Tree matching in XPath has two restrictions:
    1. It is not "nice", since you basically encode the tree in a linear
    representation which is not straightforward, as it does not
    resemble the XML document
    2. It is not possible to select content at several positions (e.g.
    "Text" and "Other Text" together)

    I don't want to make too much advertisement again, but you might want to
    have a look at http://www.xcerpt.org if you want to have a look at a
    language with "real" tree patterns.

    --
    Sebastian

    PGP Key fingerprint =
    13 1D 2E 4F 20 3E C9 1F 4C 57 52 87 8A 80 48 4D F5 E9 97 EC
    Sebastian Schaffert, Jul 28, 2003
    #2
    1. Advertising

  3. As easy as:

    node()[count(ancestor-or-self::someNode | theRoot-someNode)
    =
    count(ancestor-or-self::someNode )
    ]

    This matches all nodes of the tree with root theRoot-someNode, which is a
    specific "someNode" element.

    In case we want simply to select all nodes of a given tree, we can use the
    following simpler XPath expression, which is not a match pattern, because
    the location steps (not the predicates) of a match pattern may only contain
    the child and attribute axis:

    theRoot-someNode//descendant-or-self::node()

    This selects all nodes of the tree with root a "theRoot-someNode" element.

    =====
    Cheers,

    Dimitre Novatchev.
    http://fxsl.sourceforge.net/ -- the home of FXSL



    "ahogue at theory dot lcs dot mit dot edu" <"ahogue at theory dot lcs dot
    mit dot edu"> wrote in message
    news:3f256f6f$0$3949$...
    > Hello -
    >
    > Is there any way to match complex subtree patterns with XPath? The
    > functions I see all seem to match along a single path from root to leaf.
    > I would like to match full subtrees.
    >
    > For example, given the XHTML:
    >
    > <html>
    > <body>
    > <p>
    > <a>#text</a>
    > <br/>
    > #text
    > <b>#text</b>
    > #text
    > <br/>
    > <font>
    > <a>#text</a>
    > </font>
    > </p>
    > <p>
    > <a>#text</a>
    > <br/>
    > #text
    > <br/>
    > <font>
    > <a>#text</a>
    > </font>
    > </p>
    > </body>
    > </html>
    >
    > I would like to construct a "pattern" using XPath to match all subtrees
    > like:
    >
    > <p>
    > <a>*</a>
    > <br/>
    > *
    > (<b>*</b>)?
    > (*)?
    > <br/>
    > <font>
    > <a>*</a>
    > </font>
    > </p>
    >
    > where the "*" means that any text can be matched, and the "?" means that
    > 0 or 1 instances of the item may be matched, similar to a regular
    > expression.
    >
    > Is there an easy way to do this kind of "subtree pattern matching" in
    > XPath? Would I be better off writing a wrapper over XPath and using
    > several XPath queries to represent and retreive my pattern?
    >
    > Thanks in advance,
    >
    > Andrew Hogue
    >
    Dimitre Novatchev, Jul 29, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. .pd.
    Replies:
    1
    Views:
    1,491
    Robert Rossney
    Sep 1, 2003
  2. Hardy Merrill

    DOM XPath pattern matching

    Hardy Merrill, May 12, 2004, in forum: XML
    Replies:
    1
    Views:
    639
    Martin Honnen
    May 12, 2004
  3. malc

    xpath pattern matching

    malc, Nov 23, 2005, in forum: XML
    Replies:
    1
    Views:
    3,673
    Richard Tobin
    Nov 23, 2005
  4. Marc Bissonnette

    Pattern matching : not matching problem

    Marc Bissonnette, Jan 8, 2004, in forum: Perl Misc
    Replies:
    9
    Views:
    220
    Marc Bissonnette
    Jan 13, 2004
  5. Bobby Chamness
    Replies:
    2
    Views:
    212
    Xicheng Jia
    May 3, 2007
Loading...

Share This Page