Hierarchical query

Vadim Tropashko · Jun 13, 2007

Reposting with more clarification (as Jan asked).

Suppose I have a BNFgrammar and a source text parsed into a tree. How
would I query an
identifier declaration?

All the XQuery tutorials (where I went to gather some ideas) start
with simpleminded examples like browsing all the descendants of /
bookstore/book (and the bookstore XML design is such wrong idea to
boot).

Perhaps some example is needed. A simplified grammar:

statement_block:
'declare'
declaration_item_list
'begin'
statements
'end'
;

statements:
(statement_block | assignment) statements |
;

assignment:
identifier ':=' (identifier | number) ';'
;

declaration_item_list:
identifier 'integer' ';'
;

Suppose we parse the following text

declare -- token #1
i integer; -- tokens 2,3,4
begin
i := 1; -- tokens 6,7,8,9
end;

So that we get the derivation tree like this:

1 statement_block
1.1 'declare' (matches token #1)
1.2 declaration_item_list
1.2.1 identifier (matches token #2)
1.2.2 'integer' (matches token #3)
1.2.3 ';' (matches token #4)
1.3 'begin' (matches token #5)
1.4 statements
1.4.1 assignment
1.4.1.1 identifier
1.4.1.2 ':='
1.4.1.3 number
1.4.1.4 ';'
1.4.2 statements (matches empty token)
1.5 'end';

Now, given a derivation tree and the leaf node identifier 1.4.1.1
corresponding to the token #6 which is variable i, how do we find the
node 1.2.1 that declares it?

Jan Hidders · Jun 13, 2007

Reposting with more clarification (as Jan asked).

Suppose I have a BNFgrammar and a source text parsed into a tree. How
would I query an
identifier declaration?

All the XQuery tutorials (where I went to gather some ideas) start
with simpleminded examples like browsing all the descendants of /
bookstore/book (and the bookstore XML design is such wrong idea to
boot).

Perhaps some example is needed. A simplified grammar:

statement_block:
'declare'
declaration_item_list
'begin'
statements
'end'
;

statements:
(statement_block | assignment) statements |
;

assignment:
identifier ':=' (identifier | number) ';'
;

declaration_item_list:
identifier 'integer' ';'
;

Suppose we parse the following text

declare -- token #1
i integer; -- tokens 2,3,4
begin
i := 1; -- tokens 6,7,8,9
end;

So that we get the derivation tree like this:

1 statement_block
1.1 'declare' (matches token #1)
1.2 declaration_item_list
1.2.1 identifier (matches token #2)
1.2.2 'integer' (matches token #3)
1.2.3 ';' (matches token #4)
1.3 'begin' (matches token #5)
1.4 statements
1.4.1 assignment
1.4.1.1 identifier
1.4.1.2 ':='
1.4.1.3 number
1.4.1.4 ';'
1.4.2 statements (matches empty token)
1.5 'end';

Now, given a derivation tree and the leaf node identifier 1.4.1.1
corresponding to the token #6 which is variable i, how do we find the
node 1.2.1 that declares it?

Ok. I'm going to assume the following DTD (in a notation of my own
making to make it a bit more readable) for the syntax tree. (It uses
no attributes to keep things simple):

<stat_bl> --> <decl_kw> <decl_item_list> <begin_kw> <statements>
<semicol_kw> <end-kw>
<decl_kw> --> EMPTY
<begin_kw> --> EMPTY
<end_kw> --> EMPTY
<semicol_kw> --> EMPTY
<statements> --> ( <stat_bl> | <assignment> )+
<assignment> --> <var> <assign_kw> ( <var> | <number> ) <semicol_kw>
<var> --> PCDATA
<assign_kw> --> EMPTY
<number> --> PCDATA
<decl_item_list> --> ( <var> <type> <semicol_kw> )+
<type> --> PCDATA

For starters I'll first do the reverse query, so I will assume there
is a variable $dvar that contains a <var> element that describes a
variable in a declaration. The XPath expression that walks to all the
<var> nodes in an assignment that are in the scope of $dvar is as
follows:

(1) $dvar/(
(2) ../../statements//assignment/var[string() = $dvar/string()]
(3) minus
(4) ../../statements//stat_bl[decl_item_list/var/string() = $dvar/
string()]/statements//var
(5) )

The idea is quite simple: the path expression in line (2) walks to all
variables that are nested within the statement block of the
declaration $dvar, and the path expression in line (4) subtracts those
for which there is another intermediate declaration that supersedes
the one of $dvar.

Now the query that you asked for, which in some sense simply the
reverse. We assume now that $uvar contains the <var> element that is
used in an assignment and the path expression walks to the <var>
elements in a declaration that binds the <var> element in $uvar:

(1) $uvar/(
(2) ancestor::stat_bl/decl_item_list/var[string() = $uvar/
string()]
(3) minus
(4) ancestor::stat_bl[decl_item_list/var/string() = $uvar/
string()]/ancestor::stat_bl/decl_item/var
(5) )

Of course there are many many different ways of expressing these
queries, but these are the most compact ones I could think of. Note
that I actually didn't have to take into account the order of the
declarations, just how they are nested. But this is because of the way
you defined the syntax and scoping rules. However, if you would extend
the syntax and the scoping rules such that order also maters then
extending the queries to take that into account should not be a big
problem.

-- Jan Hidders

Aloha Kakuikanu · Jun 13, 2007

Ok. I'm going to assume the following DTD (in a notation of my own
making to make it a bit more readable) for the syntax tree. (It uses
no attributes to keep things simple):

<stat_bl> --> <decl_kw> <decl_item_list> <begin_kw> <statements>
<semicol_kw> <end-kw>
<decl_kw> --> EMPTY
<begin_kw> --> EMPTY
<end_kw> --> EMPTY
<semicol_kw> --> EMPTY
<statements> --> ( <stat_bl> | <assignment> )+
<assignment> --> <var> <assign_kw> ( <var> | <number> ) <semicol_kw>
<var> --> PCDATA
<assign_kw> --> EMPTY
<number> --> PCDATA
<decl_item_list> --> ( <var> <type> <semicol_kw> )+
<type> --> PCDATA

All right, the DTD is a bastardized [context free?] grammar describing
a language that XML document is an element of. Although the adjectives
"cumbersome" and "ugly" still apply, I grudgingly admit that the idea
that a grammar fits into DTD effortlessly is quite powerful (so I'm
removing the "XML sucks" image from my homepage

For starters I'll first do the reverse query,
so I will assume there
is a variable $dvar that contains a <var> element that describes a
variable in a declaration. The XPath expression that walks to all the
<var> nodes in an assignment that are in the scope of $dvar is as
follows:

(1) $dvar/(
(2) ../../statements//assignment/var[string() = $dvar/string()]
(3) minus
(4) ../../statements//stat_bl[decl_item_list/var/string() = $dvar/
string()]/statements//var
(5) )

The idea is quite simple: the path expression in line (2) walks to all
variables that are nested within the statement block of the
declaration $dvar, ...

I have trouble comprehending this line

(2) ../../statements//assignment/var[string() = $dvar/string()]

should I read it left to right? Then the ".." selects the parent of
the current node, and what is the current node? Or do we start from
some node that mathes "statements", then go two levels up? The
statement block <stat_bl> is only one level up!

Vadim Tropashko · Jun 13, 2007

Ok. I'm going to assume the following DTD (in a notation of my own
making to make it a bit more readable) for the syntax tree. (It uses
no attributes to keep things simple):

<stat_bl> --> <decl_kw> <decl_item_list> <begin_kw> <statements>
<semicol_kw> <end-kw>
<decl_kw> --> EMPTY
<begin_kw> --> EMPTY
<end_kw> --> EMPTY
<semicol_kw> --> EMPTY
<statements> --> ( <stat_bl> | <assignment> )+
<assignment> --> <var> <assign_kw> ( <var> | <number> ) <semicol_kw>
<var> --> PCDATA
<assign_kw> --> EMPTY
<number> --> PCDATA
<decl_item_list> --> ( <var> <type> <semicol_kw> )+
<type> --> PCDATA

All right, the DTD is a bastardized [context free?] grammar describing
a language that XML document is an element of. Although the adjectives
"cumbersome" and "ugly" still apply, I grudgingly admit that the idea
that a grammar fits into DTD effortlessly is quite powerful (so I'm
removing the "XML sucks" image from my homepage

For starters I'll first do the reverse query,
so I will assume there
is a variable $dvar that contains a <var> element that describes a
variable in a declaration. The XPath expression that walks to all the
<var> nodes in an assignment that are in the scope of $dvar is as
follows:

(1) $dvar/(
(2) ../../statements//assignment/var[string() = $dvar/string()]
(3) minus
(4) ../../statements//stat_bl[decl_item_list/var/string() = $dvar/
string()]/statements//var
(5) )

The idea is quite simple: the path expression in line (2) walks to all
variables that are nested within the statement block of the
declaration $dvar, ...

I have trouble comprehending this line

(2) ../../statements//assignment/var[string() = $dvar/string()]

should I read it left to right? Then the ".." selects the parent of
the current node, and what is the current node? Or do we start from
some node that mathes "statements", then go two levels up? The
statement block <stat_bl> is only one level up!

Joe Kesselman · Jun 13, 2007

Vadim said:
I have trouble comprehending this line

(2) ../../statements//assignment/var[string() = $dvar/string()]

should I read it left to right?

Find a good tutorial on XPath. Yes, it evaluates left to right. From the
context node, find its parent; find the parent of that; find its
<statements> children; find their <assignment> descendants (at any
depth), find their <var> children whose string value is the same as ...
hmmm. actually I'm not sure what $dvar/string() was intended to mean,
and I don't have time right now to check that.

Vadim Tropashko · Jun 13, 2007

1 statement_block
1.1 'declare' (matches token #1)
1.2 declaration_item_list
1.2.1 identifier (matches token #2)
1.2.2 'integer' (matches token #3)
1.2.3 ';' (matches token #4)
1.3 'begin' (matches token #5)
1.4 statements
1.4.1 assignment
1.4.1.1 identifier
1.4.1.2 ':='
1.4.1.3 number
1.4.1.4 ';'
1.4.2 statements (matches empty token)
1.5 'end';

Actually, here is quite simple method. Write down the parse tree as a
set of paths, using grammar varibles and constants (terminals and
nonterminals) as path elements. In the example above we get:

statement_block
statement_block.'declare'
statement_block.declaration_item_list
statement_block.declaration_item_list.identifier"i"
statement_block.declaration_item_list.'integer'
.... and so on ...

Query this set with regular expressions.

XML, good bye!

Jan Hidders · Jun 13, 2007

Ok. I'm going to assume the following DTD (in a notation of my own
making to make it a bit more readable) for the syntax tree. (It uses
no attributes to keep things simple):

Click to expand...

<stat_bl> --> <decl_kw> <decl_item_list> <begin_kw> <statements>
<semicol_kw> <end-kw>
<decl_kw> --> EMPTY
<begin_kw> --> EMPTY
<end_kw> --> EMPTY
<semicol_kw> --> EMPTY
<statements> --> ( <stat_bl> | <assignment> )+
<assignment> --> <var> <assign_kw> ( <var> | <number> ) <semicol_kw>
<var> --> PCDATA
<assign_kw> --> EMPTY
<number> --> PCDATA
<decl_item_list> --> ( <var> <type> <semicol_kw> )+
<type> --> PCDATA

Click to expand...

All right, the DTD is a bastardized [context free?] grammar describing
a language that XML document is an element of. Although the adjectives
"cumbersome" and "ugly" still apply, I grudgingly admit that the idea
that a grammar fits into DTD effortlessly is quite powerful (so I'm
removing the "XML sucks" image from my homepage

Of course, it combines ugliness with great power. Welcome to the Dark
Side.

For starters I'll first do the reverse query,
so I will assume there
is a variable $dvar that contains a <var> element that describes a
variable in a declaration. The XPath expression that walks to all the
<var> nodes in an assignment that are in the scope of $dvar is as
follows:

Click to expand...

(1) $dvar/(
(2) ../../statements//assignment/var[string() = $dvar/string()]
(3) minus
(4) ../../statements//stat_bl[decl_item_list/var/string() = $dvar/
string()]/statements//var
(5) )

Click to expand...

The idea is quite simple: the path expression in line (2) walks to all
variables that are nested within the statement block of the
declaration $dvar, ...

Click to expand...

I have trouble comprehending this line

(2) ../../statements//assignment/var[string() = $dvar/string()]

should I read it left to right? Then the ".." selects the parent of
the current node, and what is the current node?

First a small correction, the 'minus' I used should actualy be
'except', which has the semantics of the set minus.

What is the current node? Note that the global structure is p_1/(p_2
except p_3) where (p_2 except p_3) means that from the context node
you evaluate p_2 which gives you a set (or rather sequence) of nodes
and from that you subtract the set of nodes that you get when you
evaluate p_3 starting from the context node. So in this case the
"current node" that you ask about is for both subexpressions the same,
namely the node in $dvar.

Btw., if you forget about sequence order, document order and value
comparisons for a moment, the core of XPath is actually a relatively
neat calculus of binary relations:
- p_1/p_2 is the concatenation of binary relations
- p_1[p_2] is the selection of pairs in p_2 whose right-hand side
matches a left-hand side in p_2 (semijoin anyone?)
- p_1 union p_2 is the set union of two binary relations
- p_1 intersect p_2 is the set intersection of two binary relations
- p_1 except p_2 is the set difference between binary relations

Did I mention Tarski already?

Of course we had to wait until
XPath2.0 until the three set operations were all allowed everywhere,
but we have them now.

-- Jan Hidders

Jan Hidders · Jun 13, 2007

Vadim said:
Vadim said:

I have trouble comprehending this line

Click to expand...

(2) ../../statements//assignment/var[string() = $dvar/string()]

Click to expand...

should I read it left to right?

Click to expand...

Find a good tutorial on XPath. Yes, it evaluates left to right. From the
context node, find its parent; find the parent of that; find its
<statements> children; find their <assignment> descendants (at any
depth), find their <var> children whose string value is the same as ...
hmmm. actually I'm not sure what $dvar/string() was intended to mean,
and I don't have time right now to check that.

It retrieves the text node children of the nodes in $dvar. Actually it
is unnecessary, in this case I might just have well have written the
predicate as [. = $dvar].

-- Jan Hidders

Jan Hidders · Jun 13, 2007

Actually, here is quite simple method. Write down the parse tree as a
set of paths, using grammar varibles and constants (terminals and
nonterminals) as path elements. In the example above we get:

statement_block
statement_block.'declare'
statement_block.declaration_item_list
statement_block.declaration_item_list.identifier"i"
statement_block.declaration_item_list.'integer'
... and so on ...

Query this set with regular expressions.

XML, good bye!

Not so fast. An enumeration of the paths in a tree does not always
give you enough information to reconstruct the tree, so you would be
losing expressive power. Never mind that you ignore the order, which
may be a factor for the scope of declarations.

-- Jan Hidders

Vadim Tropashko · Jun 13, 2007

Not so fast. An enumeration of the paths in a tree does not always
give you enough information to reconstruct the tree, so you would be
losing expressive power. Never mind that you ignore the order, which
may be a factor for the scope of declarations.

Derivation tree was the confusing part. When we query nodes on a tree
what exactly are we doing? Then, your reply was a critical for me
understanding that the tree structure is unnecessary, the derivation
is essentially a language -- a set of words (which includes both
terminals and nonterminals) and this set of words can be quieried
solely with the language theory means. Formally, a query is a language
intersection.

The order on the tree comes from the order in the Kleene algebra, and
of course the information about the order is still there in the
grammar rules (or should I say inequalities?).

The "reqular expression" exclamation in my previous is
oversimplification, of course.

Vadim Tropashko · Jun 13, 2007

Btw., if you forget about sequence order, document order and value
comparisons for a moment, the core of XPath is actually a relatively
neat calculus of binary relations:
- p_1/p_2 is the concatenation of binary relations
- p_1[p_2] is the selection of pairs in p_2 whose right-hand side
matches a left-hand side in p_2 (semijoin anyone?)
- p_1 union p_2 is the set union of two binary relations
- p_1 intersect p_2 is the set intersection of two binary relations
- p_1 except p_2 is the set difference between binary relations

Did I mention Tarski already? Of course we had to wait until
XPath2.0 until the three set operations were all allowed everywhere,
but we have them now.

Yes, and you have been credited for that (by me removing that XML
banner

.

Well, the question originated from the world of Kleene algebras (which
Tarski relation algebra is an instance of), idempotent semirings, all
the other language theory goodies. I was temorarily confused by the
concept of the "query". Now that it it no longer the case, and I can
operate back into the language theory space, what do I need XML for?

Jan Hidders · Jun 13, 2007

[...] Then, your reply was a critical for me
understanding that the tree structure is unnecessary, the derivation
is essentially a language -- a set of words (which includes both
terminals and nonterminals) and this set of words can be quieried
solely with the language theory means. Formally, a query is a language
intersection.

Of course, all computation, including RDBMS and XML querying and
transformation, is ultimately just string manipulation. Doesn't mean
that Turing Machines are always the most appropriate formalism for
describing them, does it?

-- Jan Hidders

Vadim Tropashko · Jun 14, 2007

[...] Then, your reply was a critical for me
understanding that the tree structure is unnecessary, the derivation
is essentially a language -- a set of words (which includes both
terminals and nonterminals) and this set of words can be quieried
solely with the language theory means. Formally, a query is a language
intersection.

Click to expand...

Of course, all computation, including RDBMS and XML querying and
transformation, is ultimately just string manipulation. Doesn't mean
that Turing Machines are always the most appropriate formalism for
describing them, does it?

You seem to imply that language theory reduces to trivial string
manipulation. I expected you to request clarification why a query is a
language intersection; is there any nonvacuous idea behind this
statement? And insist on demonstating that the reg exp method does
work

Consider a simple expression grammar

expr :
expr '+' expr
| '(' expr ')'
| '1'

and the string

(1 + 1) + 1

When we ask if this string belongs to the language defined by the
grammar, we are interested to know if the intersection of the infinite
language defined by the above expression grammar and finite language
coonsisting of a single string is empty.

The derivation that proves that a string belongs to a language is a
chain of inequalities (where inequality is understood to be the usual
language subset relation):

(1 + 1) + 1 < (expr + 1) + 1 < ... < (expr + expr) + expr < expr +
expr < expr

Now we have derivation tree:

1 expr
1.1 expr
1.1.1 (
1.1.2 expr
1.1.2.1 expr
1.1.2.1.1 1
1.1.2.2 +
1.1.2.3 expr
1.1.2.3.1 1
1.1.3 )
1.2 +
1.2 expr
1.2.1 1

Next if we encode tree nodes as paths of the grammar symbols, then not
only the order is lost (as you correctly observed), but the duplicate
paths are collapsed:

1 --> expr
1.1 --> expr expr
1.1.1 --> expr expr (
1.1.2 --> expr expr expr
1.1.2.1 --> expr expr expr expr
1.1.2.1.1 --> expr expr expr expr expr
1.1.2.2
1.1.2.3
1.1.2.3.1
1.1.3
1.2
1.2 --> expr expr (same as 1.1!)
1.2.1

Marshall · Jun 14, 2007

All right, the DTD is a bastardized [context free?] grammar describing
a language that XML document is an element of. Although the adjectives
"cumbersome" and "ugly" still apply, I grudgingly admit that the idea
that a grammar fits into DTD effortlessly is quite powerful (so I'm
removing the "XML sucks" image from my homepage

Click to expand...

Of course, it combines ugliness with great power. Welcome to the
Dark Side.

This raises some questions for me. Does the above mean we
should consider comparing XML and its associated ... whadayacallits?
XPath, XSL, etc.?, against more traditional parsing techniques?
*** What, if anything, would that leave out? ***

It seems clear enough, the idea of regarding an XML document
as a syntax tree, and the DTD as a grammar. In which case,
*XML*parsing is then the analog of ... what? Converting an
ascii version of a syntax tree into a binary syntax tree?
Sort of like parsing s-expressions?

Then XML querying and updating is comparable to operations
we would do once we had a syntax tree, which seems to
be multipass traversal and transformation. Think of an
interpreter, which traverses the tree, carrying out instructions
as it does so. The more complex case is a compiler
with multiple analysis, optimizer, and code generator passes.
These involve generating additional data structures which
may or may not be trees as well. A symbol table in a
lexically scoped language likely would be; an object file
likely would not be.

If that analysis is valid, then I'd be interested to continue to
a comparison with other techniques for accomplishing the
same thing, particularly from the standpoint of expressive
power, but also from the standpoint of convenience and
generality.

Other techniques I can think of are:

1) ad hoc traversal of an object graph in an OO language
2) pattern matching over abstract data types, as introduced
by SML. (The gold standard.)
3) "Tree walking" a la Terrance Parr / Antlr (the upstart.)

1) is what I am most familiar with. It isn't particularly pleasant,
but I suppose it's no more objectionable than anything else
in an OOPL. One ends up using the Visitor pattern a lot,
and one necessarily finds oneself having to defeat the
type system, since OOPLs are organized around extensible
structures w/ fixed operations, whereas trees work best
with fixed structures w/ extensible operations.

2) I would be very interested to hear a comparison with
pattern matching from someone who is familiar with both
techniques. One thing I can see immediately is that XPath
expressions allow one to address only some specific
part of a tree that one is interested in, whereas pattern
matching requires whole-tree analysis. How much of an
issue is this in practice?

3) is interesting. I admit that when I first heard about the
idea I was extremely skeptical. The idea that one would
write a grammar to abstract operating on a tree one had
generated by using a grammar to abstract over a language
seemed perverse and unnecessarily complex at first.
However in practice it's turned out to be the opposite:
it is extremely simple. Tree grammars are a fraction
of the size of the original grammar, possible similarly
to the size of an XPath expression to a DTD.

I suppose there are also term rewriting languages but I'm
not as familiar with them. I am also vaguely aware of
"strategies" which are abstractions of data structure
traversal techniques. I suppose further there are the
various tree encoding techniques of SQL, but I haven't found
these to be very exciting. Perhaps a more complete
relational language ...

Btw., if you forget about sequence order, document order and value
comparisons for a moment, the core of XPath is actually a relatively
neat calculus of binary relations:
- p_1/p_2 is the concatenation of binary relations
- p_1[p_2] is the selection of pairs in p_2 whose right-hand side
matches a left-hand side in p_2 (semijoin anyone?)
- p_1 union p_2 is the set union of two binary relations
- p_1 intersect p_2 is the set intersection of two binary relations
- p_1 except p_2 is the set difference between binary relations

Did I mention Tarski already? Of course we had to wait until
XPath2.0 until the three set operations were all allowed everywhere,
but we have them now.

I guess we're talking about Relation Algebra here? I don't know
enough about it to know if this is just an interesting comparison
of if there's some specific result being alluded to.

My hope is that this post will stimulate an educated and informative
discussion of the relative merits of different approaches to
tree traversal, not neglecting to differentiate among the needs
of diverse use cases.

Hey, it could happen.

Marshall

Vadim Tropashko · Jun 14, 2007

3) is interesting. I admit that when I first heard about the
idea I was extremely skeptical. The idea that one would
write a grammar to abstract operating on a tree one had
generated by using a grammar to abstract over a language
seemed perverse and unnecessarily complex at first.
However in practice it's turned out to be the opposite:
it is extremely simple. Tree grammars are a fraction
of the size of the original grammar, possible similarly
to the size of an XPath expression to a DTD.

Do you have a reference?
http://antlr.org/article/1170602723163/treewalkers.html
insists that the "tree grammar" concept is not appeared to be coherent
to begin with...

Marshall · Jun 14, 2007

Do you have a reference?http://antlr.org/article/1170602723163/treewalkers.html
insists that the "tree grammar" concept is not appeared to be coherent
to begin with...

Well, that paper's on Terrence's site, so probably he doesn't
see it as *too* devastating.

But really, I had all those arguments in my head, until I tried it.
It turned out really easy to do.

In something I did recently, a simple expression grammar,
all the precedence, etc, went in the "regular" grammar.
So the AST captured all of that. Then I needed to translate
the AST into custom Java objects. It was ridiculously easy;
the tree grammar was just:

prog: expr*;

expr: <node1 a, b> {new Node1(a, b);}
| <node2> {new Node2();}
| ...
;

Perhaps my next step is getting rid of the Java AST
objects and just using the Antlr AST object.

Anyway, I've been very impressed with ANTLR.

As an aside, another idea of Terrence's was an IDE for
Antrlr, which I also thought was ridiculous when I first heard
about it. But now v3.0 is out and I've used the IDE and
it's as much a win for parsers as it is for source code.
Being able to interactively tweak the grammar and
parse programs with each revision, having a CST and AST
view, being able to single step through the parse, etc.

Marshall

Joe Kesselman · Jun 14, 2007

Jan said:
It retrieves the text node children of the nodes in $dvar.

No, that would be $dvar/text().

Jan Hidders · Jun 14, 2007

No, that would be $dvar/text().

Oops, you are right of course. The function string() concatenates all
the string values of the descendant text nodes. So in this case it
simply retrieves the string value of the text node under the element
node that is in $dvar.

-- Jan Hidders

Jan Hidders · Jun 14, 2007

It seems clear enough, the idea of regarding an XML document
as a syntax tree, and the DTD as a grammar. In which case,
*XML*parsing is then the analog of ... what? Converting an
ascii version of a syntax tree into a binary syntax tree?
Sort of like parsing s-expressions?

Sure, ordered node-labeled trees, S-expressions, XML documents,
recursively nested terms, et cetera, are all more or less the same
thing. Of course the typical use cases are different, which justifies
having diferent transformation / update / query / selection languages
and formalisms.

If that analysis is valid, then I'd be interested to continue to
a comparison with other techniques for accomplishing the
same thing, particularly from the standpoint of expressive
power, but also from the standpoint of convenience and
generality.

Other techniques I can think of are:

1) ad hoc traversal of an object graph in an OO language
2) pattern matching over abstract data types, as introduced
by SML. (The gold standard.)
3) "Tree walking" a la Terrance Parr / Antlr (the upstart.)

Yep, Tree automata, Tree transducers, Attribute grammars, XQuery,
XSLT... all have their typical use cases. Is that what you are
interested in?

2) I would be very interested to hear a comparison with
pattern matching from someone who is familiar with both
techniques. One thing I can see immediately is that XPath
expressions allow one to address only some specific
part of a tree that one is interested in, whereas pattern
matching requires whole-tree analysis. How much of an
issue is this in practice?

I think that question needs more context in order to be meaningful.
The pattern matching mechanism is usually embedded in another
languages (some functional programming language like Caml, XSLT or
XQuery). Depending upon that language it may or may not be a problem
if your pattern matching mechanism lacks certain expressive power.

Btw., if you forget about sequence order, document order and value
comparisons for a moment, the core of XPath is actually a relatively
neat calculus of binary relations:
- p_1/p_2 is the concatenation of binary relations
- p_1[p_2] is the selection of pairs in p_2 whose right-hand side
matches a left-hand side in p_2 (semijoin anyone?)
- p_1 union p_2 is the set union of two binary relations
- p_1 intersect p_2 is the set intersection of two binary relations
- p_1 except p_2 is the set difference between binary relations

Click to expand...

Did I mention Tarski already? Of course we had to wait until
XPath2.0 until the three set operations were all allowed everywhere,
but we have them now.

Click to expand...

I guess we're talking about Relation Algebra here? I don't know
enough about it to know if this is just an interesting comparison
of if there's some specific result being alluded to.

Yep.

<http://staff.science.uva.nl/~marx/talks/2005/screen_icdt.pdf>

Page 12.

-- Jan Hidders

Jan Hidders · Jun 14, 2007

You seem to imply that language theory reduces to trivial string
manipulation.

Trivial? You think that the string manipulations expressed by Turing
Machines are trivial?

I expected you to request clarification why a query is a
language intersection; is there any nonvacuous idea behind this
statement? And insist on demonstating that the reg exp method does
work

I though insisting that it cannot work would accomplish the same. But
your explanation still does not seem complete. Also you are still a
bit vague on what you are claiming. Are you saying that anything for
which one would typically use XPath can be done just as well with
regular expressions assuming that the tree is somehow encoded in a
string? This is clearly not the case for the encoding you gave. Or are
you going to extend that? Perhaps also extend the regular expressions
a little? Or did you just have a particular query in mind? That would
probably not be a very interesting observation. So which is it?

-- Jan Hidders

Boolean search query parsing (hierarchical)	10	Nov 5, 2003
Problem repeating a query within an ADO Transaction	0	Apr 3, 2007
Moore State Change	3	Nov 10, 2008
multiple SQL line query via Perl	1	Oct 3, 2003
[QUIZ] Genetic Programming (#212)	4	Jul 3, 2009
[Chocolate Ketchup Dressing] A working Ruby source code filter	3	Oct 16, 2004
Sencha Touch--Support 2 browsers in just 228K!	64	Jul 16, 2010
For Peer Review	1	Apr 2, 2010

Hierarchical query

Vadim Tropashko

Jan Hidders

Aloha Kakuikanu

Vadim Tropashko

Joe Kesselman

Vadim Tropashko

Jan Hidders

Jan Hidders

Jan Hidders

Vadim Tropashko

Vadim Tropashko

Jan Hidders

Vadim Tropashko

Marshall

Vadim Tropashko

Marshall

Joe Kesselman

Jan Hidders

Jan Hidders

Jan Hidders

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads