Question about union operator (|)

A

Art Spasky

Hello

Can somebody help in my question about Xpath?

When I union node-sets by operator "|" in which order will be nodes :

- in order they meet in xml document?
- in order the node-sets evaluates in xpath expresssion?
- the behaviour is not defined?


Thank you in advance
 
B

Bjoern Hoehrmann

* Art Spasky wrote in comp.text.xml:
Can somebody help in my question about Xpath?

When I union node-sets by operator "|" in which order will be nodes :

- in order they meet in xml document?
- in order the node-sets evaluates in xpath expresssion?
- the behaviour is not defined?

In the XPath data model, node sets are unordered (otherwise you would
call them sequences of distinct nodes or similar). So none of the above.
Searching for "order" in http://www.w3.org/TR/xpath might answer what-
ever question you are really trying to resolve.
 
R

Richard Tobin

When I union node-sets by operator "|" in which order will be nodes :

- in order they meet in xml document?
- in order the node-sets evaluates in xpath expresssion?
- the behaviour is not defined?
[/QUOTE]
In the XPath data model, node sets are unordered (otherwise you would
call them sequences of distinct nodes or similar). So none of the above.

However, the operations which appear to work on ordered node sets
(such as position()) will work as if the set was in document order. So

($a|$b)[3]

will find the third node of the union in document order.

-- Richard
 
P

Pavel Lepin

Bjoern Hoehrmann said:
* Art Spasky wrote in comp.text.xml:

In the XPath data model, node sets are unordered
(otherwise you would call them sequences of distinct nodes
or similar). So none of the above. Searching for "order"
in http://www.w3.org/TR/xpath might answer what- ever
question you are really trying to resolve.

Well, I must admit I'm lost. Doesn't the notion of context
position define an ordering on any node-set? And if there
aren't any guarantees about the meaning of context position
for resulting node-sets, doesn't that mean half the
transformations/XQueries/XPath expressions in the world are
going to break the moment a conforming implementation that
doesn't use the document order as ordering for node-sets it
produces appears?

Can perhaps someone clarify what does all of that mean:

a). That's the way it is, for a good reason I missed.
b). That's the way it is, for no good reason.
c). There is some sort of guarantee in XPath spec or
related documents that I failed to find.
 
R

Richard Tobin

Well, I must admit I'm lost. Doesn't the notion of context
position define an ordering on any node-set? And if there
aren't any guarantees about the meaning of context position
for resulting node-sets

There are. Predicates can be applied in steps, in which case the
context position comes from the direction of the step's axis, and in
filter expressions, in which case the direction of the child axis is
always used (i.e. forwards).

An expression like ($a|$b)[3] is a filter expression, so the predicate
"filters the node-set with respect to the child axis" (XPath 1.0
section 3.3) - that is, it takes the set in document order.

-- Richard
 
A

Art Spasky

Well, I must admit I'm lost. Doesn't the notion of context
position define an ordering on any node-set? And if there
aren't any guarantees about the meaning of context position
for resulting node-sets

There are. Predicates can be applied in steps, in which case the
context position comes from the direction of the step's axis, and in
filter expressions, in which case the direction of the child axis is
always used (i.e. forwards).

An expression like ($a|$b)[3] is a filter expression, so the predicate
"filters the node-set with respect to the child axis" (XPath 1.0
section 3.3) - that is, it takes the set in document order.

-- Richard

I understood about predicates. But what about that

source xslt

<?xml version='1.0' ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/
Transform">
<xsl:template match="/root/a[1]" priority="10">
<doc>
<xsl:for-each select="following::*|descendant::*|ancestor::*|
preceding::*|self::*">
<res><xsl:value-of select="name(.)"/></res>
</xsl:for-each>
</doc>
</xsl:template>

</xsl:stylesheet>

source xml:

<?xml version="1.0"?>
<root>
<a>
<b></b>>
<c></c>
</a>
<d>
</d>
</root>

in "select" attribute of <xsl:for-each> there is no any predicates,
but nodes in the node-set formed by xpath expression are in order they
occur in xml document.
 
B

Bjoern Hoehrmann

* Art Spasky wrote in comp.text.xml:
<xsl:for-each select="following::*|descendant::*|ancestor::*...">

The expression evaluates to a node set. A node set is an unordered
collection of distinct nodes. The order in which the nodes are pro-
cessed is imposed by the semantics of xsl:for-each, not by some in-
trinsic ordering of the node set. http://www.w3.org/TR/xslt notes:

The nodes are processed in document order, unless a
sorting specification is present (see [10 Sorting]).

The perceived ordering is always external to the node set.
 
P

Pavel Lepin

Richard Tobin said:
There are. Predicates can be applied in steps, in which
case the context position comes from the direction of the
step's axis, and in filter expressions, in which case the
direction of the child axis is always used (i.e.
forwards).

Indeed, the workings of context position in XPath itself
seem to be well-defined, precisely because either a forward
or a reverse axis always applies to its evaluation. But
XPath is often used as a DSL, and evaluating an XPath
expression yields a node-set, which is, by definition, just
an unordered collection of nodes (and no axis applies to
evaluation of context position in this case). That was what
I was referring to. So, for example, the result of:

<xsl:for-each select="preceding::*">
<xsl:copy>
<xsl:attribute name="pos">
<xsl:value-of select="position()"/>
</xsl:attribute>
</xsl:copy>
</xsl:for-each>

....(or its equivalent in any other language using XPath API)
is not just a little surprising, but actually seems
unspecified to me. And that's the question I was asking--is
this behaviour really unspecified? If it is, is there a
good reason for that? If not, *where* is it specified?

So far it looks to me that the spec defines the node-sets as
unordered, but on the other hand implies (which alone is
bad enough) that the information about the document order
of the nodes in a node-set should be retained somehow for
purposes of determining the context position (and if that
isn't an ordering, I don't know what is). IANALL, so
perhaps I'm still missing something.
 
R

Richard Tobin

Indeed, the workings of context position in XPath itself
seem to be well-defined, precisely because either a forward
or a reverse axis always applies to its evaluation. But
XPath is often used as a DSL, and evaluating an XPath
expression yields a node-set, which is, by definition, just
an unordered collection of nodes (and no axis applies to
evaluation of context position in this case). That was what
I was referring to. So, for example, the result of:

<xsl:for-each select="preceding::*">
<xsl:copy>
<xsl:attribute name="pos">
<xsl:value-of select="position()"/>
</xsl:attribute>
</xsl:copy>
</xsl:for-each>

...(or its equivalent in any other language using XPath API)
is not just a little surprising, but actually seems
unspecified to me.

When you embed XPath in other languages, you (the language designer)
have to specify the context for evaluating XPath expressions. So in
this case, it's XSLT that has to specify it. XSLT 1.0 (section 4)
specifies that the context for top-level expressions comes from
current node list (which is an XSLT, not XPath, concept). For
xsl:for-each, the current node list is specified to be in document
order unless <xsl:sort> is used. So in your example, the preceding
siblings are processed in document order and their position matches
document order.
So far it looks to me that the spec defines the node-sets as
unordered, but on the other hand implies (which alone is
bad enough) that the information about the document order
of the nodes in a node-set should be retained somehow for
purposes of determining the context position (and if that
isn't an ordering, I don't know what is).

It's certainly a bit odd. XPath has unordered node sets, but also a
context that implies an ordering. In some circumstances that context
follows the node set around (e.g. between successive predicates of a
step) and in some it doesn't (e.g. when a node set is used in a filter
expression). When the node set is passed up to XSLT the context is
lost, and XSLT imposes a new context when it uses the node set in a
new expression. Another language could choose to preserve the context
between expressions, but that would require a co-operative XPath
implementation.

-- Richard
 
A

Art Spasky

Indeed, the workings of context position in XPath itself
seem to be well-defined, precisely because either a forward
or a reverse axis always applies to its evaluation. But
XPath is often used as a DSL, and evaluating an XPath
expression yields a node-set, which is, by definition, just
an unordered collection of nodes (and no axis applies to
evaluation of context position in this case). That was what
I was referring to. So, for example, the result of:

<xsl:for-each select="preceding::*">
<xsl:copy>
<xsl:attribute name="pos">
<xsl:value-of select="position()"/>
</xsl:attribute>
</xsl:copy>
</xsl:for-each>

...(or its equivalent in any other language using XPath API)
is not just a little surprising, but actually seems
unspecified to me. And that's the question I was asking--is
this behaviour really unspecified? If it is, is there a
good reason for that? If not, *where* is it specified?

So far it looks to me that the spec defines the node-sets as
unordered, but on the other hand implies (which alone is
bad enough) that the information about the document order
of the nodes in a node-set should be retained somehow for
purposes of determining the context position (and if that
isn't an ordering, I don't know what is). IANALL, so
perhaps I'm still missing something.

As I undertood:
XPath evaluates expressions to unordered node-sets.
The direction of axis in XPath is significant only when location step
contains predicates.
In other cases the order of nodes in a node-set defines by external to
XPath application (for example XSLT).

Am I right?
 
D

David Carlisle

Art said:
As I undertood:
XPath evaluates expressions to unordered node-sets.
The direction of axis in XPath is significant only when location step
contains predicates.
In other cases the order of nodes in a node-set defines by external to
XPath application (for example XSLT).

Am I right?

Not really, the ordering is intrinsic to the Xpath 1 data model, not
something added by XSLT. there is nothing strange about having sets
(that is, unordered sequeces) over a data type that is ordered.
Consider sets of integers for example, {1,2,3} and {1,3,2,1,2}
are the same set of three items, as sets are unorderd, however integers
are an ordered type and one can process those items in (say) ascending
order. Sets of nodes are no different. Node sets are unordered but there
is an ordering on nodes (irrespective of which set they are in) which
can be used in some (most) processing.

XPath/XSLt1 is always careful to distinguish a "node set" fom the
"current node list" the current node list is an ordered list and is what
is used to evaluate position() etc. In XPath1 though, the current node
list is a transient object that can not be returned as a result of an
expression.

All of this changes in XPath2 of course which has no node sets.
(Ordered) sequences replace both "node sets" and "current node lists".
This is useful for some things (for example you can save the result of
an xsl:sort as a (sorted) sequence) but the model is a lot less elegant
than XPath1, rather than the semantics of /, | and other set based
operators falling out naturally as a result of the set based semantics,
each of the "set" operators in XPath2 has to (on an operator-by-operator
basis) specify the sorting and removal of duplicates required to emulate
set semantics using an ordered sequence.

David
 
R

Richard Tobin

As I undertood:
XPath evaluates expressions to unordered node-sets.
The direction of axis in XPath is significant only when location step
contains predicates.
In other cases the order of nodes in a node-set defines by external to
XPath application (for example XSLT).
[/QUOTE]
Not really, the ordering is intrinsic to the Xpath 1 data model, not
something added by XSLT.

I think you're mistaken here. In XPath 1 itself, the only place that
the ordering is an issue is when a predicate is used, and XPath
expresses that by saying that predicates filter a node set with
respect to an axis.
XPath/XSLt1 is always careful to distinguish a "node set" fom the
"current node list" the current node list is an ordered list and is what
is used to evaluate position() etc. In XPath1 though, the current node
list is a transient object that can not be returned as a result of an
expression.

XPath 1 doesn't mention current node lists. They are introduced in
XSLT 1. XPath 1 has only unordered node sets and filtering with
respect to an axis.

Of course this is just a matter of exposition: the same XPath 1
language could be defined in terms of node lists rather than sets; the
lists generated by a reverse axis could be in reverse document order;
and filter expressions would have to be defined to put their left
operand into document order.

-- Richard
 
D

David Carlisle

Not really, the ordering is intrinsic to the Xpath 1 data model, not
something added by XSLT.

I think you're mistaken here. In XPath 1 itself, the only place that
the ordering is an issue is when a predicate is used,[/QUOTE]

ah but I didn't say it makes a difference, only that it (document order)
is intrinsic to the xpath data model. It's not added by xslt. I was
wrong to say xpath uses the term current node list though, that
terminology isn't used in xpath as you say. (it could have been but the
effect of position() is described directly without giving a name to the
"node set in document or reverse document order"
and XPath
expresses that by saying that predicates filter a node set with
respect to an axis.

yes thanks for the correction.
XPath 1 doesn't mention current node lists. They are introduced in
XSLT 1. XPath 1 has only unordered node sets and filtering with
respect to an axis.

True, but the point I was trying to make (not that well:) was that
these are sets over an ordered domain.
Of course this is just a matter of exposition: the same XPath 1
language could be defined in terms of node lists rather than sets; the
lists generated by a reverse axis could be in reverse document order;
and filter expressions would have to be defined to put their left
operand into document order.

-- Richard

which is closer to the xpath2 way. Not just filter expressions, but also
/ for example need to invoke re-ordering (or removal of duplicates) as
would |.

David
 
R

Richard Tobin

David Carlisle said:
which is closer to the xpath2 way. Not just filter expressions, but also
/ for example need to invoke re-ordering (or removal of duplicates) as
would |.

In pure XPath 1, the only way to detect the ordering would be with a
predicate, and predicates are only used in two places: steps and
filter expressions. In steps, node lists would retain their order, so
the only place you would have to re-order them is when evaluating a
filter expression.

So in preceding-sibling::*[1] the list would not get re-ordered, but
in (preceding-sibling::*)[1] it would, and in (a|b)[1] it would not
have to be re-ordered before the union, but it would probably be much
more convenient to do so.

I think the difference between path[1] and (path)[1] is one of the
most counter-intuitive bits of XPath.

-- Richard
 
A

Art Spasky

David Carlisle said:
which is closer to the xpath2 way. Not just filter expressions, but also
/ for example need to invoke re-ordering (or removal of duplicates) as
would |.

In pure XPath 1, the only way to detect the ordering would be with a
predicate, and predicates are only used in two places: steps and
filter expressions. In steps, node lists would retain their order, so
the only place you would have to re-order them is when evaluating a
filter expression.

So in preceding-sibling::*[1] the list would not get re-ordered, but
in (preceding-sibling::*)[1] it would, and in (a|b)[1] it would not
have to be re-ordered before the union, but it would probably be much
more convenient to do so.

I think the difference between path[1] and (path)[1] is one of the
most counter-intuitive bits of XPath.

-- Richard

I am sorry, but during this long discussion I have not received ground
and clear answer for my question

As I undertood:
1 XPath evaluates expressions to unordered node-sets.
2 The direction of axis in XPath is significant only when location
step
contains predicates.
3 In other cases the order of nodes in a node-set defines by external
to
XPath application (for example XSLT).

Am I right?
 
J

Joe Kesselman

Art said:
1 XPath evaluates expressions to unordered node-sets.
2 The direction of axis in XPath is significant only when location
step
contains predicates.
3 In other cases the order of nodes in a node-set defines by external
to
XPath application (for example XSLT).

Confirmation: Search the XPath spec for "order" and note where it does
and doesn't appear.

I believe XPath's result order was deliberately left somewhat fuzzy in
order to permit implementation to produce results in whatever order was
most efficient, which may vary depending on their underlying data model.
The caller's needs may affect that choice, when known -- if you're
doing an existence test order doesn't matter, whereas if you're
processing the nodes in doc order (as is usually the case in XSLT) it
may be desirable to discover them that way rather than sorting them later.

By the way, if you're implementing XPath, watch out for the definition
of //. (I'm still trying to find someone who can explain to me why this
operation was explicitly defined as a top-down scan rather than as a
traditional postorder tree walk. Both are perfectly reasonable
operations, but I would expect the latter to be more commonly used and
hence a better meaning for the shorthand // operator. And having to
constantly remind folks that it *isn't* just document-ordered
descendant:: has been an ongoing nuisance.)
 
R

Richard Tobin

Joe Kesselman said:
By the way, if you're implementing XPath, watch out for the definition
of //. (I'm still trying to find someone who can explain to me why this
operation was explicitly defined as a top-down scan rather than as a
traditional postorder tree walk. Both are perfectly reasonable
operations, but I would expect the latter to be more commonly used and
hence a better meaning for the shorthand // operator. And having to
constantly remind folks that it *isn't* just document-ordered
descendant:: has been an ongoing nuisance.)

I'm not sure I understand your description, but if you mean "why is it
defined as descendant-or-self::node()/ rather than descendant::", it's
so that expressions like //p[1] mean "all paragraphs that are first
children" rather than "the first paragraph". I believe this was
considered more common in formatting, and practical to implement as a
match pattern (predicates in match patterns can only occur after
child:: or attribute:: steps, so position() only requires examination
of siblings, rather than the whole tree).

-- Richard
 
J

Joe Kesselman

Richard said:
I'm not sure I understand your description, but if you mean "why is it
defined as descendant-or-self::node()/ rather than descendant::", it's
so that expressions like //p[1] mean "all paragraphs that are first
children" rather than "the first paragraph".

I presume you mean "all paragraphs that are the first paragraph".

I can see where that argument might have been advanced. But all our
experience with the rest of the XPath expression language leads us to
expect it to mean the latter. Convenience needs to be balanced against
principle of least surprise, and I believe the d-or-s::node() expansion
of // violates that principle.

If they wanted to provide a shorthand for this purpose, fine, but they
should also have provided a shorthand for descendant, to help keep the
two from getting confused.

Too late to fix now; it is what it is. But I still put it in the
category of "warts to be fixed someday."
 
R

Richard Tobin

I'm not sure I understand your description, but if you mean "why is it
defined as descendant-or-self::node()/ rather than descendant::", it's
so that expressions like //p[1] mean "all paragraphs that are first
children" rather than "the first paragraph".
[/QUOTE]
I presume you mean "all paragraphs that are the first paragraph".

Yes.

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,045
Latest member
DRCM

Latest Threads

Top