Tree Merging Problem

S

spacerobots

I have an XML problem that needs solving, and I'd love some help on
tools and/or approach.

Here's the scenario: I want to take an XML document, and basically
merge in a number of sub-trees, which are the same structure, but may
not contain all the same nodes. I want to replace the data in the
complete tree with any data I find in the subtrees. Now the
"complete" tree isn't exactly complete because it can have elements
that are allowed to have any number of repeated nodes, as you'll see
in this example. Here's an example of what I'll be looking at:

I want to take a structure like this (note each layer represents the
same data):

<data>
<layer3>
<layer2>

<layer1>
<A>data A</A>
<B>
<BB>
<row id="1">Jim</row>
<row id="2">Alex</row>
<row id="3">Phil</row>
<row id="4">Rutiger</row>
</BB>
</B>
<D>data D</D>
</layer1>

<B>
<BA>Names</BA>
<BB>
<row id="2">Alexander</row>
</BB>
</B>
<C>data C</C>
<D>data D more info</D>
</layer2>

<C></C>
<D>data D more more info</D>
</layer3>

<master>
<A></A>
<B>
<BA></BA>
<BB>
<!-- some number of 'row' elements -->
</BB>
</B>
<C></C>
<D></D>
</master>
</data>

and transform it into this:

<master>
<A>data A</A>
<B>
<BA>Names</BA>
<BB>
<row id="1">Jim</row>
<row id="2">Alexander</row>
<row id="3">Phil</row>
<row id="4">Rutiger</row>
</BB>
</B>
<C></C>
<D>data D more more info</D>
</master>

So basically any node that is present in layer3 overwrites the data
from the equivalent node that was present in any of the previous
layers. Including empty nodes.

I need a general algorithm for this, because I'll deal with several
different XML documents. They will all follow this same layered
structure though. My idea is to do something like traverse the
"master" tree and at each node stop and look for a match in layer1,
layer2, ... layerN and then replace the data as I go. I need help on
some specifics (can I generate an XPath query for each node as I
traverse the master tree?) Perhaps not the most efficient way to do
it, but I don't think performance will be too much of a concern.

Some context: It's going to be written in java code, and I know there
are APIs out there like dom4j and xalanj that I could use to get this
done. I have advanced XPath skills but have barely ever used XSLT
(not afraid to learn it though).

Thanks in advance,
-Ryan
 
P

Pavel Lepin

Here's the scenario: I want to take an XML document, and
basically merge in a number of sub-trees, which are the
same structure, but may
not contain all the same nodes. I want to replace the
data in the
complete tree with any data I find in the subtrees. Now
the "complete" tree isn't exactly complete because it can
have elements that are allowed to have any number of
repeated nodes, as you'll see
in this example. Here's an example of what I'll be
looking at:

I want to take a structure like this (note each layer
represents the same data):
[...]

and transform it into this:

<master>
<A>data A</A>
<B>
<BA>Names</BA>
<BB>
<row id="1">Jim</row>
<row id="2">Alexander</row>
<row id="3">Phil</row>
<row id="4">Rutiger</row>
</BB>
</B>
<C></C>
<D>data D more more info</D>
</master>

So basically any node that is present in layer3 overwrites
the data from the equivalent node that was present in any
of the previous layers. Including empty nodes.

Can't say you're very specific in your requirements, but the
following (kludgy) solution works on your sample document.
Furthering tinkering may be in order to tailor it to your
actual needs.

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="l"
match=
"*[starts-with(local-name(),'layer')]/descendant::*"
use="concat(local-name(),'[id=',@id,']')"/>
<xsl:template match="data">
<xsl:apply-templates select="master"/>
</xsl:template>
<xsl:template match="master">
<xsl:copy>
<xsl:apply-templates select="*" mode="patch"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[*]" mode="patch">
<xsl:copy>
<xsl:apply-templates select="*" mode="patch"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*" mode="patch">
<xsl:copy>
<xsl:call-template name="patch"/>
</xsl:copy>
</xsl:template>
<xsl:template name="patch">
<xsl:variable name="l"
select=
"
key('l',concat(local-name(),'[id=',@id,']'))
[last()]
"/>
<xsl:choose>
<xsl:when test="$l">
<xsl:value-of select="$l"/>
</xsl:when>
<xsl:eek:therwise>
<xsl:apply-templates select="text()"/>
</xsl:eek:therwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
 
S

spacerobots

Here's the scenario: I want to take an XML document, and
basically merge in a number of sub-trees, which are the
same structure, but may
not contain all the same nodes. I want to replace the
data in the
complete tree with any data I find in the subtrees. Now
the "complete" tree isn't exactly complete because it can
have elements that are allowed to have any number of
repeated nodes, as you'll see
in this example. Here's an example of what I'll be
looking at:
I want to take a structure like this (note each layer
represents the same data):
[...]



and transform it into this:
<master>
<A>data A</A>
<B>
<BA>Names</BA>
<BB>
<row id="1">Jim</row>
<row id="2">Alexander</row>
<row id="3">Phil</row>
<row id="4">Rutiger</row>
</BB>
</B>
<C></C>
<D>data D more more info</D>
</master>
So basically any node that is present in layer3 overwrites
the data from the equivalent node that was present in any
of the previous layers. Including empty nodes.

Can't say you're very specific in your requirements, but the
following (kludgy) solution works on your sample document.
Furthering tinkering may be in order to tailor it to your
actual needs.

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="l"
match=
"*[starts-with(local-name(),'layer')]/descendant::*"
use="concat(local-name(),'[id=',@id,']')"/>
<xsl:template match="data">
<xsl:apply-templates select="master"/>
</xsl:template>
<xsl:template match="master">
<xsl:copy>
<xsl:apply-templates select="*" mode="patch"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[*]" mode="patch">
<xsl:copy>
<xsl:apply-templates select="*" mode="patch"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*" mode="patch">
<xsl:copy>
<xsl:call-template name="patch"/>
</xsl:copy>
</xsl:template>
<xsl:template name="patch">
<xsl:variable name="l"
select=
"
key('l',concat(local-name(),'[id=',@id,']'))
[last()]
"/>
<xsl:choose>
<xsl:when test="$l">
<xsl:value-of select="$l"/>
</xsl:when>
<xsl:eek:therwise>
<xsl:apply-templates select="text()"/>
</xsl:eek:therwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

Pavel, thanks for your example. I'm having trouble deciphering its
inner workings, and I can't seem to get it to work (I get: invalid
pattern: unexpected token - "descendant::*" on line 5 of the XSL).
Perhaps you can enlighten me on how to fix this! I don't see anything
wrong with the xpath, so I'm a bit confused why it throws that error.

To take another approach on this, perhaps somebody could explain to me
how to approach this smaller problem: assume I have two XML trees.
They are both the same structure, but one of the trees is the master
copy, containing all the nodes, while the other contains only some
subset of the master tree. The structure is the same, so if any node
is present in the subset tree, all that node's parents are present as
well.

How can I walk through the subset tree and copy all the data out of
each node in the subset tree into the master tree?

Thanks for your help!
 
J

Joe Kesselman

How can I walk through the subset tree and copy all the data out of
each node in the subset tree into the master tree?

I'd do that via Java (or other) programming rather than via a
stylesheet, treating it as a classic merge problem. (Walk the two
documents in parallel, outputting their union.)
 
P

Pavel Lepin

[solution, of sorts]
I'm having trouble deciphering its inner workings, and I
can't seem to get it to work (I get: invalid pattern:
unexpected token - "descendant::*" on line 5 of the XSL).

Interesting. My primary XSLT processor seems to be a bit lax
about enforcing some restrictions:

pavel@debian:~/dev/xslt$ xsltproc layers.xsl layers.xml
<?xml version="1.0"?>
<master><A>data
A</A><B><BA>Names</BA><BB><row>Jim</row><row>Alexander</row>
<row>Phil</row><row>Rutiger</row></BB></B><C/><D>data D more
more info</D></master>
pavel@debian:~/dev/xslt$ xalan -in layers.xml -xsl
layers.xsl

XSLException Type is: XPathParserException
Message is: Only 'child' and 'attribute' axes are allowed in
match patterns.
pattern
= '*[starts-with(local-name(),'layer')]/descendant::*'
Remaining tokens are: ( 'descendant' '::' '*')
(layers.xsl, line 6, column 48)
pavel@debian:~/dev/xslt$ saxon layers.xml layers.xsl
Warning: at xsl:stylesheet on line 2 of
file:/var/www/dev/xslt/layers.xsl:
Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
Error at xsl:key on line 6 of
file:/var/www/dev/xslt/layers.xsl:
XTSE0340: XSLT Pattern syntax error at char 37 on line 6
in {...name(),'layer')]/descendant...}:
Axis in pattern must be child or attribute
Failed to compile stylesheet. 1 error detected.
pavel@debian:~/dev/xslt$

(Edited to fit into 78 chars.)

Which XSLT processor are you using? Anyway...
Perhaps you can enlighten me on how to fix this! I don't
see anything wrong with the xpath, so I'm a bit confused
why it throws that error.

....that XPath expression is easy enough to rewrite:

<xsl:key name="l"
match=
"*[ancestor::*[starts-with(local-name(),'layer')]]"
use="concat(local-name(),'[id=',@id,']')"/>

Works for me in with libxslt, xalan-c++ and saxon-8B.
To take another approach on this, perhaps somebody could
explain to me how to approach this smaller problem: assume
I have two XML trees. They are both the same structure,
but one of the trees is the master copy, containing all
the nodes, while the other contains only some
subset of the master tree. The structure is the same, so
if any node is present in the subset tree, all that node's
parents are present as well.

The transformation given explains precisely that. If it's
too hard to understand, I would strongly recommend trying
to grok the inner workings of XSLT on simpler examples
(small document, identity transformation and a simple
exclusion template or two; small document and a simple key:
try getting the nodesets using the key() function; some
simple grouping problems). You're not going to get far with
XSLT unless you understand nodesets and template-based
processing on gut level.

XSLT FAQ has a lot of clever code for you to chew on, and
IBM maintains a nice collection of articles on XSLT here:

http://www-128.ibm.com/developerworks/xml
 
R

Ryan Nordman

[solution, of sorts]
I'm having trouble deciphering its inner workings, and I
can't seem to get it to work (I get: invalid pattern:
unexpected token - "descendant::*" on line 5 of the XSL).

Interesting. My primary XSLT processor seems to be a bit lax
about enforcing some restrictions:
[snip]
Which XSLT processor are you using? Anyway...

I'm using the built in processor that came with Altova XML Spy. Glad
to get that cleared up, I figured it might be something like that but
wasn't able to find the right resource to adequately explain the
restrictions on the match attribute there.
Perhaps you can enlighten me on how to fix this! I don't
see anything wrong with the xpath, so I'm a bit confused
why it throws that error.

...that XPath expression is easy enough to rewrite:

<xsl:key name="l"
match=
"*[ancestor::*[starts-with(local-name(),'layer')]]"
use="concat(local-name(),'[id=',@id,']')"/>

Works for me in with libxslt, xalan-c++ and saxon-8B.

This works in XMLSpy as well now. I don't quite understand what this
key is trying to accomplish, maybe you can help me here. I'm pretty
confident I understand how xsl:key works. So here we're matching all
the nodes that are an ancestor of one of the <layerN> nodes where N is
some string. Then the key we want to use here is the element's name
concatenated with [id=N] where N is that element's id attribute. But
in my XML here the only elements that have an id attribute are the row
nodes. At which point I'm confused why you'd want to index them this
way. I don't know whether you answer this out of context with the
rest of the transform, which I'm working on understanding (thank you
again for posting this, this is great food for thought for me).
The transformation given explains precisely that. If it's
too hard to understand, I would strongly recommend trying
to grok the inner workings of XSLT on simpler examples
(small document, identity transformation and a simple
exclusion template or two; small document and a simple key:
try getting the nodesets using the key() function; some
simple grouping problems). You're not going to get far with
XSLT unless you understand nodesets and template-based
processing on gut level.

Good advice, I'll try to tackle some smaller pieces. I don't have the
gut level understanding going for me yet, but I'll get there.
XSLT FAQ has a lot of clever code for you to chew on, and
IBM maintains a nice collection of articles on XSLT here:

http://www-128.ibm.com/developerworks/xml

Cool, I hadn't found either of these resources.

Thanks again Pavel,
-Ryan
 
P

Pavel Lepin

[solution to a tree-merging problem]
I'm having trouble deciphering its inner workings, and
I can't seem to get it to work (I get: invalid pattern:
unexpected token - "descendant::*" on line 5 of the
XSL).

Interesting. My primary XSLT processor seems to be a bit
lax about enforcing some restrictions:
[snip]
Which XSLT processor are you using? Anyway...

I'm using the built in processor that came with Altova XML
Spy. Glad to get that cleared up, I figured it might be
something like that but wasn't able to find the right
resource to adequately explain the restrictions on the
match attribute there.

The Spec is the ultimate reference. Of course, it's full of
maddening legalese, and yesterday I almost thought all of
my XSLT processors are essentially non-conformant, until
I've noticed that while you may only use child:: and
attribute:: axes in the pattern location path, you are
still allowed to use // for some reason, even though it's
semantically equivalent to /descendant-or-self::node()/ (as
defined by the XPath1 spec). Those damned lawyers. Still,
it's an invaluable skill to be able to wade through all
that fine-pointery, yes-buttery and hey-gotchary that is
The Spec.
<xsl:key name="l"
match=
"*[ancestor::*[starts-with(local-name(),'layer')]]"
use="concat(local-name(),'[id=',@id,']')"/>

Works for me in with libxslt, xalan-c++ and saxon-8B.

I'm pretty confident I understand how xsl:key works. So
here we're matching all the nodes that are an ancestor of
one of the <layerN> nodes where N is some string.

No, we are matching all the nodes that HAVE a <layer[N]>
element node as an ancestor. That is, all the element
descendants of your layers.
Then the key we want to use here is the element's name
concatenated with [id=N] where N is that element's id
attribute. But in my XML here the only elements that have
an id attribute are the row nodes.

Sure. So it doesn't matter for all of your other elements.
At which point I'm confused why you'd want to index them
this way.

It seemed like a good idea at the time. Heck, it still looks
workable enough to me.
I don't know whether you answer this out of context
with the rest of the transform, which I'm working on
understanding (thank you again for posting this, this is
great food for thought for me).

Look at the named template called 'patch' in my original
transformation. What it really does is retrieves the
content of the version of the currently processed node that
should go into the result tree.

  <xsl:variable name="l"
    select=
    "
      key('l',concat(local-name(),'[id=',@id,']'))
        [last()]
    "/>

This little bit of mojo is what achieves that, using the key
that got you wondering. Think about it a little, and if
there's still something you still don't understand, come
back with a more specific question.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top