S
spacerobots
I have an XML problem that needs solving, and I'd love some help on
tools and/or approach.
Here's the scenario: I want to take an XML document, and basically
merge in a number of sub-trees, which are the same structure, but may
not contain all the same nodes. I want to replace the data in the
complete tree with any data I find in the subtrees. Now the
"complete" tree isn't exactly complete because it can have elements
that are allowed to have any number of repeated nodes, as you'll see
in this example. Here's an example of what I'll be looking at:
I want to take a structure like this (note each layer represents the
same data):
<data>
<layer3>
<layer2>
<layer1>
<A>data A</A>
<B>
<BB>
<row id="1">Jim</row>
<row id="2">Alex</row>
<row id="3">Phil</row>
<row id="4">Rutiger</row>
</BB>
</B>
<D>data D</D>
</layer1>
<B>
<BA>Names</BA>
<BB>
<row id="2">Alexander</row>
</BB>
</B>
<C>data C</C>
<D>data D more info</D>
</layer2>
<C></C>
<D>data D more more info</D>
</layer3>
<master>
<A></A>
<B>
<BA></BA>
<BB>
<!-- some number of 'row' elements -->
</BB>
</B>
<C></C>
<D></D>
</master>
</data>
and transform it into this:
<master>
<A>data A</A>
<B>
<BA>Names</BA>
<BB>
<row id="1">Jim</row>
<row id="2">Alexander</row>
<row id="3">Phil</row>
<row id="4">Rutiger</row>
</BB>
</B>
<C></C>
<D>data D more more info</D>
</master>
So basically any node that is present in layer3 overwrites the data
from the equivalent node that was present in any of the previous
layers. Including empty nodes.
I need a general algorithm for this, because I'll deal with several
different XML documents. They will all follow this same layered
structure though. My idea is to do something like traverse the
"master" tree and at each node stop and look for a match in layer1,
layer2, ... layerN and then replace the data as I go. I need help on
some specifics (can I generate an XPath query for each node as I
traverse the master tree?) Perhaps not the most efficient way to do
it, but I don't think performance will be too much of a concern.
Some context: It's going to be written in java code, and I know there
are APIs out there like dom4j and xalanj that I could use to get this
done. I have advanced XPath skills but have barely ever used XSLT
(not afraid to learn it though).
Thanks in advance,
-Ryan
tools and/or approach.
Here's the scenario: I want to take an XML document, and basically
merge in a number of sub-trees, which are the same structure, but may
not contain all the same nodes. I want to replace the data in the
complete tree with any data I find in the subtrees. Now the
"complete" tree isn't exactly complete because it can have elements
that are allowed to have any number of repeated nodes, as you'll see
in this example. Here's an example of what I'll be looking at:
I want to take a structure like this (note each layer represents the
same data):
<data>
<layer3>
<layer2>
<layer1>
<A>data A</A>
<B>
<BB>
<row id="1">Jim</row>
<row id="2">Alex</row>
<row id="3">Phil</row>
<row id="4">Rutiger</row>
</BB>
</B>
<D>data D</D>
</layer1>
<B>
<BA>Names</BA>
<BB>
<row id="2">Alexander</row>
</BB>
</B>
<C>data C</C>
<D>data D more info</D>
</layer2>
<C></C>
<D>data D more more info</D>
</layer3>
<master>
<A></A>
<B>
<BA></BA>
<BB>
<!-- some number of 'row' elements -->
</BB>
</B>
<C></C>
<D></D>
</master>
</data>
and transform it into this:
<master>
<A>data A</A>
<B>
<BA>Names</BA>
<BB>
<row id="1">Jim</row>
<row id="2">Alexander</row>
<row id="3">Phil</row>
<row id="4">Rutiger</row>
</BB>
</B>
<C></C>
<D>data D more more info</D>
</master>
So basically any node that is present in layer3 overwrites the data
from the equivalent node that was present in any of the previous
layers. Including empty nodes.
I need a general algorithm for this, because I'll deal with several
different XML documents. They will all follow this same layered
structure though. My idea is to do something like traverse the
"master" tree and at each node stop and look for a match in layer1,
layer2, ... layerN and then replace the data as I go. I need help on
some specifics (can I generate an XPath query for each node as I
traverse the master tree?) Perhaps not the most efficient way to do
it, but I don't think performance will be too much of a concern.
Some context: It's going to be written in java code, and I know there
are APIs out there like dom4j and xalanj that I could use to get this
done. I have advanced XPath skills but have barely ever used XSLT
(not afraid to learn it though).
Thanks in advance,
-Ryan