Tree splitting/merging

W

William Ahern

I'm looking for resources on splitting and merging XML trees. Specifically,
on methods to pare large XML documents into smaller documents which can be
merged later.

Off of the top of my head, I can envision unions of node sets, and unions of
node text. But I know there's much more to the subject than that, if not
more alternatives than greater technical detail.

TIA,

Bill
 
S

sylvain.loiseau

I'm looking for resources on splitting and merging XML trees.
Specifically,
on methods to pare large XML documents into smaller documents which can be
merged later.

I have something for a problem (perhaps) close to yours: I need to perform
XSLT transformation on very large document which doesn't fit in memory. I
use a SAX parser with three XMLFilter (concretely, sub-classes of
org.xml.sax.helpers.XMLFilterImpl). The first class "split" the stream (i.e.
it throw a "start document" and a "end document" events) when it encouters a
specific start and endElement. So the next filter receive several (smaller)
documents one at once. This second filter is a TransformerHandler which
perform the transformation. Then it pass the event to a last filter, a
"merger", who discard the "start" and "endDocument" event except the very
first and the very last one.
I was inspired by a Perl module by Barrie Slaymaker.
(inccidentaly, I noticed that there is nothing as convenient for Java that
the XML::SAX::pipeline Perl module)

In fact I was coming on this list for a question close to this one: it's in
a new thread...
Off of the top of my head, I can envision unions of node sets, and unions of
node text. But I know there's much more to the subject than that, if not
more alternatives than greater technical detail.

Which level of well-formedness have your merging problem, i.e. do you want
only add node to existing nodes in a DOM mode (you just need standard method
of the Node interface), or do you want to insert mixed content checking for
well-formedness, tag nesting, etc?
 
W

William Ahern

sylvain.loiseau said:
I have something for a problem (perhaps) close to yours: I need to perform
XSLT transformation on very large document which doesn't fit in memory. I
use a SAX parser with three XMLFilter (concretely, sub-classes of
org.xml.sax.helpers.XMLFilterImpl). The first class "split" the stream (i.e.
it throw a "start document" and a "end document" events) when it encouters a
specific start and endElement. So the next filter receive several (smaller)
documents one at once. This second filter is a TransformerHandler which
perform the transformation. Then it pass the event to a last filter, a
"merger", who discard the "start" and "endDocument" event except the very
first and the very last one.
I was inspired by a Perl module by Barrie Slaymaker.
(inccidentaly, I noticed that there is nothing as convenient for Java that
the XML::SAX::pipeline Perl module)

Right after posting I tripped over the XPipe project (http://xpipe.sf.net/).
XPipe associates this w/ the scatter/gather pattern, and they seem to have
put a lot of thought into the issues. Specifically, they elaborate on a
notion of a "fulcra", or the node-depth I suppose you could call it, that a
document can be split on. Probably you're already thought this through, but
maybe you can find more info on that site. They have code and list
discussions you can wade through.

- Bill
 
S

sylvain.loiseau

Thanks, it looks very interesting.

Sylvain

William Ahern said:
Right after posting I tripped over the XPipe project (http://xpipe.sf.net/).
XPipe associates this w/ the scatter/gather pattern, and they seem to have
put a lot of thought into the issues. Specifically, they elaborate on a
notion of a "fulcra", or the node-depth I suppose you could call it, that a
document can be split on. Probably you're already thought this through, but
maybe you can find more info on that site. They have code and list
discussions you can wade through.

- Bill
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top