If you use XSLT to process an XML file, it has to keep a complete
representation of the resulting XML document into memory, since an
XSLT transformation can include XPath expressions, and XPath can in
principle access anything in the dociument. This is true even if the
input to XSLT is a SAXSource.
Weeeellll, kinda. Some XSLTs will require the whole document to be held
in memory. But it is possible to process some XSLTs in a streaming or
streaming-ish manner (where elements are held in memory, but only a
subset at a time). There's nothing stopping an XSLT processor compiling
such XSLTs into a form which does just that. Whether any actually do, i
don't know.
A while ago, i read about a streaming XPath processor. It couldn't
handle all XPaths in a streaming manner, so it had to fall back to
searching an in-memory tree where that was the case, but many common
XPaths can be handled streamingly. For instance, something like:
//order[@id='99']/order-item
Could be. You run the parse, and maintain the current stack of elements
in memory - all the elements enclosing the current parse point, IYSWIM.
Then you just look at the top of the stack at every point to see if it's
an order-item, then if it is, look back to see if the enclosing order
has an id of 99. You could probably do it more efficiently than that,
but that's one way you could do it. Something like this:
//order[customer[@id='99']]/order-item
Is more challenging, and requires a more sophisticated evaluation
strategy - you might need to read in a whole order, search it for
matching order-items, then throw it away and move on to the next one.
Or, if you knew from the DTD that the customer element had to come
before any order-items in an order, you could build a state machine that
could decide that it was inside a matching order, and then report all
order-items.
Anyway, all speculation, but it's interesting stuff!