What happens if you throw such a file into a normal XML parser?
I would imagine that if you're using StAX, you can stop parsing when you
hit the end of a root element, and then either carry on to the next one,
or perhaps wrap a fresh parser round the underlying input stream.
I have no idea what a SAX parser would do; i don't know how much
well-formedness checking they do.
One thing that works with stream parsing is to fool the parser with a
fake starting document element tag...like <log>.

Given that, SAX or
StAX will parse forever, or until end of file/stream anyway.
If you didn't fake out the parser it would choke with a well-formedness
error after the first "record".
You can get quite innovative (read hackish) by doing stuff like:
final ByteArrayInputStream bsBegin =
new ByteArrayInputStream("<wrapper>".getBytes());
URL fileUrl = new URL(...);
final InputStream in = fileUrl.openStream();
final ByteArrayInputStream bsEnd =
new ByteArrayInputStream("</wrapper>".getBytes());
SequenceInputStream sis = new SequenceInputStream(new Enumeration() {
int index = 0;
InputStream streams[] = new InputStream[] {bsBegin, in, bsEnd};
@Override
public boolean hasMoreElements() {
return index < streams.length;
}
@Override
public Object nextElement() {
return streams[index++];
}
});
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(sis);
**********
The point here being is that the "real" input wasn't well-formed at all,
but by the time the parser sees it, it's fine.
Any of your ideas would be cool, too. After all we are just trying to
get a job done.
What would a DOM parser do?
It fails when the underlying stream parsing fails, I would think.
You might need to interpolate a layer between the parser and the
FileInputStream to notionally split the file into substreams, one per
document. That would require some sort of framing format for the file, i
think.
Well, it is physical (quoth the spec: "Each XML document has both a
logical and a physical structure."), but it's a physical thing distinct
from a file, and doesn't have to map directly on to it. The spec says "A
data object is an XML document if ..." and "A textual object is a
well-formed XML document if ...", but never gets any more specific than
that.
tom
Completely different approach, and it might even be the most "valid"
approach, would be to consider each log "record" to be an XML fragment.
But I wouldn't actually contemplate doing things that way, I'd
pre-process somehow, using the ideas we've already brought out.
AHS