Seashor said:
Thanks for advising. I'm thinking of doing it via SAX.
Although it's a stream process method, I'll make some file index ,
hope it can do well
File index? If you mean a numeric offset of character positions into the
file, that could make your solution much more complex.
Just chain together polymorphic implementations of tag Handlers that are
invoked on each tag entry. Have each one hold a reference to its
enclosing-tag handler so you can pop it back into "currentHandler" on the tag
exit.
XML is a strange bedfellow with file offsets. It's far, far better to stay
within XML semantics when doing XML processing.
Just to hint at the SAX way, which nowadays is a bit old-fashioned in favor of
StAX and things like the XMLStreadReader, you could use a ContentHandler for
each tag:
<foo>
<person>
<name>John Doe</name>
</person>
</foo>
You would declare an abstract FooHandler class that implements ContentHandler,
and has child classes for each tag, "foo", "person", "name", etc.
public abstract class AbstractFooHandler extends DefaultHandler
{
public static final class Context
{
XMLReader parser;
}
private Context context;
public final Context getContext()
{
return context;
}
public final void setContext( Context ctx )
{
this.context = ctx;
}
private AbstractFooHandler encloser;
protected final AbstractFooHandler getEncloser()
{
return encloser;
}
protected final void setEncloser( AbstractFooHandler fh )
{
this.encloser = fh;
}
}
public class FooParser
{
public static void main( String [] args )
{
XMLReader parser = XMLReaderFactory.createXMLReader();
InputSource is = createInputSource( args ); // however you do it
AbstractFooHandler.Context ctx = new AbstractFooHandler.Context();
ctx.parser = parser;
AbstractFooHandler fh = new FooHandler();
fh.setContext( ctx );
parser.setContentHandler( fh );
parser.parse( is );
}
}
public class FooHandler extends AbstractFooHandler
{
public void startElement(String uri,
String localName,
String qName,
Attributes attributes)
throws SAXException
{
if ( localName.equals( "person" ))
{
AbstractFooHandler afh = new PersonHandler();
afh.setContext( getContext() );
afh.setEncloser( this );
getContext().parser.setContentHandler( afh );
}
else
{
throw new SAXException( "Illegal tag \""+ localName +"\"." );
}
}
}
Then endElement() callback of PersonHandler would detect the closing "person"
tag and replace the current Handler with its own encloser. endElement() at
every level will emit events that you want to happen in response to the XML.
I hard-coded a few things in this example, which is a Bad Thing but would have
been too long in a newsgroup post. I'd keep a Map of Handlers keyed by tags
instead of hardcoding the tag and its handler. This is most definitely not an
SSCCE.
This will let you keep track of where you are and process your file in one
pass, keeping in memory only what each handler emits as necessary to keep in
memory. No file offsets, either.