Sanjeev said:
I am using SAX parser for reading XML file.
Below is the code snippets.
<?xml version="1.0" encoding="UTF-8"?>
<root>
<student>
<name>Sanjeev Atvankar</name>
<class>Fourth Year</class>
<subject>
<subjectType>Science</subjectType>
<subjectValue>Anatomy</subjectValue>
</subject>
<subject>
<subjectType>Language</subjectType>
<subjectValue>Hindi</subjectValue>
</subject>
</student>
<student>
. . . .
. . . .
</student>
StudentVO.java (Java Bean) with following parameters
private String name;
private String classRoom;
private String scienceSubject;
private String languageSubject;
. . . .
. . . .
public StudentParser(){
studentCollectionVO = new StudentCollectionVO();
}
public StudentCollectionVO runExample(String xmlMessage) {
parseDocument(xmlMessage);
return studentCollectionVO;
}
private void parseDocument(String xmlMessage) {
SAXParserFactory spf = SAXParserFactory.newInstance();
try {
SAXParser sp = spf.newSAXParser();
sp.parse(new InputSource(new
ByteArrayInputStream(xmlMessage.getBytes())), this);
}catch(Exception e) {
}
}
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
tempVal = "";
if(qName.equalsIgnoreCase("student")) {
studentVO = new StudentVO();
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
tempVal = new String(ch,start,length);
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if(qName.equalsIgnoreCase("student")) {
studentCollectionVO.add(studentVO);
}else if (qName.equalsIgnoreCase("name")) {
studentVO.setName(tempVal);
}else if (qName.equalsIgnoreCase("class")) {
studentVO.setClassRoom(tempVal);
}else if (qName.equalsIgnoreCase("subjectValue")) {
studentVO.setScienceSubject(tempVal);
}else if (qName.equalsIgnoreCase("subjectValue")) {
studentVO.setLanguageSubject(tempVal);
}
}
. . . .
. . . .
Since each subject is given in following tag format
<subject>
<subjectType></subjectType>
<subjectValue></subjectValue>
</subject>
how can identify individual subject.
In above example Anatomy belongs to Science(subjectType) and
Hindi belongs to Language(subjectType).
Refactor. Have a separate parser class responsible for each tag, and
have each instance hold a reference to its parent. Thus, you'll have a
"StudentHandler", a "NameHandler", a "RoomHandler",
"ScienceSubjectHandler", etc. Plug the appropriate handler in at
startElement(), and pop to the prior one at the end of endElement().
Okay, i had to think about this for a bit, and write some code which was
entirely the wrong thing, but i think i get this idea, and it's pretty
cool. Here's an attempt (which i haven't tried to compile, and ignores
various details that would be required to do so):
// application-independent bits
interface ElementHandler {
public ElementHandler handleChild(String tag) ;
public String handleText(String text) ;
}
class ElementHandlingHandler implements org.xml.sax.ContentHandler {
private ElementHandler handler ;
private List<ElementHandler> handlerStack = new ArrayList<ElementHandler>() ;
private StringBuffer sbuf = new StringBuffer() ; // do buffering here, not in handlers
public ElementHandlingParser(String rootTag, ElementHandler rootHandler) {
handler = new RootHandler(rootTag, rootHandler) ;
}
public void characters(char[] buf, int off, int len) {
sbuf.append(buf, off, len) ;
}
public void startElement(String uri, String name, String qname, Attributes attrs) {
flush() ;
handlerStack.add(handler) ; // aka 'push'
handler = handler.handleChild(name) ;
}
public void endElement(String uri, String name, String qname) {
flush() ;
if (!handlerStack.isEmpty()) {
handler = handlerStack.remove(handlerStack.length() - 1) ; // aka 'pop'
}
else {
// we're done - null things so we puke if more methods are called
handler = null ;
sbuf = null ;
}
}
private void flush() {
if (sbuf.length() > 0) {
handler.handleText(sbuf.toString()) ;
sbuf.setLength(0) ;
}
}
}
// convenience class - override at least one of the methods to do anything useful!
abstract class ElementHandlerBase implements ElementHandler {
public ElementHandler handleChild(String tag) {
throw new IllegalStateException("element has no such child: " + tag) ;
}
public String handleText(String text) {
throw new IllegalStateException("element has no text") ;
}
}
// sort of weird adapter thing, see use above
class RootHandler extends ElementHandlerBase {
private String rootTag ;
private ElementHandler rootHandler ;
public RootHandler(String rootTag, ElementHandler rootHandler) {
this.rootTag = rootTag ;
this.rootHandler = rootHandler ;
}
public ElementHandler handleChild(String tag) {
if (!tag.equals(rootTag)) super.handleChild(tag) ;
return rootHandler ;
}
}
// application-specific bits
class Student {
// etc
}
class StudentListHandler extends ElementHandlerBase {
private List<Student> students = new ArrayList<Student>() ;
public List<Student> getStudents() {
return students ;
}
public ElementHandler handleChild(String tag) {
// note that i use super.handleChild to signal an error, here and below
if (!tag.equals("student")) super.handleChild(tag) ;
Student student = new Student() ;
students.add(student) ;
return new StudentHandler(student) ;
}
}
class StudentHandler extends ElementHandlerBase {
private Student student ;
public StudentHandler(Student student) {
this.student = student ;
}
public ElementHandler handleChild(String tag) {
if (tag.equals("name")) return new NameHandler(student) ;
else if (tag.equals("class")) return new ClassHandler(student) ;
else if (tag.equals("subject")) return new SubjectHandler(student) ;
else super.handleChild(tag) ;
}
}
class NameHandler extends ElementHandlerBase {
private Student student ;
public NameHandler(Student student) {
this.student = student ;
}
public String handleText(String text) {
student.setName(text) ;
}
}
class ClassHandler extends ElementHandlerBase {
private Student student ;
public ClassHandler(Student student) {
this.student = student ;
}
public String handleText(String text) {
student.setClass(text) ;
}
}
class SubjectHandler extends ElementHandlerBase {
private Student student ;
private String subjectType = null ;
public SubjectHandler(Student student) {
this.student = student ;
}
public ElementHandler handleChild(String tag) {
if (tag.equals("subjectType")) return new SubjectTypeHandler(this) ;
else if (tag.equals("subjectValue")) return new SubjectValueHandler(this) ;
else super.handleChild(tag) ;
}
public String setSubjectType(String type) {
if (subjectType != null) throw new IllegalStateException("subject type already set") ;
subjectType = type ;
}
public String setSubjectValue(String value) {
if (subjectType == null) throw new IllegalStateException("subject type not yet set") ;
if (subjectType.equals("language")) student.setLanguageSubject(value) ;
else if (subjectType.equals("science")) student.setScienceSubject(value) ;
else throw new IllegalArgumentException("no such subject type: " + subjectType) ; // this should really be thrown in setSubjectType!
subjectType = null ;
}
}
class SubjectTypeHandler extends ElementHandlerBase {
private SubjectHandler parent ;
public SubjectTypeHandler(SubjectHandler parent) {
this.parent = parent ;
}
public String handleText(String text) {
parent.setSubjectType(text) ;
}
}
class SubjectValueHandler extends ElementHandlerBase {
private SubjectHandler parent ;
public SubjectValueHandler(SubjectHandler parent) {
this.parent = parent ;
}
public String handleText(String text) {
parent.setSubjectValue(text) ;
}
}
public List<Student> parse(InputSource xml) {
SaxParser parser = SAXParserFactory.newInstance().newSAXParser() ;
StudentListHandler root = new StudentListHandler() ;
parser.parse(xml, new ElementHandlingHandler("root", root)) ;
// the root element should really be called studentList or something
return root.getStudents() ;
}
Is that anything like what you meant?
I manage the handler stack externally rather than internally, but that's a
somewhat orthogonal choice. Looking at that code, it might be better to,
as you do, handle it internally: it would avoid duplication in the case of
the sub-handlers of SubjectHandler, and it would let me do some cleverness
where SubjectHandler returns itself to handle the sub-elements, but uses a
state variable (WAITING_FOR_TYPE, WAITING_FOR_VALUE) to decide what to do
when it gets some text.
Instead of "if ( qname.equalsIgnoreCase()
DIGRESSION:
Ignore case? Why do you do that? XML is case sensitive. Don't ignore case.
END DIGRESSION.
)" use a Map:
Handler handler = handlers.get( qName );
Hmm. My version doesn't use a map - there are several hard-coded switches
which could be done with maps instead. It wouldn't save any lines of code,
but would be less crufty.
If the handler's parent is a ScienceHandler you have one thing, if the parent
is a LanguageHandler you have another.
I've written a number of SAX parsers using this stratagem and it works
well. It also eliminates that long if-chain. Maps are easier to
configure - they don't require recompilation every time you change the
rules.
Provided you're getting the map from an external source, rather than
defining it in the code. But i don't quite understand how that would work
here. I think i've got really grokked your design.
tom