Handling Whitespace in Java DOM

J

Jason Cavett

Before I get flamed, I have already read on how to ignore whitespace
in an XML document via the Java DOM. However, according to the
following link, it currently is not working as intended. (See:
http://forums.java.net/jive/thread.jspa?messageID=226957 for the
thread and http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6564400
for the bug report)

So, I can't seem to use DocumentBuilderFactory's
setIgnoringElementContentWhitespace, so I am wondering if there is
another way to handle whitespace in an XML file. When I parse the XML
file and get the children the text nodes look something like this...

name: #text
data: (all whitespace - spaces, \n, etc.)

Now, I suppose I could just check to see if the name is #text and
ignore it as I loop through the nodes, but that seems kind of crummy
to do that. Is there a better way that I'm not seeing?

Thanks
 
A

Arne Vajhøj

Jason said:
Before I get flamed, I have already read on how to ignore whitespace
in an XML document via the Java DOM. However, according to the
following link, it currently is not working as intended. (See:
http://forums.java.net/jive/thread.jspa?messageID=226957 for the
thread and http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6564400
for the bug report)

So, I can't seem to use DocumentBuilderFactory's
setIgnoringElementContentWhitespace, so I am wondering if there is
another way to handle whitespace in an XML file. When I parse the XML
file and get the children the text nodes look something like this...

name: #text
data: (all whitespace - spaces, \n, etc.)

Now, I suppose I could just check to see if the name is #text and
ignore it as I loop through the nodes, but that seems kind of crummy
to do that. Is there a better way that I'm not seeing?

We had this question back here back in September.

My conclusion was that you need a DTD to get it working.

See code below.

Arne

===========================================

import java.io.StringReader;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.traversal.DocumentTraversal;
import org.w3c.dom.traversal.NodeFilter;
import org.w3c.dom.traversal.TreeWalker;
import org.xml.sax.InputSource;

public class XMLandWS {
public static void parse(String xml) throws Exception {
System.out.print(xml);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setIgnoringElementContentWhitespace(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xml)));
TreeWalker walk = ((DocumentTraversal)
doc).createTreeWalker(doc.getDocumentElement(), NodeFilter.SHOW_TEXT,
null, false);
Node n;
while ((n = walk.nextNode()) != null) {
System.out.println("=" + n.getNodeValue().replace("\n",
"\\n").replace(" ", "_"));
}
}
public static void main(String[] args) throws Exception {
parse("<all>\n" +
" <one>A</one>\n" +
" <one>BB</one>\n" +
" <one>CCC</one>\n" +
"</all>\n");
parse("<!DOCTYPE all [\n" +
"<!ELEMENT all (one)*>\n" +
"<!ELEMENT one (#PCDATA)>\n" +
"]>\n" +
"<all>\n" +
" <one>A</one>\n" +
" <one>BB</one>\n" +
" <one>CCC</one>\n" +
"</all>\n");
parse("<!DOCTYPE all [\n" +
"<!ELEMENT all (#PCDATA|one)*>\n" +
"<!ELEMENT one (#PCDATA)>\n" +
"]>\n" +
"<all>\n" +
" <one>A</one>\n" +
" <one>BB</one>\n" +
" <one>CCC</one>\n" +
"</all>\n");
}
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top