Handling Whitespace in Java DOM

Jason Cavett · Dec 12, 2007

Before I get flamed, I have already read on how to ignore whitespace
in an XML document via the Java DOM. However, according to the
following link, it currently is not working as intended. (See:
http://forums.java.net/jive/thread.jspa?messageID=226957 for the
thread and http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6564400
for the bug report)

So, I can't seem to use DocumentBuilderFactory's
setIgnoringElementContentWhitespace, so I am wondering if there is
another way to handle whitespace in an XML file. When I parse the XML
file and get the children the text nodes look something like this...

name: #text
data: (all whitespace - spaces, \n, etc.)

Now, I suppose I could just check to see if the name is #text and
ignore it as I loop through the nodes, but that seems kind of crummy
to do that. Is there a better way that I'm not seeing?

Thanks

Arne Vajhøj · Dec 15, 2007

Jason said:
Before I get flamed, I have already read on how to ignore whitespace
in an XML document via the Java DOM. However, according to the
following link, it currently is not working as intended. (See:
http://forums.java.net/jive/thread.jspa?messageID=226957 for the
thread and http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6564400
for the bug report)

So, I can't seem to use DocumentBuilderFactory's
setIgnoringElementContentWhitespace, so I am wondering if there is
another way to handle whitespace in an XML file. When I parse the XML
file and get the children the text nodes look something like this...

name: #text
data: (all whitespace - spaces, \n, etc.)

Now, I suppose I could just check to see if the name is #text and
ignore it as I loop through the nodes, but that seems kind of crummy
to do that. Is there a better way that I'm not seeing?

We had this question back here back in September.

My conclusion was that you need a DTD to get it working.

See code below.

Arne

===========================================

import java.io.StringReader;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.traversal.DocumentTraversal;
import org.w3c.dom.traversal.NodeFilter;
import org.w3c.dom.traversal.TreeWalker;
import org.xml.sax.InputSource;

public class XMLandWS {
public static void parse(String xml) throws Exception {
System.out.print(xml);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setIgnoringElementContentWhitespace(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xml)));
TreeWalker walk = ((DocumentTraversal)
doc).createTreeWalker(doc.getDocumentElement(), NodeFilter.SHOW_TEXT,
null, false);
Node n;
while ((n = walk.nextNode()) != null) {
System.out.println("=" + n.getNodeValue().replace("\n",
"\\n").replace(" ", "_"));
}
}
public static void main(String[] args) throws Exception {
parse("<all>\n" +
" <one>A</one>\n" +
" <one>BB</one>\n" +
" <one>CCC</one>\n" +
"</all>\n");
parse("<!DOCTYPE all [\n" +
"<!ELEMENT all (one)*>\n" +
"<!ELEMENT one (#PCDATA)>\n" +
"]>\n" +
"<all>\n" +
" <one>A</one>\n" +
" <one>BB</one>\n" +
" <one>CCC</one>\n" +
"</all>\n");
parse("<!DOCTYPE all [\n" +
"<!ELEMENT all (#PCDATA|one)*>\n" +
"<!ELEMENT one (#PCDATA)>\n" +
"]>\n" +
"<all>\n" +
" <one>A</one>\n" +
" <one>BB</one>\n" +
" <one>CCC</one>\n" +
"</all>\n");
}
}

Javascript DOM	1	Mar 29, 2023
problem of python whitespace XML dom	0	Jan 13, 2016
Whitespace problems, xml-parsing	5	Apr 15, 2008
BUG in Java ImageIO,problem in JPEGImageDecoder,can't read or write image	2	Oct 23, 2006
Exception Handling	33	Mar 10, 2012
XML and Line Breaks/Whitespace	2	Oct 16, 2006
Writing XML and Whitespace	1	Feb 23, 2007
Error Handling	27	May 28, 2010

Handling Whitespace in Java DOM

Jason Cavett

Arne Vajhøj

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads