Handling Whitespace in Java DOM

Discussion in 'Java' started by Jason Cavett, Dec 12, 2007.

  1. Jason Cavett

    Jason Cavett Guest

    Before I get flamed, I have already read on how to ignore whitespace
    in an XML document via the Java DOM. However, according to the
    following link, it currently is not working as intended. (See:
    http://forums.java.net/jive/thread.jspa?messageID=226957 for the
    thread and http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6564400
    for the bug report)

    So, I can't seem to use DocumentBuilderFactory's
    setIgnoringElementContentWhitespace, so I am wondering if there is
    another way to handle whitespace in an XML file. When I parse the XML
    file and get the children the text nodes look something like this...

    name: #text
    data: (all whitespace - spaces, \n, etc.)

    Now, I suppose I could just check to see if the name is #text and
    ignore it as I loop through the nodes, but that seems kind of crummy
    to do that. Is there a better way that I'm not seeing?

    Jason Cavett, Dec 12, 2007
    1. Advertisements

  2. Jason Cavett

    Arne Vajhøj Guest

    We had this question back here back in September.

    My conclusion was that you need a DTD to get it working.

    See code below.



    import java.io.StringReader;

    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;

    import org.w3c.dom.Document;
    import org.w3c.dom.Node;
    import org.w3c.dom.traversal.DocumentTraversal;
    import org.w3c.dom.traversal.NodeFilter;
    import org.w3c.dom.traversal.TreeWalker;
    import org.xml.sax.InputSource;

    public class XMLandWS {
    public static void parse(String xml) throws Exception {
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document doc = db.parse(new InputSource(new StringReader(xml)));
    TreeWalker walk = ((DocumentTraversal)
    doc).createTreeWalker(doc.getDocumentElement(), NodeFilter.SHOW_TEXT,
    null, false);
    Node n;
    while ((n = walk.nextNode()) != null) {
    System.out.println("=" + n.getNodeValue().replace("\n",
    "\\n").replace(" ", "_"));
    public static void main(String[] args) throws Exception {
    parse("<all>\n" +
    " <one>A</one>\n" +
    " <one>BB</one>\n" +
    " <one>CCC</one>\n" +
    parse("<!DOCTYPE all [\n" +
    "<!ELEMENT all (one)*>\n" +
    "<!ELEMENT one (#PCDATA)>\n" +
    "]>\n" +
    "<all>\n" +
    " <one>A</one>\n" +
    " <one>BB</one>\n" +
    " <one>CCC</one>\n" +
    parse("<!DOCTYPE all [\n" +
    "<!ELEMENT all (#PCDATA|one)*>\n" +
    "<!ELEMENT one (#PCDATA)>\n" +
    "]>\n" +
    "<all>\n" +
    " <one>A</one>\n" +
    " <one>BB</one>\n" +
    " <one>CCC</one>\n" +
    Arne Vajhøj, Dec 15, 2007
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.