Handling Whitespace in Java DOM

Discussion in 'Java' started by Jason Cavett, Dec 12, 2007.

  1. Jason Cavett

    Jason Cavett Guest

    Before I get flamed, I have already read on how to ignore whitespace
    in an XML document via the Java DOM. However, according to the
    following link, it currently is not working as intended. (See:
    http://forums.java.net/jive/thread.jspa?messageID=226957 for the
    thread and http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6564400
    for the bug report)

    So, I can't seem to use DocumentBuilderFactory's
    setIgnoringElementContentWhitespace, so I am wondering if there is
    another way to handle whitespace in an XML file. When I parse the XML
    file and get the children the text nodes look something like this...

    name: #text
    data: (all whitespace - spaces, \n, etc.)

    Now, I suppose I could just check to see if the name is #text and
    ignore it as I loop through the nodes, but that seems kind of crummy
    to do that. Is there a better way that I'm not seeing?

    Thanks
     
    Jason Cavett, Dec 12, 2007
    #1
    1. Advertisements

  2. Jason Cavett

    Arne Vajhøj Guest

    We had this question back here back in September.

    My conclusion was that you need a DTD to get it working.

    See code below.

    Arne

    ===========================================

    import java.io.StringReader;

    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;

    import org.w3c.dom.Document;
    import org.w3c.dom.Node;
    import org.w3c.dom.traversal.DocumentTraversal;
    import org.w3c.dom.traversal.NodeFilter;
    import org.w3c.dom.traversal.TreeWalker;
    import org.xml.sax.InputSource;

    public class XMLandWS {
    public static void parse(String xml) throws Exception {
    System.out.print(xml);
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setIgnoringElementContentWhitespace(true);
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document doc = db.parse(new InputSource(new StringReader(xml)));
    TreeWalker walk = ((DocumentTraversal)
    doc).createTreeWalker(doc.getDocumentElement(), NodeFilter.SHOW_TEXT,
    null, false);
    Node n;
    while ((n = walk.nextNode()) != null) {
    System.out.println("=" + n.getNodeValue().replace("\n",
    "\\n").replace(" ", "_"));
    }
    }
    public static void main(String[] args) throws Exception {
    parse("<all>\n" +
    " <one>A</one>\n" +
    " <one>BB</one>\n" +
    " <one>CCC</one>\n" +
    "</all>\n");
    parse("<!DOCTYPE all [\n" +
    "<!ELEMENT all (one)*>\n" +
    "<!ELEMENT one (#PCDATA)>\n" +
    "]>\n" +
    "<all>\n" +
    " <one>A</one>\n" +
    " <one>BB</one>\n" +
    " <one>CCC</one>\n" +
    "</all>\n");
    parse("<!DOCTYPE all [\n" +
    "<!ELEMENT all (#PCDATA|one)*>\n" +
    "<!ELEMENT one (#PCDATA)>\n" +
    "]>\n" +
    "<all>\n" +
    " <one>A</one>\n" +
    " <one>BB</one>\n" +
    " <one>CCC</one>\n" +
    "</all>\n");
    }
    }
     
    Arne Vajhøj, Dec 15, 2007
    #2
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.