see whitespace in java DOM

Discussion in 'XML' started by Wired Earp, Oct 20, 2004.

  1. Wired Earp

    Wired Earp Guest

    I've had some luck using string values "\t" "\n" and "\r" to insert tabs,
    newlines and carriagereturn textnodes into a document, but I can't *read*
    these nodes, at least not by analyzing the nodeValue. Am i missing
    something?


    /**
    * NodeFilter supposed to remove ignorable whitespace
    */
    private class WhiteSpaceFilter implements NodeFilter {

    public short acceptNode ( Node node ) {

    // HELLO?
    String value = node.getTextContent ();
    boolean ok = value.equals ( "\n" ) || value.equals ( "\t" );
    return ok ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT;
    }
    }
    /**
    * Strip whitespace
    * @param element DOMElement
    */
    private void strip ( Element element ) {

    List<Node> list = new ArrayList<Node> ();
    NodeFilter filter = new WhiteSpaceFilter ();
    Document document = element.getOwnerDocument();
    DocumentTraversal traversable = (DocumentTraversal) document;
    TreeWalker walker = traversable.createTreeWalker (
    element, NodeFilter.SHOW_TEXT, filter, true );

    while ( walker.nextNode() != null )
    list.add ( walker.getCurrentNode ());
    for ( Node node : list )
    node.getParentNode().removeChild ( node );
    }
     
    Wired Earp, Oct 20, 2004
    #1
    1. Advertisements

  2. Wired Earp

    Wired Earp Guest

    For some reason, even a single "\n" textnode can only be identified by a
    regular expression. To make things worse, in-text whitespace must be
    trimmed out, not to fool the filter.

    private class WhiteSpaceFilter implements NodeFilter {

    // filter parsed data
    public short acceptNode ( Node node ) {
    node = sanitize ( node );
    String data = node.getTextContent();
    boolean ok = Pattern.matches ( "", data );
    return ok ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT;
    }

    // parse and modify data
    private Node sanitize ( Node node ) {
    Text text = ( Text ) node;
    String data = text.getData ();
    text.setData ( data.replaceAll ( "[\t\n\r\f]+", "" ));
    return node; //TODO: delete multiple space characters
    }
    }
     
    Wired Earp, Oct 21, 2004
    #2
    1. Advertisements

  3. Wired Earp

    Wired Earp Guest

    In that case, it would probably be simpler to just:

    private void strip ( Document document ) {

    DocumentTraversal traversable = ( DocumentTraversal ) document;
    NodeIterator iterator = traversable.createNodeIterator (
    (Node)document, NodeFilter.SHOW_TEXT, null, false );

    Node node;
    while (( node = iterator.nextNode ()) != null ) {
    Text text = ( Text ) node;
    String data = text.getData ();
    text.setData ( data.replaceAll ( "[\t\n\r\f]+", "" ));
    // TODO: delete multiple spaces
    }
    document.normalizeDocument ();
    }
     
    Wired Earp, Oct 21, 2004
    #3
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.