Document.importNode(Node,boolean) - what supports it?

Discussion in 'XML' started by Simon Brooke, Mar 16, 2007.

  1. Simon Brooke

    Simon Brooke Guest

    The DOM API has included public Node importNode(Node,boolean) as a method
    of the Document interface for a long time. Does anything actually
    implement it? Xerces 2 is giving me:

    org.w3c.dom.DOMException: NOT_SUPPORTED_ERR: The implementation does not
    support the requested type of object or operation.
    at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown
    Source)
    at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown
    Source)
    at
    uk.co.weft.domutil.MaybeParseGenerator.maybeParse(MaybeParseGenerator.java:183)

    This is so whether the node I'm trying to import is an
    org.apache.xerces.dom.DeferredElementImpl (i.e. parsed with Xerces) or a
    org.apache.crimson.tree.ElementNode (i.e. parsed with Crimson).

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/
    Ye hypocrites! are these your pranks? To murder men and give God thanks?
    Desist, for shame! Proceed no further: God won't accept your thanks for
    murther
    -- Robert Burns, 'Thanksgiving For a National Victory'
    Simon Brooke, Mar 16, 2007
    #1
    1. Advertising

  2. Simon Brooke wrote:
    > The DOM API has included public Node importNode(Node,boolean) as a method
    > of the Document interface for a long time. Does anything actually
    > implement it?


    Certainly should work; I wrote Xerces' first implementation of that
    function, and in fact was one of those who lobbied the DOM WG to include
    it in the standard. If the node being imported properly implements the
    DOM APIs, and the implementation being imported into doesn't have some
    reason for blocking this (eg, that it's specifically a read-only DOM,
    such as the DOM view of Xalan's internal data model), the function
    should work. It isn't rocket science, after all; it's just a tree-walker
    feeding a tree-builder.

    I have to believe the problem resides in something you haven't told us.

    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Mar 16, 2007
    #2
    1. Advertising

  3. Simon Brooke

    Simon Brooke Guest

    in message <>, Joe Kesselman
    ('') wrote:

    > Simon Brooke wrote:
    >> The DOM API has included public Node importNode(Node,boolean) as a
    >> method of the Document interface for a long time. Does anything actually
    >> implement it?

    >
    > Certainly should work; I wrote Xerces' first implementation of that
    > function, and in fact was one of those who lobbied the DOM WG to include
    > it in the standard. If the node being imported properly implements the
    > DOM APIs, and the implementation being imported into doesn't have some
    > reason for blocking this (eg, that it's specifically a read-only DOM,
    > such as the DOM view of Xalan's internal data model), the function
    > should work. It isn't rocket science, after all; it's just a tree-walker
    > feeding a tree-builder.
    >
    > I have to believe the problem resides in something you haven't told us.


    OK, then I have to believe that, too. Furthermore, this is another of the
    bits of my code that have been around for a long time (since 2003 in this
    case), and I'm sure it used to work (but it may only ever have worked with
    Crimson). I have had occasions in the past where I have inadvertently
    depended on bugs in a library, and when that library has been fixed all my
    code broke.

    If this class fails, it returns a text node with a 'flat' representation of
    the embedded markup. Looking at the production server logs I see that it
    has been intermittently failing in this way for some time, but that the
    failure simply has not been noticed. The failure on the production servers
    is different from the failure on the development server, I'll detail that
    difference below. The production severs use Crimson to parse, but Xerces
    to construct documents - I can't remember why, but probably just an
    oversight.

    The class in question is:

    //***********************************************************************\
    // *
    // MaybeParseGenerator.java *
    // *
    // Author: Simon Brooke *
    // Created: 17th January 2003 *
    // $Revision: 1.7.4.3 $; $Date: 2006/09/04 13:45:54 $ *
    // *
    //***********************************************************************/
    package uk.co.weft.domutil;

    import org.w3c.dom.Document;
    import org.w3c.dom.Node;

    import org.xml.sax.InputSource;

    import java.io.StringReader;

    import javax.xml.parsers.DocumentBuilder;

    import uk.co.weft.htform.ResourceConsumerImpl;


    /*
    * $Log: MaybeParseGenerator.java,v $
    * Revision 1.7.4.3 2006/09/04 13:45:54 simon
    * Added more debugging output. Have an intermittent bug in PRES which may
    originate here.
    *
    * Revision 1.7.4.2 2005/12/30 16:54:00 simon
    * EkitWidget now working remarkably well. Still some tidying up to do.
    *
    * Revision 1.7.4.1 2005/12/23 10:48:33 simon
    * Brute force tidy up after CVS server crash: this time it should work.
    *
    * Revision 1.7 2005/02/05 17:40:17 simon
    * Improved diagnostics on failure
    *
    * Revision 1.6 2004/07/14 12:52:34 simon
    * Final commit for 1.10.0
    *
    * Revision 1.5 2004/06/17 15:10:38 simon
    * Extends ResourceConsumerImpl to gain access to grs, etc
    *
    * Revision 1.4 2003/10/30 12:40:21 simon
    * Added debug flag in domutil classes
    *
    * Revision 1.3 2003/08/20 09:38:35 simon
    * Code cleanup with eclipse; mostly removal of exccessive includes
    *
    * Revision 1.2 2003/07/09 09:32:07 simon
    * Initial work on HTML generation of widgets.
    *
    * Revision 1.1 2003/02/06 11:22:26 simon
    * New superclass for node generators which may want to parse XML text.
    */

    /**
    * Abstract superclass for TextNodeGenerator and ElementGenerator, which
    may
    * want to parse their content. Parsing is potentially expensive, so if
    * you're confident the value won't contain XML markup it may be worth
    * setting allowEmbeddeMarkup( false).
    *
    * @author Simon Brooke
    * @version $Revision: 1.7.4.3 $ This revision: $Author: simon $
    */
    public abstract class MaybeParseGenerator extends ResourceConsumerImpl
    {
    //~ Instance fields -----------------------------------------------------

    /**
    * whether or not I'm in debug mode; if I am I may print debugging
    * messages to System.err
    */
    protected boolean debug = false;

    /** By default we allow embedded markup in children */
    protected boolean embeddedMarkup = true;

    //~ Constructors --------------------------------------------------------

    /**
    * Creates a new MaybeParseGenerator object.
    */
    public MaybeParseGenerator( )
    {
    // ...nothing...
    }

    //~ Methods -------------------------------------------------------------

    /**
    * whether or not to set debugging mode. If true, the generator _may_
    * write debugging messages to System.err
    *
    * @param debug whether or not to set debugging mode
    *
    * @since Jacquard 1.10
    */
    public void setDebug( boolean debug )
    {
    this.debug = debug;
    }

    /**
    * Do we allow (and parse for) embedded markup within the value of this
    * node? default is we do.
    *
    * @param allow if true, then allow embedded markup within my value
    */
    public void allowEmbeddedMarkup( boolean allow )
    {
    embeddedMarkup = allow;
    }

    /**
    * Construct a node representing this value. It's perfectly possible (and
    * possibly legitimate) that the value of a child should contain embedded
    * markup. If so, try to parse a node out of it.
    *
    * @param doc the document in which the node is to be created
    * @param unparsed the string, possibly with embedded markup, to parse
    *
    * @exception GenerationException if parsing fails
    */
    protected Node maybeParse( Document doc, String unparsed )
    throws GenerationException
    {
    Node val = doc.createTextNode( unparsed ); // safe default

    if ( debug )
    {
    System.err.println( "MaybeParseGenerator.maybeParse: parsing [" +
    unparsed + "]" );
    }

    if ( unparsed != null ) // defensive
    {
    if ( embeddedMarkup && (
    // if we allow embedded markup
    unparsed.indexOf( "<" ) > -1 ) ) // it looks like markup
    {
    if ( !unparsed.trim( ).startsWith( "<" ) )
    {
    // nasty: if it contains markup, but
    // isn't contained in markup, the
    // parser will barf.
    unparsed = "<parsed>" + unparsed + "</parsed>";
    }

    try
    {
    DocumentBuilder parser = DOMStub.getParser( );

    if ( parser == null )
    {
    System.err.println( "Could not initialise XML parser" );
    }

    InputSource i =
    new InputSource( new StringReader( unparsed ) );

    // i.setCharacterStream( new StringReader( unparsed ) );
    Document parsed = parser.parse( i );

    if ( debug )
    {
    System.err.println( "Parsed document: " +
    parsed.toString( ) );

    if ( parsed != null )
    {
    Node root = parsed.getDocumentElement( );

    if ( root != null )
    {
    System.err.println( "Root node: (" +
    root.getClass( ).getName( ) + "): " +
    root.toString( ) );
    }
    }
    }

    val = doc.importNode( parsed, true );

    if ( debug )
    {
    System.err.println(
    "MaybeParseGenerator.maybeParse: parse successful" );
    new Printer( ).print( val, System.err );
    }
    }
    catch ( Exception e )
    {
    System.err.println(
    "MaybeParseGenerator.maybeParse(): Could not parse '" +
    unparsed + "'as XML" );
    e.printStackTrace( System.err );
    }
    }
    }

    return val;
    }
    }

    /* [end of file] */


    What I'm getting in the error stream on the development server is (with
    parser unconfigured, i.e. using Tomcat's default, which is Xerces; see
    below for Crimson):

    ElementGenerator.generate: attempting to parse <div class="Intro">
    Here be dragons!
    </div>
    MaybeParseGenerator.maybeParse: parsing [<div class="Intro">
    Here be dragons!
    </div>]
    Parsed document: [#document: null]
    Root node: (org.apache.xerces.dom.DeferredElementImpl): [div: null]
    MaybeParseGenerator.maybeParse(): Could not parse '<div class="Intro">
    Here be dragons!
    </div>'as XML
    org.w3c.dom.DOMException: NOT_SUPPORTED_ERR: The implementation does not
    support the requested type of object or operation.
    at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source)
    at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source)
    at
    uk.co.weft.domutil.MaybeParseGenerator.maybeParse(MaybeParseGenerator.java:183)


    (with parser configured as org.apache.crimson.tree.DOMImplementationImpl):

    ElementGenerator.generate: attempting to parse <div class="Intro">
    Here be dragons!
    </div>
    MaybeParseGenerator.maybeParse: parsing [<div class="Intro">
    Here be dragons!
    </div>]
    Parsed document: org.apache.crimson.tree.XmlDocument@e9a0e9a
    Root node: <div class="Intro">
    Here be dragons!
    </div>
    MaybeParseGenerator.maybeParse(): Could not parse '<div class="Intro">
    Here be dragons!
    </div>'as XML
    org.w3c.dom.DOMException: NOT_SUPPORTED_ERR: The implementation does not
    support the requested type of object or operation.
    at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source)
    at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source)
    at
    uk.co.weft.domutil.MaybeParseGenerator.maybeParse(MaybeParseGenerator.java:173)


    What's showing up in the production server logs is:
    (Firstly, evidence that it sometimes does work):
    ElementGenerator.generate: attempting to parse <div
    class="Introduction"><p>Copies of documentation issued to licensees is
    available in this section.</p></div>
    ElementGenerator.generate: attempting to parse Cockle Bags - further
    information


    (Secondly, evidence that it sometimes doesn't):
    ElementGenerator.generate: attempting to parse <div class="Introduction">
    Ayrshire and Dumfrieshire Cyclists Association is a regional
    association
    of cycling clubs within the structure of Scottish Cycling.
    </div>
    MayberParseGenerator.maybeParse(): Could not parse '<div
    class="Introduction">
    Ayrshire and Dumfrieshire Cyclists Association is a regional
    association
    of cycling clubs within the structure of Scottish Cycling.
    </div>'as XML
    java.lang.NullPointerException
    at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown
    Source)
    at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown
    Source)
    at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown
    Source)
    at
    uk.co.weft.domutil.MaybeParseGenerator.maybeParse(MaybeParseGenerator
    ..java:163)

    I've checked the libraries and the two instances above use the same
    versions of the same libraries with the same configuration, so why

    <div class="Introduction"><p>Copies of documentation issued to licensees is
    available in this section.</p></div>

    parses successfully and

    <div class="Introduction">
    Ayrshire and Dumfrieshire Cyclists Association is a regional
    association
    of cycling clubs within the structure of Scottish Cycling.
    </div>

    fails to parse is frankly baffling me.

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/
    ;; Let's have a moment of silence for all those Americans who are stuck
    ;; in traffic on their way to the gym to ride the stationary bicycle.
    ;; Rep. Earl Blumenauer (Dem, OR)
    Simon Brooke, Mar 16, 2007
    #3
  4. Just a quick observation: Your "sometimes works" and "sometimes doesn't"
    are significantly different:

    > (Firstly, evidence that it sometimes does work):
    > ElementGenerator.generate: attempting to parse <div
    > class="Introduction"><p>Copies of documentation issued to licensees is
    > available in this section.</p></div>


    <div> has a <p> child.


    > (Secondly, evidence that it sometimes doesn't):
    > ElementGenerator.generate: attempting to parse <div class="Introduction">
    > Ayrshire and Dumfrieshire Cyclists Association is a regional
    > association
    > of cycling clubs within the structure of Scottish Cycling.
    > </div>


    <div> contains only text. Haven't looked at the code yet, but are you
    sure you aren't doing something simple like trying to import the string
    value rather than a TextNode object?

    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Mar 16, 2007
    #4
  5. Also: You didn't show us the implementation of DOMStub... but with that
    name, I wouldn't be at all surprised if you've got a subset
    implementation there.

    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Mar 17, 2007
    #5
  6. Well, I've reproduced the error message under Eclipse. Lemme see if I
    can reproduce it with a current version of Xerces...



    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Mar 17, 2007
    #6
  7. Oh. That's stupid; I should have remembered this:

    http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/core.html#Core-Document-importNode

    You're attempting to import a Document node. That's forbidden. Import
    its root element instead.

    Yes, the error message could have been more helpful. I'd suggest posting
    that as a suggestion on the Xerces users mailing list, since I'm not
    sure any of the current Xerces maintainers are reading this list.


    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Mar 17, 2007
    #7
  8. * Joe Kesselman wrote in comp.text.xml:
    >You're attempting to import a Document node. That's forbidden. Import
    >its root element instead.


    Heh, I actually had a quick look into the Xerces source code when I
    looked at the question, but that case was the only where the specific
    claimed exception would be raised, and Simon said he tried to import
    element nodes, so I concluded the issue is too weird to investigate
    further...
    --
    Björn Höhrmann · mailto: · http://bjoern.hoehrmann.de
    Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
    68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
    Bjoern Hoehrmann, Mar 17, 2007
    #8
  9. Simon Brooke

    Simon Brooke Guest

    in message <>, Joe Kesselman
    ('') wrote:

    > Oh. That's stupid; I should have remembered this:
    >
    >

    http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/core.html#Core-Document-importNode
    >
    > You're attempting to import a Document node. That's forbidden. Import
    > its root element instead.
    >
    > Yes, the error message could have been more helpful. I'd suggest posting
    > that as a suggestion on the Xerces users mailing list, since I'm not
    > sure any of the current Xerces maintainers are reading this list.


    Thank you. I was going to say indignantly 'oh no I don't', but on reading
    through my code I see I get the root node of the document... and then
    don't use it. Having fixed that, /this/ problem is solved, and I can now
    replace vintage Crimson with current Xerces and my code still works.

    Still can't get it to work with current Xalan, but that's another set of
    problems...

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/

    ;; Good grief, I can remember when England won the Ashes.
    Simon Brooke, Mar 17, 2007
    #9
  10. Simon Brooke

    Simon Brooke Guest

    in message <>, Joe Kesselman
    ('') wrote:

    > Also: You didn't show us the implementation of DOMStub... but with that
    > name, I wouldn't be at all surprised if you've got a subset
    > implementation there.


    No, it just allows me to select and configure the DOMImplementation I use:

    /**
    * Should be called before DOMStub is used, but perfectly safe to call
    * more than once. If I've already been initialised, don't intialise me
    * again.
    *
    * @param config my configuration
    *
    * @exception InitialisationException if requested DOM implementation
    * can't be found
    */
    public static void init( Context config ) throws InitialisationException
    {
    String s = config.getValueAsString( "dom_implementation_class" );

    if ( domImp == null )
    {
    /* i.e., I have not already been initialised */
    try
    {
    if ( s != null )
    {
    domImpName = s;
    }

    domImp =
    (DOMImplementation) Class.forName( domImpName )
    .newInstance( );
    }
    catch ( Exception any )
    {
    throw new InitialisationException( "Could not find DOM " +
    "implementation " + domImpName );
    }
    }

    Boolean b = config.getValueAsBoolean( "dom_coalescing" );

    if ( b != null )
    {
    dbf.setCoalescing( b.booleanValue( ) );
    }

    b = config.getValueAsBoolean( "dom_expand_entity_references" );

    if ( b != null )
    {
    dbf.setExpandEntityReferences( b.booleanValue( ) );
    }

    b = config.getValueAsBoolean( "dom_ignore_comments" );

    if ( b != null )
    {
    dbf.setIgnoringComments( b.booleanValue( ) );
    }

    b = config.getValueAsBoolean( "dom_ignore_whitespace" );

    if ( b != null )
    {
    dbf.setIgnoringElementContentWhitespace( b.booleanValue( ) );
    }

    b = config.getValueAsBoolean( "dom_namespace_aware" );

    if ( b != null )
    {
    dbf.setNamespaceAware( b.booleanValue( ) );
    }

    b = config.getValueAsBoolean( "dom_validating" );

    if ( b != null )
    {
    dbf.setValidating( b.booleanValue( ) );
    }
    }
    }


    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/

    X-no-archive: No, I'm not *that* naive.
    Simon Brooke, Mar 17, 2007
    #10
  11. Bjoern Hoehrmann wrote:
    > Simon said he tried to import element nodes, so I concluded the issue is too weird to
    > investigate further...


    This is why it's often helpful to post a minimal example that provokes
    the problem. In fact, the process of extracting code and writing that
    reduced example is often enough to expose the problem.

    I must admit I cheated -- I tossed the code into a debugger, did some
    cleanup so it could actually be run, added the Xerces source (so I could
    see what was happening inside that), set the classpaths to use this copy
    of Xerces rather than the one in the Java libraries, set it to stop when
    a DOMException was about to be thrown, and just pushed the button.
    Bingo; there we are at the error, and the object in question is indeed a
    Document.


    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Mar 17, 2007
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    1,448
  2. Andy Fish

    performance of importNode

    Andy Fish, Nov 4, 2003, in forum: XML
    Replies:
    0
    Views:
    646
    Andy Fish
    Nov 4, 2003
  3. Tjerk Wolterink
    Replies:
    2
    Views:
    1,423
    Dimitre Novatchev
    Aug 24, 2006
  4. Replies:
    3
    Views:
    1,029
    Martin Honnen
    Dec 20, 2006
  5. Une Bévue

    importNode and root node name

    Une Bévue, Apr 26, 2008, in forum: Javascript
    Replies:
    11
    Views:
    224
    Une Bévue
    Apr 26, 2008
Loading...

Share This Page