DOM2 API (Java): how to get namespace declarations?

Discussion in 'XML' started by Simon Brooke, Feb 11, 2006.

  1. Simon Brooke

    Simon Brooke Guest

    I was debugging a new XML generator tonight and trying to determine why
    it wasn't working; and realised my dom printer does not output XML
    namespace declarations.

    My method to output an Element is as follows:

    /**
    * Print an element node, and, by recursive descent, it's children
    *
    * @param node the node to print
    * @param out the stream to print it on
    * @param url the base URL to use in expanding relative URLs
    * @param level the indentation level if pretty printing
    */
    protected void print( Element node, PrintStream out, URL url,
    int level )
    throws IOException
    {
    indent( out, level );
    out.print( '<' );

    String tagname = node.getNodeName( );
    out.print( tagname );

    NamedNodeMap attrs = node.getAttributes( );
    NodeList children = node.getChildNodes( );

    /**
    * Get the attributes of the node and print their values.
    */
    for ( int i = 0; i < attrs.getLength( ); i++ )
    {
    print( ( (Attr) attrs.item( i ) ), out, url, level + 1 );
    }

    if ( ( children != null ) && ( children.getLength( ) > 0 ) )
    { // it's a non-empty tag
    out.print( '>' );

    int len = children.getLength( );

    for ( int i = 0; i < len; i++ )
    {
    print( children.item( i ), out, url, level + 1 );
    }

    /**
    * Set the end tag.
    */
    indent( out, level );
    out.print( '<' );
    out.print( '/' );
    out.print( tagname );
    }
    else // it's an empty tag
    {
    out.print( " /" );
    }

    out.print( '>' );
    }

    Performing the exact same XSL transform, the Xerces printer emits:

    <?xml version="1.0" encoding="UTF-8"?>
    <rdf:RDF version="1.0"
    xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:geourl="http://geourl.org/rss/module/"
    xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rss version="0.91">
    ...

    whereas my printer emits:

    <rdf:RDF version="1.0">
    <rss version="0.91">
    ...

    The relevant part of the XSL file reads:

    <xsl:template match="category">
    <rdf:RDF version="1.0"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
    xmlns:geourl="http://geourl.org/rss/module/"
    xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">
    <rss version="0.91">
    ...

    Clearly what Xerces is emitting is right and what I am emitting is wrong,
    but I'm having trouble seeing what I'm doing wrong. My method to output
    an attribute node is as follows:

    /**
    * Print an attribute node. If url is not null, use it as a base URL
    * for expanding URL values.
    *
    * @param node the node to print
    * @param out the stream to print it on
    * @param url the base URL to use in expanding relative URLs
    * @param level the indentation level if pretty printing
    */
    protected void print( Attr node, PrintStream out, URL url,
    int level )
    throws IOException
    {
    String delimiter = "\"";
    String value = node.getNodeValue( );

    if ( value != null )
    {
    /* As I understand it, you aren't allowed unvalued
    * attributes in XML
    */
    value = cleanString( value, true );
    /* are attribute values allowed to contain *any*
    * characters? */

    if ( value.indexOf( delimiter ) > -1 )
    /* if an attribute has double quotes in it's value, we'll use
    * single quotes as the delimiter and vice versa. If it has
    * both we're stuffed. */
    {
    delimiter = "'";
    }

    indent( out, level );
    out.print( " " );
    out.print( node.getNodeName( ) );
    out.print( "=" );
    out.print( delimiter );

    /* If this is an attribute whose value
    * should be a URL. */
    if ( ( node.getNodeName( ).equalsIgnoreCase( "href" ) ||
    node.getNodeName( ).equalsIgnoreCase( "link" ) ||
    node.getNodeName( ).equalsIgnoreCase( "src" ) ) &&
    ( url != null ) )
    {
    /* Change the partial URL to a full URL. */
    try
    {
    String fullURL = new URL( url, value ).toString( );

    out.print( fullURL );
    }
    catch ( MalformedURLException m )
    {
    // log
    m.printStackTrace();
    }
    }
    else
    { /* If I've got a value, clean it and
    * print it. */
    out.print( value );
    }

    out.print( delimiter );
    }
    else
    {
    System.err.println( "Unvalued attribute: " +
    node.getNodeName( ));
    }
    }

    Neither the MalformedURLException nor the string 'Unvalued attribute'
    ever appear in the log. From this it seems that neither
    Node.getAttributes() nor Node.getChildNodes() return the namespace
    declarations. Yet I can't see any other no-args get...() method in the
    API. Reading through the Xerces XMLSerializer code makes is seem that
    they are finding the namespace declarations among the attributes.

    Can anyone see what I'm doing wrong? I appreciate it probably some basic
    howler, but I just can't see it myself.

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/

    Hobbit ringleader gives Sauron One in the Eye.
    Simon Brooke, Feb 11, 2006
    #1
    1. Advertising

  2. Simon Brooke wrote:
    > I was debugging a new XML generator tonight and trying to determine why
    > it wasn't working; and realised my dom printer does not output XML
    > namespace declarations.


    XML namespace declarations are optional in the DOM, since every node
    carries its namespace and bindings can be reconstructed when you
    serialize the DOM's contents as XML. The flipside is that it is the
    serializer's responsibility to check that the necessary declarations are
    present as Attribute nodes, and/or to synthesize those declarations.

    The DOM Level 3 spec should have a fairly detailed description of one
    algorithm for doing that check and fixup. (I drafted the first version
    of that logic, though I think it's been tweaked a bit since then.) I'd
    suggest reading that before implementing your own DOM-printer.

    Alternatively, you can insist that whoever constructs your DOM take
    responsibility for making sure that all the necessary Attribute nodes
    exist to declare the namespaces. (Note that they have to be in the
    correct namespace themselves...). But it's probably better not to count
    on that unless you have full control of both sides of the system.

    Note that most DOM implementations these days ship with serializers that
    know how to do the right things, so unless you're creating your own DOM
    or have unusual formatting requirements it might be simpler to just use
    those rather than reimplementing that code. (And of course DOM Level 3
    proposes a standard API for that function.)

    But doing a recursive-descent DOM printer _is_ a good learning exercise,
    so it's probably something you should write at least once. Among other
    things, the same tree-walking logic is useful for many other kinds of
    DOM processing.
    Joe Kesselman, Feb 11, 2006
    #2
    1. Advertising

  3. Simon Brooke

    Simon Brooke Guest

    in message <>, Joe Kesselman
    ('') wrote:

    > Simon Brooke wrote:
    >> I was debugging a new XML generator tonight and trying to determine
    >> why it wasn't working; and realised my dom printer does not output XML
    >> namespace declarations.

    >
    > XML namespace declarations are optional in the DOM, since every node
    > carries its namespace and bindings can be reconstructed when you
    > serialize the DOM's contents as XML. The flipside is that it is the
    > serializer's responsibility to check that the necessary declarations
    > are present as Attribute nodes, and/or to synthesize those
    > declarations.


    Thanks very much!

    > The DOM Level 3 spec should have a fairly detailed description of one
    > algorithm for doing that check and fixup. (I drafted the first version
    > of that logic, though I think it's been tweaked a bit since then.) I'd
    > suggest reading that before implementing your own DOM-printer.


    OK, got it.
    <URL:http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/namespaces-algorithms.html>

    > Note that most DOM implementations these days ship with serializers
    > that know how to do the right things, so unless you're creating your
    > own DOM or have unusual formatting requirements it might be simpler to
    > just use those rather than reimplementing that code. (And of course DOM
    > Level 3 proposes a standard API for that function.)


    Yup. The thing is I wrote my printer back in February 2000 when there
    weren't a lot of others around - which makes it surprising that it's
    failure to do the right things with namespaces hasn't tripped me up
    before. It would probably be more economic now to just make a call to
    the DOM3 serialiser API, but as a matter of craftsmanship I'd like to
    get mine right.

    OK, so: we look at a node and see if it needs a namespace, and if it does
    we generate a namespace declaration. Suppose we have a structure

    1 <a>
    2 <b>
    3 <foo:c/>
    4 <foo:d/>
    5 </b>
    6 <bar:e/>
    7 </a>

    am I right in thinking that it would be correct to attach the 'foo'
    namespace declaration at any of nodes c /and/ d, or at node b, or at
    node a, and the 'bar' namespace declaration at either node e or node a?

    Clearly not duplicating the declaration makes the job of the parser
    easier. Is there any good reason not to pre-scan the tree an collect all
    of the namespaces used and declare them on the root element of the
    document? Looking at the 'algorithms' page it seems that unless two
    elements use the same prefix to indicate different namespaces, there
    should be no problem in 'shuffling' namespace declaration as high up the
    tree as possible.

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/

    ;; When all else fails, read the distractions.
    Simon Brooke, Feb 11, 2006
    #3
  4. * Simon Brooke wrote in comp.text.xml:
    >OK, so: we look at a node and see if it needs a namespace, and if it does
    >we generate a namespace declaration. Suppose we have a structure
    >
    >1 <a>
    >2 <b>
    >3 <foo:c/>
    >4 <foo:d/>
    >5 </b>
    >6 <bar:e/>
    >7 </a>
    >
    >am I right in thinking that it would be correct to attach the 'foo'
    >namespace declaration at any of nodes c /and/ d, or at node b, or at
    >node a, and the 'bar' namespace declaration at either node e or node a?


    xmlns:foo must be in scope of c and d, adding them there would do the
    job, as well as adding them to one of the ancestors. Adding them to
    a,b,c,d would also be possible, for example, but probably be redundant.
    Note that 'foo' might map to different namespace names on different
    elements, e.g.

    <x>
    <y:z xmlns:y='foo' />
    <y:z xmlns:y='bar' />
    </x>

    would also be possible and there might be content that depends on the
    prefixes (e.g., XPath expressions in a XSLT document), so if you have

    <x some-qname-attribute='y:z' xmlns:y='foo'>
    <y:example />
    </x>

    mapping that to

    <x some-qname-attribute='y:z'>
    <y:example xmlns:y='foo' />
    </x>

    might be a bad idea.

    >Clearly not duplicating the declaration makes the job of the parser
    >easier. Is there any good reason not to pre-scan the tree an collect all
    >of the namespaces used and declare them on the root element of the
    >document? Looking at the 'algorithms' page it seems that unless two
    >elements use the same prefix to indicate different namespaces, there
    >should be no problem in 'shuffling' namespace declaration as high up the
    >tree as possible.


    This is true in general, but it would turn a probably incorrect document
    like

    <x some-qname-attribute='y:z'>
    <y:example xmlns:y='foo' />
    </x>

    into a correct document, which might not be intended. Of course, QNames
    in content might not be a concern for your application.
    --
    Björn Höhrmann · mailto: · http://bjoern.hoehrmann.de
    Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
    68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
    Bjoern Hoehrmann, Feb 11, 2006
    #4
  5. Simon Brooke

    Simon Brooke Guest

    in message <>,
    Bjoern Hoehrmann ('') wrote:

    > * Simon Brooke wrote in comp.text.xml:
    >>OK, so: we look at a node and see if it needs a namespace, and if it
    >>does we generate a namespace declaration. Suppose we have a structure
    >>
    >>1 <a>
    >>2 <b>
    >>3 <foo:c/>
    >>4 <foo:d/>
    >>5 </b>
    >>6 <bar:e/>
    >>7 </a>
    >>
    >>am I right in thinking that it would be correct to attach the 'foo'
    >>namespace declaration at any of nodes c /and/ d, or at node b, or at
    >>node a, and the 'bar' namespace declaration at either node e or node a?

    >
    > xmlns:foo must be in scope of c and d, adding them there would do the
    > job, as well as adding them to one of the ancestors. Adding them to
    > a,b,c,d would also be possible, for example, but probably be redundant.
    > Note that 'foo' might map to different namespace names on different
    > elements, e.g.
    >
    > <x>
    > <y:z xmlns:y='foo' />
    > <y:z xmlns:y='bar' />
    > </x>
    >
    > would also be possible and there might be content that depends on the
    > prefixes (e.g., XPath expressions in a XSLT document), so if you have
    >
    > <x some-qname-attribute='y:z' xmlns:y='foo'>
    > <y:example />
    > </x>
    >
    > mapping that to
    >
    > <x some-qname-attribute='y:z'>
    > <y:example xmlns:y='foo' />
    > </x>
    >
    > might be a bad idea.
    >
    >>Clearly not duplicating the declaration makes the job of the parser
    >>easier. Is there any good reason not to pre-scan the tree an collect
    >>all of the namespaces used and declare them on the root element of the
    >>document? Looking at the 'algorithms' page it seems that unless two
    >>elements use the same prefix to indicate different namespaces, there
    >>should be no problem in 'shuffling' namespace declaration as high up
    >>the tree as possible.

    >
    > This is true in general, but it would turn a probably incorrect
    > document like
    >
    > <x some-qname-attribute='y:z'>
    > <y:example xmlns:y='foo' />
    > </x>
    >
    > into a correct document, which might not be intended. Of course, QNames
    > in content might not be a concern for your application.


    OK, my algorithm at this stage is as follows

    if ( responsibleForNamespaceDeclarations )
    {
    try
    {
    spaces = recursivelyCollectNamespaces( node );

    Enumeration keys = spaces.keys( );

    while ( keys.hasMoreElements( ) )
    {
    String key = keys.nextElement( ).toString( );
    printNS( key, spaces.get( key ).toString( ), out,
    level + 1 );
    }

    responsibleForNamespaceDeclarations = false;
    }
    catch ( NamespaceCollisionException e )
    {
    String uri = node.getNamespaceURI( );
    String prefix = node.getPrefix( );

    if ( ( uri != null ) && ( prefix != null ) )
    {
    printNS( prefix, uri, out, level + 1);
    }

    System.err.println( "Namespace clash: " + e.getMessage( ) );
    }
    }
    ...
    for ( int i = 0; i < children.length(); i++ )
    {
    print( children.item( i ), out, level + 1,
    responsibleForNamespaceDeclarations );
    }

    That is to say, when printing an element node, I do recursive descent to
    collect all the namespaces down tree from it. If there is a collision,
    then if I have a local namespace to deal with, I deal with that locally,
    and leave responsibility for printing namespaces set for the child
    nodes. If there is no collision, then I deal with all the down-tree
    namespaces and clear the responsibleForNamespaceDeclarations flag.

    Can anyone see problems with this? And what do I do about the default
    namespace? Will the default namespace have getNamespaceURI() non-null
    and getPrefix() null?

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/

    The Conservative Party is now dead. The corpse may still be
    twitching, but resurrection is not an option - unless Satan
    chucks them out of Hell as too objectionable even for him.
    Simon Brooke, Feb 11, 2006
    #5
  6. Bjoern Hoehrmann, Feb 11, 2006
    #6
  7. Simon Brooke wrote:
    >? Will the default namespace have getNamespaceURI() non-null
    > and getPrefix() null?


    Yes.

    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Feb 11, 2006
    #7
  8. Simon Brooke

    Simon Brooke Guest

    in message <>, Joe Kesselman
    ('') wrote:

    > Simon Brooke wrote:
    >>? Will the default namespace have getNamespaceURI() non-null
    >> and getPrefix() null?

    >
    > Yes.


    Thanks.

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/

    ;; no eternal reward will forgive us now for wasting the dawn.
    ;; Jim Morrison
    Simon Brooke, Feb 11, 2006
    #8
  9. Simon Brooke

    Simon Brooke Guest

    in message <>, Joe Kesselman
    ('') wrote:

    > Simon Brooke wrote:
    >>? Will the default namespace have getNamespaceURI() non-null
    >> and getPrefix() null?

    >
    > Yes.


    Thanks.

    [did I reply to this already?]

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/
    Iraq war: it's time for regime change...
    ... go now, Tony, while you can still go with dignity.
    [update 18 months after this .sig was written: it's still relevant]
    Simon Brooke, Feb 11, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Spartanicus
    Replies:
    2
    Views:
    529
  2. Replies:
    3
    Views:
    128
  3. space on text node DOM2

    , Dec 3, 2005, in forum: Javascript
    Replies:
    3
    Views:
    129
  4. Replies:
    3
    Views:
    156
  5. table row property DOM2

    , Dec 5, 2005, in forum: Javascript
    Replies:
    4
    Views:
    129
    Thomas 'PointedEars' Lahn
    Dec 6, 2005
Loading...

Share This Page