Can someone confirm my DTD & namespace solution?

D

D McGilvray

Hi, I've been researching for a while to understand namespaces to find a
solution to my problem. I now have a solution which works in my test
examples, but before I roll it out through my software, I was hoping
someone might tell me if I've fallen into any traps.

Particularly, I would like to know what type of parser I am restricted
to using - will it work for any fully conforming validating parser? Or
are there certain levels of conformity to the specification?


I want to include another XML file within my own. The other DTD has no
support for namespaces. I want to avoid naming conflicts, so I need
namespaces. However, I also require validation because I'll be using
ID/IDREF references extensively.

Here is an example xml file. inc represents the included structure. It
and it's children are not bound to any namespaces. The rest of the
elements are part of my own structure, and belong to the namespace ns.


<ns:doc xmlns:ns="http://dougie/test/">
<ns:a/>
<inc/>
</ns:doc>

This is the rather complex DTD for the included structure contained in
'inc.dtd':

<!ELEMENT inc ANY >


Below is the DTD for my document, which contains inc. The other DTD is
included, which defines its structure. The parameter entity nsp defines
the prefix for the namespace so that it can be overridden. However, when
this is entity is expanded, it is surrounded by white space unless it is
expanded within another parameter entity. So any time I want to use the
prefix with a tagname or attribute I have to combine the text within
another param entity. Hence, I have to create the doc entity and
substitute that for the element tag name (rather than writing %nsp;:doc
straight into the ELEMENT definition). And the same for 'a' and the
namespace declaration within the doc element.


<!-- Include other DTD -->
<!ENTITY % incdtd SYSTEM 'inc.dtd' > %incdtd;

<!-- This defines the prefix for the namespace -->
<!ENTITY % nsp 'ns' >

<!-- This combines prefix with namespace attribute name -->
<!ENTITY % nspdec 'xmlns:%nsp;' >

<!-- This combines the prefix with the tagname -->
<!ENTITY % doc '%nsp;:doc'>
<!ELEMENT %doc; ( ns:a , inc ) >
<!ATTLIST %doc;
%nspdec; CDATA #REQUIRED >


<!ENTITY % a '%nsp;:a'>
<!ELEMENT %a; ANY >


Specifying my namespace prefix in the entity nsp means that the
namespace prefix for my document can be overridden in another DTD like so:

<!ENTITY % nsp 'ns' >

<!ENTITY % otherdtd SYSTEM 'test.dtd' > %otherdtd;


Does this all look kosher? It works in my test example, but I don't have
enough experience to say that this complexity is worthwhile.

Many thanks for looking,
Dougie
 
J

Joseph Kesselman

DTDs predate namespaces, and are unaware of them... so if you're trying
to use this combination, you need to write your instance documents to
use EXACTLY the prefixes (or lack thereof) called for by the DTD, and to
declare namespace bindings only where the DTD says they can be declared.
I want to include another XML file within my own. The other DTD has no
support for namespaces. I want to avoid naming conflicts, so I need
namespaces.

The DTDs can and will conflict if they attempt to declare the same
element or attribute names. The only way to avoid that is to have at
least one of them use a prefix, which requires designing the DTD to
expect that prefix.

The kluge of using parameter entities to control which prefix a DTD is
using is ugly, but does work as long as you are careful to make sure the
instance document uses that prefix and only that prefix when referring
to that namespace. And, yes, you need to do a bit of ugly magic to keep
whitespace from being introduced next to the parameter entity, as you've
discovered.
Does this all look kosher? It works in my test example, but I don't have
enough experience to say that this complexity is worthwhile.

Yes, it works. But I would suggest that forcing DTDs to deal with
namespaces is not particularly worthwhile these days. XML Schemas are
fully namespace-aware, and will handle all of this without requiring so
much magic or being so fragile.
 
D

D McGilvray

Joseph said:
DTDs predate namespaces, and are unaware of them... so if you're trying
to use this combination, you need to write your instance documents to
use EXACTLY the prefixes (or lack thereof) called for by the DTD, and to
declare namespace bindings only where the DTD says they can be declared.

Ahhhh, a penny just dropped - because parsers are unaware of namespaces,
they won't recognise default namespaces? So the namespace will have to
be explicitly specified for every element which requires it? I actually
hadn't thought of that, but I can live with it.
Yes, it works. But I would suggest that forcing DTDs to deal with
namespaces is not particularly worthwhile these days. XML Schemas are
fully namespace-aware, and will handle all of this without requiring so
much magic or being so fragile.

Unfortunately, I have no control over the documents which will be
included and, from the sounds of it they have no intention of switching
to Schema's any time soon. It is quite important to remain faithful to
the included document's DTD as an independent document, and valid as an
external entity included within my document. Perhpas later I'll look to
see if there's a way to combine DTD's and Schemas, but I'm happy to use
this for now.

Thanks for your help,
Dougie
 
J

Joseph Kesselman

D said:
Ahhhh, a penny just dropped - because parsers are unaware of namespaces,
they won't recognise default namespaces?

That's not what I said.

Modern parsers are aware of namespaces (or can be told to be so), and
will do the right things with them as far as reading and processing the
document's contents, including inheriting namespace bindings and
applying the default namespace (when one has been asserted). However,
"the right things" doesn't suffice for DTD validation.

DTDs are *NOT* aware of namespaces. As far as the DTD is concerned, a
namespace declaration is just an attribute, and a prefix is just part of
the name of the element or attribute. DTDs will not correctly handle the
case where the name isn't exactly what the DTD says it has to be, so
they won't handle cases where a different prefix was used (or, as is not
uncommon, several prefixes were used with the intent that they refer to
the same namespace).
Unfortunately, I have no control over the documents which will be
included and, from the sounds of it they have no intention of switching
to Schema's any time soon.

If you really insist on doing DTD validation and namespaces at the same
time, using parameter entities will at least let you explicitly avoid
the cases where they decide to use the same prefix you wanted to use,
and still get correct behavior from both the DTD and the document.

But that is going to force you to deal with that explicit avoidance,
which strikes me as excessively fragile. So I'd still be inclined to ask
them them to reconsider this, or to let you process their data as
well-formed rather than trying to do combined DTD validation on the
composite document.
 
D

D McGilvray

Joseph said:
That's not what I said.

Modern parsers are aware of namespaces (or can be told to be so), and
will do the right things with them as far as reading and processing the
document's contents, including inheriting namespace bindings and
applying the default namespace (when one has been asserted). However,
"the right things" doesn't suffice for DTD validation.

Sorry, I explained myself terribly there. I meant to refer to
validation, not parsing. Declaring a default namepsace in my document
(say in a sibling of the included document) doesn't allow me to leave
out the prefix for children of that node.

DTDs are *NOT* aware of namespaces. As far as the DTD is concerned, a
namespace declaration is just an attribute, and a prefix is just part of
the name of the element or attribute. DTDs will not correctly handle the
case where the name isn't exactly what the DTD says it has to be, so
they won't handle cases where a different prefix was used (or, as is not
uncommon, several prefixes were used with the intent that they refer to
the same namespace).


If you really insist on doing DTD validation and namespaces at the same
time, using parameter entities will at least let you explicitly avoid
the cases where they decide to use the same prefix you wanted to use,
and still get correct behavior from both the DTD and the document.

But that is going to force you to deal with that explicit avoidance,
which strikes me as excessively fragile. So I'd still be inclined to ask
them them to reconsider this, or to let you process their data as
well-formed rather than trying to do combined DTD validation on the
composite document.
People bring up the question of moving to Schema on the representation's
mailing list fairly frequently. Each discussion is entertained less than
the previous one.
However, I am the only one forcing combined validation. I have multiple
hierarchies referencing the same data using ID/IDREFS so I really need
these validated because inconsistencies could mean hours investigating
the XML by hand :O.
I appreciate everything you say, it is fragile. Perhaps in the long-term
I could write an application which checks validity rather than relying
on a DTD and a validator. My priority now though, is to get a first
draft finalised with minimal further development of tools.

Thanks very much for the discussion it has proven very helpful.

Cheers,
Dougie
 
J

Joseph Kesselman

D said:
Sorry, I explained myself terribly there. I meant to refer to
validation, not parsing. Declaring a default namepsace in my document
(say in a sibling of the included document) doesn't allow me to leave
out the prefix for children of that node.

The DTD will have been written to either require a specific prefix, or
to require that no prefix be present. Whatever it says, that's how you
have to write the instance document. The DTD will also constrain where
namespace declarations can occur, and may or may not enforce one (which
is the most common kluge for attempting to make DTDs play halfway nicely
with namespaces).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top