If DTD is unspecifed XML should not parse

M

Mithil

Hello everyone,

I have a question regarding DTD and XML, is there any way to stop the
parser in parsing the XML file if the DTD is not specified in the
Doctype of the XML file and also throw an error ? I am using java by
the way any help is greatly appreciated.

Regards,
Mithil
 
J

Joe Kesselman

Mithil said:
I have a question regarding DTD and XML, is there any way to stop the
parser in parsing the XML file if the DTD is not specified in the
Doctype of the XML file and also throw an error ? I am using java by
the way any help is greatly appreciated.

If the DTD is not specified by the document type, validation is not
performed and parsing runs normally.

If you really insist on rejecting these documents... Depending on the
parser and API you're using, you may be able to detect that no DTD has
been specified and have your program do something appropriate. If you're
using a SAX parser which presents this information, your handler may be
able to crash the parser by throwing an exception. Hope that helps.
 
R

Richard Tobin

I have a question regarding DTD and XML, is there any way to stop the
parser in parsing the XML file if the DTD is not specified in the
Doctype of the XML file and also throw an error ? I am using java by
the way any help is greatly appreciated.
[/QUOTE]
If the DTD is not specified by the document type, validation is not
performed and parsing runs normally.

But presumably the "invalid" indicator will be set (whatever that is
for the parser in question), so if you want to reject invalid documents
are well as ones without a DTD you can use that.

-- Richard
 
R

RedGrittyBrick

If the DTD is not specified by the document type, validation is not
performed and parsing runs normally.

But presumably the "invalid" indicator will be set (whatever that is
for the parser in question), so if you want to reject invalid documents
are well as ones without a DTD you can use that.
[/QUOTE]

Just because an XML document lacks a DTD doesn't mean it is invalid does
it? It might conform to an external XSD schema or external DTD?
 
R

Richard Tobin

Just because an XML document lacks a DTD doesn't mean it is invalid does
it? It might conform to an external XSD schema or external DTD?

The word "valid" is used in various ways, but the XML spec use it to
mean valid with respect to the DTD referred to in the document. If it
doesn't refer to a DTD, it's invalid.

-- Richard
 
J

Joe Kesselman

The word "valid" is used in various ways, but the XML spec use it to
mean valid with respect to the DTD referred to in the document. If it
doesn't refer to a DTD, it's invalid.

There are arguably multiple states: Not validated (well-formed only, not
tested), invalid (DTD validation attempted and failed), valid (DTD
validation attempted and succeeded), schema-invalid and schema-valid.
(The latter two are distinguished only in the Post-Schema-Validation
infoset, not in the basic infoset.)

As far as I can tell, the basic XML Infoset doesn't actually included
any indication of these states as part of its information content. There
are pieces of information which are only available when a document is
valid, or when it was at least processed with a validating parser, but
that's the closest I can find. Apparently detecting validation success
or failure was left to whatever mechanism you use to invoke the parser
and/or validator.
 
R

Richard Tobin

Joe Kesselman said:
There are arguably multiple states: Not validated (well-formed only, not
tested), invalid (DTD validation attempted and failed), valid (DTD
validation attempted and succeeded),

True, but the XML spec says that validating parsers must report
violations of validity constraints, and a document without a DTD
will violate at least one.
As far as I can tell, the basic XML Infoset doesn't actually included
any indication of these states as part of its information content.

Yes, the Infoset doesn't address validity except in the cases where
invalidity prevents an item from having a value (notably the
[references] property of attributes).
Apparently detecting validation success
or failure was left to whatever mechanism you use to invoke the parser
and/or validator.

All that's required is there must be such a mechanism for a validating
parser.

-- Richard
 
J

Joe Kesselman

Apparently detecting validation success
All that's required is there must be such a mechanism for a validating
parser.

Yep. And certainly the various parser APIs (SAX, JAXP, the DOM3 document
load operations) do report this.

I just would have been a bit happier, from an architectural point of
view, if this had been made one of the properties of the Infoset.

Oh well. In an ideal world we would have developed the Infoset first,
including all the afterthoughts like namespaces, then developed the
schema language and XML markup syntax from that. Maybe if/when XML ever
graduates from Recommendation to Standard (the semi-mythical XML 2.0?)
we'll have the luxury of being able to do it that way. Meanwhile, the
advantage of developing from the syntax forward was that we were able to
put XML into use immediately; the disadvantage is that it has a bunch of
minor warts.
 
M

Mithil

wow thanks guys I think this argument gave me insight into more stuff.
I really appreciated it thanks again.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,573
Members
45,046
Latest member
Gavizuho

Latest Threads

Top