If DTD is unspecifed XML should not parse

Discussion in 'XML' started by Mithil, Aug 1, 2007.

  1. Mithil

    Mithil Guest

    Hello everyone,

    I have a question regarding DTD and XML, is there any way to stop the
    parser in parsing the XML file if the DTD is not specified in the
    Doctype of the XML file and also throw an error ? I am using java by
    the way any help is greatly appreciated.

    Regards,
    Mithil
    Mithil, Aug 1, 2007
    #1
    1. Advertising

  2. Mithil wrote:
    > I have a question regarding DTD and XML, is there any way to stop the
    > parser in parsing the XML file if the DTD is not specified in the
    > Doctype of the XML file and also throw an error ? I am using java by
    > the way any help is greatly appreciated.


    If the DTD is not specified by the document type, validation is not
    performed and parsing runs normally.

    If you really insist on rejecting these documents... Depending on the
    parser and API you're using, you may be able to detect that no DTD has
    been specified and have your program do something appropriate. If you're
    using a SAX parser which presents this information, your handler may be
    able to crash the parser by throwing an exception. Hope that helps.



    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Aug 2, 2007
    #2
    1. Advertising

  3. In article <>,
    Joe Kesselman <> wrote:

    >> I have a question regarding DTD and XML, is there any way to stop the
    >> parser in parsing the XML file if the DTD is not specified in the
    >> Doctype of the XML file and also throw an error ? I am using java by
    >> the way any help is greatly appreciated.


    >If the DTD is not specified by the document type, validation is not
    >performed and parsing runs normally.


    But presumably the "invalid" indicator will be set (whatever that is
    for the parser in question), so if you want to reject invalid documents
    are well as ones without a DTD you can use that.

    -- Richard
    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
    Richard Tobin, Aug 2, 2007
    #3
  4. Richard Tobin wrote:
    > In article <>,
    > Joe Kesselman <> wrote:
    >
    >>> I have a question regarding DTD and XML, is there any way to stop the
    >>> parser in parsing the XML file if the DTD is not specified in the
    >>> Doctype of the XML file and also throw an error ? I am using java by
    >>> the way any help is greatly appreciated.

    >
    >> If the DTD is not specified by the document type, validation is not
    >> performed and parsing runs normally.

    >
    > But presumably the "invalid" indicator will be set (whatever that is
    > for the parser in question), so if you want to reject invalid documents
    > are well as ones without a DTD you can use that.
    >


    Just because an XML document lacks a DTD doesn't mean it is invalid does
    it? It might conform to an external XSD schema or external DTD?
    RedGrittyBrick, Aug 4, 2007
    #4
  5. In article <>,
    RedGrittyBrick <> wrote:
    >Just because an XML document lacks a DTD doesn't mean it is invalid does
    >it? It might conform to an external XSD schema or external DTD?


    The word "valid" is used in various ways, but the XML spec use it to
    mean valid with respect to the DTD referred to in the document. If it
    doesn't refer to a DTD, it's invalid.

    -- Richard
    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
    Richard Tobin, Aug 4, 2007
    #5
  6. > The word "valid" is used in various ways, but the XML spec use it to
    > mean valid with respect to the DTD referred to in the document. If it
    > doesn't refer to a DTD, it's invalid.


    There are arguably multiple states: Not validated (well-formed only, not
    tested), invalid (DTD validation attempted and failed), valid (DTD
    validation attempted and succeeded), schema-invalid and schema-valid.
    (The latter two are distinguished only in the Post-Schema-Validation
    infoset, not in the basic infoset.)

    As far as I can tell, the basic XML Infoset doesn't actually included
    any indication of these states as part of its information content. There
    are pieces of information which are only available when a document is
    valid, or when it was at least processed with a validating parser, but
    that's the closest I can find. Apparently detecting validation success
    or failure was left to whatever mechanism you use to invoke the parser
    and/or validator.


    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Aug 4, 2007
    #6
  7. In article <>,
    Joe Kesselman <> wrote:

    >There are arguably multiple states: Not validated (well-formed only, not
    >tested), invalid (DTD validation attempted and failed), valid (DTD
    >validation attempted and succeeded),


    True, but the XML spec says that validating parsers must report
    violations of validity constraints, and a document without a DTD
    will violate at least one.

    >As far as I can tell, the basic XML Infoset doesn't actually included
    >any indication of these states as part of its information content.


    Yes, the Infoset doesn't address validity except in the cases where
    invalidity prevents an item from having a value (notably the
    [references] property of attributes).

    >Apparently detecting validation success
    >or failure was left to whatever mechanism you use to invoke the parser
    >and/or validator.


    All that's required is there must be such a mechanism for a validating
    parser.

    -- Richard
    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
    Richard Tobin, Aug 4, 2007
    #7
  8. >> Apparently detecting validation success
    >> or failure was left to whatever mechanism you use to invoke the parser
    >> and/or validator.

    >
    > All that's required is there must be such a mechanism for a validating
    > parser.


    Yep. And certainly the various parser APIs (SAX, JAXP, the DOM3 document
    load operations) do report this.

    I just would have been a bit happier, from an architectural point of
    view, if this had been made one of the properties of the Infoset.

    Oh well. In an ideal world we would have developed the Infoset first,
    including all the afterthoughts like namespaces, then developed the
    schema language and XML markup syntax from that. Maybe if/when XML ever
    graduates from Recommendation to Standard (the semi-mythical XML 2.0?)
    we'll have the luxury of being able to do it that way. Meanwhile, the
    advantage of developing from the syntax forward was that we were able to
    put XML into use immediately; the disadvantage is that it has a bunch of
    minor warts.

    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Aug 4, 2007
    #8
  9. Mithil

    Mithil Guest

    wow thanks guys I think this argument gave me insight into more stuff.
    I really appreciated it thanks again.
    Mithil, Aug 6, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stuart Miller
    Replies:
    2
    Views:
    1,431
    Stuart Miller
    Aug 3, 2004
  2. Joseph Tilian
    Replies:
    0
    Views:
    351
    Joseph Tilian
    Dec 21, 2004
  3. Stuart Miller
    Replies:
    0
    Views:
    790
    Stuart Miller
    Jul 26, 2004
  4. test
    Replies:
    2
    Views:
    2,025
    Oliver Wong
    Jul 28, 2006
  5. admyc
    Replies:
    3
    Views:
    397
    Martin Honnen
    Dec 3, 2007
Loading...

Share This Page