Root element specified by DTD ?

Discussion in 'XML' started by Andy Dingley, Jun 2, 2006.

  1. Andy Dingley

    Andy Dingley Guest

    What specifies the permitted root element(s) for a document ? HTML,
    SGML, XHTML or XML ?


    Valid HTML documents need to have a well-known DTD and a doctypedecl in
    each document like this:
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">

    The document's root element is "HTML", and is specified by the
    doctypedecl. For HTML and XHTML it's possible that the prose of their
    recommendation restricts it too.


    My question is, is there any way to author a non-HTML DTD (SGML or XML)
    so as to restrict valid documents to only allow a certain subset of
    their elements to be used as the root element? Can this restriction be
    expressed _entirely_ within a DTD? Is this used within the HTML DTDs ?
    (i.e. not just in the doctypedecl)

    Is this fragment a valid HTML document ? If not, why isn't it? Just
    which part of its definition is forbidding this fragmentary use?
    <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">
    <div>
    <p>Foo</p>
    </div>


    Good tutorial refs on DTDs are also welcome. I don't know anything like
    enough on DTD innards.

    Thanks
     
    Andy Dingley, Jun 2, 2006
    #1
    1. Advertising

  2. Andy Dingley

    Lachlan Hunt Guest

    Andy Dingley <> wrote:
    > What specifies the permitted root element(s) for a document ? HTML,
    > SGML, XHTML or XML ?


    Any element may be the root element. There is nothing in the DTD that
    says which elements may or may not be the root element. The element
    used as the root element is specified by the DOCTYPE, just like in the
    example you gave.

    > Is this fragment a valid HTML document ?...
    > <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
    > "http://www.w3.org/TR/html4/strict.dtd">
    > <div>
    > <p>Foo</p>
    > </div>


    Yes, it's valid. The validator would have told you that.

    --
    Lachlan Hunt
    http://lachy.id.au/
    http://GetFirefox.com/ Rediscover the Web
    http://GetThunderbird.com/ Reclaim your Inbox
     
    Lachlan Hunt, Jun 2, 2006
    #2
    1. Advertising

  3. Andy Dingley

    Chris Morris Guest

    Lachlan Hunt <> writes:
    > Andy Dingley <> wrote:
    > > Is this fragment a valid HTML document ?...
    > > <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
    > > "http://www.w3.org/TR/html4/strict.dtd">
    > > <div>
    > > <p>Foo</p>
    > > </div>

    >
    > Yes, it's valid. The validator would have told you that.


    It's valid, but is it a valid *HTML* document? I think not, since
    http://www.w3.org/TR/html4/struct/global.html
    requires HTML documents to have title elements
    "Every HTML document *must* have a TITLE element in the HEAD section."

    Those requirements can't be fully enforced at the DTD level, but are
    in the specification. It's clearly a valid SGML document, but I think
    describing it as HTML is dubious.

    --
    Chris
     
    Chris Morris, Jun 2, 2006
    #3
  4. Andy Dingley

    Andy Dingley Guest

    Lachlan Hunt wrote:
    > Andy Dingley <> wrote:


    > > Is this fragment a valid HTML document ?...
    > > <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
    > > "http://www.w3.org/TR/html4/strict.dtd">
    > > <div>
    > > <p>Foo</p>
    > > </div>

    >
    > Yes, it's valid. The validator would have told you that.


    I don't know _what_ the validator is telling me. As an example (from
    Tidy) it gives a warning
    "inserting missing 'title' element"

    Now to my mind, this suggests that it's seen as a valid serialisation
    of a HTML document, but that after parsing it the HTML-specific tool
    has implied the <html>, <head>, <title> and presumably <body> elements.
    Now that's quite a different behaviour to "These documents are valid
    as fragments based on any root element".

    I also don't have a generic SGML parser to hand, just HTML ones. My
    real interest here is in the XML or SGML cases, not anything
    HTML-specific that is being implied by the context or HTTP headers.
     
    Andy Dingley, Jun 2, 2006
    #4
  5. >I don't know _what_ the validator is telling me. As an example (from
    >Tidy) it gives a warning
    >"inserting missing 'title' element"


    Tidy isn't a validatator. It's a tool for repairing broken documents.
     
    Joe Kesselman, Jun 2, 2006
    #5
  6. Chris Morris wrote:
    > It's valid, but is it a valid *HTML* document?


    Please note: HTML is not an XML language; it's based on SGML, and its
    DTDs follow somewhat different rules.

    If you're talking about XML-validity and HTML in the same sentence, you
    want to move to XHTML (and hope the tools you and your customers are
    using support it). Or, work in XML at the source level, and then render
    into HTML at the end for output to the user; XSLT can be used to do that.
     
    Joe Kesselman, Jun 2, 2006
    #6
  7. Chris Morris wrote:
    > Lachlan Hunt <> writes:
    >> Andy Dingley <> wrote:
    >>> Is this fragment a valid HTML document ?...
    >>> <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
    >>> "http://www.w3.org/TR/html4/strict.dtd">
    >>> <div>
    >>> <p>Foo</p>
    >>> </div>

    >> Yes, it's valid. The validator would have told you that.

    >
    > It's valid, but is it a valid *HTML* document?


    It isn't an HTML document at all. By its own declaration, it's a DIV
    document.
     
    Harlan Messinger, Jun 2, 2006
    #7
  8. Harlan Messinger wrote:
    > Chris Morris wrote:
    >> Lachlan Hunt <> writes:
    >>> Andy Dingley <> wrote:
    >>>> Is this fragment a valid HTML document ?...
    >>>> <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
    >>>> "http://www.w3.org/TR/html4/strict.dtd">
    >>>> <div>
    >>>> <p>Foo</p>
    >>>> </div>
    >>> Yes, it's valid. The validator would have told you that.

    >>
    >> It's valid, but is it a valid *HTML* document?

    >
    > It isn't an HTML document at all. By its own declaration, it's a DIV
    > document.


    http://www.w3.org/TR/html4/struct/global.html#h-7.3

    "After document type declaration, the remainder of an HTML document is
    contained by the HTML element."
     
    Harlan Messinger, Jun 2, 2006
    #8
  9. Andy Dingley

    Peter Flynn Guest

    Andy Dingley <> wrote:
    > What specifies the permitted root element(s) for a document ? HTML,
    > SGML, XHTML or XML ?


    When using a DTD, any declared element type can be the root element.
    It must be specified in the Document Type Declaration in the XML file.
    The same is true for SGML, HTML, XHTML eg

    <!DOCTYPE table PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN">

    specifies a document starting with <table> and containing anything
    valid in HTML 4.01 tables.

    Warning: *browsers* are not SGML conforming applications, so they won't
    understand this. They *will* understand if you use XML or XHTML, but
    I don't know what their reaction to a XHTML fragment would be.

    > My question is, is there any way to author a non-HTML DTD (SGML or XML)
    > so as to restrict valid documents to only allow a certain subset of
    > their elements to be used as the root element?


    Yep, just use the element type name of your choice in the Document
    Type Declaration. This is required to be supported by all conforming
    editors using a DTD. If you use a Schema, all bets are off, as the
    specification of a root element type is done quite differently there.

    > Can this restriction be
    > expressed _entirely_ within a DTD?


    No, not at all. *Any* element type of a DTD can be used as the root
    element type.

    But conforming applications (eg editors) usually make a good guess
    if they are worth anything, when they parse the DTD -- it's not
    hard for them to spot that at least one element type is never used
    in the content model of any other element type, and is therefore a
    good choice for a default root element type. Oddly, some otherwise
    very good editors fail to do this, possibly because their programmers
    simply didn't grok XML markup.

    > Is this used within the HTML DTDs ?
    > (i.e. not just in the doctypedecl)


    Not explicitly.

    > Is this fragment a valid HTML document ?


    Yes, perfectly.

    > If not, why isn't it? Just
    > which part of its definition is forbidding this fragmentary use?
    > <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
    > "http://www.w3.org/TR/html4/strict.dtd">
    > <div>
    > <p>Foo</p>
    > </div>


    You can test this by running it through any SGML validating parser
    (eg nsgmls).

    > Good tutorial refs on DTDs are also welcome. I don't know anything like
    > enough on DTD innards.


    The best by far is still Eve Maler and Jeanne El Andaloussi, "Developing
    SGML DTDs -- from text to model to markup", Prentice Hall, 1996. You
    just have to skip the bits which refer to those parts of SGML which were
    dropped in the XML Specification (see the list in the FAQ on converting
    DTDs to XML at http://xml.silmaril.ie/developers/dtdconv/).

    But you should also bone up on Relax NG, which is a schema language with
    a short (DTD-like) syntax as well as a verbose syntax, from which you
    can generate DTDs, W3C Schemas, and more. This may be an easier way into
    document modelling.

    ///Peter
    --
    XML FAQ: http://xml.silmaril.ie/
     
    Peter Flynn, Jun 3, 2006
    #9
  10. Andy Dingley

    Peter Flynn Guest

    Chris Morris wrote:
    > Lachlan Hunt <> writes:
    >> Andy Dingley <> wrote:
    >>> Is this fragment a valid HTML document ?...
    >>> <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
    >>> "http://www.w3.org/TR/html4/strict.dtd">
    >>> <div>
    >>> <p>Foo</p>
    >>> </div>

    >> Yes, it's valid. The validator would have told you that.

    >
    > It's valid, but is it a valid *HTML* document? I think not, since
    > http://www.w3.org/TR/html4/struct/global.html
    > requires HTML documents to have title elements
    > "Every HTML document *must* have a TITLE element in the HEAD section."
    >
    > Those requirements can't be fully enforced at the DTD level, but are
    > in the specification. It's clearly a valid SGML document, but I think
    > describing it as HTML is dubious.


    It's a HTML *fragment*. Browsers may gag on it. Properly conformant
    software won't.

    ///Peter
    --
    XML FAQ: http://xml.silmaril.ie/
     
    Peter Flynn, Jun 3, 2006
    #10
  11. Peter Flynn <> scripsit:

    >> Is this fragment a valid HTML document ?

    >
    > Yes, perfectly.


    No, it is a valid SGML document, but it is not an HTML document, as defined
    in HTML specifications. (Of course, most "HTML documents" on the Web are not
    HTML documents in that sense, but the question is meaningful only if
    interpreted as relating to specifications. "HTML document" in the loose
    sense - as well as "XML document" when well-formedness is not required - is
    far too fuzzy a concept to be argued about.)

    >> If not, why isn't it? Just
    >> which part of its definition is forbidding this fragmentary use?
    >> <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
    >> "http://www.w3.org/TR/html4/strict.dtd">
    >> <div>
    >> <p>Foo</p>
    >> </div>

    >
    > You can test this by running it through any SGML validating parser
    > (eg nsgmls).


    That would indicate the validity, but the HTML 4.01 specification requires
    that one of three specific DOCTYPE declarations be used - not just that one
    of three DTDs be used. And this isn't one of them. Moreover, the
    specification explicitly says:
    "After document type declaration, the remainder of an HTML document is
    contained by the HTML element."
    http://www.w3.org/TR/REC-html40/struct/global.html#h-7.3

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Jun 3, 2006
    #11
  12. In other words: As always, a DTD -- or a schema -- is only a partial
    description of what makes a document correct and meaningful. Think of
    these as "higher-level syntax checking"; the application is always going
    to impose semantic constraints as well.

    Having the schema or DTD describes the document's structure in a
    machine-readable form that tools can take advantage of, so they don't
    have to do *all* the checking themselves. That's valuable. But don't
    expect it to be complete.

    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
     
    Joe Kesselman, Jun 3, 2006
    #12
  13. Joe Kesselman <> scripsit:

    > In other words:


    In future, please quote or paraphrase the message that you are commenting
    on.

    >As always, a DTD -- or a schema -- is only a partial
    > description of what makes a document correct and meaningful.


    It depends on. There's no law that requires additional rules, though pure
    syntax as such _is_ somewhat boring.

    > Think of
    > these as "higher-level syntax checking"; the application is always
    > going to impose semantic constraints as well.


    What's "higher-level" here? Anyway, in the issue discussed in this thread,
    it is the additional _syntactic_ constraints that imply that a certain kind
    of document is not an HTML document. There's nothing semantic in the
    requirement that a document contain a specific DOCTYPE declaration or that a
    document contain a <title> element. (Requiring that the <title> element
    contain text that is a descriptive name for the document, especially for use
    as a title for it in different contexts, would be a semantic requirement.
    Whether HTML specifications make such a requirement is debatable; the prose
    in the specs is a mixture of normative-looking prose, comments, hints,
    wishful thinking, etc.)

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Jun 3, 2006
    #13
  14. In article <>,
    "Andy Dingley <>" <>
    wrote:

    > My question is, is there any way to author a non-HTML DTD (SGML or XML)
    > so as to restrict valid documents to only allow a certain subset of
    > their elements to be used as the root element? Can this restriction be
    > expressed _entirely_ within a DTD?


    No and no.

    RELAX NG can restrict the allowed roots and does not allow the document
    to override.

    > Is this fragment a valid HTML document ?


    > <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
    > "http://www.w3.org/TR/html4/strict.dtd">
    > <div>
    > <p>Foo</p>
    > </div>


    Valid in the SGML sense but not conforming to the HTML 4.01 spec.
    Validity is overrated. DTD-validity is especially overrated.

    > Good tutorial refs on DTDs are also welcome. I don't know anything like
    > enough on DTD innards.


    Since you haven't learning invested in DTDs, unless you have a
    non-negotiable requirement to use them, I suggest learning RELAX NG
    Compact Syntax instead:
    http://relaxng.org/compact-tutorial-20030326.html

    --
    Henri Sivonen

    http://hsivonen.iki.fi/
    Validation Service for RELAX NG: http://hsivonen.iki.fi/validator/
     
    Henri Sivonen, Jun 3, 2006
    #14
  15. On Sat, 3 Jun 2006, Joe Kesselman wrote:

    > In other words:


    Who and what are you trying to restate? Your header says it's
    <UH8gg.662$> by Jukka, but readers have
    no idea which part(s) of that posting you are trying to comment, on,
    contradict, misquote, or whatever. Please observe customary usenet
    courtesies.

    > As always, a DTD -- or a schema -- is only a partial
    > description of what makes a document correct and meaningful.


    The W3C HTML specification requires the document root to be the <html>
    element. That seems to me to be a syntactic constraint on anything
    which lays claim to being an "HTML document" (as opposed to a
    fragment). Which is part of what Jukka said, and which you appear to
    be trying to obfuscate.

    > Think of these as "higher-level syntax checking"; the application is
    > always going to impose semantic constraints as well.


    Of course; but your comment, far from being a restatement "in other
    words" of the article you were following-up to, appears to be some
    quite unrelated issue, that throws little or no light on what Jukka
    said. By failing to quote the relevant parts on which you are
    commenting, you give the unfortunate impression that you are making it
    harder for readers to see just how the reasoning is being de-railed.

    > Having the schema or DTD describes the document's structure in a
    > machine-readable form that tools can take advantage of, so they
    > don't have to do *all* the checking themselves. That's valuable. But
    > don't expect it to be complete.


    It seems to me that you could do well to distinguish between an "HTML
    document", and an HTML fragment. The kind of HTML fragment under
    discussion here is not (IMO) an "HTML document" within the meaning of
    the applicable specifications, and that is on syntactic grounds.

    Jukka is going a bit far at the point where he says:

    |the HTML 4.01 specification requires that one of three specific
    |DOCTYPE declarations be used ...

    - since this would appear to rule out ISO HTML as being a bona fide
    kind of HTML, quite apart from the various custom DTD which are
    around, and which I think most folks would accept as *kinds* of HTML
    document, albeit not approved by the W3C.

    But the main argument does not hinge on that detail, as far as I can
    tell. Their root element (express or implied) needs to be <html>
    before they can be an "HTML document".

    h t h
     
    Alan J. Flavell, Jun 3, 2006
    #15
  16. In article <>,
    "Alan J. Flavell" <> wrote:

    > Jukka is going a bit far at the point where he says:
    >
    > |the HTML 4.01 specification requires that one of three specific
    > |DOCTYPE declarations be used ...
    >
    > - since this would appear to rule out ISO HTML as being a bona fide
    > kind of HTML,


    I think it is quite appropriate to claim that ISO HTML is not conforming
    HTML *4.01*.

    --
    Henri Sivonen

    http://hsivonen.iki.fi/
    Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
     
    Henri Sivonen, Jun 3, 2006
    #16
  17. On Sat, 3 Jun 2006, Henri Sivonen wrote:

    > "Alan J. Flavell" <> wrote:
    >
    > > Jukka is going a bit far at the point where he says:
    > >
    > > |the HTML 4.01 specification requires that one of three specific
    > > |DOCTYPE declarations be used ...
    > >
    > > - since this would appear to rule out ISO HTML as being a bona
    > > fide kind of HTML,

    >
    > I think it is quite appropriate to claim that ISO HTML is not
    > conforming HTML *4.01*.


    Oh, indeed. What Jukka said was entirely reasonable within its own
    terms, but what light did it throw on a generic definition of the term
    "HTML document"? I suppose I was griping more about what he didn't
    say, than about what he did. Sorry.

    Maybe we're losing sight of where this discussion came from:

    |> > Just
    |> > which part of its definition is forbidding this fragmentary use?
    |> > <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
    |> > "http://www.w3.org/TR/html4/strict.dtd">
    |> > <div>
    |> > <p>Foo</p>
    |> > </div>

    It seems entirely plausible to test *that* particular question against
    the HTML/4.01 specification, since it calls-out the HTML/4.01 DTD [1]

    But then we have to differentiate the question 'what defines an "HTML
    document" according to this or that specific flavour of HTML?' from
    the more general question of 'who is entitled to define the term "HTML
    document" without reference to any specific flavour of HTML, and where
    would we find such a definition?'.

    I'm saying that - no matter which specific HTML DTD were to be called
    out from the above DOCTYPE - the result could be an HTML fragment, but
    it would be unreasonable to claim it as an "HTML document". But I'm
    not sure that I would be able to give you chapter and verse to settle
    that argument authoritiatively. And no review of definitions of each
    /individual version of HTML/ could suffice to define the term "HTML"
    generically.

    regards

    [1] Yes, I've reviewed the historic arguments about an SGML DTD not
    defining what we all had thought it did. But they relied on doing
    things which HTML rules out, but which SGML does not allow to be ruled
    out. Taken to its logical conclusion, that would result in HTML
    disappearing entirely in a puff of logic. I didn't want to go there.
     
    Alan J. Flavell, Jun 3, 2006
    #17
  18. Andy Dingley

    Jack Guest

    Henri Sivonen wrote:
    > In article <>,
    > "Alan J. Flavell" <> wrote:
    >
    >> Jukka is going a bit far at the point where he says:
    >>
    >> |the HTML 4.01 specification requires that one of three specific
    >> |DOCTYPE declarations be used ...
    >>
    >> - since this would appear to rule out ISO HTML as being a bona fide
    >> kind of HTML,

    >
    > I think it is quite appropriate to claim that ISO HTML is not
    > conforming HTML *4.01*.
    >

    Would you care to expand on this apparently rather odd statement?

    As far as I am aware, ISO HTML is essentially a restatement of W3C HTML
    4.01, with certain recommendations transformed into requirements, and
    certain deprecations transformed into exclusions. Apart from that, the
    recommended DTD declaration is different; but the exact DTD to be
    declared is not a requirement of W3C HTML 4.01 anyway.

    Pleae explain whatever I may have misunderstood!

    --
    Jack.
     
    Jack, Jun 3, 2006
    #18
  19. In article <e5rtl7$atf$1$>,
    Jack <> wrote:

    > Henri Sivonen wrote:
    > > In article <>,
    > > "Alan J. Flavell" <> wrote:
    > >
    > >> Jukka is going a bit far at the point where he says:
    > >>
    > >> |the HTML 4.01 specification requires that one of three specific
    > >> |DOCTYPE declarations be used ...
    > >>
    > >> - since this would appear to rule out ISO HTML as being a bona fide
    > >> kind of HTML,

    > >
    > > I think it is quite appropriate to claim that ISO HTML is not
    > > conforming HTML *4.01*.
    > >

    > Would you care to expand on this apparently rather odd statement?


    The specs make incompatible requirements about the doctype, which means
    conformance to the specs is mutually exclusive.

    > As far as I am aware, ISO HTML is essentially a restatement of W3C HTML
    > 4.01, with certain recommendations transformed into requirements, and
    > certain deprecations transformed into exclusions. Apart from that, the
    > recommended DTD declaration is different; but the exact DTD to be
    > declared is not a requirement of W3C HTML 4.01 anyway.


    But Jukka Korpela pointed out in the quoted part that W3C HTML 4.01 does
    have a requirement of particular doctypes.

    (Whether these requirements should be considered bogus or not is another
    matter.)

    --
    Henri Sivonen

    http://hsivonen.iki.fi/
    Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
     
    Henri Sivonen, Jun 3, 2006
    #19
  20. Andy Dingley

    VK Guest

    Alan J. Flavell wrote:
    > I'm saying that - no matter which specific HTML DTD were to be called
    > out from the above DOCTYPE - the result could be an HTML fragment, but
    > it would be unreasonable to claim it as an "HTML document".


    You have no choice but claim it as "HTML document". It is served from
    the served with "Content-Type: text/html", for local files it is served
    as the same type by association .html,.htm... --> text/html.

    So before any DTD you /have/ to explicetly declare what document you
    are serving - this is the only way to make an application to react on
    it. This way however you would twist around an HTML code, it is always
    /HTML document/ for the recipient: correctly formatted or badly broken
    is another issue. Out of curiosity you can serve a page from your
    server such as:

    Content-Type: text/html\n\n
    !@#$%&*


    P.S. I'm really glad to see that the discussion at
    <http://groups.google.com/group/comp.infosystems.www.authoring.html/browse_frm/thread/4fd4218808cd53ce>

    triggered your curiosity and the thinking process in whole.

    Just try to not put your frustration on Mr.Kesselman - he has nothing
    to do with it.
     
    VK, Jun 3, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joseph Tilian
    Replies:
    0
    Views:
    377
    Joseph Tilian
    Dec 21, 2004
  2. Ronald Fischer
    Replies:
    4
    Views:
    1,806
    Ronald Fischer
    Mar 17, 2005
  3. Christian

    DTD names root element?

    Christian, Aug 19, 2003, in forum: XML
    Replies:
    3
    Views:
    517
    Bob Foster
    Aug 26, 2003
  4. test
    Replies:
    2
    Views:
    2,175
    Oliver Wong
    Jul 28, 2006
  5. VK
    Replies:
    8
    Views:
    535
    Joseph Kesselman
    Oct 31, 2006
Loading...

Share This Page