most XHTML on the web is invalid?

Discussion in 'HTML' started by John Salerno, Feb 5, 2006.

  1. John Salerno

    John Salerno Guest

    What exactly does this mean:

    "Document sent as text/html are handled as tag soup [1] by most UAs.
    This means that authors are not checking for validity, and thus
    most XHTML documents on the web now are invalid. Therefore the main
    advantage of using XHTML, that it has to be valid, is lost of the
    document is then sent as text/html."

    To me it sounds like he is saying that *any* document written in XHTML
    and then served as text/html is invalid. But is that really the case? Or
    is he saying that the document *could* be invalid because it could still
    be prone to the methods of HTML (e.g., no closing tags, etc.)?

    I assume if you validate your XHTML, then simply serving it as text/html
    doesn't harm it, right? It doesn't suddenly make it "invalid," does it?
    (Perhaps in a strict sense it does, because it's not truly XHTML, but as
    far as the actually words in the document themselves, they are still
    valid, right? And if it was then served as application/xhtml-xml, it
    would be valid, correct?)
    John Salerno, Feb 5, 2006
    #1
    1. Advertising

  2. John Salerno

    John Salerno Guest

    John Salerno wrote:
    > What exactly does this mean:
    >
    > "Document sent as text/html are handled as tag soup [1] by most UAs.
    > This means that authors are not checking for validity, and thus
    > most XHTML documents on the web now are invalid. Therefore the main
    > advantage of using XHTML, that it has to be valid, is lost of the
    > document is then sent as text/html."
    >
    > To me it sounds like he is saying that *any* document written in XHTML
    > and then served as text/html is invalid. But is that really the case? Or
    > is he saying that the document *could* be invalid because it could still
    > be prone to the methods of HTML (e.g., no closing tags, etc.)?
    >
    > I assume if you validate your XHTML, then simply serving it as text/html
    > doesn't harm it, right? It doesn't suddenly make it "invalid," does it?
    > (Perhaps in a strict sense it does, because it's not truly XHTML, but as
    > far as the actually words in the document themselves, they are still
    > valid, right? And if it was then served as application/xhtml-xml, it
    > would be valid, correct?)


    Here's a related quote:

    "If you ever switch your documents that claim to be XHTML from
    text/html to application/xhtml+xml, then you will in all likelyhood
    end up with a considerable number of XML errors, meaning your
    content won't be readable by users. (See above: most of these
    documents do not validate.)"

    To me, this argument seems valid only because it carries the
    presupposition that most authors are still writing as if they are using
    HTML instead of XHTML. If you actually know what you are doing (i.e.,
    know how an XML language needs to be structured) and you do that, then
    this point is moot.
    John Salerno, Feb 5, 2006
    #2
    1. Advertising

  3. John Salerno

    John Salerno Guest

    John Salerno wrote:
    > What exactly does this mean:
    >
    > "Document sent as text/html are handled as tag soup [1] by most UAs.
    > This means that authors are not checking for validity, and thus
    > most XHTML documents on the web now are invalid. Therefore the main
    > advantage of using XHTML, that it has to be valid, is lost of the
    > document is then sent as text/html."
    >
    > To me it sounds like he is saying that *any* document written in XHTML
    > and then served as text/html is invalid. But is that really the case? Or
    > is he saying that the document *could* be invalid because it could still
    > be prone to the methods of HTML (e.g., no closing tags, etc.)?
    >
    > I assume if you validate your XHTML, then simply serving it as text/html
    > doesn't harm it, right? It doesn't suddenly make it "invalid," does it?
    > (Perhaps in a strict sense it does, because it's not truly XHTML, but as
    > far as the actually words in the document themselves, they are still
    > valid, right? And if it was then served as application/xhtml-xml, it
    > would be valid, correct?)


    One more quote, this time from Andy from this newsgroup:

    "However (the bad news) this is an XML technique and so only works with
    XHTML documents that are XML documents, not the Appendix C XHTML non-XML
    documents we've already mentioned as being the only ones that are yet
    ready for use on the web. You can still use these techniques, but it's
    not simple, much of your audience may have problems with them, and
    compatibility issues are significant."

    So basically these last three posts of mine are asking the same
    question: why is "Appendix C" XHTML considered invalid? What more is
    there to writing real XHTML than simply starting with the basic rules
    (lower-case, proper nesting, closing tags, etc.)?

    Obviously you can eventually include other XML namespaces, but for now,
    why is it said that documents written in Appendix C XHTML will
    eventually be invalid when actually served as application/xhtml+xml?

    Thanks!
    John Salerno, Feb 5, 2006
    #3
  4. John Salerno <> wrote:

    > What exactly does this mean:
    >
    > "Document sent as text/html are handled as tag soup [1] by most
    > UAs. This means that authors are not checking for validity, and
    > thus most XHTML documents on the web now are invalid. Therefore the
    > main advantage of using XHTML, that it has to be valid, is lost of
    > the document is then sent as text/html."


    Looks pretty clear to me, though the phrase "it has to be valid" is
    actually just a successful meme (in a particular environment), not a
    correct statement of facts. It is apparently meant to say that browsers
    _must_ check for validity (for documents, HTML specifications have
    always required validity), but what the XHTML 1.0 document actually
    says is something very different. It only requires that well-formedness
    (i.e., being XML in the first place) is "evaluated" (with no
    requirement on reporting the result of the evaluation), and there is
    clearly no requirement on checking validity:
    http://www.w3.org/TR/xhtml1/#uaconf
    In fact, XHTML 1.0 even _requires_ certain processing of unrecognized
    elements and attributes, which means rules for processing _invalid_
    documents. In classic HTML, this was just common practice (and a
    _suggestion_ in the specs).

    > To me it sounds like he is saying that *any* document written in
    > XHTML and then served as text/html is invalid.


    No, that's not at all what it says.

    The general idea is that authors who think they are using XHTML do not,
    in fact, use XHTML (but violate validity requirements, prose
    requirements, and perhaps even well-formedness requirements) and do not
    observe this, since browsers don't report the errors. The idea seems to
    be that browser _would_ report errors if application/xhtml+xml were
    used, but as I explained, there is no such requirements - and browsers
    are even _required_ to process invalid documents in a particular manner
    (though we can perhaps deduce that they _may_ also flag errors).

    > I assume if you validate your XHTML, then simply serving it as
    > text/html doesn't harm it, right?


    Right. But you gain nothing either.

    > It doesn't suddenly make it
    > "invalid," does it? (Perhaps in a strict sense it does, because
    > it's not truly XHTML, but as far as the actually words in the
    > document themselves, - -


    Validity has nothing to do with the Internet media type used to serve a
    document. Validity is an inherent property of a document.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html
    Jukka K. Korpela, Feb 5, 2006
    #4
  5. John Salerno

    Neredbojias Guest

    With neither quill nor qualm, John Salerno quothed:

    > What exactly does this mean:
    >
    > "Document sent as text/html are handled as tag soup [1] by most UAs.
    > This means that authors are not checking for validity, and thus
    > most XHTML documents on the web now are invalid. Therefore the main
    > advantage of using XHTML, that it has to be valid, is lost of the
    > document is then sent as text/html."
    >
    > To me it sounds like he is saying that *any* document written in XHTML
    > and then served as text/html is invalid. But is that really the case? Or
    > is he saying that the document *could* be invalid because it could still
    > be prone to the methods of HTML (e.g., no closing tags, etc.)?
    >
    > I assume if you validate your XHTML, then simply serving it as text/html
    > doesn't harm it, right? It doesn't suddenly make it "invalid," does it?
    > (Perhaps in a strict sense it does, because it's not truly XHTML, but as
    > far as the actually words in the document themselves, they are still
    > valid, right? And if it was then served as application/xhtml-xml, it
    > would be valid, correct?)


    Using xhtml "today" without a special and rather esoteric need for it is
    just plain silly. There's nothing profound about the format or markup,
    nothing really to be learned by doing xhtml pages except the most
    simplistic tactical techniques one could imagine. Perhaps someday when
    the so-called experts figure out what they're supposed to be doing and
    are actually able to do it, this will change, but that day is not going
    to be soon. Far better a valid and structurally-correct html-strict
    opus than anything in xhtml at all.

    --
    Neredbojias
    Contrary to popular belief, it is believable.
    Neredbojias, Feb 5, 2006
    #5
  6. John Salerno

    Spartanicus Guest

    John Salerno <> wrote:

    >"Document sent as text/html are handled as tag soup [1] by most UAs.
    >This means that authors are not checking for validity, and thus
    >most XHTML documents on the web now are invalid. Therefore the main
    >advantage of using XHTML, that it has to be valid, is lost of the
    >document is then sent as text/html."


    IIRC Hixie makes this point about validating parsers. Imo he pushes the
    boat out to far with his arguments against XHTML. The paragraph above is
    an example, validating parsers are and will continue to be a rarity in
    UAs that are not validators.

    At the same time Hixie doesn't address many of the myths that cause
    authors to believe that XHTML is preferred over HTML. Have a look at the
    previously mentioned http://www.spartanicus.utvinternet.ie/no-xhtml.htm

    --
    Spartanicus
    Spartanicus, Feb 5, 2006
    #6
  7. On Sat, 4 Feb 2006, John Salerno wrote:

    > What exactly does this mean:
    >
    > "Document sent as text/html are handled as tag soup [1] by most UAs.
    > This means that authors are not checking for validity, and thus most
    > XHTML documents on the web now are invalid.


    I interpreted it as meaning that *most* authors who jumped on the
    XHTML bandwagon - because it was sexy, rather than because they knew
    what they were doing - are still writing tag soup - just that now
    they're writing XHTML-flavoured tag soup whereas previously they were
    writing HTML-flavoured tag soup.

    > Therefore the main advantage of using XHTML, that it has to be
    > valid, is lost of the document is then sent as text/html."
    >
    > To me it sounds like he is saying that *any* document written in
    > XHTML and then served as text/html is invalid.


    I don't think he meant that.

    One of the claimed benefits for XHTML was that it would put an end to
    tag soup, and would produce only documents which were valid, thus
    putting an end to the problem of browsers having to guess
    heuristically what they were supposed to do with invalid markup. We
    were told by its proponents that a new generation of XML-based
    browsers would be able to get rid of all that ballast of error fixup
    code, and just parse the valid XML-based markups that they would be
    given. Which of course could be far more elaborate than mere HTML -
    containing additional XML-based markups including SVG and MathML, and
    so on.

    What he's alerting us to, AIUI, is that in reality, many/most of those
    who *imagine* they are producing XHTML are producing no such thing -
    they are producing XHTML-flavoured tag soup, sending it out as
    text/html, and continuing to rely on old error-correcting browsers
    which were designed for parsing HTML tag soup (courtesy of the W3C's
    misguided provisions of "Appendix C" to do so).

    Here we know better, of course, since we not only know how to use a
    validator (or, even better, use an authoring process which is designed
    such that it can only generate valid output); we also know what's
    meant by semantic markup (even if we have lesser disagreements about
    exactly what it means). But "we" are in a tiny minority compared with
    the billions of pages that are out there on the WWW.

    > I assume if you validate your XHTML, then simply serving it as
    > text/html doesn't harm it, right? It doesn't suddenly make it
    > "invalid," does it?


    Well, text/html used to mean in theory "this is HTML" - in practice it
    meant "this is almost certainly HTML-like tag soup, although
    occasionally it will be HTML"; whereas under the provisions of
    Appendix C, it now means "this is almost certainly one or other
    flavour of tag soup, although occasionally it will be either HTML or
    Appendix-C XHTML/1.0".

    No, valid XHTML/1.0 Appendix C isn't actually *invalid* as HTML; it
    just (per the SHORTTAG problem) *means* something different, and
    relies on a widespread browser bug to get itself rendered as intended
    - rather than as specified by SGML.

    Remember, the "SGML Declaration" for HTML is non-negotiable. It's
    published in the HTML specification(s), e.g
    http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html , and it forms an
    implied part of every HTML transaction. Unlike the contents of the
    DTD (which are referenced from the DOCTYPE), the SGML Declaration
    forms no part of the negotiation between the client and the server -
    there is no URL from which the client could, even in principle,
    retrieve the "SGML Declaration". And there's no doubt that the SGML
    Declaration for HTML says "SHORTTAG YES", whether the supporters of
    Appendix C care to hear it or not.

    Appendix C relies upon the fact that client agents don't take the SGML
    Declaration seriously. And in order to cope with this, the so-called
    HTML validators need a mode switch, which takes a sneaky look at the
    DOCTYPE and decides whether to switch from an SGML mode into an XML
    mode for the validation. This is all very heuristic - it's not based
    on any well-founded theoretical model at all.

    AIUI, Hixie would like application/xhtml+xml when sent from a server
    to mean "I warrant this to be XHTML", with no parachute provided for
    cases where that turns out to be false.

    No, I don't think he's demanding that every browser must be a
    validating parser, guaranteeing an error report instead of rendering
    documents which prove to be invalid[1]. (XML does, however, mandate
    reporting an error for well-formedness errors.) He's only saying that
    serving XHTML *should* represent a warranty of validity, with the
    *sender* accepting any consequences of the warranty being broken, and
    removing the implied requirement on every recipient to perform the QA
    corrections which the author failed to do.

    That's my interpretation of it, anyway. I don't know Hixie personally
    and can only base my understanding of his position on what I've read.
    YMMV and all that.

    best

    [1] As long as so many authors continue to use their favourite browser
    as the sole arbiter of correctness, however, it really would be a good
    idea if their browser would do precisely that. But I *know* it isn't
    going to happen, so I'm not losing any sleep over it.
    Alan J. Flavell, Feb 5, 2006
    #7
  8. John Salerno

    cwdjrxyz Guest

    John Salerno wrote:
    > What exactly does this mean:
    >
    > "Document sent as text/html are handled as tag soup [1] by most UAs.
    > This means that authors are not checking for validity, and thus
    > most XHTML documents on the web now are invalid. Therefore the main
    > advantage of using XHTML, that it has to be valid, is lost of the
    > document is then sent as text/html."
    >
    > To me it sounds like he is saying that *any* document written in XHTML
    > and then served as text/html is invalid. But is that really the case? Or
    > is he saying that the document *could* be invalid because it could still
    > be prone to the methods of HTML (e.g., no closing tags, etc.)?
    >
    > I assume if you validate your XHTML, then simply serving it as text/html
    > doesn't harm it, right? It doesn't suddenly make it "invalid," does it?
    > (Perhaps in a strict sense it does, because it's not truly XHTML, but as
    > far as the actually words in the document themselves, they are still
    > valid, right? And if it was then served as application/xhtml-xml, it
    > would be valid, correct?)


    You have had many answers with useful information. I just want to add a
    practical consideration or two.

    No matter if you write your page in perfect html or perfect xhtml, if
    you label the page as something.html, it gets served as html by most
    servers. If you want to serve a xhtml page as true xhtml you need to
    set the mime type of application/xhtml+xml, associated with some
    extension such as .xhtml, on the server and serve the page with the
    extension such as .xhtml selected. A good xhtml page then gets served
    correctly. And it will not be viewed by browsers such as IE6 that do
    not understand the mentioned mime type. If you want IE6 and such to
    view the page, you either have to serve a separate html page for it, or
    use a trick such as writing a php include at the top of the
    something.php page that will serve the page as xhtml only if the
    server/browser header exchange indicates that the browser will at least
    accept application/xhtml+xml. If not, then the page gets served as
    html. A few browsers will not tell if they will accept this mime type
    or not even when they will accept true xhtml. In that case you would
    serve the page as html anyway to be on the safe side, or you are back
    to writing separate pages for html and xhtml. Then you need to check
    the page on a few browsers to make certain the automatic conversion to
    a html page for IE6 and such is working.

    When you use a xhtml aware browser(Opera, Firefox, etc) and serve the
    page as application/xhtml+xml, the browser then parses the page as xml.
    Little errors that might not cause much of a problem on an html page
    then sometimes result in the page not showing, and you also often get
    an xml parse error.

    Concerning the W3C validator, if you serve a page (written in html or
    xhtml code) as something.html, the validator views the page as html.
    If you serve as application/xhtml+xml, the validator views the page as
    xhtml. If you use the extended interface at the validator, it will tell
    you the mime type of the page it views. However the validator validates
    the code of the page as indicated in the Doctype. Thus if you serve an
    xhtml page as html, the validator will still validate the code as
    xhtml, gripes about unclosed img and br tags, etc. However it does not
    determine if the page is being served correctly(other than with the
    mime type displayed using the extended interface). I suspect this is
    one reason some think they are serving xhtml, when they actually are
    not.
    cwdjrxyz, Feb 5, 2006
    #8
  9. John Salerno

    Andy Dingley Guest

    On Sat, 04 Feb 2006 21:16:51 -0500, John Salerno
    <> wrote:

    >What exactly does this mean:
    >
    >"Document sent as text/html are handled as tag soup [1] by most UAs.
    >This means that authors are not checking for validity, and thus
    >most XHTML documents on the web now are invalid.


    This is a fallacious conclusion to draw from those initial conditions.
    Empirically the evidence supports the same conclusion, but still not
    that logic.

    XHTML out there is bogus and badly formed. But that's nothing to do with
    them being sent as text/html.

    >Therefore the main
    >advantage of using XHTML, that it has to be valid, is lost of the
    >document is then sent as text/html."


    This has never been an advantage of XHTML, at any incarnation beyond the
    whiteboard stage. The web has _always_ taken a best-guess approach to
    error recovery of any format, and XHTML never seriously attempted to
    reverse that.

    Maybe it should. Maybe it would have been better if all badly-formed XML
    was rejected out of hand (as indeed it is in the applications world).
    But XHTML crept onto the web gradually, as an evolution of HTML by the
    designers, not as a move of real coders from the desktop onto the web.
    It inherited HTML's sloppy approaches and we have to work from that as
    out starting point - anything else is just pointless theorising.
    Andy Dingley, Feb 5, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. H.MuthuKumaraRajan
    Replies:
    3
    Views:
    417
    H.MuthuKumaraRajan
    Feb 4, 2004
  2. Replies:
    7
    Views:
    852
  3. chronos3d
    Replies:
    9
    Views:
    750
    Andy Dingley
    Dec 5, 2006
  4. xkenneth
    Replies:
    8
    Views:
    328
    Bruno Desthuilliers
    Feb 6, 2008
  5. Usha2009
    Replies:
    0
    Views:
    1,112
    Usha2009
    Dec 20, 2009
Loading...

Share This Page