Jay said:
IE does not support XHTML?
No it does not. Internet Explorer is an HTML web browser.
I'd like to read up on this if you have a URL as I
wasn't led to believe this.
It is easy enough to verify the veracity of the claim by pointing IE at
a resource serving XHTML, with an XHTML content type header
(application/xhtml+xml). IE will likely offer you the option of
downloading the file and saving it to disk (which is how it handles most
things that it doesn't support, either directly or indirectly).
In the original XHTML specification (1.0) there is a section - Appendix
C:-
<quote cite="XHTML 1.0: The Extensible HyperText Markup Language">
Appendix C. HTML Compatibility Guidelines
This appendix is informative.
This appendix summarizes design guidelines for authors who wish their
XHTML documents to render on existing HTML user agents.
....
</quote>
- which proposes a series of measures that can be taken to allow a
formally correct XHTML 1.0 document to be interpreted as an (erroneous)
HTML document. Key among these measures is the sending of an HTTP
content-type header of 'text/html'. That content type header is an
assertion that the document is an HTML document, and every user agent
(including those that support XHTML) has no choice but interpret a
document sent as text/html as an HTML document.
XHTML and HTML are in some respects very similar, and in others very
different. The similarities allow the HTML user agent to interpret the
results as HTML but the differences can get in the way of that so
Appendix C goes on to propose strategies for negating the effect of the
differences. For example, is XML you can use a shorthand to describe
elements that have no contents:-
<something></somthing>
- and be written as:-
<something/>
- and have exactly the same meaning. As an application of XML, XHTML
also allows this. For an HTML browser that penultimate slash in
<something/> would have a different meaning. Older HTML browser tended
to regard it as a part of the element name, so Appendix C proposes that
the penultimate slash be separated from the element name by at least one
space character. This avoids confusion as to the actual element name,
but the slash is still meaningless in HTML (in SGML it means something
completely different, but that is another mater). Fortunately HTML user
agents have long become accustomed to being presented with meaningless
constructs in HTML (due to the abysmal standards of technical competence
common in web development) so they have facilities for
'error-correction'. Thus the HTML browser sees the penultimate slash as
an error (akin to a typo) and disregards it.
This works fine because:-
<br />
- is error corrected back to:-
<br>
- which is meaningful in HTML.
Problems start to occur when other elements are treated to the XML
shorthand, such as:-
<div></div>
-becoming:-
<div />
- because it would be error-corrected to:-
<div>
- which is an opening HTML DIV tag without a corresponding closing DIV
tag. While that is not strictly allowed in HTML it is a common error and
will itself be subject to error correction. The HTML user agent will
infer the closing DIV tag at the last location in which it should have
occurred; either just before the closing tag for any containing element,
or just before the opening tag of any element that it could not contain.
The result is very different from an XHTML interpretation of the same
original mark-up.
To avoid this issue Appendix C proposes that only elements that are
empty in HTML should use the shorthand syntax. Thus; <img />, <br />,
etc, but not <script />.
The same applies in reverse as XHTML allows empty elements to be
expresses with both opening and closing tags, E.G. <br></br>, is a
single line break in XHTML, but the error-corrected HTML interpretation
is two BR elements (or the second tag is an opening tag for an element
with an unrecognised name).
The above, and the other proposals in Appendix C, result in a syntax
that is a subset of XHTML that is within the ability of known HTML user
agents to error-correct back to HTML, if served as text/html. And when
served as text/html those documents will be interpreted as erroneous
HTML. Only documents served with an XHTML content type header will ever
be interpreted as XHTML.
Because IE cannot understand XHTML it is necessary to send Appendix C
XHTML mark-up to IE with the text/html content type, and most of the
time this means sensing Appendix C XHTML to all user agents with a
text/html header. So Appendix C XHTML is usually in reality a flavour of
formally malformed HTML.
On alternative is for the server to do content negotiation and serve
Appendix C (or separate real) XHTML with an XHTML content type header to
user agents that assert their acceptance of it, and to send only
Appendix C XHTML (or separate HTML) with a text/html content type header
to user agents that do not claim to recognise XHTML.
Obviously sending two different versions depending on the user agent's
ability to accept contents is at least slightly more effort than not
doing so. Making Appendix C XHTML look appealing as it is capable of
being sent as both HTML and XHTML. However, we have a particular
interest in the scripting of web browsers and so an interest in whether
the browser's DOM is an XHTML DOM (case sensitive, interested in
namespaces, preferring slightly different approaches, such as using
setAttribute, lacking some convenience and shortcut properties) or an
HTML DOM (case insensitive, ignorant of namespaces, preferring different
approaches, such as direct assignment to element properties, and filled
with convenience properties and non-standard shortcuts).
If a document is served as text/html it is interpreted as HTML and it is
an HTML DOM that the browser builds for it, while if it is served as
application/xhtml+xml it is interpreted as XHTML and it is an XHTML DOM
that the browser builds for it. A very significant proportion of scripts
are not interoperable between the two DOMs, and writing interoperable
scripts adds an entirely new level of testing and branching if the
script is anything but the most trivial. So Appendix C XHTML doesn't
really remove the issue of serving alternative content to different user
agents, it just moves the problem to a different place; the choice of
accompanying script files
However, that general lack of technical competence in web development
that lead to the HTML browsers using such extreme error-correction also
manifests itself in the use of XHTML. Many having no appreciation of
HTTP headers, or sending all of their XHTML as text/html, and finding
that it is completely successful to script these documents as if they
were HTML (possibly not even being aware that an XHTML DOM would need be
scripted differently), because in reality they are HTML (if malformed).
Which means that if the future ever offers an opportunity for the viable
commercial use of XHTML an awful lot of people are going to be very
disappointed to find that all their scripts suddenly stop working.
Probably the most sensible reaction to all of this is that if you want
to script a document you should probably write it in, and serve it as
formally valid HTML, only.
Richard.