Anthony Boyd said:
http://www.outshine.com/top10.xhtml
Can any of the issues I outline at the linked page be fixed? I feel
like I just don't know enough about XHTML to have sorted out all the
tricks & tips. How are other people getting around these issues?
"1. Including the SYSTEM value (the URL in the DOCTYPE at the start of
an XHTML document) causes Internet Explorer 4 and a few other lesser
browsers to actually go to
www.w3.org and download the DTD every
time."
I don't believe this. What content-type are you serving your pages as?
If you're serving them as XML rather than XHTML or HTML then I suppose
some browsers may fetch the DTD but why are you serving it as XML?
"If I leave the SYSTEM value out of the DOCTYPE, the XHTML doesn't
validate."
Rubbish. A page starting with, for example, <!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"> will validate.
Or are you using a custom DTD
<!DOCTYPE html SYSTEM "
http://www.example.com/mydtd.dtd">
Or have you mucked up the FPI so that the validator can't find it in
its catalogue and needs to look up the DTD by URL?
"2. The ampersand must be written as a character entity now (&
rather than &) even in comments and scripts. Yet comments and scripts
are not parsed for display! So my comments, meant to be viewed as raw
source, are now more difficult for a human to read."
I just validated a page that contained
<!-- > & -->
<p>foo & bar </p>
and it passed. The validator did comment on the second & but didn't
flag it as an error.
Scripts and styles are different but you bang on about them at length
later so we'll leave them for now.
Can you give me an example that doesn't pass?
"3. The symbols for less-than, greater-than, and apparently even the
double dash are now interpreted even in comments and scripts. If
you're trying to use JavaScript to print a tag, this can give
unpredictable results. If you use a double-dash as a poor man's
emdash, it can throw off your comments."
External files have been preferable for a long time.
BTW
-- has always ended comments
<!-- this is a comment -- this isn't -- this is -- this isn't>
Most browsers have got this wrong.
And use of the -- decrement operator in JavaScript has always been
risky if that JS is inside an SGML comment.
Note that it is XML parsers that can remove comments, not XHTML
parsers. Most XHTML is actually just HTML wearing a false nose and is
treated as such by browsers.
"4. Character encoding is optional in the XHTML 1.0 spec -- if you
don't include it, it assumes UTF-8. However, the w3.org validator
doesn't assume that, and errors out even though the document is
valid."
A problem with the validator not with XHTML.
Use the extended interface to the validator and pick your encoding.
"5. If you include the character encoding as the leading XML element,
or use XML stylesheet links, some PHP parsers will barf. Since my
first loyalty is to PHP, I find myself unable to use certain parts of
XHTML even though I want to do the right thing."
A problem with those PHP parsers not with XHTML.
"6. XHTML clients are allowed to discard comments before rendering. So
if I wrap my JavaScript code in an HTML-style comment, new XHTML
clients may not execute that code. If I don't wrap my JavaScript code
in a comment, then old browsers and those with JavaScript disabled may
display the code right in browser. Lose-lose."
The habit of hiding script inside SGML comments was invented to cater
for Netscape 1. There may be one or two equally old browsers that also
fail to understand what a <script> element is (note that they don't
need to understand and specific scripting language, just what the
element is) but script and style were both in HTML 3.2 as placeholders
so any browser that claims compliance with HTML 3.2 or higher should
not display the contents of those elements. And any browser that
displays the contents of script just because scripting has been
disabled is very broken. Anyway, external files are a much better
idea.
7 isn't a point at all.
"8. The XML-style links for stylesheets appear to be too burdensome
even for w3.org. Their own validator uses the style tag without
assigning it an id and using XML at the top to provide the XML-style
link."
Do you want your page parsed as XML or XHTML or HTML? For most web
pages you don't want it parsed as XML, so all that <?xml-stylesheet
....> stuff isn't needed. Don't worry about it.
"9. In my opinion, having to declare your character encoding is an
unreasonable burden on the developer. Most developers do not memorize
the various characters used in character sets. I certainly cannot look
at any Web page and tell you if it is ISO 8859-1 compliant, for
example. Character sets should be provided by the server, as it is
less error prone."
Declaring the character encoding via the HTTP content-type header
always has been, and still is, the preferred option. You only need to
declare it the XML declaration if it is not UTF-8 or UTF-16 (but ASCII
and ISO-8859-1 are subsets of UTF-8 and so can also be used).
"10. XHTML 1.1 seems to be enough of a pain in the ass that adoption
is painfully slow. The w3.org specs seem to have lost at least some
developer-friendly focus -- enough that developers don't seem to be
interested much beyond a very simple baseline."
The fact that IE can't cope with XHTML 1.1 at all makes its adoption
pointless for a www site. And if you think XHTML 1.1 is bad take a
look at XHTML 2.0...
Steve