Top 10 things I hate about XHTML

S

Steve Pugh

Anthony Boyd said:
http://www.outshine.com/top10.xhtml

Can any of the issues I outline at the linked page be fixed? I feel
like I just don't know enough about XHTML to have sorted out all the
tricks & tips. How are other people getting around these issues?

"1. Including the SYSTEM value (the URL in the DOCTYPE at the start of
an XHTML document) causes Internet Explorer 4 and a few other lesser
browsers to actually go to www.w3.org and download the DTD every
time."

I don't believe this. What content-type are you serving your pages as?
If you're serving them as XML rather than XHTML or HTML then I suppose
some browsers may fetch the DTD but why are you serving it as XML?

"If I leave the SYSTEM value out of the DOCTYPE, the XHTML doesn't
validate."

Rubbish. A page starting with, for example, <!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"> will validate.

Or are you using a custom DTD
<!DOCTYPE html SYSTEM "http://www.example.com/mydtd.dtd">

Or have you mucked up the FPI so that the validator can't find it in
its catalogue and needs to look up the DTD by URL?

"2. The ampersand must be written as a character entity now (&amp;
rather than &) even in comments and scripts. Yet comments and scripts
are not parsed for display! So my comments, meant to be viewed as raw
source, are now more difficult for a human to read."

I just validated a page that contained
<!-- > & -->
<p>foo & bar </p>
and it passed. The validator did comment on the second & but didn't
flag it as an error.
Scripts and styles are different but you bang on about them at length
later so we'll leave them for now.

Can you give me an example that doesn't pass?

"3. The symbols for less-than, greater-than, and apparently even the
double dash are now interpreted even in comments and scripts. If
you're trying to use JavaScript to print a tag, this can give
unpredictable results. If you use a double-dash as a poor man's
emdash, it can throw off your comments."

External files have been preferable for a long time.

BTW
-- has always ended comments
<!-- this is a comment -- this isn't -- this is -- this isn't>
Most browsers have got this wrong.
And use of the -- decrement operator in JavaScript has always been
risky if that JS is inside an SGML comment.

Note that it is XML parsers that can remove comments, not XHTML
parsers. Most XHTML is actually just HTML wearing a false nose and is
treated as such by browsers.

"4. Character encoding is optional in the XHTML 1.0 spec -- if you
don't include it, it assumes UTF-8. However, the w3.org validator
doesn't assume that, and errors out even though the document is
valid."

A problem with the validator not with XHTML.
Use the extended interface to the validator and pick your encoding.

"5. If you include the character encoding as the leading XML element,
or use XML stylesheet links, some PHP parsers will barf. Since my
first loyalty is to PHP, I find myself unable to use certain parts of
XHTML even though I want to do the right thing."

A problem with those PHP parsers not with XHTML.

"6. XHTML clients are allowed to discard comments before rendering. So
if I wrap my JavaScript code in an HTML-style comment, new XHTML
clients may not execute that code. If I don't wrap my JavaScript code
in a comment, then old browsers and those with JavaScript disabled may
display the code right in browser. Lose-lose."

The habit of hiding script inside SGML comments was invented to cater
for Netscape 1. There may be one or two equally old browsers that also
fail to understand what a <script> element is (note that they don't
need to understand and specific scripting language, just what the
element is) but script and style were both in HTML 3.2 as placeholders
so any browser that claims compliance with HTML 3.2 or higher should
not display the contents of those elements. And any browser that
displays the contents of script just because scripting has been
disabled is very broken. Anyway, external files are a much better
idea.

7 isn't a point at all.

"8. The XML-style links for stylesheets appear to be too burdensome
even for w3.org. Their own validator uses the style tag without
assigning it an id and using XML at the top to provide the XML-style
link."

Do you want your page parsed as XML or XHTML or HTML? For most web
pages you don't want it parsed as XML, so all that <?xml-stylesheet
....> stuff isn't needed. Don't worry about it.

"9. In my opinion, having to declare your character encoding is an
unreasonable burden on the developer. Most developers do not memorize
the various characters used in character sets. I certainly cannot look
at any Web page and tell you if it is ISO 8859-1 compliant, for
example. Character sets should be provided by the server, as it is
less error prone."

Declaring the character encoding via the HTTP content-type header
always has been, and still is, the preferred option. You only need to
declare it the XML declaration if it is not UTF-8 or UTF-16 (but ASCII
and ISO-8859-1 are subsets of UTF-8 and so can also be used).

"10. XHTML 1.1 seems to be enough of a pain in the ass that adoption
is painfully slow. The w3.org specs seem to have lost at least some
developer-friendly focus -- enough that developers don't seem to be
interested much beyond a very simple baseline."

The fact that IE can't cope with XHTML 1.1 at all makes its adoption
pointless for a www site. And if you think XHTML 1.1 is bad take a
look at XHTML 2.0...

Steve
 
H

Hywel Jenkins

Anthony Boyd said:
http://www.outshine.com/top10.xhtml

Can any of the issues I outline at the linked page be fixed? I feel
like I just don't know enough about XHTML to have sorted out all the
tricks & tips. How are other people getting around these issues?

1 - There's a lesser browser than IE4?
2 - Move your scripts to source files. &amp; isn't that hard to read.
Could you use "and" instead?
3 - Move your scripts to source files.
3 - emdash - does using the character entity instad fix this?
5 - Which PHP parsers?
6 - Move your scripts to source files.
7 - Move your scripts to source files.
9 - Do you really need to remember them, or can you just copy & paste them
from somewhere?
10 - Perhaps, but it's worth trying anyway.
 
T

Toby A Inkster

Anthony said:

Steve has already addressed most of these points, but here are a few other
things...

5. Use this:

<?= ('<?xml version="1.0" encoding="utf-8"?>' . "\n") ?>

2, 3, 6 & 7. Then stop bloody commenting out the styles and scripts then!
It *isn't* needed! Never really has been (apart from for some beta of
Netscape 3 Gold)!

4 & 9. You don't need to.
 
M

Mark Parnell

The habit of hiding script inside SGML comments was invented to cater
for Netscape 1.

As Toby said, it was actually a beta version of NS3 Gold, but the point
is well made. :)
 
T

Toby A Inkster

Mark said:
As Toby said, it was actually a beta version of NS3 Gold, but the point
is well made. :)

To be honest, Netscape 1 and IE 1 and 2 will also print the Javascript
code to the screen, but ONLY IF IT APPEARS IN THE BODY. Which is why it's
good practice to put all scripting in the HEAD (or even better: in an
external file).
 
M

Mark Parnell

To be honest, Netscape 1 and IE 1 and 2 will also print the Javascript
code to the screen, but ONLY IF IT APPEARS IN THE BODY.

I didn't know that. Thanks.
 
J

Jukka K. Korpela

Steve Pugh said:
"2. The ampersand must be written as a character entity now (&amp;
rather than &) even in comments and scripts. Yet comments and scripts
are not parsed for display! So my comments, meant to be viewed as raw
source, are now more difficult for a human to read."

I just validated a page that contained
<!-- > & -->
<p>foo & bar </p>
and it passed. The validator did comment on the second & but didn't
flag it as an error.

It is actually an error, so the validator is misleading when it issues
just a warning. It is however not a violation of a well-formedness
constraing or a validity constraint, so an XML validator is not required
to report it and must not report a document as invalid just because it
contains such an error.
 
A

Anthony Boyd

Steve said:
I don't believe this. What content-type are you serving your pages as?

It's text/html.
If you're serving them as XML rather than XHTML or HTML then I suppose
some browsers may fetch the DTD but why are you serving it as XML?

I'm not. I think MSIE 4 is broken. I can ignore MSIE 4, but it makes
XHTML adoption a little painful. Not much. Just a little.
"If I leave the SYSTEM value out of the DOCTYPE, the XHTML doesn't
validate."

Rubbish. A page starting with, for example, <!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"> will validate.

Only with Transistional, and even then it will throw up some text about
the missing SYSTEM. When I try my DOCTYPE of Strict, it will error out.
I just validated a page that contained
<!-- > & -->
<p>foo & bar </p>
and it passed.

Really? Wow. The XHTML 1.0 spec (the rev from 2002) says "&" must be
written as &amp; (if I'm reading it right).

So do you think the validator is wrong and I should continue to use
&amp; or do you think section C 12 of the spec is wrong? Here's a part
of that:

"In order to ensure that documents are compatible with historical HTML
user agents and XML-based user agents, ampersands used in a document
that are to be treated as literal characters must be expressed
themselves as an entity reference (e.g. '&amp;')."
And use of the -- decrement operator in JavaScript has always been
risky if that JS is inside an SGML comment.

Good point.
"5. If you include the character encoding as the leading XML element,
or use XML stylesheet links, some PHP parsers will barf.

A problem with those PHP parsers not with XHTML.

I don't know. PHP predates XHTML and XML by a few years. They should
have picked something exclusive. Because they didn't, I have to pick
sides or patch parsers. Yuck.
Do you want your page parsed as XML or XHTML or HTML? For most web
pages you don't want it parsed as XML, so all that <?xml-stylesheet
...> stuff isn't needed. Don't worry about it.
OK.

Declaring the character encoding via the HTTP content-type header
always has been, and still is, the preferred option. You only need to
declare it the XML declaration if it is not UTF-8 or UTF-16 (but ASCII
and ISO-8859-1 are subsets of UTF-8 and so can also be used).

I know. Of course, that gets back to the part about w3's validator
being busted. I just pulled up their source code to send in a patch,
but in the code it's pretty obvious that they've designed it to only
auto-detect UTF if you tell it to or if you set the type to text/xml.
I'm not sure why text/html wouldn't qualify. Maybe that's in the spec
and I missed it. In that case, no patch needed for the validator,
although I might want to patch the spec, then.
The fact that IE can't cope with XHTML 1.1 at all makes its adoption
pointless for a www site. And if you think XHTML 1.1 is bad take a
look at XHTML 2.0...

Agreed. That's sort of my point: who is clamoring for this stuff? Who
is pressuring the browser companies/organizations to implement it? No
one, far as I can tell. The spec is too fragmented (deliberately, as
modules) and too difficult to read (deliberately again, but at a cost).

If IE were to implement it, I suspect slogging through their MSDN docs
would be an easier way to learn about it. Coupled with the whole patent
fiasco last year, w3 just doesn't seem to be interested in what I'm
interested in anymore. I don't know how to fix it, other than to say
that the w3 of 2000 seemed to be a more successful organization.
 
S

Steve Pugh

Anthony Boyd said:
Only with Transistional, and even then it will throw up some text about
the missing SYSTEM. When I try my DOCTYPE of Strict, it will error out.

Then there must be something else wrong with your code.
Strict doctype, no URL, no problem:-
http://validator.w3.org/check?uri=http://steve.pugh.net
Really? Wow. The XHTML 1.0 spec (the rev from 2002) says "&" must be
written as &amp; (if I'm reading it right).

A Jukka pointed out, it is an error, but not a validity error.

It has always been best practice to always write & as &amp; even when
it hasn't been required (e.g. such as in my example above), it's now
required as well.

As pointed out elsewhere moving your scripts and styles to separate
files is the best approach, so that just leaves your complaint about
making comments harder to read. Are there many comments in your code
where & _have_ to be used rather than 'and' ?

Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top