I think you guys converted me...

J

John Salerno

I've been doing a little research of my own into this whole HTML vs.
XHTML business (I know, old news to all of you) and I finally am
beginning to see what the real problem is. Naturally I understood that
serving XHTML as text/html means you aren't even taking advantage of the
benefits of XHTML, but I figured I was at least "preparing" myself for
the future.

Anyhow, after finding several sites very much against XHTML (but some in
favor), and especially after reading: http://hixie.ch/advocacy/xhtml it
makes a lot more sense to me now why XHTML is considered more or less a
useless language.

But I do have one question still: I know that the main benefit of XHTML
will eventually be that you can integrate other XML-based languages
inside the markup, but what are some simple examples of why/how you
would even do this? Is this something a normal person might do on his
site, or is this a more advanced area that is used by businesses, etc.?

Thanks.
 
S

Spartanicus

John Salerno said:
But I do have one question still: I know that the main benefit of XHTML
will eventually be that you can integrate other XML-based languages
inside the markup, but what are some simple examples of why/how you
would even do this?

How: http://www.spartanicus.utvinternet.ie/mixed_namespace.xhtml
(requires an SVG enabled browser such as Opera)

Note that you can achieve the same result with HTML by embedding content
such as SVG: http://www.spartanicus.utvinternet.ie/single_namespace.htm
(again requires an SVG enabled browser such as Opera)

I'm not aware of any significant advantage that the mixed namespace
method has over the embedding method.
 
A

Andy Dingley

Anyhow, after finding several sites very much against XHTML (but some in
favor), and especially after reading: http://hixie.ch/advocacy/xhtml it
makes a lot more sense to me now why XHTML is considered more or less a
useless language.

XHTML is demonstrably very far from useless.

I work mainly on content-management systems, not hand-authoring directly
to the web. My page content might move through half-a-dozen separate
processes before finally reaching the web. Through all of these steps,
using XHTML is a _massive_ benefit to me, rather than HTML. There is
just no question of this - XML processing tools are cheap, easy and
powerful, everything that SGML failed with completely.

For the final output, I can transform to HTML output and in some ways
this is even easier than XHTML (it's hard to generate good Appendix C
XHTML from most XSLT tools)

Appendix C XHTML is a kludge, but it's a kludge that works on the web
and allows XHTML to be served in a way that's at no disadvantage
compared to HTML. Now if I'm already going to be using XHTML
internally, then who benefits from pushing it into another yet format
just to serve out ? I need a good argument for going back to HTML over
Appendix C.


Hixie's position is pure sophistry. He constructs a valid argument
against a proposition no-one is really advocating (Authoring XHTML so it
can be treated as either XML or HTML simultaneously) then he attacks the
real situation (serving XHTML to existing HTML browsers) on the basis of
spec subtleties that no credible browser ever supported. I'm still
waiting for a screenshot of a browser demonstrating the "infamous
SHORTTAG bug"

I particular he creates two notable straw men
* The "xmlns" attribute is invalid HTML4.
* The XHTML DOCTYPEs are not valid HTML4 DOCTYPEs.

Why should we care that the xmlns attribute is invalid ? Why should we
care about validation at all ? Validation is useful because it's
objective and simple (you're either valid or not), but equally there is
no benefit accruing from validation _itself_, only from achieving a
standard of "objective compatibiity" that is most easily achieved by
demonstrating validity. In the case of xmlns though, the
well-established rules on ignoring unknown attributes can handle things
perfectly adequately.


As th the doctype validity issue, then this is bogus. The XHTML doctype
is perfectly valid, it's merely different. It's also well-established
for some years now and widely understood by browsers (to the extent that
browsers do anything useful with a doctype anyway).

If we take Hixie's own position of "Ivory tower SGML purist who hasn't
even noticed the M$oft barbarians at the gate", then doctypes have
always been flexible and extensible by SGML's rules. Now in web terms
this was a bad approach and never really worked (despite the number of
late-'90s HTML editors that tried). A custom doctype gains you nothing
on the web, it's not used by an SGML parser to extend HTML, and the very
best you can hope for is that a doctype is seen as one of a handful of
magic identifier strings that the browser recognises. Perhaps it's a
sadness that the web chose to never use this feature and go down this
route (personally I don't think so, but that's for another thread).
However I do not have to listen to arguments against varying a doctype
in a respectable from someone who is simultaneously castigating me for
SHORTTAG ! Take your pick - the ivory standards or the real web
practices - you can't pick and choose whichever happen conveniently to
support your own position.


Hixie is asking me to throw away the processing capabilities of XML in
favour of pleasing the tiny handful of SGML-anoraks who even understand
what the problem is. This is no bargain.

Meanwhile the rest of the world sees MS Office and Dreamweaver as
appropriate HTML authoring tools, despite their absolutely glaring
holes. The enemy here is bad and bogus markup with no structure
whatsoever, not XHTML.
 
D

David Segall

[Convincing pro XHTML argument snipped]
Meanwhile the rest of the world sees MS Office and Dreamweaver as
appropriate HTML authoring tools, despite their absolutely glaring
holes.
What is wrong with Dreamweaver 8 in the context of your argument?
Pages that I write using it validate using both the built-in validator
and the one at http://validator.w3.org/.
 
A

Alan J. Flavell

On Sat, 4 Feb 2006, Andy Dingley wrote:

[...]
For the final output, I can transform to HTML output and in some
ways this is even easier than XHTML (it's hard to generate good
Appendix C XHTML from most XSLT tools)

Good point.
Appendix C XHTML is a kludge, but it's a kludge that works on the
web and allows XHTML to be served in a way that's at no disadvantage
compared to HTML.

In practical terms, yes, although it can reasonably be argued that it
*relies* on at least one browser bug.
Now if I'm already going to be using XHTML internally, then who
benefits from pushing it into another yet format just to serve out ?

Didn't you just give at least one answer to that point, a moment ago?
I need a good argument for going back to HTML over Appendix C.

Hixie's position is pure sophistry.

Whether you agree with his supporting arguments or not, there's
certainly one point where he's got it spot-on. Vast swathes of
so-called Appendix-C XHTML are in fact unfit to be called XHTML -
they're nothing more than XHTML-ish-flavoured tag-soup - the very
thing that XML claimed it was going to save us from.

The clue is that those who promote the use of XHTML - amongst authors
who have no idea why they are making that choice - have taken us from
a situation where there was one horrible legacy of HTML-flavoured tag
soup, to a situation where there are two horrible legacies of tag
soup, with none of the benefits that were claimed for XHTML. Most of
that stuff is useless as real XHTML anyway - it only gets rendered
tolerably because it's being parsed as "HTML with a deliberate bug".

As you have said yourself, it's easier to emit good HTML than it is to
emit good Appendix-C-compatible XHTML/1.0, *even* when your internal
process is XML-based. And, since the latter offers *no benefits
whatever to the existing web* as compared to the former (and even
relies on a widespread browser bug, and brings with it some quite
unnecessary additional complications), why not just keep on emitting
HTML, *until* the web is ready to deploy real XHTML with some real
additional benefits relative to either flavour of "text/html" ?

Otherwise, I'd venture a hunch that XHTML (at least most of what
currently purports to be XHTML) is due to fester in its own dreck,
alongside the festering HTML-flavoured tag soup legacy, and we'll need
some alternative clean solution (don't ask me what it might be), in
place of the one which XML claimed to offer but which seems to be
failing - except for a few commendable exceptions ("present company",
and all that).
I'm still waiting for a screenshot of a browser demonstrating the
"infamous SHORTTAG bug"

A pity, then, that I didn't keep a screen shot of emacs-w3 before it
got deliberately broken to avoid the problem. You don't have to
believe me, but it's nevertheless true. A web search reminds me that
we were discussing it in 2001, but I'm not sure just when emacs-w3 got
nobbled in that way. At that time, Toby Speight (for one) evidently
considered that the popular browsers were broken because of their
failure to implement this non-optional feature of SGML.
If we take Hixie's own position of "Ivory tower SGML purist who
hasn't even noticed the M$oft barbarians at the gate", then doctypes
have always been flexible and extensible by SGML's rules.

Then we get into *real* sophistry, for example that HTML purports to
be an application of SGML while at the same time ruling-out constructs
which SGML forbids to be ruled-out. But this line of argument would
get us nowhere, if you only care about "what works in practice" never
mind the theory.
Hixie is asking me to throw away the processing capabilities of XML
in favour of pleasing the tiny handful of SGML-anoraks who even
understand what the problem is. This is no bargain.

I don't think so. Here's his key advice:

|| If you use XHTML, you should deliver it with the
|| application/xhtml+xml MIME type. If you do not do so, you should
|| use HTML4 instead of XHTML.
^^^

If you interpret that word "use" to refer to what you deliver to the
web, *irrespective* of your internal process, then it seems to me to
be good advice, and consistent with what you said already.

He's asking you, for the time being, to do what you already described
above - have your process emit good HTML. You evidently don't have
any sympathy for the various pillars of the argument which he used to
support that advice, but it seems, from what you said above, that this
part of the advice is consistent with what you yourself said.

Your internal processes may be interesting to discuss, but in the
final analysis they're no concern of the web user: *their* only
justified concern is the quality of your final product as emitted from
your web server. As far as I'm concerned, you'd be welcome to code in
well-structured LaTeX, whatever, and convert that to HTML for the web
- the criterion being the quality of the final result, no matter what
your internal process.
Meanwhile the rest of the world sees MS Office and Dreamweaver as
appropriate HTML authoring tools, despite their absolutely glaring
holes. The enemy here is bad and bogus markup with no structure
whatsoever, not XHTML.

Indeed. But we now have a widespread practical demonstration (as if
it wasn't obvious that this was going to happen) that encouraging
tag-soup cooks to cook a different flavour of tag-soup goes nowhere
towards improving the quality of the web.

I'd have to blame it on the W3C for failing to foresee the
consequences of them offering a transition path from HTML to so-called
XHTML, instead of making it plain that it was meant to be a clean
break from an unwelcome legacy. That they would offer specifications
for "Transitional" and "Frameset" XHTML just made things worse.

regards
 
A

Andy Dingley

On Sat, 4 Feb 2006, Andy Dingley wrote:

[...]
For the final output, I can transform to HTML output and in some
ways this is even easier than XHTML (it's hard to generate good
Appendix C XHTML from most XSLT tools)
Now if I'm already going to be using XHTML internally, then who
benefits from pushing it into another yet format just to serve out ?

Didn't you just give at least one answer to that point, a moment ago?

No, I gave one answer for one possible set of circumstances (simplistic
use of XSLT). There's more to XML than XSLT. There are other ways to
serialise XSLT's output other than the default.

Whether you agree with his supporting arguments or not, there's
certainly one point where he's got it spot-on. Vast swathes of
so-called Appendix-C XHTML are in fact unfit to be called XHTML -
they're nothing more than XHTML-ish-flavoured tag-soup -

This is certainly true, but is it any worse than HTML ?

Is XHTML expected to be any more parseeable by a non error-correcting
XML parser than a similar situation for HTML with an SGML parser ? In
many ways XHTML _is_ better here - the well-formedness condition is
self-evident in the absence of a DTD and is easily tested by even a
crude editor. Mangled tags are the sort of trivia that's either
perfect, or else we're allowed to be brutal in error recovery from it.

The more subtle problem, and from where tag soup really arises, is with
SGML. Clever DTD-based parsing rules are all very well when they're done
properly, but how often are they?

I saw this fragment (abbreviated) lately, together with a highly
confusing validation report (maybe in this ng.) and a plaintive cry
about CSS problems.

<html><title><basefont><link><body>...

Now why does the validator claim so vehemently that <link> has the
problem ? Only someone who is familiar with the obscure <basefont>
_and_ with SGML parsing behaviour can understand this.

This is a problem inherent in the use of optional elements (sometimes),
or particularly in optional closing tags. In XML they're mandatory, so
that the document can be correctly parsed into its infoset, even without
knowing the DTD.

In XML, <basefont> could never follow <head> directly, there would
always have to be an explicit <body>. An XML parser would thus report
the errorr to be about <basefont> having been placed into the <head>
(and <link> is thus correct), rather than SGML's behaviour of seeing
<basefont> as implying the automatic position of <body> and thus
(incorrectly) seeing <link> as mis-placed.

SGML is all very clever, but it's no bloody use ! Real people, in suits
and ties, just can't work it.

the very thing that XML claimed it was going to save us from.

I don't recall XML ever claiming that. XHTML might have done, but this
is an aberration from the HTML "random hand-coding with bad editors"
camp. XML (~HTML, ~web) has usually been quite reasonable about
compliance, well-formed at least if not actually valid..

RSS seems to have sufered from HTML contagion by proximity and is
probably the most badly formed disalect out there.
The clue is that those who promote the use of XHTML - amongst authors
who have no idea why they are making that choice - have taken us from
a situation where there was one horrible legacy of HTML-flavoured tag
soup, to a situation where there are two horrible legacies of tag
soup, with none of the benefits that were claimed for XHTML.

_Three_ flavours of tag soup! Lets not leave RSS out of this - as far
as character-level and syntactic encoding goes, it's by far the worst
offender.
As you have said yourself, it's easier to emit good HTML than it is to
emit good Appendix-C-compatible XHTML/1.0, *even* when your internal
process is XML-based.

No, only in the case of trivial XSLT use.

There are many other ways I could be generating XHTML for output. The
popular PHP & template methods, even the expensive Obtree CMS, generate
garbage with no pretence at XML well-formedness because they really are
pure-text writeln-based output.
Otherwise, I'd venture a hunch that XHTML (at least most of what
currently purports to be XHTML) is due to fester in its own dreck,
alongside the festering HTML-flavoured tag soup legacy,

Certainly. But will the solution to this necessarily require the
protocol itself to be thrown away ?

IMHO, we _will_ gradually improve average validation quality of most web
sites. This will be driven by non-desktop devices and the resultant
quality of the auto-transcoding of content onto them. Once big operators
realise that a valid and fluid site looks good on a phone as well as a
powerpoint presentation, then they'll slowly start to drop the rigid
pixelated PSD designs of recent years and look towards validity too.
Geocities homepages won't even notice.

Hixie's key point seems to be that premature use of XHTML, done badly,
will be damaging to XHTML in the long-run. This is a reasonable view,
although I don't believe it myself. I also doubt that Hixie believes it
either - given his attempts to really throw a clog into XHTML with his
HTML 5 schism.

Then we get into *real* sophistry, for example that HTML purports to
be an application of SGML

Does it? I'd always understood that it was inspired by SGML, but long
conceded that it wasn't strictly a valid SGML application. I don't weep
for the passing of SHORTTAG certainly, because (for whatever reason) it
clearly is no longer part of HTML.

I don't much care whether doctypes are references or identifiers either.
Identifiers are obviously less flexible, but they seem to be adequate
for the web's purposes. There's also a long and complex argument that a
flexible DTD conveys no benefit anyway, unless you also bundle some sort
of processing model along with it - <marquee> doesn't become renderable
just because you've added it to a DTD, only if you've also bound it to
some rendering behaviour.

The XHTML doctypes though _are_ already widespread and recognised.
Hixie's position fails because they're either permitted by SGML's rules,
or they're already commonplace enough to stand as opaque identifiers.
I'd have to blame it on the W3C for failing to foresee the
consequences of them offering a transition path from HTML to so-called
XHTML, instead of making it plain that it was meant to be a clean
break from an unwelcome legacy.

Isn't that what XHTML 2.0 is about ? And that's _far_ worse !
 
L

Leonard Blaisdell

Alan J. Flavell said:
Just for the record, both of your examples are also rendered by
firefox 1.5.

And they're also rendered properly in Firefox's little brother Camino
(v1.0b2).

leo
 
A

Alan J. Flavell

On Sun, 5 Feb 2006, Andy Dingley wrote:

[you made some points that I respectfully disagree with, but
there seems nothing to be gained by anyone if we get bogged down
in them, so I'll leave them be. But a few points seem to call for
commment.]
This is certainly true, but is it any worse than HTML ?

It depends what criteria you take into account. It's certainly no
better than HTML, but I'd say that in a number of respects it's worse.

Bearing in mind that - in a practical sense - HTML served as text/html
has to be parsed by some kind of tag-soup slurper with masses of error
fixup code; whereas we were told (by some, at least) that XHTML was
going to put an end to the need for all that fixup code - just a
simple parser, and predictable rendering routines.

It seems to me inevitable that when the masses do get it into their
heads to switch from text/html to application/xhtml+xml, there's going
to be massive clamouring for all these tag-soup documents to be
rendered "correctly" (in *their* sense of correctly, i.e "looks the
same as what MSIE used to do"), just like the mess that developed with
HTML.
The more subtle problem, and from where tag soup really arises, is
with SGML. Clever DTD-based parsing rules are all very well when
they're done properly, but how often are they?

If you wanted HTML without omitted tags, you could have had it with
SGML all along. If you wanted to eliminate SHORTTAGS, you can do so
in SGML.

I'm not proposing that one should start on that now; I'm just saying
that you shouldn't use problems for which SGML *does have* a solution,
as your basis for saying that SGML is unsuitable.
This is a problem inherent in the use of optional elements (sometimes),
or particularly in optional closing tags. In XML they're mandatory, so
that the document can be correctly parsed into its infoset, even without
knowing the DTD.

Taking out the parts for which SGML does have a solution, then, your
argument is based just on XML's concept of well-formedness.
SGML is all very clever, but it's no bloody use ! Real people, in
suits and ties, just can't work it.

There's certainly far more in SGML than HTML needs.
Hixie's key point seems to be that premature use of XHTML, done
badly, will be damaging to XHTML in the long-run.

That's what these detailed arguments boil down to, indeed.
This is a reasonable view,
Yup

although I don't believe it myself. I also doubt that Hixie
believes it either - given his attempts to really throw a clog into
XHTML with his HTML 5 schism.

I haven't quite worked out how that fits into any picture yet, so I'm
reserving judgment.

How else would you interpret this, then?
http://www.w3.org/TR/REC-html40/conform.html#h-4.1

|| An HTML document is an SGML document that meets the constraints of
|| this specification.

cheers
 
A

Andy Dingley

Alan said:
Bearing in mind that - in a practical sense - HTML served as text/html
has to be parsed by some kind of tag-soup slurper with masses of error
fixup code; whereas we were told (by some, at least) that XHTML was
going to put an end to the need for all that fixup code - just a
simple parser, and predictable rendering routines.

I don't recall ever hearing this, outside of the mobile devices crowd.
Certainly lightweight parsers _could_ be made that operated on XHTML,
but these were generally seen as being parsers without even display
renderers - smart agents jabbering away at each other, not
lighter-weight desktop browsers. After all, XHTML certainly wasn't
going to _replace_ HTML tag soup any time soon, so if browsers did
choose to do anything smart with it, they'd have to do it in a modal
manner.

I remember early XHTML as pure geekthusiasm and the feeling that
"everything had to be ported to XML, just because it _could_ be".
There's no need to ascribe reason or logic to this, other than the lure
of c00l.

Early sightings of "XHTML will lead to simpler browsers" claims would
certainly be interesting from the historical aspect.

It seems to me inevitable that when the masses do get it into their
heads to switch from text/html to application/xhtml+xml, there's going
to be massive clamouring for all these tag-soup documents to be
rendered "correctly" (in *their* sense of correctly, i.e "looks the
same as what MSIE used to do"), just like the mess that developed with
HTML.

First we kill all the lawyers. Then we start on the marketing
department and anyone who thinks .psd bitmaps are an appropriate design
tool for the web. This jihad is _long_ overdue.

There's certainly a risk here, and the scenario you describe would be
appalling. Fortunately I think even M$oft are smart enough to avoid
doing it - we're just left with the risk of Apple or Winer doing it to
make podcasting "down with the kids".
If you wanted HTML without omitted tags, you could have had it with
SGML all along. If you wanted to eliminate SHORTTAGS, you can do so
in SGML.

I don't understand why HTML didn't do that (it's before my time, and
you know I'm no SGML geek). <rhetorical>Was HTML ever intended to
actually support SHORTTAG and use it? I've never heard of it being
used deliberately, only heard by vague repute of it even being
implemented and it does seem contrary to the idea of "HTML as simple"
Taking out the parts for which SGML does have a solution, then, your
argument is based just on XML's concept of well-formedness.

Exactly. SGML was a failure and HTML and XML have both been successful,
if we judge them in terms of live desktops and bodies running the tech
daily. Having architected well-budgeted projects within SGML's core
competency and _still_ been unable to justify using it ('97 and we went
with PDF) I'm extremely unimpressed with SGML as any sort of
_practical_ solution to almost anything outside aerospace or
government. When a telco can't get it to work and looks for something
simpler, then that's a technology that has made itself unapproachable
by the masses. My_Cat_Mittens@Geocities is the best thing about the web
- worldwide publishing for grannies without budgets.

There's certainly far more in SGML than HTML needs.

There's more in SGML than _people_ need.

How else would you interpret this, then?
http://www.w3.org/TR/REC-html40/conform.html#h-4.1

|| An HTML document is an SGML document that meets the constraints of
|| this specification.

The fact that {an SGML document that meets the constraints of the HTML
spec} is also a HTML document does not imply that all HTML documents
are neccesarily SGML documents. It may even be so (my understanding of
SGML minimal conformance requirements is sketchy), but as I understand
things HTML is a sufficiently reduced subset of SGML that it can no
longer claim real conformance. Of course they're hugely similar, but
this isn't enough to justify some of the claims that Hixie makes (those
of the form "SGML can theoretically do this, therefore the web should
be doing it right now").
 
A

Alan J. Flavell

I don't understand why HTML didn't do that (it's before my time, and
you know I'm no SGML geek).

There's an attempt at an explanation in the thread which contains mid=
(e-mail address removed)
<rhetorical>Was HTML ever intended to
actually support SHORTTAG and use it?

*Some* of the features of SHORTTAG certainly did get used: omission of
quotes from certain attribute values, and omission of attribute names,
for a start. Otherwise you would always have had to write
SELECTED="SELECTED", instead of just SELECTED, and so forth.

At the time, it was inevitable that turning on this SGML option
also dragged along with it some other features, unwanted in SGML.

Due to the "Web TC", these features can now be turned on separately,
although I'd have to RTFM to find the precise details.
The fact that {an SGML document that meets the constraints of the
HTML spec} is also a HTML document

But that isn't what it says (hint: it's a conformance definition of
HTML); so your conclusion is flawed.

But that definition of an HTML document is itself flawed, since some
of the "constraints" of the HTML spec forbid things which SGML does
not allow to be forbidden. So we're back again at an exercise in
sophistry.

cheers
 
A

Alan J. Flavell

On Mon, 6 Feb 2006, Alan J. Flavell wrote:

[SHORTTAG]
At the time, it was inevitable that turning on this SGML
option also dragged along with it some other features,
unwanted in SGML.
^^^^^^^^^^^^^^^^

"Unwanted in HTML", that was supposed to say. Sorry.
Due to the "Web TC", these features can now be turned on separately,

cheers
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top