Technical DOCTYPE question for technophiles

D

David Håsäther

Is this (with lowercase "html")
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

the same as (with uppercase "HTML"):
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

Are both technically valid and do both (technically) mean the same
thing? I would like to know on a technical web standards level.

It would be wrong to call a DOCTYPE declaration valid, but yes, both
are correct and means the exact same thing.

As you may know, the first parameter of a DOCTYPE declaration is the
document element (or root element). Since HTML uses NAMECASE GENERAL
YES in the NAMING section of the SGML declaration, "html" will be case-
folded to "HTML".
 
J

Jukka K. Korpela

Is this (with lowercase "html")
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

the same as (with uppercase "HTML"):
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

Are both technically valid and do both (technically) mean the same
thing? I would like to know on a technical web standards level.

Someone might find it odd that the seller of a purported "HTML validator"
does not know such a thing, and does not know where to find authoritative
answers on such matters. How can a validator do anything if it does not
even know what to do with a document type declaration?

As David Håsäther explained, the root element name is case-folded in SGML,
so it is case insensitive there - but not in XML. The quoted strings in the
declaration are case sensitive. The keywords DOCTYPE and PUBLIC are case-
insensitive in SGML but case-sensitive in XML and need to appear in upper
case there (contrary to the popular mantra "XML is all lowercase").
 
A

Albert Wiersch

Jukka K. Korpela said:
Someone might find it odd that the seller of a purported "HTML validator"
does not know such a thing, and does not know where to find authoritative
answers on such matters. How can a validator do anything if it does not
even know what to do with a document type declaration?

I though someone might bring that up. :) Even professionals don't know
everything and rely on their peers for help. Besides, I've explained before
that CSE HTML Validator is not a DTD validator. Even you said somewhere on
your web site that, basically, it's probably a lost cause to argue about
your definition of validator being the only correct one.
As David Håsäther explained, the root element name is case-folded in SGML,
so it is case insensitive there - but not in XML. The quoted strings in
the
declaration are case sensitive. The keywords DOCTYPE and PUBLIC are case-
insensitive in SGML but case-sensitive in XML and need to appear in upper
case there (contrary to the popular mantra "XML is all lowercase").

Thank you.

Albert Wiersch
 
B

Benjamin Niemann

Jukka said:
Someone might find it odd that the seller of a purported "HTML validator"
does not know such a thing, and does not know where to find authoritative
answers on such matters. How can a validator do anything if it does not
even know what to do with a document type declaration?

The problem is probably that the SGML specification is not freely (as in
beer) available - in contrast to the XML specification. You'd have to pay
some bucks to the ISO or buy the SGML Handbook in order to read it (but
he'd just have to sell two 'Profession Editions', so that's no real
excuse...).
And the SGML spec is really a monstrosity, so I would not blame him for
seeking help here, before he dives into the spec.
But I must admit that the answer is pretty obvious for everyone with some
understanding of HTML and SGML - even without reading the spec.
 
J

Jukka K. Korpela

I though someone might bring that up. :)

Usually you pop up once a year to advertize your purported validator, and
regulars do remember you.
Even professionals don't know
everything and rely on their peers for help.

Pruhfessionals surely don't know everything.

Is it professional to ask about the very basics of validation _years_ after
you've started marketing something that you call a validator?
Besides, I've explained
before that CSE HTML Validator is not a DTD validator.

And despite the fact that the name is thus seriously misleading, you keep
using it.
Even you said
somewhere on your web site that, basically, it's probably a lost cause
to argue about your definition of validator being the only correct one.

Your prose seems to be as sloppy as your false validator. You attribute a
statement to me, without citing any specific page, so that people really
cannot check _how_ badly you misrepresent me in your attempt to defend your
product and its false labeling. And you make foolish remarks by calling the
definition of validator in markup context _my_ definition.
 
J

Jukka K. Korpela

Benjamin Niemann said:
The problem is probably that the SGML specification is not freely (as
in beer) available - in contrast to the XML specification.

I would expect someone to spend a few bucks on the very basic documents, if
he intends to write software like a validator.
And the SGML spec is really a monstrosity,

No doubt about that. That's one reason why people shouldn't try to write
software like validators light-heartedly.
But I must admit that the answer is pretty obvious for everyone with
some understanding of HTML and SGML - even without reading the spec.

In fact, it can only be known from the SGML standard - either by actually
reading the standard, or consulting someone you trust to have read it,
understood it, and willing to help you. As far as I remember, the HTML
specifications, for example, don't mention this issue at all - in
principle, the reference to SGML or XML is sufficient, and the authors of
HTML specs didn't bother mentioning this particular detail. It is true that
e.g. the HTML 4.01 specification explicitly mentions that element names are
case insensitive. But it does not say that the "html" in a DOCTYPE
declaration is an element name (still less that it a generic identifier,
which is the SGML jargon for element name).

Reading the HTML spec can actually make you doubt what it wants to say.
The hilariously titled "HTML version information" section,
http://www.w3.org/TR/html4/struct/global.html#h-7.2
says: "HTML 4.01 specifies three DTDs, so authors must include one of the
following document type declarations in their documents." This sentence
seems to present a logical implication (with "so"), but it's a non
sequitur. The use of one of three document type definitions does not imply
that you need to use one of three document type declarations. (Anyone who
does not see this should not dream of writing a validator, or feel
competent to discuss what is a validator and what is not, before spending
quite some time in a study room with good books.)

From the SGML viewpoint, the wording would best be understood so that the
three document type declarations are just _examples_. But we know that
browsers have based "doctype sniffing" on the document type declarations,
so that e.g. the presence or absence of a URL can be decisive.

If the words "authors must include one of the following document type
declarations in their documents" are to be read as an independent
requirement imposed in conforming documents (and not as a logical
implication from something else), the next question is: how literally shall
this be interpreted? When presenting
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
as one of the three permitted options, does the specification mandate this
exact format with
- the redundant URL included
- uppercase in keywords DOCTYPE, HTML, and PUBLIC
- that particular use of white space?
Even the last point is not self-evident. By SGML rules, the amount and kind
of white space between the two quoted strings here is not significant; but
it all seems like the spec wants to impose _additional_ requirement,
requiring a very specific document type declaration.

Not yet sufficiently confused? Then please read how the online version of
the HTML specification itself starts at source level:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
(I'm not referring to Transitional vs. Strict issue, in which the
specification actually practices something else than it teaches; here I'm
referring to the use of a document type declaration without a URL.)
 
N

Neredbojias

With neither quill nor qualm, Jukka K. Korpela quothed:
Is it professional to ask about the very basics of validation _years_ after
you've started marketing something that you call a validator?

Oh, come on! Microsoft never asks anybody anything and look where
they've ended up. I've heard they employ only those people who were in
the top 10% of their classes. Ergo, there must be lot of tall techies
in Seattle.
 
A

Albert Wiersch

Jukka K. Korpela said:
And despite the fact that the name is thus seriously misleading, you keep
using it.

It's only misleading to you and a few others who insist on only one
definition of validator. To the vast majority of people, it's not
misleading. An HTML validator is simply a program that checks HTML documents
for problems.
Your prose seems to be as sloppy as your false validator. You attribute a
statement to me, without citing any specific page, so that people really
cannot check _how_ badly you misrepresent me in your attempt to defend
your
product and its false labeling. And you make foolish remarks by calling
the
definition of validator in markup context _my_ definition.

It is _your_ definition because you've chosen to make it your only one. When
I say _your_, I don't mean that you were the source of the definition, I
mean that you have chosen to make it yours by defending it as you do. Life
gets very complicated when you look at everything so technically!

Here's the page I was referring to (at the bottom - final notes section):
http://www.cs.tut.fi/~jkorpela/html/validation.html

Albert
 
J

Jukka K. Korpela

Albert Wiersch said:
Here's the page I was referring to (at the bottom - final notes section):
http://www.cs.tut.fi/~jkorpela/html/validation.html

Thank you. Now it is easy to anyone interested to see that you
misrepresented what I have written. This is not surprising, since you keep
misrepresenting your product (which basically just reports what _you_
regard as "error" or as worth of a warning) as a validator.

I think I don't want to know why you asked your "technical DOCTYPE
question".
 
M

Mark Parnell

Previously in alt.html, Albert Wiersch
An HTML validator is simply a program that checks HTML documents
for problems.

No, it's a program that checks HTML documents for *validity*. Otherwise
it would be a problemator.

You can't determine whether something is valid unless you have a
reference to base that determination on. In the case of HTML, that's the
DTD. I'd be interested to know what other basis of validation there
could possibly be for HTML. Surely not some random list of "problems"
that an individual software author comes up with.
 
A

Albert Wiersch

Mark Parnell said:
You can't determine whether something is valid unless you have a
reference to base that determination on. In the case of HTML, that's the
DTD. I'd be interested to know what other basis of validation there
could possibly be for HTML. Surely not some random list of "problems"
that an individual software author comes up with.

The basis used by CSE HTML Validator is a combination of the HTML
specification, DTDs, browser extensions, user issues and problems that have
come up over time, and what works in the real world. It's put together by
myself and is extremely useful in writing good HTML. Much more so than just
a DTD validator, at least in my opinion (and many others). I certainly would
never limit myself to checking HTML documents using only a DTD validator.

For example, a document like the following is completely valid according to
a DTD validator, even though it contains numerous problems. For more info
see
http://www.htmlvalidator.com/htmlval/whycseisbetter.html


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Untitled</title>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="generator" content="CSE HTML Validator Professional
(http://www.htmlvalidator.com/)">

<style type="text/csss">
p { color: black}
</style>

</head>

<body bgcolor="fffff">

<p style="color: greeen"></p>

img src="somepic.jpg">

<img src="somepic.jpg" alt="description" width="50text">

<a href="http://www.domain.com/images\image.jpg">...</a>

<a href="htp://www.domain.com/">...</a>

<a href="(e-mail address removed)">[email protected]</a>

<a href="mailto:[email protected]">[email protected]</a>

<p>He said &quot;I'll be there!&quot</p>

<a href="#chapter5">...</a>

<a href="http://www.domain.com/" target="_new">...</a>

<a href="http://www.domain.com/ ">...</a>

<a name="atagid" id="atagid2">...</a>


<table><tr><td>...</table>

<!-- This is a comment with an ending style that may cause problems in
browsers -- >

</body>
</html>
 
M

Mark Parnell

Previously in alt.html, Albert Wiersch
The basis used by CSE HTML Validator is a combination of the HTML
specification, DTDs, browser extensions, user issues and problems that have
come up over time, and what works in the real world.

An admirable list.
It's put together by
myself and is extremely useful in writing good HTML.

I'll not deny that.
Much more so than just
a DTD validator, at least in my opinion (and many others). I certainly would
never limit myself to checking HTML documents using only a DTD validator.

I have no problem with that. Though no program is ever going to be a
substitute for testing it properly in browsers yourself.
For example, a document like the following is completely valid according to
a DTD validator, even though it contains numerous problems.

Yes. But it *is* still valid. There is a big difference between validity
and good practice (which you obviously realise), and CSE attempts to
bridge that gap. But that doesn't make it a validator.

I'm sure that when I've looked at CSE in the past, it didn't actually
check the validity of the document. Looking at that page, it looks like
perhaps it now does.

Does CSE check the validity of the HTML against the DTD, as well as
checking for other potential problems? If so, then I have no problem
with it being called a validator, though it certainly is much more than
that.
 
A

Albert Wiersch

Mark Parnell said:
Previously in alt.html, Albert Wiersch

Does CSE check the validity of the HTML against the DTD, as well as
checking for other potential problems? If so, then I have no problem
with it being called a validator, though it certainly is much more than
that.

Hi Mark,

Technically, no. It doesn't use an SGML parser for "official" validation.
There's been virtually NO demand to add DTD validation to the program so I
have not done much research into adding it. If and when there is a good
demand for it (which I believe will be unlikely), then I would definitely
investigate it. There was enough demand to add HTML Tidy support, so it now
has the option and capability to display HTML Tidy's messages with CSE's
messages or by themselves. I could do something similar with the results
from a real SGML parser, but, like I said, it's not what people really want.
 
A

Albert Wiersch

Toby Inkster said:
CSE comes up with three "errors" that are certainly not really errors when
validating:

http://tobyinkster.co.uk/

See if you can fix that, will you?

I ran that URL through CSE and there was only one error relating to
accessibility and Section 508 accessibility standards.

What three errors are you referring to that CSE comes up with?

Are you referring to this?
[41] The "div" tag has no attributes. Attributes are normally used with the
"div" element to provide functionality.

Those are considered informational messages and are not counted as errors by
CSE HTML Validator. If you don't care to see that message, then you just
need to right-click on it and disable it.
 
J

Jukka K. Korpela

Mark Parnell said:
Previously in alt.html, Albert Wiersch


An admirable list.

The bottom line is that it checks what Albert Wiersch regards as rules for
HTML. This includes ignoring parts of the specification he does not like or
does not understand as well as things he just made up in his mind.
Undoubtedly some features happen to coincide with expert consensus on what
is actually good or bad. But poor users have little way of distinguishing
such features from what the phony validator reports as errors just because
Albert Wiersch thinks so.
Does CSE check the validity of the HTML against the DTD,

We already got the answer that it doesn't, and how could it, when its
author does not know SGML?
- - as well as
checking for other potential problems? If so, then I have no problem
with it being called a validator, though it certainly is much more than
that.

I would have no problem with saying that it is not a validator even if it
did that. The point is that it reports as _errors_ things that are not
reportable markup errors. If it reported them as warnings, we might say
that it is a validator with some extra features (as the W3C and WDG
validators actually are).
 
T

Toby Inkster

Albert said:
I ran that URL through CSE and there was only one error relating to
accessibility and Section 508 accessibility standards.

http://online.htmlvalidator.com/php/onlinevallite.php says:

"The character reference "ト" is not recognized. Did you misspell
it? Did you forget to end the reference with a semicolon?"

Ummm... of course I didn't miss the semicolon -- the semicolon is right
there in the bit you quoted!

Ditto: "The character reference "ビ" is not recognized. Did you
misspell it? Did you forget to end the reference with a semicolon?"

Ditto: "The character reference "–" is not recognized. Did you
misspell it? Did you forget to end the reference with a semicolon?"

All them be proper character references.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

DOCTYPE + SAX 2
Scrollbar vsus doctype query 10
Html page Doctype and Meta 22
doctype conditionals 5
Doctype screws up file 6
Getting DOCTYPE right 7
DOCTYPE related question 7
What doctype should I use ? 26

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top