Eric said:
I am not confusing anything.
Yes, you are. The Document Character Set is the set of characters that can
be displayed with a document. On the other hand, the character encoding is
how the these characters or references thereto are encoded. For example,
you can use each of the encodings US-ASCII, ISO-8859-1, UTF-7, UTF-8,
UTF-16LE, UTF-16BE, and UTF-32, among others, to encode HTML source code
that is used to display characters in UCS-2; the character entity reference
… or …, one of its corresponding character references, requires
only US-ASCII as an encoding to be used, but it refers to a character in
UCS-2 and therefore requires UCS-2 as DCS to be displayed.
http://www.w3.org/TR/html401/charset.html
If you want to say HTTP charset parameter, you could just do.
I could, nevertheless this is but a synonym for the former, and a
misleading/confusing one when talking about the differences between the DCS
and the character encoding. You want to leave it to me which synonym I choose.
Or say encoding, because that's less ambiguous.
That would be wrong in this context, because is not a synonym for the
former. The character encoding of a document resource may differ from what
was declared, which is the entire point of this thread.
After all charset is like referer, it's too old to be fixed.
You are not making sense.
In the SGML declaration CHARSET means document character set, by the way
(and in the public identifier of a document type declaration, DTD means
public text class ‘document type declaration subset’, not ‘document type
definition’; so much jargon, so little shoulders to stand on
.
You are only confusing more things.
http://www.w3.org/TR/html401/sgml/dtd.html
OTOH, I was just joking. When comparing comments you make and your
reactions on comments you get, you seem to be pretty undecided on
pedantery for its own sake after all. Good
I know what I am talking about, whereas you obviously know only half the
things you are talking about. However unfortunate, the latter is not bad in
itself. But it becomes bad when it causes you to add just more confusion to
an already hard-to-explain issue, and to give bad advice.
Nonsense. It describes the range of legal SGML characters an SGML parser
has to be able to deal with.
There is no contradiction, that is included in "can be displayed with an
HTML document". "to display" in English does not have the sole meaning of
"to show on a screen" (compare "dargestellt durch" in German).
Displaying is the job of the application; the latter might even be able
to deal with non-SGML characters. Why else would it be valid to use
character references to non-SGML characters in the document instance set?
Using character references to represent a character (of the DCS) is included
in "can be displayed with an HTML document".
LOL. In *practice*, it's Windows 1252, at least with a western European
locale. Worked for me on Mac OS 9, several GNU/Linux distributions, OS X,
maybe even Windows (<- heads up, joke). Supposedly for many other people
too, even on Solaris, but that's just hearsaying.
Nonsense. You really want to read the Specification about this:
http://www.w3.org/TR/html401/charset.html#h-5.2.2
Engineering tends to gravitate towards either reality or the bit bucket.
See above.
Don't be ridiculous. This was a statement, not a command. The explanation
for it followed below.
You should try reading ISO 8879 one day, to find out how it defines a
conforming application of SGML.
You should read the HTML 4.01 Specification more thoroughly, and test your
extended documents in some user agents once in a while (BTDT). What you
failed to observe to date is that the Specification prose is normative, too,
except places where it is defined informative. HTML is neither solely
defined by its DTD(s) nor is it implemented so.
http://www.w3.org/TR/html401/intro/sgmltut.html#h-3.2
The primary reason that I cannot do that is not because you say so (sorry
about that), but because the document type declaration subset is located
in the document type declaration, not the SGML declaration.
Nonsense. The document of an SGML application usually may contain the
declaration of an internal subset in its DOCTYPE declaration:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"
http://www.w3.org/TR/html4/loose.dtd"
[
<!ATTLIST img
onload CDATA #IMPLIED]>
But that is not allowed in HTML:
http://www.w3.org/TR/html401/struct/global.html#h-7.2
You are half right; because actual UAs don't support SGML, I've always
done that in the external subset (‘only five lines’).
<
http://lists.w3.org/Archives/Public/www-validator/2006Sep/0010.html>
(the details of w3c validation service output are like the seasons,
subject to arbitrary change
Only that this is no longer HTML as well, and therefore the least you can
expect is that user agents use Quirks mode:
,<
http://validator.w3.org/check?uri=http://bednarz.nl/tmp/nobr/&ss=1>
|
| [...]
|
| Potential Issues
|
| The following missing or conflicting information caused the validator
| to perform guesswork prior to validation. If the guess or fallback is
| incorrect, it may make validation results entirely incoherent. It is
| highly recommended to check these potential issues, and, if necessary,
| fix them and re-validate the document.
|
| /!\ Warning Unable to Determine Parse Mode!
|
| The validator can process documents either as XML (for document types
| such as XHTML, SVG, etc.) or SGML (for HTML 4.01 and prior versions).
| For this document, the information available was not sufficient
| to determine the parsing mode unambiguously, because:
|
| * the MIME Media Type (text/html) can be used for XML or SGML
| document types
| * the Document Type (
http://bednarz.nl/tmp/nobr/www.dtd) is not
| in the validator's catalog
| * No XML declaration (e.g <?xml version="1.0"?>) could be found
| at the beginning of the document.
|
| As a default, the validator is falling back to SGML mode.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PointedEars