Character data not allowed here? What?

J

Joshua Beall

Hi All,

The W3C's validator seems to return "character data not allowed here" for
HTML documents when you have a tag in <head> section that has no end tag,
and you put "/>" to close that tag. E.g., <meta http-equiv="Content-Type"
content="text/html;charset=iso-8859-1"/>

Here's an example page (ibm.com):


http://validator.w3.org/check?uri=http://jbeall.com/pages/dev/test.htm

The exact same document, only with the "/>" changed to ">" in the <head>
section

http://validator.w3.org/check?uri=h...automatically)&doctype=(detect+automatically)

Why is this? I knew that the "/>" for tags that do not have an end tag was
not *required* in HTML4, but I did not know it was *disallowed*?
Furthermore, why is it complaining about "character data"? There is no
character data, is there?

-Josh
 
J

Joshua Beall

Here's an example page (ibm.com):

Er, not ibm.com... sorry, did a revision with my own documents as examples,
and forgot to change that bit about ibm.com. I was going to use ibm.com as
an example because pointing the validator at their site does the same thing
I am talking about.
 
D

David Dorward

Joshua said:
The W3C's validator seems to return "character data not allowed here" for
HTML documents when you have a tag in <head> section that has no end tag,
and you put "/>" to close that tag. E.g., <meta http-equiv="Content-Type"
content="text/html;charset=iso-8859-1"/>

In XML <foo /> means <foo></foo>
In SGML <foo /> means <foo>&gt;

http://www.cs.tut.fi/~jkorpela/html/empty.html
 
J

Joshua Beall

David Dorward said:
In XML <foo /> means <foo></foo>
In SGML <foo /> means <foo>&gt;

So why does the validator not complain about <br/> and <img/>? I use both
of those all the time. I just tested <hr/> and it does, indeed, complain
for that one.

What gives?
 
D

David Dorward

Joshua said:
So why does the validator not complain about <br/> and <img/>? I use both
of those all the time. I just tested <hr/> and it does, indeed, complain
for that one.

What gives?

Anywhere that you are allowed a line break or an image, you are allowed
character data (i.e. a > character).

<!-- A > is not allowed here -->
<div>
<!-- A > is allowed here -->
</div>
 
E

Eric B. Bednarz

Gidday. Please consider fixing your quoting style by either upgrading
to a news reader or installing something like quotefix. Likewise, the
attribution line doesn't benefit from duplicating stuff like message
IDs. Thanks.
So why does the validator not complain about <br/> and <img/>? I use both
of those all the time.

While character data in the HEAD is always invalid, the situation for
children of BODY depends on the context. If BR and IMG appear as a
descendant of another element type in HTML 4 strict, the character '>'
is certainly valid, and in HTML 4 transitional character data is even
allowed as a child of BODY.
I just tested <hr/> and it does, indeed, complain
for that one.

Then you are using strict and HR is a child of body. Why would you want
to use incompatible syntax in the first place?
 
J

Jukka K. Korpela

Joshua Beall said:
So why does the validator not complain about <br/> and <img/>? I use
both of those all the time.

Because you use them in a context where character data _is_ allowed.
I just tested <hr/> and it does, indeed,
complain for that one.

It depends. If <hr/> appears directly as a sub-element of <body>, then
you get a syntax error when using Strict DTD, which disallows character
data directly inside <body>, but not when using Transitional DTD.
 
J

Joshua Beall

David Dorward said:
Anywhere that you are allowed a line break or an image, you are allowed
character data (i.e. a > character).

Ah, I see.

And all this time I thought a valid XHMTL document would also be a valid
HTML4 document (aside from doctype changes and things of that sort). I've
been living a lie! What a bummer.
 
D

Dennis Marks

Jukka K. said:
Because you use them in a context where character data _is_ allowed.


It depends. If <hr/> appears directly as a sub-element of <body>, then
you get a syntax error when using Strict DTD, which disallows character
data directly inside <body>, but not when using Transitional DTD.

I thought a space was required before the slash <br />.
Am I wrong?
 
D

David Dorward

Dennis said:
I thought a space was required before the slash <br />.
Am I wrong?

It is not required in XHTML, but it is required in the HTML compatability
guidelines (Appendix C of the XHTML 1.0 spec)
 
T

Toby A Inkster

David said:
In XML <foo /> means <foo></foo>
In SGML <foo /> means <foo>&gt;

Close, but no cigar.

In SGML, '<foo />' can mean '<foo></foo>' *or* '<foo>&gt;' depending on
various gubbins in the DTD. Specifically, if the SHORTTAGS feature is
switched on by the DTD (which is the case for HTML) and the DTD defines
'foo' as an empty element (which is the case for link and meta in HTML),
the latter interpretation is correct.

This is precisely why XML was invented -- it allows you to get an idea of
the structure of a document without having to look at the DTD.
 
E

Eric B. Bednarz

Toby A Inkster said:
Close, but no cigar.

Smoking can clog the brains.
In SGML, '<foo />' can mean '<foo></foo>' *or* '<foo>&gt;' depending on
various gubbins in the DTD.

Don't hesitate to elaborate on that. (In SGML, ':' *could* be NESTC and
')' NET, which effectively allows you to close instances of element
types with a content model of EMPTY with a smiley if EMPTYNRM YES is
specified in the SGML declaration as well.)
Specifically, if the SHORTTAGS feature is
switched on by the DTD (which is the case for HTML) and the DTD defines
'foo' as an empty element (which is the case for link and meta in HTML),
the latter interpretation is correct.

I wonder how you would accomplish either of both cases without SHORTTAG
features, Annex K taken in account or not, and precisely what the
relevance of the content model EMPTY is for the latter.

I also wonder what 'the DTD' means in this season.
 
T

Toby A Inkster

Eric said:
Don't hesitate to elaborate on that. (In SGML, ':' *could* be NESTC and
')' NET, which effectively allows you to close instances of element
types with a content model of EMPTY with a smiley if EMPTYNRM YES is
specified in the SGML declaration as well.)

True, but this is not relevant to the subject of this newsgroup, where
NESTC is effectively always '/' and NET always '>'.
I wonder how you would accomplish either of both cases without SHORTTAG
features

Both cases require SHORTTAG features, I never claimed otherwise. Really,
the difference between the cases is that HTML 'meta' and 'link' are
defined as empty elements with no end tag whereas in XML all elements must
be closed explicitly.

You clearly know a lot more about SGML than I do -- I only really tinker
with the bits relevant to XML, but my point stands -- that "In SGML <foo
/> means <foo>&gt;" is not necessarily correct. In SGML, one can't know
what <foo /> means until you know which delimiters and optional features
are being used, etc. It may mean <foo>&gt;, it may mean <foo></foo>, it
may mean something else entirely.
 
S

Steve Pugh

Joshua Beall said:
The W3C's validator seems to return "character data not allowed here" for
HTML documents when you have a tag in <head> section that has no end tag,
and you put "/>" to close that tag. E.g., <meta http-equiv="Content-Type"
content="text/html;charset=iso-8859-1"/>

Yep, why would you use XHTML syntax in an HTML page?

And why didn't you check the validator FAQ:
http://validator.w3.org/docs/help.html#faq-linkandmeta
Why is this? I knew that the "/>" for tags that do not have an end tag was
not *required* in HTML4, but I did not know it was *disallowed*?

It's not disallowed, but <foo /> means something very different in
HTML to what it means in XHTML.

See http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.7
Furthermore, why is it complaining about "character data"? There is no
character data, is there?

The > is character data.

The / closes the tag and hence > is left as character data in the
<head> of the page which is an error. A similar construct in the
<body> will not usually trigger an error as character data is usually
allowed there.

Steve
 
E

Eric B. Bednarz

Toby A Inkster said:
True, but this is not relevant to the subject of this newsgroup, where
NESTC is effectively always '/' and NET always '>'.

I think you are mixing something up here, this group is loosely about
HTML, not XML. For the SGML declaration of the latter, your statement
is correct. For HTML the NET delimiter is a matching pair of *one*
character, namely the solidus (prior to Web SGML Adaptions NESTC doesn't
exist, and the W3C in all of its wisdom decided not to take advantage of
Annex K for HTML).

This is <i/HTML/ (might even partly work in your news client ;-).
[...] but my point stands -- that "In SGML <foo
/> means <foo>&gt;" is not necessarily correct.

David could admittedly better have written something like 'SGML
applications using the reference concrete syntax' -- which applies to
the subject of this newsgroup anyway. On the other hand, correct
terminology is a very cumbersome subject; you can also frequently
witness otherwise smart regulars dumping phrases like 'the <foo> tag',
which is utter complete nonsense with infinite half life time, but at
least everybody understands what that means.

(Either of

* start tag of an instance of the element type FOO
* an instance of the element type FOO
* the element type FOO

:)


Cheers
 
M

Mark Parnell

Likewise, the
attribution line doesn't benefit from duplicating stuff like message
IDs. Thanks.

Yes it does. If I don't have the message he is replying to, I can click
on the message ID and attempt to download it.

(Yes, I know I don't quote the message ID, but I do find it useful
sometimes when others do.) :)
 
E

Eric B. Bednarz

Mark Parnell said:
Yes it does.

No it doesn't. This isn't a Monty Python sketch, is it?-)
If I don't have the message he is replying to, I can click
on the message ID and attempt to download it.

Oh. I find all IDs of preceding messages of the average Usenet threat
conveniently where I'd expect them without that. Besides, many news
client have at least more or less prominent commands to fetch the
previous message and to restore the whole threat (the spelling may
differ in some hierarchies) up to patient 0.
(Yes, I know I don't quote the message ID

There's no point in duplicating header information about date, time and
news groups -- except when crossposting (and|or) setting a follow-up --
either. If you think you must, that's fine, but mostly that's just a
default configuration that adds to the general noise and lowers
readability, especially when several quoted parties of a thread are
attributed with two or more lines (if you wanted to actually improve
things, you could append a standard keyword like 'wrote' or 'writes'
after the name/address to enable collapsing of quote levels in several
agents).
 
M

Mark Parnell

No it doesn't. This isn't a Monty Python sketch, is it?-)

Not that I know of. ;-)
Oh. I find all IDs of preceding messages of the average Usenet threat
conveniently where I'd expect them without that.

In the message headers? It takes up too much room to display all the
headers on screen all the time, so it is easier to have the message ID
there in the message than to have to display the headers - but I see
your point.
Besides, many news
client have at least more or less prominent commands to fetch the
previous message and to restore the whole threat (the spelling may
differ in some hierarchies) up to patient 0.

Mine doesn't seem to, (though maybe I just haven't looked hard enough)
but that's obviously a problem with my newsreader; nothing to do with
the messages I am reading. :)

And now for something completely different...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,999
Latest member
MakersCBDGummiesReview

Latest Threads

Top