May a CDATA section appear in an attribute value?

J

Jon Noring

Out of curiosity, may a CDATA section appear within an attribute
value with datatype CDATA? And if so, how about other attribute
value datatypes which accept the XML markup characters?

To me, the XML specification seems a little ambiguous on this, so
I defer to the XML authorities. Refer to sections 2.4 and 2.7 (it all
hinges on if CDATA attribute values are part of markup or not.)

Thanks.

Jon
 
M

mgungora

As I understand from the XML 1.0 spec, attribute value is a kind of a
literal which cannot start with ...<... or ...&...(unless it's a
reference). So, the answer is "no".

Regards,
-murat
 
P

Peter Flynn

Jon said:
Out of curiosity, may a CDATA section appear within an attribute
value with datatype CDATA?

No. You can't have declaration markup in attribute values.
And if so, how about other attribute
value datatypes which accept the XML markup characters?

No attribute types allow element or declaration markup in their values.
To me, the XML specification seems a little ambiguous on this,

No, it's quite specific: Production 41, Well-Formedness Constraint:
"No < in Attribute Values"
I defer to the XML authorities. Refer to sections 2.4 and 2.7 (it all
hinges on if CDATA attribute values are part of markup or not.)

It doesn't really have anything at all to do with CDATA attribute
values. There is an unfortunately (hereditary) semantic distinction
between what CDATA means in attribute declarations and what CDATA
means in Marked Sections, which you probably don't want to investigate
unless you're a masochist (but it doesn't even have much to do with
that either :)

It's a restriction in XML that you cannot have the open-angle bracket
in an attribute value. Period. Not for any reason. (You *could* do
this in SGML, but this was one of the sacrifices we had to make to
get a more extensible and easily-programmed language).

If you could give us some idea of what you wanted this for, perhaps
there is another way to solve the problem.

///Peter
 
J

Jon Noring

Peter Flynn answered:
Jon Noring asked:
... No, it's quite specific: Production 41, Well-Formedness
Constraint: "No < in Attribute Values"

It's a restriction in XML that you cannot have the open-angle
bracket in an attribute value. Period. Not for any reason. (You
*could* do this in SGML, but this was one of the sacrifices we had
to make to get a more extensible and easily-programmed language).

Thanks! Somehow I missed that particular well-formedness constraint
given in production 41. This constraint clearly trumps any other
ambiguities that there may be about using a CDATA section within
attribute values. No question about it -- CDATA sections must not
appear in attribute values.

Now, to address a slightly different issue, in my reading of that
constraint, it seems like the "<" character may not literally appear
(not as part of any markup) in an attribute value, whether directly
encoded, as a numeric character reference, or as part of a defined
general entity. It leaves out the ability of XML document authors to
use that character, in a literal fashion, within attribute values of
datatype CDATA. For example, this appears to not be allowed (where
"<" == "<"):

If you could give us some idea of what you wanted this for, perhaps
there is another way to solve the problem.

I don't have a particular problem. Rather it's simply trying to gain
a thorough understanding of using CDATA sections in XML documents
from an XML document authoring perspective.

But since you mention it, I am curious to know how an XML document
author may include the literal "<" character in a CDATA attribute
value. As noted above, it does not appear it is possible. Assuming
this indeed is the case, then the only way I can think of to get
around this would be to use a similar Unicode character. For example,
from the Unicode Basic Latin script chart the following are similar
characters:

x2039 single left-pointing angle quotation
x2329 left-pointing angle bracket
x27E8 mathematical left angle bracket
x3008 left angle bracket

But this kludge is still not very satisfying and has presentational
issues.

Thanks.

Jon Noring
 
A

Andrew Thompson

Jon said:
But since you mention it, I am curious to know how an XML document
author may include the literal "<" character in a CDATA attribute
value. As noted above, it does not appear it is possible.

<WAG>
Convert to an HTML entity? E.G. < = &lt;
</WAG>
 
J

Jon Noring

Andrew said:
Jon Noring wrote:
<WAG>
Convert to an HTML entity? E.G. < = &lt;
</WAG>

My prior message noted what the XML 1.0 Spec seems to say about
putting a literal "<" character into an attribute value: it appears
that it can't be done, even with an entity reference.

Here's the relevant section in XML 1.0:

http://www.w3.org/TR/REC-xml/#sec-starttags

Which says:

"Well-formedness constraint: No < in Attribute Values

"The replacement text of any entity referred to directly or
indirectly in an attribute value MUST NOT contain a <."


Now, being a little dense at times, maybe I'm misinterpreting what
the XML spec is saying, but it seems to me that the "<" character may
*never* appear in the attribute value of a well-formed XML document no
matter how it is done, encoded, directly and indirectly.

Am I right?

Jon
 
R

Richard Tobin

Jon Noring said:
"The replacement text of any entity referred to directly or
indirectly in an attribute value MUST NOT contain a <."

The replacement text of the lt attribute is < which does
not contain a <. Note that < is a character reference,
not an entity reference. You can also use < directly in
attributes.

-- Richard
 
J

Jon Noring

Richard said:
Jon Noring wrote:
The replacement text of the lt attribute is < which does
not contain a <. Note that < is a character reference,
not an entity reference. You can also use < directly in
attributes.

Yes, the XML spec does note that a numeric character reference is not
an entity, nor is "&lt;", which is called a "string" even though its
structure suggests an entity reference.

In addition, the original 1998 XML spec, in rule 41, specifically
notes the following:

"The replacement text of any entity referred to directly or
indirectly in an attribute value (other than "&lt;") must not
contain a <."

So, the original intent was to allow "&lt;" to represent the "<"
character in attribute values (and by section 2.4 also allow the
numeric character reference of < / < ). Tim Bray
commented on the above constraint in his well-known Annotated XML
Specification: http://www.xml.com/axml/notes/NoLTinAtt.html

"Banishing the < ... This rule might seem a bit unnecessary, on
the face of it. Since you can't have tags in attribute values,
having an < can hardly be confusing, so why ban it?

"This is another attempt to make life easy for the DPH ["Desperate
Perl Hacker"]. The rule in XML is simple: when you're reading text,
and you hit a <, then that's a markup delimiter. Not just
sometimes, always. When you want one in the data, you have to use
&lt;. Not just sometimes, always. In attribute values too.

"This rule has another unintended beneficial side-effect; it makes
the catching of certain errors much easier. Suppose you have a
chunk of XML as follows:

<a href="notes.html> <img src='notes.gif'></a>

"Notice that the notes.html is missing its closing quote. Without
the no-&lt; rule, it would be really hard to detect this problem
and issue a reasonable error message. Since attribute values can
contain almost anything, no error would be detected until the
processor finds the next quotation mark. Instead, you get an error
message the first time you hit a <, which in the example above, as
in many cases, is almost immediately."


So, from the possibilities list I previously posted:

1) <foo bar="is x < y ?">

2) <foo bar="is x &lt; y ?">

3) <foo bar="is x < y ?">

4) <foo bar="is x &lessthan; y ?"

a) where in the DTD we have <!ENTITY lessthan "<">

b) where in the DTD we have <!ENTITY lessthan "&lt;">

c) where in the DTD we have <!ENTITY lessthan "<">


It would seem like all are permissable except for #1 and #4a since
they involve the literal "<" character.

Am I right on this?

Thanks.

Jon
 
P

Peter Flynn

Jon said:
Peter Flynn answered:


Thanks! Somehow I missed that particular well-formedness constraint
given in production 41. This constraint clearly trumps any other
ambiguities that there may be about using a CDATA section within
attribute values. No question about it -- CDATA sections must not
appear in attribute values.

It's more fundamental than that: CDATA sections are for enclosing
pieces of your document *text* that contain markup characters < and &
that you do not want to be interpreted as markup. For example:

<para>To create the header of your web page, type the following:</para>
<programlisting><![CDATA[
<html>
<head>
<title>My first web page</title>
</head>
]]></programlisting>

I'm curious to know how the question could arise of such data appearing
in an attribute value. It's always very helpful to documentation writers
to understand the thought-processes or reading experiences that lie
behind people's acquisition of knowledge, because it's something that
rarely comes to light, and it can help make documentation more useful.
(If you have the time to explain...offline :)
Now, to address a slightly different issue, in my reading of that
constraint, it seems like the "<" character may not literally appear
(not as part of any markup) in an attribute value,
Correct.

whether directly
encoded, as a numeric character reference, or as part of a defined
general entity.

The restriction is only on the literal < character itself. The character
entity reference &lt; and the decimal or hexadecimal equivalent are
perfectly valid in CDATA attribute values (indeed some document types
actually rely on this).
It leaves out the ability of XML document authors to
use that character, in a literal fashion, within attribute values of
datatype CDATA. For example, this appears to not be allowed (where
"<" == "<"):

<header title="Is A < B?"> ... </header>

No, that's perfectly valid. So is title="Is A&lt;C" (assuming lt is
declared, either explicitly or implicitly).

As I mentioned, SGML allowed markup start characters in attributes,
I don't have a particular problem. Rather it's simply trying to gain
a thorough understanding of using CDATA sections in XML documents
from an XML document authoring perspective.

OK...the objective is as above: to stop the parser from interpreting
markup characters as markup. In a CDATA section, < and & are just
text.
But since you mention it, I am curious to know how an XML document
author may include the literal "<" character in a CDATA attribute
value.

As &lt; or the numeric equivalent.

///Peter
 
J

Jon Noring

Peter said:
[explaining about the issue of "<" in attribute values]

Peter, thanks! You've clarified the issue very well. Very
valuable information.

Jon
Peter said:
[explaining about the issue of "<" in attribute values
in two separate messages.]

Peter, thanks! You've clarified the issue very well. Very
valuable information.

Jon
 
R

Richard Tobin

Jon Noring said:
So, from the possibilities list I previously posted:

1) <foo bar="is x < y ?">

2) <foo bar="is x &lt; y ?">

3) <foo bar="is x < y ?">

4) <foo bar="is x &lessthan; y ?"

a) where in the DTD we have <!ENTITY lessthan "<">

b) where in the DTD we have <!ENTITY lessthan "&lt;">

c) where in the DTD we have <!ENTITY lessthan "<">


It would seem like all are permissable except for #1 and #4a since
they involve the literal "<" character.

4c is also illegal, because character references (unlike entity
references) are expanded at entity definition time, so the replacement
text of lessthan contains a real "<" character.

This would be legal: <!ENTITY lessthan "&#x003C;"> since its
replacement text is "<".

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top