May a CDATA section appear in an attribute value?

Discussion in 'XML' started by Jon Noring, Nov 14, 2005.

  1. Jon Noring

    Jon Noring Guest

    Out of curiosity, may a CDATA section appear within an attribute
    value with datatype CDATA? And if so, how about other attribute
    value datatypes which accept the XML markup characters?

    To me, the XML specification seems a little ambiguous on this, so
    I defer to the XML authorities. Refer to sections 2.4 and 2.7 (it all
    hinges on if CDATA attribute values are part of markup or not.)

    Thanks.

    Jon
    Jon Noring, Nov 14, 2005
    #1
    1. Advertising

  2. Jon Noring

    mgungora Guest

    As I understand from the XML 1.0 spec, attribute value is a kind of a
    literal which cannot start with ...<... or ...&...(unless it's a
    reference). So, the answer is "no".

    Regards,
    -murat
    mgungora, Nov 14, 2005
    #2
    1. Advertising

  3. Jon Noring

    Peter Flynn Guest

    Jon Noring wrote:

    > Out of curiosity, may a CDATA section appear within an attribute
    > value with datatype CDATA?


    No. You can't have declaration markup in attribute values.

    > And if so, how about other attribute
    > value datatypes which accept the XML markup characters?


    No attribute types allow element or declaration markup in their values.

    > To me, the XML specification seems a little ambiguous on this,


    No, it's quite specific: Production 41, Well-Formedness Constraint:
    "No < in Attribute Values"

    > I defer to the XML authorities. Refer to sections 2.4 and 2.7 (it all
    > hinges on if CDATA attribute values are part of markup or not.)


    It doesn't really have anything at all to do with CDATA attribute
    values. There is an unfortunately (hereditary) semantic distinction
    between what CDATA means in attribute declarations and what CDATA
    means in Marked Sections, which you probably don't want to investigate
    unless you're a masochist (but it doesn't even have much to do with
    that either :)

    It's a restriction in XML that you cannot have the open-angle bracket
    in an attribute value. Period. Not for any reason. (You *could* do
    this in SGML, but this was one of the sacrifices we had to make to
    get a more extensible and easily-programmed language).

    If you could give us some idea of what you wanted this for, perhaps
    there is another way to solve the problem.

    ///Peter
    --
    XML FAQ: http://xml.silmaril.ie/
    Peter Flynn, Nov 14, 2005
    #3
  4. Jon Noring

    Jon Noring Guest

    Peter Flynn answered:
    > Jon Noring asked:


    >> Out of curiosity, may a CDATA section appear within an attribute
    >> value with datatype CDATA?


    > ... No, it's quite specific: Production 41, Well-Formedness
    > Constraint: "No < in Attribute Values"
    >
    > It's a restriction in XML that you cannot have the open-angle
    > bracket in an attribute value. Period. Not for any reason. (You
    > *could* do this in SGML, but this was one of the sacrifices we had
    > to make to get a more extensible and easily-programmed language).


    Thanks! Somehow I missed that particular well-formedness constraint
    given in production 41. This constraint clearly trumps any other
    ambiguities that there may be about using a CDATA section within
    attribute values. No question about it -- CDATA sections must not
    appear in attribute values.

    Now, to address a slightly different issue, in my reading of that
    constraint, it seems like the "<" character may not literally appear
    (not as part of any markup) in an attribute value, whether directly
    encoded, as a numeric character reference, or as part of a defined
    general entity. It leaves out the ability of XML document authors to
    use that character, in a literal fashion, within attribute values of
    datatype CDATA. For example, this appears to not be allowed (where
    "<" == "<"):

    <header title="Is A < B?"> ... </header>


    > If you could give us some idea of what you wanted this for, perhaps
    > there is another way to solve the problem.


    I don't have a particular problem. Rather it's simply trying to gain
    a thorough understanding of using CDATA sections in XML documents
    from an XML document authoring perspective.

    But since you mention it, I am curious to know how an XML document
    author may include the literal "<" character in a CDATA attribute
    value. As noted above, it does not appear it is possible. Assuming
    this indeed is the case, then the only way I can think of to get
    around this would be to use a similar Unicode character. For example,
    from the Unicode Basic Latin script chart the following are similar
    characters:

    x2039 single left-pointing angle quotation
    x2329 left-pointing angle bracket
    x27E8 mathematical left angle bracket
    x3008 left angle bracket

    But this kludge is still not very satisfying and has presentational
    issues.

    Thanks.

    Jon Noring
    Jon Noring, Nov 15, 2005
    #4
  5. Jon Noring wrote:

    > But since you mention it, I am curious to know how an XML document
    > author may include the literal "<" character in a CDATA attribute
    > value. As noted above, it does not appear it is possible.


    <WAG>
    Convert to an HTML entity? E.G. < = &lt;
    </WAG>
    Andrew Thompson, Nov 15, 2005
    #5
  6. Jon Noring

    Jon Noring Guest

    Putting a "<" in an attribute value (was about CDATA sections)

    Andrew Thompson wrote:
    >Jon Noring wrote:


    >> But since you mention it, I am curious to know how an XML document
    >> author may include the literal "<" character in a CDATA attribute
    >> value. As noted above, it does not appear it is possible.


    > <WAG>
    > Convert to an HTML entity? E.G. < = &lt;
    > </WAG>


    My prior message noted what the XML 1.0 Spec seems to say about
    putting a literal "<" character into an attribute value: it appears
    that it can't be done, even with an entity reference.

    Here's the relevant section in XML 1.0:

    http://www.w3.org/TR/REC-xml/#sec-starttags

    Which says:

    "Well-formedness constraint: No < in Attribute Values

    "The replacement text of any entity referred to directly or
    indirectly in an attribute value MUST NOT contain a <."


    Now, being a little dense at times, maybe I'm misinterpreting what
    the XML spec is saying, but it seems to me that the "<" character may
    *never* appear in the attribute value of a well-formed XML document no
    matter how it is done, encoded, directly and indirectly.

    Am I right?

    Jon
    Jon Noring, Nov 15, 2005
    #6
  7. Re: Putting a "<" in an attribute value (was about CDATA sections)

    In article <>,
    Jon Noring <> wrote:

    > "The replacement text of any entity referred to directly or
    > indirectly in an attribute value MUST NOT contain a <."


    The replacement text of the lt attribute is < which does
    not contain a <. Note that < is a character reference,
    not an entity reference. You can also use < directly in
    attributes.

    -- Richard
    Richard Tobin, Nov 15, 2005
    #7
  8. Jon Noring

    Jon Noring Guest

    Re: Putting a "<" in an attribute value (was about CDATA sections)

    Richard Tobin wrote:
    > Jon Noring wrote:


    >> "The replacement text of any entity referred to directly or
    >> indirectly in an attribute value MUST NOT contain a <."


    > The replacement text of the lt attribute is < which does
    > not contain a <. Note that < is a character reference,
    > not an entity reference. You can also use < directly in
    > attributes.


    Yes, the XML spec does note that a numeric character reference is not
    an entity, nor is "&lt;", which is called a "string" even though its
    structure suggests an entity reference.

    In addition, the original 1998 XML spec, in rule 41, specifically
    notes the following:

    "The replacement text of any entity referred to directly or
    indirectly in an attribute value (other than "&lt;") must not
    contain a <."

    So, the original intent was to allow "&lt;" to represent the "<"
    character in attribute values (and by section 2.4 also allow the
    numeric character reference of < / < ). Tim Bray
    commented on the above constraint in his well-known Annotated XML
    Specification: http://www.xml.com/axml/notes/NoLTinAtt.html

    "Banishing the < ... This rule might seem a bit unnecessary, on
    the face of it. Since you can't have tags in attribute values,
    having an < can hardly be confusing, so why ban it?

    "This is another attempt to make life easy for the DPH ["Desperate
    Perl Hacker"]. The rule in XML is simple: when you're reading text,
    and you hit a <, then that's a markup delimiter. Not just
    sometimes, always. When you want one in the data, you have to use
    &lt;. Not just sometimes, always. In attribute values too.

    "This rule has another unintended beneficial side-effect; it makes
    the catching of certain errors much easier. Suppose you have a
    chunk of XML as follows:

    <a href="notes.html> <img src='notes.gif'></a>

    "Notice that the notes.html is missing its closing quote. Without
    the no-&lt; rule, it would be really hard to detect this problem
    and issue a reasonable error message. Since attribute values can
    contain almost anything, no error would be detected until the
    processor finds the next quotation mark. Instead, you get an error
    message the first time you hit a <, which in the example above, as
    in many cases, is almost immediately."


    So, from the possibilities list I previously posted:

    1) <foo bar="is x < y ?">

    2) <foo bar="is x &lt; y ?">

    3) <foo bar="is x < y ?">

    4) <foo bar="is x &lessthan; y ?"

    a) where in the DTD we have <!ENTITY lessthan "<">

    b) where in the DTD we have <!ENTITY lessthan "&lt;">

    c) where in the DTD we have <!ENTITY lessthan "<">


    It would seem like all are permissable except for #1 and #4a since
    they involve the literal "<" character.

    Am I right on this?

    Thanks.

    Jon
    Jon Noring, Nov 15, 2005
    #8
  9. Jon Noring

    Peter Flynn Guest

    Jon Noring wrote:

    > Peter Flynn answered:
    >> Jon Noring asked:

    >
    >>> Out of curiosity, may a CDATA section appear within an attribute
    >>> value with datatype CDATA?

    >
    >> ... No, it's quite specific: Production 41, Well-Formedness
    >> Constraint: "No < in Attribute Values"
    >>
    >> It's a restriction in XML that you cannot have the open-angle
    >> bracket in an attribute value. Period. Not for any reason. (You
    >> *could* do this in SGML, but this was one of the sacrifices we had
    >> to make to get a more extensible and easily-programmed language).

    >
    > Thanks! Somehow I missed that particular well-formedness constraint
    > given in production 41. This constraint clearly trumps any other
    > ambiguities that there may be about using a CDATA section within
    > attribute values. No question about it -- CDATA sections must not
    > appear in attribute values.


    It's more fundamental than that: CDATA sections are for enclosing
    pieces of your document *text* that contain markup characters < and &
    that you do not want to be interpreted as markup. For example:

    <para>To create the header of your web page, type the following:</para>
    <programlisting><![CDATA[
    <html>
    <head>
    <title>My first web page</title>
    </head>
    ]]></programlisting>

    I'm curious to know how the question could arise of such data appearing
    in an attribute value. It's always very helpful to documentation writers
    to understand the thought-processes or reading experiences that lie
    behind people's acquisition of knowledge, because it's something that
    rarely comes to light, and it can help make documentation more useful.
    (If you have the time to explain...offline :)

    > Now, to address a slightly different issue, in my reading of that
    > constraint, it seems like the "<" character may not literally appear
    > (not as part of any markup) in an attribute value,


    Correct.

    > whether directly
    > encoded, as a numeric character reference, or as part of a defined
    > general entity.


    The restriction is only on the literal < character itself. The character
    entity reference &lt; and the decimal or hexadecimal equivalent are
    perfectly valid in CDATA attribute values (indeed some document types
    actually rely on this).

    > It leaves out the ability of XML document authors to
    > use that character, in a literal fashion, within attribute values of
    > datatype CDATA. For example, this appears to not be allowed (where
    > "<" == "<"):
    >
    > <header title="Is A < B?"> ... </header>


    No, that's perfectly valid. So is title="Is A&lt;C" (assuming lt is
    declared, either explicitly or implicitly).

    As I mentioned, SGML allowed markup start characters in attributes,
    so <header title="Is A<C; B?">...</> would be OK in SGML. But to make
    it easier to write software for XML this feature was withdrawn in XML.

    >> If you could give us some idea of what you wanted this for, perhaps
    >> there is another way to solve the problem.

    >
    > I don't have a particular problem. Rather it's simply trying to gain
    > a thorough understanding of using CDATA sections in XML documents
    > from an XML document authoring perspective.


    OK...the objective is as above: to stop the parser from interpreting
    markup characters as markup. In a CDATA section, < and & are just
    text.

    > But since you mention it, I am curious to know how an XML document
    > author may include the literal "<" character in a CDATA attribute
    > value.


    As &lt; or the numeric equivalent.

    ///Peter
    Peter Flynn, Nov 15, 2005
    #9
  10. Jon Noring

    Jon Noring Guest

    Peter Flynn wrote:

    > [explaining about the issue of "<" in attribute values]


    Peter, thanks! You've clarified the issue very well. Very
    valuable information.

    Jon
    Peter Flynn wrote:

    > [explaining about the issue of "<" in attribute values
    > in two separate messages.]


    Peter, thanks! You've clarified the issue very well. Very
    valuable information.

    Jon
    Jon Noring, Nov 16, 2005
    #10
  11. Re: Putting a "<" in an attribute value (was about CDATA sections)

    In article <>,
    Jon Noring <> wrote:

    >So, from the possibilities list I previously posted:
    >
    > 1) <foo bar="is x < y ?">
    >
    > 2) <foo bar="is x &lt; y ?">
    >
    > 3) <foo bar="is x < y ?">
    >
    > 4) <foo bar="is x &lessthan; y ?"
    >
    > a) where in the DTD we have <!ENTITY lessthan "<">
    >
    > b) where in the DTD we have <!ENTITY lessthan "&lt;">
    >
    > c) where in the DTD we have <!ENTITY lessthan "<">
    >
    >
    >It would seem like all are permissable except for #1 and #4a since
    >they involve the literal "<" character.


    4c is also illegal, because character references (unlike entity
    references) are expanded at entity definition time, so the replacement
    text of lessthan contains a real "<" character.

    This would be legal: <!ENTITY lessthan "&#x003C;"> since its
    replacement text is "<".

    -- Richard
    Richard Tobin, Nov 16, 2005
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Davison
    Replies:
    1
    Views:
    569
    Hal Rosser
    Jul 7, 2004
  2. CarlosRivera
    Replies:
    5
    Views:
    784
    CarlosRivera
    Jan 16, 2005
  3. Replies:
    2
    Views:
    1,672
    Richard Tobin
    Nov 27, 2003
  4. Jon Noring
    Replies:
    5
    Views:
    2,857
    Peter Flynn
    Nov 16, 2005
  5. Replies:
    3
    Views:
    737
    Joe Kesselman
    Mar 6, 2006
Loading...

Share This Page