Quick question on the presence of CDATA

Discussion in 'XML' started by Dilip, Oct 25, 2006.

  1. Dilip

    Dilip Guest

    I have been out of the XML world for a while and have sort of forgotten
    the exact difference between:

    <Symbol><![CDATA[IBM]]></Symbol>

    and just:

    <Symbol>IBM</Symbol>

    Can anyone tell me why one is preferred over the other?

    thanks!
    Dilip, Oct 25, 2006
    #1
    1. Advertising

  2. Followup to the Microsoft list doesn't work through my servers, so
    answering here...


    Dilip wrote:
    > <Symbol><![CDATA[IBM]]></Symbol>
    > <Symbol>IBM</Symbol>


    Identical meaning, since there aren't any special characters in the value.

    <!CDATA[]]> sections are an alternative to character-by-character
    escaping of characters that would otherwise confuse XML syntax (such as
    "<" and "&"). It escapes its entire contents -- with the exception of
    any ]]> sequences, which require special handling.

    Generally the only time you care about this is when you're hand-editing
    XML, want to drop non-XML text into the value of an XML element (note
    that you can't use this kluge for attribute values), and are too lazy to
    fix it up by hand. If you build your XML using any XML-aware tool, it
    should take care of the escaping for you and you don't have to care
    whether it escapes individual characters or uses <!CDATA[]]>


    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
    Joseph Kesselman, Oct 25, 2006
    #2
    1. Advertising

  3. Dilip

    Dilip Guest

    Joseph Kesselman wrote:
    > Followup to the Microsoft list doesn't work through my servers, so
    > answering here...
    >
    >
    > Dilip wrote:
    > > <Symbol><![CDATA[IBM]]></Symbol>
    > > <Symbol>IBM</Symbol>

    >
    > Identical meaning, since there aren't any special characters in the value.
    >
    > <!CDATA[]]> sections are an alternative to character-by-character
    > escaping of characters that would otherwise confuse XML syntax (such as
    > "<" and "&"). It escapes its entire contents -- with the exception of
    > any ]]> sequences, which require special handling.
    >
    > Generally the only time you care about this is when you're hand-editing
    > XML, want to drop non-XML text into the value of an XML element (note
    > that you can't use this kluge for attribute values), and are too lazy to
    > fix it up by hand. If you build your XML using any XML-aware tool, it
    > should take care of the escaping for you and you don't have to care
    > whether it escapes individual characters or uses <!CDATA[]]>


    Just so that I got this straight, from the standpoint of the XML parser
    does the 2 forms of elements make a difference? I mean, if I use XPath
    to locate that element to retrieve its value, will I get back IBM or
    something else?

    Sorry if the question sounds stupid. I remember what CDATA is about
    but I have forgotten what happens when a parser encounters it. (It
    probably just treats whatever is inside as plain text, right?)
    Dilip, Oct 25, 2006
    #3
  4. Dilip wrote:
    > Just so that I got this straight, from the standpoint of the XML parser
    > does the 2 forms of elements make a difference? I mean, if I use XPath
    > to locate that element to retrieve its value, will I get back IBM or
    > something else?


    XPath doesn't distinguish the two; both yield IBM.

    Parsers *CAN* distinguish the two, for the convenience of editors and
    other tools which want to be able to display syntax as well as semantics
    -- but aren't required to and often don't unless you ask them to.

    > probably just treats whatever is inside as plain text, right?)


    Modulo the difference in how escaping is handled, yes, pretty much. A
    SAX parser may tell the application that it's now inside the bounds of a
    CDATA section; the app needs to decide whether to listen for lexical
    events and whether it cares about this one. A DOM (depending on how the
    builder is configured) may display the data using a CDATASection Node
    rather than a Text Node, but the former is a subclass of the latter so
    again that doesn't matter unless the application cares about the difference.

    As far as the XML Infoset is concerned, <![CDATA[&a<]]> is just a
    representation of the character sequence &a< and is identical to
    &amp;a&lt; or &a< or &a< or any of the other possible
    combinations. The Infoset considers the differences between these to be
    No Difference.

    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
    Joseph Kesselman, Oct 25, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. a_srivathsan
    Replies:
    2
    Views:
    3,403
    a_srivathsan
    Sep 8, 2004
  2. |{evin

    MSN Presence info?

    |{evin, Jan 27, 2004, in forum: ASP .Net
    Replies:
    6
    Views:
    407
    |{evin
    Jan 28, 2004
  3. John Davison
    Replies:
    1
    Views:
    580
    Hal Rosser
    Jul 7, 2004
  4. Replies:
    3
    Views:
    750
    Joe Kesselman
    Mar 6, 2006
  5. JKop
    Replies:
    11
    Views:
    874
Loading...

Share This Page