Use of < and > in XML

R

Roedy Green

Is this correct. In garden variety XML you can't have the characters <
and > in your data. You could only have them if you did some voodoo
to define the entities. The Entity definition is part of a DTD, not
he document itself.

Is there a vanilla DTD that will let you use the usual HTML entities.
see http://mindprod.com/jgloss/htmlentities.html
 
A

Arne Vajhøj

Roedy said:
Is this correct. In garden variety XML you can't have the characters <
and > in your data. You could only have them if you did some voodoo
to define the entities. The Entity definition is part of a DTD, not
he document itself.

No.

&gt; and &lt; (and &amp;) should always work.

Arne
 
S

Stefan Ram

Roedy Green said:
Is this correct. In garden variety XML you can't have the
characters < and > in your data.

There is no »garden variety XML«. There is XML 1.0

http://www.w3.org/TR/2000/REC-xml-20001006.html

and XML 1.1

http://www.w3.org/TR/xml11/

. Both do not differ with regard to your question.
You could only have them if you did some voodoo
to define the entities. The Entity definition is part of a
DTD, not he document itself.

The word »entity« does have another meaning in XML than you
believe.

»An XML document may consist of one or many storage units.
These are called entities;«

http://www.w3.org/TR/2000/REC-xml-20001006.html#sec-physical-struct

For example, the whole XML document is an entity,
the »document entity«.

You can represent »<« and »>« in XML as data characters in
several ways.

The most direct way is a CDATA section.

»<![CDATA[<greeting>Hello, world!</greeting>]]>«

http://www.w3.org/TR/2000/REC-xml-20001006.html#sec-cdata-sect

Another example would be:

»<![CDATA[<greeting>Hello<<<world!</greeting>]]>«

You can use character references for them:
»<« for »<«, and »>« for »>«. See:

http://www.w3.org/TR/2000/REC-xml-20001006.html#sec-references

You can use entity references for them:
»&lt;« for »<«, and »&gt;« for »>«.
These are entity /references/, not entities. Also see:

http://www.w3.org/TR/2000/REC-xml-20001006.html#sec-references

Finally, The use of »>« within data is allowed:

»CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)«

http://www.w3.org/TR/2000/REC-xml-20001006.html#syntax

Only »<« is not CharData. So this is well-formed XML:

<?xml version="1.0"?><greeting>Hello>>>world!</greeting>
 
S

Sherman Pendley

Roedy Green said:
Is this correct. In garden variety XML you can't have the characters <
and > in your data. You could only have them if you did some voodoo
to define the entities. The Entity definition is part of a DTD, not
he document itself.

True in general, but not in the specific case of those two characters,
since XML predefines five standard entities:

&lt;
&amp;
&gt;
&quot;
&apos;

So you can use those five without defining them in a DTD.
Is there a vanilla DTD that will let you use the usual HTML entities.
see http://mindprod.com/jgloss/htmlentities.html

It's not exactly "vanilla," but the W3C has split the definitions of
XHTML entities out into separate files from the main DTD:

<http://www.w3.org/TR/xhtml1/#h-A2>

At a quick glance, those don't appear to have any dependencies on the
main XHTML DTD.

sherm--
 
R

Roedy Green

Roedy Green said:
Is this correct. In garden variety XML you can't have the
characters < and > in your data.

There is no »garden variety XML«. There is XML 1.0

http://www.w3.org/TR/2000/REC-xml-20001006.html

and XML 1.1

http://www.w3.org/TR/xml11/

. Both do not differ with regard to your question.
You could only have them if you did some voodoo
to define the entities. The Entity definition is part of a
DTD, not he document itself.

The word »entity« does have another meaning in XML than you
believe.

»An XML document may consist of one or many storage units.
These are called entities;«

http://www.w3.org/TR/2000/REC-xml-20001006.html#sec-physical-struct

For example, the whole XML document is an entity,
the »document entity«.

You can represent »<« and »>« in XML as data characters in
several ways.

The most direct way is a CDATA section.

»<![CDATA[<greeting>Hello, world!</greeting>]]>«

http://www.w3.org/TR/2000/REC-xml-20001006.html#sec-cdata-sect

Another example would be:

»<![CDATA[<greeting>Hello<<<world!</greeting>]]>«

You can use character references for them:
»<« for »<«, and »>« for »>«. See:

http://www.w3.org/TR/2000/REC-xml-20001006.html#sec-references

You can use entity references for them:
»&lt;« for »<«, and »&gt;« for »>«.
These are entity /references/, not entities. Also see:

http://www.w3.org/TR/2000/REC-xml-20001006.html#sec-references

Finally, The use of »>« within data is allowed:

»CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)«

http://www.w3.org/TR/2000/REC-xml-20001006.html#syntax

Only »<« is not CharData. So this is well-formed XML:

<?xml version="1.0"?><greeting>Hello>>>world!</greeting>

I have composed an section in the Java glossary on this topic based on
what you two told me. See http://mindprod.com/jgloss/xml.html#AWKWARD
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top