Using XHTML entities in XML documents: Legal?

  • Thread starter Peter C. Chapin
  • Start date
P

Peter C. Chapin

I have a need to include Greek letters in some of my XML documents (the
documents contain astronomical information and many stars are named using
Greek letters). Following some earlier postings on the subject of
entities. I did the following

---- top of file ----
<?xml version="1.0"?>

<!-- I added this to an existing document. -->
<!DOCTYPE observation-set [
<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"xhtml-symbol.ent">
%HTMLsymbol;
]>

<?xml-stylesheet type="text/xsl" href="AOML.xsl"?>

<!-- This is the existing document root. -->
<observation-set
xmlns="http://www.ecet.vtc.edu/~pchapin/AOML_0.0"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.ecet.vtc.edu/~pchapin/AOML_0.0
AOML.xsd">

<!-- Now I believe I can use &alpha;, &beta;, etc. here. -->

</observation-set>
---- end of file ----

I'm attempting to borrow the entity definitions that were created for
XHTML. I downloaded the file xhtml-symbol.ent from the W3C and have a
copy locally in the same folder as the XML document that references it.
My desire was to now be able to use things like &alpha; and &beta; in my
XML document.

This mostly works. In particular, it works fine with IEv6. My XML
documents also validate (no complaints about undefined entities) with XSV
and XMLSpy (using MSXML, I believe). Also if I use Xalan to style
the document, it generates appropriate HTML. In fact I was able to prove
that Xalan is reading the external file containing the entity
definitions: I temporarily changed the definition of &alpha; to be the
same as &beta;. When Xalan wrote its output it serialized the character I
had written as "&alpha;" in the XML document into "&beta;" in the output
HTML document. Very cool.

However, with Mozilla v1.3 I get "undefined entity" errors. Even if I
include in the internal subset an explicit definition of the entities I'm
using, Mozilla still doesn't seem to notice them. Is this a problem with
Mozilla or am I missing something in my document? It is my desire to
support Mozilla so disregarding this problem is not really an option.

On a possibly related note, the Xerces (v2.3.0) parser seems to notice
the entities but it produces errors of this sort:

[Error] AO-2003-06-16.xml:16:75: Element type "observation-set" must be
declared.

The (line, column) of the error points to the end of the opening
observation-set tag. This error does not occur if I remove the <!DOCTYPE
observation-set [...]>. It almost seems as if Xerces sees the DOCTYPE
declaration and commits itself to the idea that a DTD is being used when,
in fact, the document uses an XML Schema. (It complains about all the
other elements as well, not just the document element). However, neither
XSV nor MSXML seemed to have that problem. Is this an issue with Xerces
or is mixing DOCTYPE and XML Schemas a bad thing?

Thanks for any clarification you can provide.

Peter
 
M

Martin Honnen

Peter said:
I have a need to include Greek letters in some of my XML documents (the
documents contain astronomical information and many stars are named using
Greek letters). Following some earlier postings on the subject of
entities. I did the following

---- top of file ----
<?xml version="1.0"?>

As you don't specify an encoding for your XML you use UTF-8 or UTF-16
both of which are capable to encode Greek letters without the need to
use entities.
So why do you need to use entities?
 
P

Peter C. Chapin

As you don't specify an encoding for your XML you use UTF-8 or UTF-16
both of which are capable to encode Greek letters without the need to
use entities.
So why do you need to use entities?

Well, I don't have an editor that allows me to easily enter or view Greek
letters. I have been meaning to look into the matter of editing "Unicode"
files (that is, files that use characters above U+007F to a non-trivial
extent). I haven't walked that road as yet and I guess I figured the
entity solution would address the matter for the half dozen or so greek
characters that I need per document in my current situation.

Since posting my original note I spent some time with the Mozilla bug
database. It turns out that Mozilla doesn't (at least old versions) read
external entities (apparently non-validating parsers are not required to
do so). Furthermore once it encounters a reference to an external entity
it stops processing the internal DTD subset. Apparently this is according
to the XML specification.

However, unlike my earlier assertion Mozilla does read the internal DTD
subset. The reason it didn't notice the Greek entity definitions that I
tried before was because I put them *after* the reference to the external
entity. When I remove the external entity entirely it works fine.

Thus I can get the effect I want if I define all the Greek letter
entities in the internal DTD subset of each document that I produce. That
is not ideal but it is workable, I think.

Peter
 
P

Peter C. Chapin

Then use references.

The document is far more readible and writable using, for example
"&alpha;" than it is using "α". While the numeric references do work
they don't really seem like a very nice solution in this case. I read and
write these documents manually.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top