Using XHTML entities in XML documents: Legal?

Discussion in 'XML' started by Peter C. Chapin, Jul 5, 2003.

  1. I have a need to include Greek letters in some of my XML documents (the
    documents contain astronomical information and many stars are named using
    Greek letters). Following some earlier postings on the subject of
    entities. I did the following

    ---- top of file ----
    <?xml version="1.0"?>

    <!-- I added this to an existing document. -->
    <!DOCTYPE observation-set [
    <!ENTITY % HTMLsymbol PUBLIC
    "-//W3C//ENTITIES Symbols for XHTML//EN"
    "xhtml-symbol.ent">
    %HTMLsymbol;
    ]>

    <?xml-stylesheet type="text/xsl" href="AOML.xsl"?>

    <!-- This is the existing document root. -->
    <observation-set
    xmlns="http://www.ecet.vtc.edu/~pchapin/AOML_0.0"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.ecet.vtc.edu/~pchapin/AOML_0.0
    AOML.xsd">

    <!-- Now I believe I can use &alpha;, &beta;, etc. here. -->

    </observation-set>
    ---- end of file ----

    I'm attempting to borrow the entity definitions that were created for
    XHTML. I downloaded the file xhtml-symbol.ent from the W3C and have a
    copy locally in the same folder as the XML document that references it.
    My desire was to now be able to use things like &alpha; and &beta; in my
    XML document.

    This mostly works. In particular, it works fine with IEv6. My XML
    documents also validate (no complaints about undefined entities) with XSV
    and XMLSpy (using MSXML, I believe). Also if I use Xalan to style
    the document, it generates appropriate HTML. In fact I was able to prove
    that Xalan is reading the external file containing the entity
    definitions: I temporarily changed the definition of &alpha; to be the
    same as &beta;. When Xalan wrote its output it serialized the character I
    had written as "&alpha;" in the XML document into "&beta;" in the output
    HTML document. Very cool.

    However, with Mozilla v1.3 I get "undefined entity" errors. Even if I
    include in the internal subset an explicit definition of the entities I'm
    using, Mozilla still doesn't seem to notice them. Is this a problem with
    Mozilla or am I missing something in my document? It is my desire to
    support Mozilla so disregarding this problem is not really an option.

    On a possibly related note, the Xerces (v2.3.0) parser seems to notice
    the entities but it produces errors of this sort:

    [Error] AO-2003-06-16.xml:16:75: Element type "observation-set" must be
    declared.

    The (line, column) of the error points to the end of the opening
    observation-set tag. This error does not occur if I remove the <!DOCTYPE
    observation-set [...]>. It almost seems as if Xerces sees the DOCTYPE
    declaration and commits itself to the idea that a DTD is being used when,
    in fact, the document uses an XML Schema. (It complains about all the
    other elements as well, not just the document element). However, neither
    XSV nor MSXML seemed to have that problem. Is this an issue with Xerces
    or is mixing DOCTYPE and XML Schemas a bad thing?

    Thanks for any clarification you can provide.

    Peter
     
    Peter C. Chapin, Jul 5, 2003
    #1
    1. Advertising

  2. Peter C. Chapin wrote:
    > I have a need to include Greek letters in some of my XML documents (the
    > documents contain astronomical information and many stars are named using
    > Greek letters). Following some earlier postings on the subject of
    > entities. I did the following
    >
    > ---- top of file ----
    > <?xml version="1.0"?>


    As you don't specify an encoding for your XML you use UTF-8 or UTF-16
    both of which are capable to encode Greek letters without the need to
    use entities.
    So why do you need to use entities?




    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, Jul 5, 2003
    #2
    1. Advertising

  3. In article <>,
    says...

    > As you don't specify an encoding for your XML you use UTF-8 or UTF-16
    > both of which are capable to encode Greek letters without the need to
    > use entities.
    > So why do you need to use entities?


    Well, I don't have an editor that allows me to easily enter or view Greek
    letters. I have been meaning to look into the matter of editing "Unicode"
    files (that is, files that use characters above U+007F to a non-trivial
    extent). I haven't walked that road as yet and I guess I figured the
    entity solution would address the matter for the half dozen or so greek
    characters that I need per document in my current situation.

    Since posting my original note I spent some time with the Mozilla bug
    database. It turns out that Mozilla doesn't (at least old versions) read
    external entities (apparently non-validating parsers are not required to
    do so). Furthermore once it encounters a reference to an external entity
    it stops processing the internal DTD subset. Apparently this is according
    to the XML specification.

    However, unlike my earlier assertion Mozilla does read the internal DTD
    subset. The reason it didn't notice the Greek entity definitions that I
    tried before was because I put them *after* the reference to the external
    entity. When I remove the external entity entirely it works fine.

    Thus I can get the effect I want if I define all the Greek letter
    entities in the internal DTD subset of each document that I produce. That
    is not ideal but it is workable, I think.

    Peter
     
    Peter C. Chapin, Jul 5, 2003
    #3
  4. Andreas Prilop, Jul 5, 2003
    #4
  5. In article <050720032109527832%-hannover.de>,
    -hannover.de says...

    > > Well, I don't have an editor that allows me to easily enter or view Greek
    > > letters.

    >
    > Then use references.


    The document is far more readible and writable using, for example
    "&alpha;" than it is using "α". While the numeric references do work
    they don't really seem like a very nice solution in this case. I read and
    write these documents manually.

    Peter
     
    Peter C. Chapin, Jul 6, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tom
    Replies:
    0
    Views:
    454
  2. Matthew Burgess
    Replies:
    3
    Views:
    479
    Toni Uusitalo
    Jul 28, 2003
  3. Tom
    Replies:
    0
    Views:
    587
  4. Jim Higson
    Replies:
    3
    Views:
    249
    Eric Amick
    Jul 25, 2004
  5. -Lost
    Replies:
    7
    Views:
    182
    Randy Webb
    Jun 20, 2007
Loading...

Share This Page