Re: Emitting   to HTML Output

Discussion in 'XML' started by Peter C. Chapin, Jun 26, 2003.

  1. In article <bd71va$1fc4$>,
    says...

    > You may notice that people are reluctant to give you a straight
    > answer. This is because it's not really in the spirit of XSLT to do
    > such a thing. XSLT stylesheets transform XML documents as trees, not
    > as text. Messing with the text output can produce documents that are
    > well-formed or (as in this case) refer to entities that may not be
    > defined.
    >
    > But if you really want to do it,
    >
    > <xsl:text disable-output-escaping="yes">&amp;nbsp;</xsl:text>


    This information was helpful to me. I've been trying to include a &copy;
    character in my output but when I include "&copy;" in my XSL style sheet
    I get errors about the undefined entity. Based on your posting I did

    <xsl:text disable-output-escaping="yes">&amp;copy;</xsl:text>

    and with Xalan it works just fine. Thanks!

    Interestingly with Mozilla v1.3 it does not work. I get "&copy;"
    displayed. So apparently Mozilla v1.3 does not know what to do with the
    "disable-output-escaping" attribute. I haven't tried it with IE yet, but
    I may do so later.

    Peter
     
    Peter C. Chapin, Jun 26, 2003
    #1
    1. Advertising

  2. "Peter C. Chapin" <> schrieb im Newsbeitrag
    news:...
    >
    > In article <bd71va$1fc4$>,
    > says...
    >
    > > You may notice that people are reluctant to give you a straight
    > > answer. This is because it's not really in the spirit of XSLT to do
    > > such a thing. XSLT stylesheets transform XML documents as trees, not
    > > as text. Messing with the text output can produce documents that are
    > > well-formed or (as in this case) refer to entities that may not be
    > > defined.
    > >
    > > But if you really want to do it,
    > >
    > > <xsl:text disable-output-escaping="yes">&amp;nbsp;</xsl:text>

    >
    > This information was helpful to me. I've been trying to include a &copy;


    Why? Just use "©"

    > character in my output but when I include "&copy;" in my XSL style sheet
    > I get errors about the undefined entity. Based on your posting I did
    >
    > <xsl:text disable-output-escaping="yes">&amp;copy;</xsl:text>
    >
    > and with Xalan it works just fine. Thanks!
    >
    > Interestingly with Mozilla v1.3 it does not work. I get "&copy;"
    > displayed. So apparently Mozilla v1.3 does not know what to do with the
    > "disable-output-escaping" attribute. I haven't tried it with IE yet, but
    > I may do so later.


    d-o-e is an *optional* XSLT feature. Some engines do not support it at all
    (Mozilla/transformix). Others only support it in specific cases. Do not rely
    on int.
     
    Julian F. Reschke, Jun 26, 2003
    #2
    1. Advertising

  3. "Peter C. Chapin" <> schrieb im Newsbeitrag
    news:...
    > In article <bde4vg$sbp68$>,
    > says...
    >
    > > > This information was helpful to me. I've been trying to include a

    &copy;
    > >
    > > Why? Just use "©"

    >
    > That's less readible. Also, since &copy; is more abstract, if a new
    > symbol was widely adopted for copyright the meaning of &copy; could be
    > changed accordingly in the specification and my documents would
    > automatically be upgraded. I admit that's probably not too likely to be


    Alas, this won't happen, because these code mappings are standardized.

    > an issue in this case. However, it seems a shame for HTML to have all
    > sorts of nice character entities and yet not be able to use them in an
    > XSLT style sheet without redefining them all. This seems like a
    > deficiency of XSLT to me.
    >
    > > d-o-e is an *optional* XSLT feature.

    >
    > Good to know. Thanks.
    >
    > Peter
    >
     
    Julian F. Reschke, Jun 26, 2003
    #3
  4. Peter C. Chapin wrote:
    > In article <bde4vg$sbp68$>,
    > says...
    >
    >
    >>>This information was helpful to me. I've been trying to include a &copy;

    >>
    >>Why? Just use "©"

    >
    >
    > That's less readible.


    If you want to _use_ &copy; in XSLT, you have to define it in an
    internal DTD subset, as someone has already posted in this thread.
    --
    Johannes Koch
    In te domine speravi; non confundar in aeternum.
    (Te Deum, 4th cent.)
     
    Johannes Koch, Jun 26, 2003
    #4
  5. In article <bdencq$rn0r8$>,
    says...

    > > > Why? Just use "©"

    > >
    > > That's less readible. Also, since &copy; is more abstract, if a new
    > > symbol was widely adopted for copyright the meaning of &copy; could be
    > > changed accordingly in the specification and my documents would
    > > automatically be upgraded. I admit that's probably not too likely to be

    >
    > Alas, this won't happen, because these code mappings are standardized.


    Well, suppose the publishing industry decided that, for whatever reason,
    they wanted to use a different symbol for copyright. Imagine a symbol
    resembling "-c-" instead of "(c)". Precisely because the current code
    mappings are standardized it probably wouldn't be a great idea to change
    the "normal" appearance of the character that currently looks like "(c)".
    Thus one might be tempted to introduce a new character with a different
    mapping for the new symbol (wasn't something like this done for the
    Euro?). One imagines that in this case the HTML specification would be
    eventually revised so that the entity &copy; would refer to the new
    symbol. However a reference to © would not follow the new convention
    (it would still be the old symbol, of course) and documents using it
    would need to be edited.

    I'm not saying that this is a likely scenerio. In fact, I'd say it's
    pretty darn unlikely in this case. However, the point is this: named
    entities offer a layer of abstraction over the numeric characters they
    represent. Such a layer allows, in general, for more robust documents in
    the face of changing standards. It's exactly the same issue as occurs in,
    for example, C programming:

    #define MAX_BUFFER_SIZE 1024 // Might want to change this later.

    ...

    if (index >= MAX_BUFFER_SIZE) error();

    Using 1024 in the body of the program is not recommended because if a
    change to that value is made the program must be (in general) manually
    updated. If many programs depend on this parameter the work involved
    could be considerable.

    Defining the entities in the document, as has been suggested, doesn't
    really address this matter. If in my XSLT stylesheet I define &copy; to
    be then the character U+00A9 will be inserted into my HTML.
    However, if that character stops being appropriate I'll have modify my
    stylesheet... exactly as if I had used directly (I can see that the
    modification would be somewhat easier to make, however).

    It seems to me like the "right" solution would be for XSLT to pass
    undefined entities directly to the target document literally somehow. I
    shouldn't have to know how HTML (or any other target markup) has defined
    a character entity to use it in a style sheet. The solution involving
    disable-output-escaping meets the requirements... but if it's optional in
    XSLT then it isn't as good a solution as it might be.

    Peter
     
    Peter C. Chapin, Jun 26, 2003
    #5
  6. In article <>,
    Peter C. Chapin <> wrote:

    >Interestingly with Mozilla v1.3 it does not work. I get "&copy;"
    >displayed. So apparently Mozilla v1.3 does not know what to do with the
    >"disable-output-escaping" attribute.


    Output escaping - and not escaping - only makes sense if you're
    outputting the data as XML (or HTML). In a browser, the transformed
    tree is not output in that sense, but displayed.

    Or to put it another way, disabling output escaping is a trick that
    lets you output something which will have a different syntactic
    significance when read in again; since it never gets read again
    in a browser, that significance never applies.

    -- Richard
    --
    Spam filter: to mail me from a .com/.net site, put my surname in the headers.

    FreeBSD rules!
     
    Richard Tobin, Jun 26, 2003
    #6
  7. Peter C. Chapin

    Micah Cowan Guest

    Peter C. Chapin <> writes:

    > In article <bde4vg$sbp68$>,
    > says...
    >
    > > > This information was helpful to me. I've been trying to include a &copy;

    > >
    > > Why? Just use "©"

    >
    > That's less readible. Also, since &copy; is more abstract, if a new
    > symbol was widely adopted for copyright the meaning of &copy; could be
    > changed accordingly in the specification and my documents would
    > automatically be upgraded. I admit that's probably not too likely to be
    > an issue in this case. However, it seems a shame for HTML to have all
    > sorts of nice character entities and yet not be able to use them in an
    > XSLT style sheet without redefining them all. This seems like a
    > deficiency of XSLT to me.


    Nonsense. You don't need to *explicitly* redefine them all. Just alter
    the DTD to include them. My stylesheets typically start with something
    like:

    <!DOCTYPE xsl:stylesheet [
    <!ENTITY % HTMLlat1 PUBLIC
    "-//W3C//ENTITIES Latin 1 for XHTML//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
    %HTMLlat1;

    <!ENTITY % HTMLsymbol PUBLIC
    "-//W3C//ENTITIES Symbols for XHTML//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent">
    %HTMLsymbol;

    <!ENTITY % HTMLspecial PUBLIC
    "-//W3C//ENTITIES Special for XHTML//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent">
    %HTMLspecial;
    ]>

    This should shut up any problems you have using HTML entities in your
    XSLT source. Of course, the DTD above won't make your XSLT source
    valid XML (as opposed to well-formed), but it wasn't anyway. (Making
    XSLT source to be valid XML is *way* too much work to be worth it).

    -Micah
     
    Micah Cowan, Jun 27, 2003
    #7
  8. Peter C. Chapin

    Micah Cowan Guest

    "Julian F. Reschke" <> writes:

    > "Peter C. Chapin" <> schrieb im Newsbeitrag
    > news:...
    > > In article <bde4vg$sbp68$>,
    > > says...
    > >
    > > > > This information was helpful to me. I've been trying to include a

    > &copy;
    > > >
    > > > Why? Just use "©"

    > >
    > > That's less readible. Also, since &copy; is more abstract, if a new
    > > symbol was widely adopted for copyright the meaning of &copy; could be
    > > changed accordingly in the specification and my documents would
    > > automatically be upgraded. I admit that's probably not too likely to be

    >
    > Alas, this won't happen, because these code mappings are standardized.


    And it shouldn't. If a new symbol were widely adopted with the same
    semantics, it really ought to be implemented as a font change, rather
    than getting its own code-point.

    -Micah
     
    Micah Cowan, Jun 27, 2003
    #8
  9. Peter C. Chapin

    Micah Cowan Guest

    Peter C. Chapin <> writes:

    > In article <bdencq$rn0r8$>,
    > says...
    >
    > > > > Why? Just use "©"
    > > >
    > > > That's less readible. Also, since &copy; is more abstract, if a new
    > > > symbol was widely adopted for copyright the meaning of &copy; could be
    > > > changed accordingly in the specification and my documents would
    > > > automatically be upgraded. I admit that's probably not too likely to be

    > >
    > > Alas, this won't happen, because these code mappings are standardized.

    >
    > Well, suppose the publishing industry decided that, for whatever reason,
    > they wanted to use a different symbol for copyright. Imagine a symbol
    > resembling "-c-" instead of "(c)".


    There is absolutely no reason why © couldn't be used for that as
    well: Unicode (in general) does not specify how the glyph should look,
    it only decrees that U+00A9 corresponds to a character with the
    semantic of representing "a copyright symbol", whatever that may
    mean. That symbol can look like anything the font-writer wants,
    provided it conveys the intended meaning.

    > Thus one might be tempted to introduce a new character with a different
    > mapping for the new symbol (wasn't something like this done for the
    > Euro?).


    Ah, but the euro is a completely new currency symbol -- a completely
    new semantic meaning. Or did you mean an alternative glyph?

    Unicode is supposed to avoid having different code-points for a single
    semantic meaning, but in practice it was unavoidable that many
    alternate versions be encoded, in order to support round-trip encoding
    for as many encodings as possible (i.e., that a document in a
    non-Unicode encoding could be transliterated into Unicode and back
    again without change).

    However, I believe they would avoid it in all cases which do not
    affect compatibility with other encodings.

    > I'm not saying that this is a likely scenerio. In fact, I'd say it's
    > pretty darn unlikely in this case. However, the point is this: named
    > entities offer a layer of abstraction over the numeric characters they
    > represent. Such a layer allows, in general, for more robust documents in
    > the face of changing standards. It's exactly the same issue as occurs in,
    > for example, C programming:
    >
    > #define MAX_BUFFER_SIZE 1024 // Might want to change this later.
    >
    > ...
    >
    > if (index >= MAX_BUFFER_SIZE) error();


    In general, yes; but in the case of the HTML character entities, not
    really. They serve more as a mnemonic than anything else: I doubt very
    much that ISO/W3C/Mr. Berners-Lee had any intentions of changing these
    once released; otherwise, they'd have said so.

    > It seems to me like the "right" solution would be for XSLT to pass
    > undefined entities directly to the target document literally
    > somehow.


    It can't do this and still produce well-formed XML, which is a
    reasonable expectation.

    > shouldn't have to know how HTML (or any other target markup) has defined
    > a character entity to use it in a style sheet. The solution involving
    > disable-output-escaping meets the requirements... but if it's optional in
    > XSLT then it isn't as good a solution as it might be.


    Include the appropriate external entities in your DTD instead (see my
    other post).

    -Micah
     
    Micah Cowan, Jun 27, 2003
    #9
  10. In article <>,
    says...

    > > Well, suppose the publishing industry decided that, for whatever reason,
    > > they wanted to use a different symbol for copyright. Imagine a symbol
    > > resembling "-c-" instead of "(c)".

    >
    > There is absolutely no reason why © couldn't be used for that as
    > well: Unicode (in general) does not specify how the glyph should look,
    > it only decrees that U+00A9 corresponds to a character with the
    > semantic of representing "a copyright symbol", whatever that may
    > mean.


    I understand that. However, in the event of a character changing its
    traditional glyph it seems more likely to me that a new code point would
    be allocated. Some documents might specifically want to continue using
    the old form of the character for historical or compatibility reasons.
    Yet other documents would, one assumes, want to use the new version of
    the character instead. Thus both glyphs would probably have to be
    available. I don't know if this situation has ever really come up but if
    it does (or has), I can imagine an argument like this being made by those
    dealing with the relevant standards.

    > Ah, but the euro is a completely new currency symbol -- a completely
    > new semantic meaning. Or did you mean an alternative glyph?


    My example of the euro was not totally accurate. I was only referring to
    the idea that new symbols (not necessarly existing things) do get
    introduced now and then. Thus the idea of a new symbol for copyright
    coming along isn't at crazy as it might at first seem.

    > In general, yes; but in the case of the HTML character entities, not
    > really. They serve more as a mnemonic than anything else: I doubt very
    > much that ISO/W3C/Mr. Berners-Lee had any intentions of changing these
    > once released; otherwise, they'd have said so.


    Perhaps, but what of other markups besides HTML? I could imagine a DTD
    author defining entities specifically to hide their representations so
    that later changes to the spec could be made without requiring documents
    to be edited. It seems like a powerful and useful feature of entities in
    general and one that the community should endeavor to support.

    > Include the appropriate external entities in your DTD instead (see my
    > other post).


    Yes, this seems like a reasonable solution. I've made a note of your
    other posting for future reference. Thanks! I still think it's less than
    ideal to require that the XSLT engine expand all entities before writing
    them into the output tree (or output document). For example, suppose one
    uses XSLT to produce a large collection of documents and then later the
    expansion of an entity changes. I'd have to reprocess the original XML
    again to make new documents; the documents I produced before won't
    contain the entity references and thus won't be automatically updated by
    the change in the entity expansion.

    In a different post Richard Tobin pointed out that output escaping only
    makes sense when one is outputing a document and not when one is acting
    directly on the output tree. However, I dispute that. For example, Xalan
    must internally mark in the tree somehow which text regions are to be
    free of escaping when it outputs the final document. (Recall that Xalan
    seems to implement the disable-output-escaping mechanism). Thus the
    information about what is and is not escaped needs to be stored in the
    tree in some sort of implementation defined way. It thus seems reasonable
    that a program like Mozilla could also store that information and then
    act on it accordingly if it choose to do so. In particular if I get the
    text

    &copy;

    into the output tree marked in such a way as to indicate that no output
    escaping should take place, I'd like to think that Mozilla could treat
    the '&' literally and then notice that '&copy;' is a valid HTML
    entity reference. Of course it doesn't currently do that (or so it
    appears) and if disable-output-escaping is optional then it is within its
    rights to ignore the feature. However, I think it is as least
    *meaningful* to talk about implementing that feature.

    I can see that there might be problems with doing what I'm talking about.
    Right now neither Xalan nor Mozilla need to interpret the character nodes
    in the output tree. For Mozilla to recognize HTML entities in the tree
    directly it would have to look for entities in the character nodes of the
    tree. I'm guessing that doing so would introduce some serious issues, but
    I'm not really sure. I seem to recall reading someplace that the DOM does
    not contain entities... so this would be a violation of that policy.
    Right?

    I suppose what all this means, in general, is that entities don't really
    work as well as one might like. Could this is a fundamental problem with
    using an XML format to control styling? Or is it a limitation with the
    DOM? Perhaps my real issue is that I'm trying to make a *transformation*
    standard do things that don't really make sense for it to do. Hmmm.

    I'm rambling. I'll stop now. :)

    Peter
     
    Peter C. Chapin, Jun 27, 2003
    #10
  11. Peter C. Chapin

    Micah Cowan Guest

    Peter C. Chapin <> writes:

    > In article <>,
    > says...
    >
    > > > Well, suppose the publishing industry decided that, for whatever reason,
    > > > they wanted to use a different symbol for copyright. Imagine a symbol
    > > > resembling "-c-" instead of "(c)".

    > >
    > > There is absolutely no reason why © couldn't be used for that as
    > > well: Unicode (in general) does not specify how the glyph should look,
    > > it only decrees that U+00A9 corresponds to a character with the
    > > semantic of representing "a copyright symbol", whatever that may
    > > mean.

    >
    > I understand that. However, in the event of a character changing its
    > traditional glyph it seems more likely to me that a new code point would
    > be allocated. Some documents might specifically want to continue using
    > the old form of the character for historical or compatibility reasons.
    > Yet other documents would, one assumes, want to use the new version of
    > the character instead.


    That's what fonts are for: Unicode specifically tries to avoid this.

    > Thus both glyphs would probably have to be
    > available.


    Not unless somewhere in the book, the author specifically wanted to
    contrast the two glyphs (as you are now), in which case a font change
    would still be more appropriate.

    <snip>

    > > In general, yes; but in the case of the HTML character entities, not
    > > really. They serve more as a mnemonic than anything else: I doubt very
    > > much that ISO/W3C/Mr. Berners-Lee had any intentions of changing these
    > > once released; otherwise, they'd have said so.

    >
    > Perhaps, but what of other markups besides HTML? I could imagine a DTD
    > author defining entities specifically to hide their representations so
    > that later changes to the spec could be made without requiring documents
    > to be edited. It seems like a powerful and useful feature of entities in
    > general and one that the community should endeavor to support.


    Absolutely.

    <snip>

    > In a different post Richard Tobin pointed out that output escaping only
    > makes sense when one is outputing a document and not when one is acting
    > directly on the output tree. However, I dispute that. For example, Xalan
    > must internally mark in the tree somehow which text regions are to be
    > free of escaping when it outputs the final document.


    In regards to the node tree defined in the XSLT spec, Mr. Tobin is
    correct. How Xalan's tree differs from that tree is inconsequential.

    -Micah
     
    Micah Cowan, Jun 28, 2003
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Livermore

    Client side validators not emitting

    John Livermore, Jul 18, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    395
    John Livermore
    Jul 18, 2003
  2. Klaus Johannes Rusch

    Re: Emitting &nbsp; to HTML Output

    Klaus Johannes Rusch, Jun 24, 2003, in forum: XML
    Replies:
    5
    Views:
    3,061
    Micah Cowan
    Jun 27, 2003
  3. Andy Jefferies

    Re: Emitting &nbsp; to HTML Output

    Andy Jefferies, Jun 25, 2003, in forum: XML
    Replies:
    0
    Views:
    1,373
    Andy Jefferies
    Jun 25, 2003
  4. Paul Boddie
    Replies:
    0
    Views:
    1,361
    Paul Boddie
    Jun 24, 2003
  5. hotkitty

    Parsing CSV and "&nbsp;&nbsp;"

    hotkitty, Oct 9, 2008, in forum: Perl Misc
    Replies:
    9
    Views:
    353
Loading...

Share This Page