numeric entities in XSL

Discussion in 'XML' started by Simon Brooke, Mar 14, 2007.

  1. Simon Brooke

    Simon Brooke Guest

    More silly questions, I'm afraid.

    Consider the following stylesheet:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    exclude-result-prefixes="xhtml">

    <xsl:eek:utput indent="yes" method="text"/>

    <xsl:template match="/">
     £
    </xsl:template>

    <xsl:template match="*">
    <!-- nothing -->
    </xsl:template>
    </xsl:stylesheet>

    When processed by Xalan 2.7.0 or by xsltproc 1.1.19 both output:

     £

    (that is not seven-bit clean - if it is not correctly transmitted by NNTP,
    it is uppercase A circumflex, space, uppercase A circumflex, pound-sign).
    If the output method is changed to 'xml', the output is the same. If the
    output method is changed to 'html', however, xsltproc outputs exactly the
    same, but Xalan2 outputs:

    &nbsp;&pound;

    If we now change the stylesheet to:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE stylesheet [
    <!ENTITY nobreak " ">
    <!ENTITY poundsign "£">
    ]>
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    exclude-result-prefixes="xhtml">

    <xsl:eek:utput indent="yes" method="html"/>

    <xsl:template match="/">
    &nobreak;&poundsign;
    </xsl:template>

    <xsl:template match="*">
    <!-- nothing -->
    </xsl:template>
    </xsl:stylesheet>

    then the behaviour is exactly the same as before.

    So, questions:

    (1) Where does the uppercase A circumflex come from? What do I have to do
    to avoid it?
    (2) Where does Xalan2 magically get the HTML entity names from, and is it
    in accord with the standard in printing them?

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/

    ;; When all else fails, read the distractions.
     
    Simon Brooke, Mar 14, 2007
    #1
    1. Advertising

  2. * Simon Brooke wrote in comp.text.xml:
    > <xsl:template match="/">
    >  £
    > </xsl:template>


    >When processed by Xalan 2.7.0 or by xsltproc 1.1.19 both output:
    >
    >  £
    >
    >(that is not seven-bit clean - if it is not correctly transmitted by NNTP,
    >it is uppercase A circumflex, space, uppercase A circumflex, pound-sign).
    >If the output method is changed to 'xml', the output is the same. If the
    >output method is changed to 'html', however, xsltproc outputs exactly the
    >same, but Xalan2 outputs:


    >(1) Where does the uppercase A circumflex come from? What do I have to do
    >to avoid it?


    You are seeing UTF-8 interpreted as some other encoding like ISO-8859-1.
    The problem is that you are using the wrong tool to inspect the result,
    or failed to configure the tool correctly. Use a tool with UTF-8 support
    or tell the tool the content is UTF-8 encoded or pick a different en-
    coding using xsl:eek:utput encoding='...'.

    >(2) Where does Xalan2 magically get the HTML entity names from, and is it
    >in accord with the standard in printing them?


    It presumably has an internal character<->entity table where it looks
    it up, and yes, that's in accord with the HTML output method, see the
    XSLT 1.0 spec, <http://www.w3.org/TR/xslt#section-HTML-Output-Method>.
    --
    Björn Höhrmann · mailto: · http://bjoern.hoehrmann.de
    Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
    68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
     
    Bjoern Hoehrmann, Mar 14, 2007
    #2
    1. Advertising

  3. Simon Brooke wrote:

    > When processed by Xalan 2.7.0 or by xsltproc 1.1.19 both output:
    >
    >  £


    You get that result if the output is UTF-8 encoded but you look at it
    with a tool/editor that assumes ISO-8859-1 to decode.
    You might want to use e.g.
    <xsl:eek:utput encoding="ISO-8859-1"/>
    in your stylesheet if you want that encoding respectively if your editor
    assumes it.


    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, Mar 14, 2007
    #3
  4. In article <45f83d3d$0$20290$-online.net>,
    Martin Honnen <> wrote:

    >You might want to use e.g.
    > <xsl:eek:utput encoding="ISO-8859-1"/>


    Or even encoding="ascii".

    -- Richard
    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
     
    Richard Tobin, Mar 14, 2007
    #4
  5. Simon Brooke

    Simon Brooke Guest

    in message <et9ejl$1igh$>, Richard Tobin
    ('') wrote:

    > In article <45f83d3d$0$20290$-online.net>,
    > Martin Honnen <> wrote:
    >
    >>You might want to use e.g.
    >> <xsl:eek:utput encoding="ISO-8859-1"/>

    >
    > Or even encoding="ascii".


    Ah! Thank you. Or even

    <meta http-equiv="Content-Type" content="text/html;charset=utf-8">

    in the generated HTML; or even (better) fix it in the servlet config so
    that it sends that in the real HTTP header.

    Many thanks indeed.

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/

    ;; Conservatives are not necessarily stupid,
    ;; but most stupid people are conservatives -- J S Mill
     
    Simon Brooke, Mar 15, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    3,687
    A. Bolmarcich
    May 27, 2005
  2. Replies:
    5
    Views:
    980
    X-Centric
    Jun 30, 2005
  3. darrel
    Replies:
    4
    Views:
    866
    darrel
    Jul 19, 2007
  4. Jim Higson
    Replies:
    3
    Views:
    247
    Eric Amick
    Jul 25, 2004
  5. Andreas Gohr

    decoding numeric HTML entities

    Andreas Gohr, Jun 10, 2005, in forum: Javascript
    Replies:
    10
    Views:
    309
Loading...

Share This Page