numeric entities in XSL

Simon Brooke · Mar 14, 2007

More silly questions, I'm afraid.

Consider the following stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xhtml">

<xsl

utput indent="yes" method="text"/>

<xsl:template match="/">
£
</xsl:template>

<xsl:template match="*">

</xsl:template>
</xsl:stylesheet>

When processed by Xalan 2.7.0 or by xsltproc 1.1.19 both output:

Ã‚Â Ã‚Â£

(that is not seven-bit clean - if it is not correctly transmitted by NNTP,
it is uppercase A circumflex, space, uppercase A circumflex, pound-sign).
If the output method is changed to 'xml', the output is the same. If the
output method is changed to 'html', however, xsltproc outputs exactly the
same, but Xalan2 outputs:

 £

If we now change the stylesheet to:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE stylesheet [
<!ENTITY nobreak " ">
<!ENTITY poundsign "£">
]>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xhtml">

<xsl

utput indent="yes" method="html"/>

<xsl:template match="/">
&nobreak;&poundsign;
</xsl:template>

<xsl:template match="*">

</xsl:template>
</xsl:stylesheet>

then the behaviour is exactly the same as before.

So, questions:

(1) Where does the uppercase A circumflex come from? What do I have to do
to avoid it?
(2) Where does Xalan2 magically get the HTML entity names from, and is it
in accord with the standard in printing them?

Bjoern Hoehrmann · Mar 14, 2007

* Simon Brooke wrote in comp.text.xml:

<xsl:template match="/">
£
</xsl:template>

When processed by Xalan 2.7.0 or by xsltproc 1.1.19 both output:

Â Â£

(that is not seven-bit clean - if it is not correctly transmitted by NNTP,
it is uppercase A circumflex, space, uppercase A circumflex, pound-sign).
If the output method is changed to 'xml', the output is the same. If the
output method is changed to 'html', however, xsltproc outputs exactly the
same, but Xalan2 outputs:

(1) Where does the uppercase A circumflex come from? What do I have to do
to avoid it?

You are seeing UTF-8 interpreted as some other encoding like ISO-8859-1.
The problem is that you are using the wrong tool to inspect the result,
or failed to configure the tool correctly. Use a tool with UTF-8 support
or tell the tool the content is UTF-8 encoded or pick a different en-
coding using xsl

utput encoding='...'.

(2) Where does Xalan2 magically get the HTML entity names from, and is it
in accord with the standard in printing them?

It presumably has an internal character<->entity table where it looks
it up, and yes, that's in accord with the HTML output method, see the
XSLT 1.0 spec, <http://www.w3.org/TR/xslt#section-HTML-Output-Method>.

Martin Honnen · Mar 14, 2007

Simon said:
When processed by Xalan 2.7.0 or by xsltproc 1.1.19 both output:

Ã‚ Ã‚Â£

You get that result if the output is UTF-8 encoded but you look at it
with a tool/editor that assumes ISO-8859-1 to decode.
You might want to use e.g.
<xsl

utput encoding="ISO-8859-1"/>
in your stylesheet if you want that encoding respectively if your editor
assumes it.

Richard Tobin · Mar 14, 2007

Martin Honnen said:
You might want to use e.g.
<xslutput encoding="ISO-8859-1"/>

Or even encoding="ascii".

-- Richard

Simon Brooke · Mar 15, 2007

Richard Tobin said:
Or even encoding="ascii".

Ah! Thank you. Or even

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

in the generated HTML; or even (better) fix it in the servlet config so
that it sends that in the real HTTP header.

Many thanks indeed.

XSL and entities	11	Feb 20, 2005
${...} values in attributes of an imported XML in XSL ...	5	Nov 7, 2006
Identity Transform with preservation of entities	1	Oct 4, 2005
XML to XML using XSLT	1	Aug 18, 2011
XSL: I'm doing something wrong, and I can't see it!	5	Feb 26, 2007
Namespace ins XSL	5	May 28, 2008
XSL Calculation	0	May 6, 2012
newbie using group by in XSL on XLM	6	Feb 17, 2011

numeric entities in XSL

Simon Brooke

Bjoern Hoehrmann

Martin Honnen

Richard Tobin

Simon Brooke

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads