numeric entities in XSL

S

Simon Brooke

More silly questions, I'm afraid.

Consider the following stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xhtml">

<xsl:eek:utput indent="yes" method="text"/>

<xsl:template match="/">
 £
</xsl:template>

<xsl:template match="*">
<!-- nothing -->
</xsl:template>
</xsl:stylesheet>

When processed by Xalan 2.7.0 or by xsltproc 1.1.19 both output:

 £

(that is not seven-bit clean - if it is not correctly transmitted by NNTP,
it is uppercase A circumflex, space, uppercase A circumflex, pound-sign).
If the output method is changed to 'xml', the output is the same. If the
output method is changed to 'html', however, xsltproc outputs exactly the
same, but Xalan2 outputs:

&nbsp;&pound;

If we now change the stylesheet to:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE stylesheet [
<!ENTITY nobreak " ">
<!ENTITY poundsign "£">
]>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xhtml">

<xsl:eek:utput indent="yes" method="html"/>

<xsl:template match="/">
&nobreak;&poundsign;
</xsl:template>

<xsl:template match="*">
<!-- nothing -->
</xsl:template>
</xsl:stylesheet>

then the behaviour is exactly the same as before.

So, questions:

(1) Where does the uppercase A circumflex come from? What do I have to do
to avoid it?
(2) Where does Xalan2 magically get the HTML entity names from, and is it
in accord with the standard in printing them?
 
B

Bjoern Hoehrmann

* Simon Brooke wrote in comp.text.xml:
<xsl:template match="/">
 £
</xsl:template>
When processed by Xalan 2.7.0 or by xsltproc 1.1.19 both output:

 £

(that is not seven-bit clean - if it is not correctly transmitted by NNTP,
it is uppercase A circumflex, space, uppercase A circumflex, pound-sign).
If the output method is changed to 'xml', the output is the same. If the
output method is changed to 'html', however, xsltproc outputs exactly the
same, but Xalan2 outputs:
(1) Where does the uppercase A circumflex come from? What do I have to do
to avoid it?

You are seeing UTF-8 interpreted as some other encoding like ISO-8859-1.
The problem is that you are using the wrong tool to inspect the result,
or failed to configure the tool correctly. Use a tool with UTF-8 support
or tell the tool the content is UTF-8 encoded or pick a different en-
coding using xsl:eek:utput encoding='...'.
(2) Where does Xalan2 magically get the HTML entity names from, and is it
in accord with the standard in printing them?

It presumably has an internal character<->entity table where it looks
it up, and yes, that's in accord with the HTML output method, see the
XSLT 1.0 spec, <http://www.w3.org/TR/xslt#section-HTML-Output-Method>.
 
M

Martin Honnen

Simon said:
When processed by Xalan 2.7.0 or by xsltproc 1.1.19 both output:

 £

You get that result if the output is UTF-8 encoded but you look at it
with a tool/editor that assumes ISO-8859-1 to decode.
You might want to use e.g.
<xsl:eek:utput encoding="ISO-8859-1"/>
in your stylesheet if you want that encoding respectively if your editor
assumes it.
 
S

Simon Brooke

Richard Tobin said:
Or even encoding="ascii".

Ah! Thank you. Or even

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

in the generated HTML; or even (better) fix it in the servlet config so
that it sends that in the real HTTP header.

Many thanks indeed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top