XSL and entities

Discussion in 'XML' started by Tjerk Wolterink, Feb 20, 2005.

  1. I've a problem in an xsl transformation.
    My xml input:

    --- input.xml ---

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <!DOCTYPE xc:content [
    <!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    %xhtml;
    ]>
    <xc:xcontent xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent" xmlns="http://www.w3.org/1999/xhtml" module="news">
    <xc:text type="html">
    leuk he jazeker ãôé<br/>
    </xc:text>
    </xc:xcontent>

    ----

    And an xsl file:

    -- style.xsl ---

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <xsl:stylesheet version="1.0"
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
    xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent">

    <xsl:eek:utput method="xml" indent="yes"/>

    <!--
    ! All html should remain html
    !-->
    <xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
    <xsl:copy>
    <xsl:for-each select="@*">
    <xsl:copy/>
    </xsl:for-each>
    <xsl:apply-templates select="./node()"/>
    </xsl:copy>
    </xsl:template>

    <xsl:template match="/xc:xcontent">
    <page:page type="module">
    <p>
    <xsl:apply-templates select="xc:text"/>
    </p>
    </page:page>
    </xsl:template>

    </xsl:stylesheet>

    ---



    The output here is:

    ---
    <page:page type="module">
    <p>
    leuk he jazeker<br/>
    </p>
    </page:page>

    ---


    But i expect this as output


    ---
    <page:page type="module">
    <p>
    leuk he jazeker ãôé<br/>
    </p>
    </page:page>
    ---


    How can that be, why are the characters: ãôé gone??
    Is there something wrong with my encoding?
    Note: i do'nt know if the files are really encoded in ISO-8859-1, but it did work for me.
    My editor says the encoding is ISO-8859-1 so i think that is good.. Or did the editor get that
    information from the xml prolog?
     
    Tjerk Wolterink, Feb 20, 2005
    #1
    1. Advertising

  2. > cut

    Well my topic-subject is not really a good choice. there are not entities involved.
     
    Tjerk Wolterink, Feb 20, 2005
    #2
    1. Advertising

  3. are they really gone or are you just looking at the file in some program
    that doesn't understand the encoding, they appeared to be gone inyour
    posted output but that does'nt match what xslt should have done.
    That output is also missing a namespace declaration for xhtml, is it
    really the output you got from XSLT?

    If you want iso-8859-1 output add
    <xsl:eek:utput encoding="iso-8859-1"/>
    to your stylesheet.

    Incidentally despite the fact that you have refered to entities in the
    subject line there are no entity references in your input (except the
    parameter entity reference %xhtml) if you enter all your characters
    directlly as character data there's no need to reference the xhtml dtd
    (which might have a very noticable effect on parsing speed, especially
    if you really are fetching the dtd off eth w3c site each time)

    David
     
    David Carlisle, Feb 20, 2005
    #3
  4. Tjerk Wolterink wrote:

    > I've a problem in an xsl transformation.
    > My xml input:
    >
    > --- input.xml ---
    >
    > <?xml version="1.0" encoding="ISO-8859-1"?>
    > <!DOCTYPE xc:content [

    ^^^^^^^^^^

    > <!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    > %xhtml;
    > ]>
    > <xc:xcontent xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent"


    If the DOCTYPE declaration says the root element is xc:content then you
    should have that but you have xc:xcontent so one needs to be changed.

    > xmlns="http://www.w3.org/1999/xhtml" module="news">
    > <xc:text type="html">
    > leuk he jazeker ãôé<br/>
    > </xc:text>
    > </xc:xcontent>
    >
    > ----
    >
    > And an xsl file:
    >
    > -- style.xsl ---
    >
    > <?xml version="1.0" encoding="ISO-8859-1"?>
    > <xsl:stylesheet version="1.0"
    > xmlns="http://www.w3.org/1999/xhtml"
    > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    > xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
    > xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent">
    >
    > <xsl:eek:utput method="xml" indent="yes"/>


    What output encoding do you want?

    > <!--
    > ! All html should remain html
    > !-->
    > <xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
    > <xsl:copy>
    > <xsl:for-each select="@*">
    > <xsl:copy/>
    > </xsl:for-each>
    > <xsl:apply-templates select="./node()"/>
    > </xsl:copy>
    > </xsl:template>


    Could be done easier and more efficient:

    <xsl:template match="xhtml:*">
    <xsl:copy>
    <xsl:copy-of select="@* " />
    <xsl:apply-templates select="node()"/>
    </xsl:copy>
    </xsl:template>

    where the prefix xhtml is bound to the namespace URI for XHTML earlier
    in the document.


    > The output here is:
    >
    > ---
    > <page:page type="module">
    > <p>
    > leuk he jazeker<br/>
    > </p>
    > </page:page>
    >
    > ---
    >
    >
    > But i expect this as output
    >
    >
    > ---
    > <page:page type="module">
    > <p>
    > leuk he jazeker ãôé<br/>
    > </p>
    > </page:page>
    > ---


    What XSLT processor are you using, how exactly do you run the
    transformation?

    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, Feb 20, 2005
    #4
  5. Martin Honnen wrote:
    >
    >
    > Tjerk Wolterink wrote:
    >
    >> I've a problem in an xsl transformation.
    >> My xml input:
    >>
    >> --- input.xml ---
    >>
    >> <?xml version="1.0" encoding="ISO-8859-1"?>
    >> <!DOCTYPE xc:content [

    >
    > ^^^^^^^^^^
    >
    >> <!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    >> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    >> %xhtml;
    >> ]>
    >> <xc:xcontent xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent"

    >
    >
    > If the DOCTYPE declaration says the root element is xc:content then you
    > should have that but you have xc:xcontent so one needs to be changed.
    >


    Your right, typing error. The xml-reader does not complain, therefore i did not notice this error.

    >> xmlns="http://www.w3.org/1999/xhtml" module="news">
    >> <xc:text type="html">
    >> leuk he jazeker ãôé<br/>
    >> </xc:text>
    >> </xc:xcontent>
    >>
    >> ----
    >>
    >> And an xsl file:
    >>
    >> -- style.xsl ---
    >>
    >> <?xml version="1.0" encoding="ISO-8859-1"?>
    >> <xsl:stylesheet version="1.0"
    >> xmlns="http://www.w3.org/1999/xhtml"
    >> xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    >> xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
    >> xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent">
    >>
    >> <xsl:eek:utput method="xml" indent="yes"/>

    >
    >
    > What output encoding do you want?
    >
    >> <!--
    >> ! All html should remain html
    >> !-->
    >> <xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
    >> <xsl:copy>
    >> <xsl:for-each select="@*">
    >> <xsl:copy/>
    >> </xsl:for-each>
    >> <xsl:apply-templates select="./node()"/>
    >> </xsl:copy>
    >> </xsl:template>

    >
    >
    > Could be done easier and more efficient:
    >
    > <xsl:template match="xhtml:*">
    > <xsl:copy>
    > <xsl:copy-of select="@* " />
    > <xsl:apply-templates select="node()"/>
    > </xsl:copy>
    > </xsl:template>
    >
    > where the prefix xhtml is bound to the namespace URI for XHTML earlier
    > in the document.
    >


    that is a solution, but they both work.

    >
    >> The output here is:
    >>
    >> ---
    >> <page:page type="module">
    >> <p>
    >> leuk he jazeker<br/>
    >> </p>
    >> </page:page>
    >>
    >> ---
    >>
    >>
    >> But i expect this as output
    >>
    >>
    >> ---
    >> <page:page type="module">
    >> <p>
    >> leuk he jazeker ãôé<br/>
    >> </p>
    >> </page:page>
    >> ---

    >
    >
    > What XSLT processor are you using, how exactly do you run the
    > transformation?
    >


    I'm using sablatron for xsl transformations.
    But i think the problem is more complex than i thought.
     
    Tjerk Wolterink, Feb 20, 2005
    #5
  6. > [cut]

    Well,

    The example i gave you was a bad one.
    The problem i have do not occur in my examples.

    Here an example where the problem does occur:

    I have an xml document:
    ---
    <?xml version="1.0" encoding="ISO-8859-1"?>

    <!DOCTYPE xc:xcontent [
    <!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    %xhtml;
    ]>
    <xc:xcontent xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent" xmlns="http://www.w3.org/1999/xhtml" module="geschiedenis">
    <xc:text type="html">
    <p>Caf&eacute; de Kletskop is gevestigd in een oud lichtenvoords pander,
    </p> </xc:text>
    </xc:xcontent>
    ---


    And when i put this together with this xsl document:


    -- style.xsl ---

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <xsl:stylesheet version="1.0"
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
    xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent">

    <xsl:eek:utput method="xml" indent="yes"/>

    <!--
    ! All html should remain html
    !-->
    <xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
    <xsl:copy>
    <xsl:for-each select="@*">
    <xsl:copy/>
    </xsl:for-each>
    <xsl:apply-templates select="./node()"/>
    </xsl:copy>
    </xsl:template>

    <xsl:template match="/xc:xcontent">
    <page:page type="module">
    <xsl:apply-templates select="xc:text"/>
    </page:page>
    </xsl:template>

    </xsl:stylesheet>
    ---


    Then the output will be:

    --
    <page:page type="module">
    <p>
    Caf de Kletskop is gevestigd in een oud lichtenvoords pander,
    </p>
    </page:page>
    --


    My &eacute; in the xml is gone in the transformation output.

    Sorry that i gave a bad example, now the problem should be clear.

    How do you solve my problem?
     
    Tjerk Wolterink, Feb 20, 2005
    #6
  7. a non validating parser is allowed by the XML recommendation to _not_
    fetch external DTD files and just report entity references as undefined.

    howevr the Xpath model does not support undefined entities so in this
    case I would expect that you get a parsing error on input that the
    entity reference cab bot be resolved. Your system seems to be silently
    dropping the entities, which looks like a bug to me.

    Can't suggest what you can do other than raise it with maintainers.

    David
     
    David Carlisle, Feb 20, 2005
    #7
  8. David Carlisle wrote:

    > a non validating parser is allowed by the XML recommendation to _not_
    > fetch external DTD files and just report entity references as undefined.
    >
    > howevr the Xpath model does not support undefined entities so in this
    > case I would expect that you get a parsing error on input that the
    > entity reference cab bot be resolved. Your system seems to be silently
    > dropping the entities, which looks like a bug to me.


    Is there no way to match entities in xsl?
    What is the default behavior of xsl systems when it comes to entities?

    > Can't suggest what you can do other than raise it with maintainers.


    raise it with maintainers??
    You mean to report it as a bug

    >
    > David
     
    Tjerk Wolterink, Feb 20, 2005
    #8
  9. Tjerk Wolterink <> writes:

    > David Carlisle wrote:
    >
    > > a non validating parser is allowed by the XML recommendation to _not_
    > > fetch external DTD files and just report entity references as undefined.
    > >
    > > howevr the Xpath model does not support undefined entities so in this
    > > case I would expect that you get a parsing error on input that the
    > > entity reference cab bot be resolved. Your system seems to be silently
    > > dropping the entities, which looks like a bug to me.

    >
    > Is there no way to match entities in xsl?


    No, they are expanded by teh xml parser befope XSLT starts , so the
    input tree has all entities expanded.

    > What is the default behavior of xsl systems when it comes to entities?

    If the parser expands then they are not there as far as XXSLT is
    concerned, if it doesn't it's a fatal error and nothing is transformed.

    >
    > > Can't suggest what you can do other than raise it with maintainers.


    Try a different XSLT engine?

    >
    > raise it with maintainers??
    > You mean to report it as a bug


    yes.

    >
    > >
    > > David


    David
     
    David Carlisle, Feb 20, 2005
    #9
  10. David Carlisle wrote:
    > Tjerk Wolterink <> writes:
    >
    > [cut]



    I could set the following option of the xslt-parser:

    XSLT_SABOPT_PARSE_PUBLIC_ENTITIES = on
    Tell the processor to parse public entities. By default this has been turned off.

    But now when i do the following xsltransformation:

    xml:
    --

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <page:page xmlns:page="http://www.wolterinkwebdesign.com/xml/page">
    <page:content>

    <page:module
    module="agenda"
    stylesheet="agenda.xsl">

    <page:multiple-settings multiple="agendapunt" max="30" order-by="datum" direction="desc"/>
    </page:module>
    </page:content>
    </page:page>

    --


    xsl:
    --
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <!DOCTYPE xsl:stylesheet [
    <!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    %xhtml;
    ]>

    <xsl:stylesheet version="1.0"
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
    xmlns:menu="http://www.wolterinkwebdesign.com/xml/menu"
    xmlns:r="http://www.wolterinkwebdesign.com/xml/roles">


    [rest does not matter]

    </xsl:stylesheet>
    --


    Now i get the following error:

    ["msgtype"]=> string(5) "error"
    ["code"]=> string(1) "2"
    ["module"]=> string(9) "Sablotron"
    ["URI"]=> string(49) "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
    ["line"]=> string(1) "1"
    ["msg"]=> string(51) "XML parser error 4: not well-formed (invalid token)"


    So the dtdt on w3c.org is not valid??
    How can i solve this?

    What i want is that xhtml entities like &nbsp; are parsed to a number entitie lik &209;
    (dont know if 209=nbsp but you know what i mean)

    What should i do?
     
    Tjerk Wolterink, Feb 22, 2005
    #10
  11. So the dtdt on w3c.org is not valid??

    I just tested the file you posted with rxp and it reported it as being
    well formed.

    How can i solve this?

    Report it as a bug to the parser maintainers?

    You don't need to load the whole xhtml dtd, just the entity definitions,
    eg the dtd you quoted uses
    <!ENTITY % HTMLlat1 PUBLIC
    "-//W3C//ENTITIES Latin 1 for XHTML//EN"
    "xhtml-lat1.ent">
    %HTMLlat1;

    <!ENTITY % HTMLsymbol PUBLIC
    "-//W3C//ENTITIES Symbols for XHTML//EN"
    "xhtml-symbol.ent">
    %HTMLsymbol;

    <!ENTITY % HTMLspecial PUBLIC
    "-//W3C//ENTITIES Special for XHTML//EN"
    "xhtml-special.ent">
    %HTMLspecial;


    so
    http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
    for latin-1 for example. so you might like to try just loading those, or
    the versions of entity files you will find at
    http://www.w3.org/2003/entities which I personally prefer (being
    biased:) instead of loading the xhtml dtd.


    But as I said at the beginning, not using a <!DOCTYPE and not using
    entity references in your stylesheet really will make your life simpler.

    At the very least you ought to make local copies of the files and
    reference those. refererencing the w3c site to download the xhtml dtd
    every time you do a transformation is going to slow your transformation
    down dramatically.

    David
     
    David Carlisle, Feb 22, 2005
    #11
  12. David Carlisle wrote:

    > So the dtdt on w3c.org is not valid??
    >
    > I just tested the file you posted with rxp and it reported it as being
    > well formed.
    >
    > How can i solve this?
    >
    > Report it as a bug to the parser maintainers?
    >
    > You don't need to load the whole xhtml dtd, just the entity definitions,
    > eg the dtd you quoted uses
    > <!ENTITY % HTMLlat1 PUBLIC
    > "-//W3C//ENTITIES Latin 1 for XHTML//EN"
    > "xhtml-lat1.ent">
    > %HTMLlat1;
    >
    > <!ENTITY % HTMLsymbol PUBLIC
    > "-//W3C//ENTITIES Symbols for XHTML//EN"
    > "xhtml-symbol.ent">
    > %HTMLsymbol;
    >
    > <!ENTITY % HTMLspecial PUBLIC
    > "-//W3C//ENTITIES Special for XHTML//EN"
    > "xhtml-special.ent">
    > %HTMLspecial;
    >
    >
    > so
    > http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
    > for latin-1 for example. so you might like to try just loading those, or
    > the versions of entity files you will find at
    > http://www.w3.org/2003/entities which I personally prefer (being
    > biased:) instead of loading the xhtml dtd.
    >
    >
    > But as I said at the beginning, not using a <!DOCTYPE and not using
    > entity references in your stylesheet really will make your life simpler.
    >
    > At the very least you ought to make local copies of the files and
    > reference those. refererencing the w3c site to download the xhtml dtd
    > every time you do a transformation is going to slow your transformation
    > down dramatically.
    >
    > David



    i think i solved the problem. The xsl-engine was not able to load dtd's from other servers.

    David,
    thanks for your help!
     
    Tjerk Wolterink, Feb 22, 2005
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Angus Parvis
    Replies:
    0
    Views:
    627
    Angus Parvis
    Aug 26, 2004
  2. Fernando Martins

    Redefine entities in XSL.

    Fernando Martins, Oct 1, 2003, in forum: XML
    Replies:
    0
    Views:
    465
    Fernando Martins
    Oct 1, 2003
  3. slberry
    Replies:
    0
    Views:
    417
    slberry
    May 15, 2004
  4. Replies:
    1
    Views:
    3,636
    A. Bolmarcich
    May 27, 2005
  5. Jim Higson
    Replies:
    3
    Views:
    234
    Eric Amick
    Jul 25, 2004
Loading...

Share This Page