[XSL] How do i get & #160; to pass through?

Discussion in 'XML' started by Collin VanDyck, Nov 13, 2003.

  1. I have a basic understanding of this, so forgive me if I am overly
    simplistic in my explanation of my problem..

    I am trying to get a Java/Xalan transform to pass through a numeric
    character reference (i.e.  ) and it seems to be converting the
    character to its UNICODE representation.

    Take this source XML document:

    <?xml version="1.0" encoding="UTF-8"?>
    <sourcexml>
    some space separated text
    </sourcexml>

    And this stylesheet:

    <?xml version='1.0' encoding='UTF-8'?>
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:eek:utput method="xml" indent="yes"/>
    <xsl:template match="@*|*|text()|processing-instruction()">
    <xsl:copy>
    <xsl:apply-templates select="@*|*|text()|processing-instruction()"/>
    </xsl:copy>
    </xsl:template>
    </xsl:stylesheet>

    I am trying to get it to regurgitate the original document, with the
     's intact. Instead I am getting bizarre characters (copied from
    windows CMD window):

    <?xml version="1.0" encoding="UTF-8"?>
    <sourcexml>
    someáspaceáseparatedátext
    </sourcexml>

    Here is how I am doing my transform (java code):

    SAXSource in = new SAXSource(new InputSource(new
    StringReader(this.xmlDocument)));

    // build the out result
    StringWriter writer = new StringWriter();
    StreamResult out = new StreamResult(writer);

    // build the transformer
    SAXSource stylesheetIn = new SAXSource(new InputSource(new
    StringReader(this.xslStylesheet)));
    Transformer transformer =
    TransformerFactory.newInstance().newTransformer(stylesheetIn);

    // transform the string.
    transformer.transform(in,out);

    // return the transformation result.
    return writer.toString();

    Any ideas? Any help would be very appreciated. Thanks :)
     
    Collin VanDyck, Nov 13, 2003
    #1
    1. Advertising

  2. Collin VanDyck <> scribbled the following
    on comp.lang.java.programmer:
    > I have a basic understanding of this, so forgive me if I am overly
    > simplistic in my explanation of my problem..


    > I am trying to get a Java/Xalan transform to pass through a numeric
    > character reference (i.e.  ) and it seems to be converting the
    > character to its UNICODE representation.


    > Take this source XML document:


    > <?xml version="1.0" encoding="UTF-8"?>
    > <sourcexml>
    > some space separated text
    > </sourcexml>


    I'm not sure if this is what you want, but one way to get the literal
    string " " to appear in the output is to write it as:
    "&amp;#160;" in the source code. To get *that* to appear, write it as
    "&amp;amp;#160;" and so on. Of course I could be trying to solve the
    wrong problem here.

    --
    /-- Joona Palaste () ------------- Finland --------\
    \-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
    "This isn't right. This isn't even wrong."
    - Wolfgang Pauli
     
    Joona I Palaste, Nov 13, 2003
    #2
    1. Advertising

  3. Hi...

    Thanks for the reply, but I tried that, and it produced:

    <?xml version="1.0" encoding="UTF-8"?>
    <sourcexml>
    some&amp;amp;#160;space&amp;amp;#160;separated&amp;amp;#160;text
    </sourcexml>

    where what I am aiming to get is:

    <?xml version="1.0" encoding="UTF-8"?>
    <sourcexml>
    some space separated text
    </sourcexml>

    thanks,

    "Joona I Palaste" <> wrote in message
    news:bp0o90$sp$...
    > Collin VanDyck <> scribbled the

    following
    > on comp.lang.java.programmer:
    > > I have a basic understanding of this, so forgive me if I am overly
    > > simplistic in my explanation of my problem..

    >
    > > I am trying to get a Java/Xalan transform to pass through a numeric
    > > character reference (i.e.  ) and it seems to be converting the
    > > character to its UNICODE representation.

    >
    > > Take this source XML document:

    >
    > > <?xml version="1.0" encoding="UTF-8"?>
    > > <sourcexml>
    > > some space separated text
    > > </sourcexml>

    >
    > I'm not sure if this is what you want, but one way to get the literal
    > string " " to appear in the output is to write it as:
    > "&amp;#160;" in the source code. To get *that* to appear, write it as
    > "&amp;amp;#160;" and so on. Of course I could be trying to solve the
    > wrong problem here.
    >
    > --
    > /-- Joona Palaste () ------------- Finland --------\
    > \-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
    > "This isn't right. This isn't even wrong."
    > - Wolfgang Pauli
     
    Collin VanDyck, Nov 13, 2003
    #3
  4. Collin VanDyck

    Anton Spaans Guest

    I thought these character entity-references need 4 digits if you want to
    specify a decimal value
    (character-code 160 as the &nbsp; in HTML) (and not 3, as in your example).

    Try

     

    (instead of  )

    -- Anton.

    "Collin VanDyck" <> wrote in message
    news:...
    > I have a basic understanding of this, so forgive me if I am overly
    > simplistic in my explanation of my problem..
    >
    > I am trying to get a Java/Xalan transform to pass through a numeric
    > character reference (i.e.  ) and it seems to be converting the
    > character to its UNICODE representation.
    >
    > Take this source XML document:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <sourcexml>
    > some space separated text
    > </sourcexml>
    >
    > And this stylesheet:
    >
    > <?xml version='1.0' encoding='UTF-8'?>
    > <xsl:stylesheet version="1.0"
    > xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    > <xsl:eek:utput method="xml" indent="yes"/>
    > <xsl:template match="@*|*|text()|processing-instruction()">
    > <xsl:copy>
    > <xsl:apply-templates select="@*|*|text()|processing-instruction()"/>
    > </xsl:copy>
    > </xsl:template>
    > </xsl:stylesheet>
    >
    > I am trying to get it to regurgitate the original document, with the
    >  's intact. Instead I am getting bizarre characters (copied from
    > windows CMD window):
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <sourcexml>
    > someáspaceáseparatedátext
    > </sourcexml>
    >
    > Here is how I am doing my transform (java code):
    >
    > SAXSource in = new SAXSource(new InputSource(new
    > StringReader(this.xmlDocument)));
    >
    > // build the out result
    > StringWriter writer = new StringWriter();
    > StreamResult out = new StreamResult(writer);
    >
    > // build the transformer
    > SAXSource stylesheetIn = new SAXSource(new InputSource(new
    > StringReader(this.xslStylesheet)));
    > Transformer transformer =
    > TransformerFactory.newInstance().newTransformer(stylesheetIn);
    >
    > // transform the string.
    > transformer.transform(in,out);
    >
    > // return the transformation result.
    > return writer.toString();
    >
    > Any ideas? Any help would be very appreciated. Thanks :)
    >
     
    Anton Spaans, Nov 13, 2003
    #4
  5. Thanks, but same result. I ended up deciding to pass everything through an
    xml encoder-decoder that would do a regex replaceAll on

    &#([0-9+);

    to

    [unicode]$1[/unicode]

    And after the transform was done, reverse it back into the character
    reference syntax.


    "Anton Spaans" <aspaans at(noSPAM) smarttime dot(noSPAM) com> wrote in
    message news:...
    > I thought these character entity-references need 4 digits if you want to
    > specify a decimal value
    > (character-code 160 as the &nbsp; in HTML) (and not 3, as in your

    example).
    >
    > Try
    >
    >  
    >
    > (instead of  )
    >
    > -- Anton.
    >
    > "Collin VanDyck" <> wrote in message
    > news:...
    > > I have a basic understanding of this, so forgive me if I am overly
    > > simplistic in my explanation of my problem..
    > >
    > > I am trying to get a Java/Xalan transform to pass through a numeric
    > > character reference (i.e.  ) and it seems to be converting the
    > > character to its UNICODE representation.
    > >
    > > Take this source XML document:
    > >
    > > <?xml version="1.0" encoding="UTF-8"?>
    > > <sourcexml>
    > > some space separated text
    > > </sourcexml>
    > >
    > > And this stylesheet:
    > >
    > > <?xml version='1.0' encoding='UTF-8'?>
    > > <xsl:stylesheet version="1.0"
    > > xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    > > <xsl:eek:utput method="xml" indent="yes"/>
    > > <xsl:template match="@*|*|text()|processing-instruction()">
    > > <xsl:copy>
    > > <xsl:apply-templates select="@*|*|text()|processing-instruction()"/>
    > > </xsl:copy>
    > > </xsl:template>
    > > </xsl:stylesheet>
    > >
    > > I am trying to get it to regurgitate the original document, with the
    > >  's intact. Instead I am getting bizarre characters (copied from
    > > windows CMD window):
    > >
    > > <?xml version="1.0" encoding="UTF-8"?>
    > > <sourcexml>
    > > someáspaceáseparatedátext
    > > </sourcexml>
    > >
    > > Here is how I am doing my transform (java code):
    > >
    > > SAXSource in = new SAXSource(new InputSource(new
    > > StringReader(this.xmlDocument)));
    > >
    > > // build the out result
    > > StringWriter writer = new StringWriter();
    > > StreamResult out = new StreamResult(writer);
    > >
    > > // build the transformer
    > > SAXSource stylesheetIn = new SAXSource(new InputSource(new
    > > StringReader(this.xslStylesheet)));
    > > Transformer transformer =
    > > TransformerFactory.newInstance().newTransformer(stylesheetIn);
    > >
    > > // transform the string.
    > > transformer.transform(in,out);
    > >
    > > // return the transformation result.
    > > return writer.toString();
    > >
    > > Any ideas? Any help would be very appreciated. Thanks :)
    > >

    >
    >
     
    Collin VanDyck, Nov 13, 2003
    #5
  6. "Collin VanDyck" <> wrote in message
    news:...

    > I am trying to get a Java/Xalan transform to pass through a numeric
    > character reference (i.e.  ) and it seems to be converting the
    > character to its UNICODE representation.


    Probably UTF-8 encoded. Try specifying the encoding to generate in
    your xsl:eek:utput element. Iso-8859-1 should do what you want.

    Groetjes,
    Maarten Wiltink
     
    Maarten Wiltink, Nov 13, 2003
    #6
  7. Collin VanDyck wrote:

    > I have a basic understanding of this, so forgive me if I am overly
    > simplistic in my explanation of my problem..
    >
    > I am trying to get a Java/Xalan transform to pass through a numeric
    > character reference (i.e.  ) and it seems to be converting the
    > character to its UNICODE representation.
    >
    > Take this source XML document:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <sourcexml>
    > some space separated text
    > </sourcexml>
    >
    > And this stylesheet:
    >
    > <?xml version='1.0' encoding='UTF-8'?>
    > <xsl:stylesheet version="1.0"
    > xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    > <xsl:eek:utput method="xml" indent="yes"/>
    > <xsl:template match="@*|*|text()|processing-instruction()">
    > <xsl:copy>
    > <xsl:apply-templates select="@*|*|text()|processing-instruction()"/>
    > </xsl:copy>
    > </xsl:template>
    > </xsl:stylesheet>
    >
    > I am trying to get it to regurgitate the original document, with the
    >  's intact. Instead I am getting bizarre characters (copied from
    > windows CMD window):


    That's how the XPath data model works - the information about whether a
    character originally was entered as a numerical entitity isn't available
    - so there's no chance to preserve that bit of information using XSLT.

    (amazing how many wrong suggestions were made :)

    Julian
     
    Julian Reschke, Nov 13, 2003
    #7
  8. Collin VanDyck

    yzzzzz Guest

    Collin VanDyck wrote:

    > I have a basic understanding of this, so forgive me if I am overly
    > simplistic in my explanation of my problem..
    >
    > I am trying to get a Java/Xalan transform to pass through a numeric
    > character reference (i.e.  ) and it seems to be converting the
    > character to its UNICODE representation.
    >
    > Take this source XML document:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <sourcexml>
    > some space separated text
    > </sourcexml>
    >
    > And this stylesheet:
    >
    > <?xml version='1.0' encoding='UTF-8'?>
    > <xsl:stylesheet version="1.0"
    > xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    > <xsl:eek:utput method="xml" indent="yes"/>
    > <xsl:template match="@*|*|text()|processing-instruction()">
    > <xsl:copy>
    > <xsl:apply-templates select="@*|*|text()|processing-instruction()"/>
    > </xsl:copy>
    > </xsl:template>
    > </xsl:stylesheet>
    >
    > I am trying to get it to regurgitate the original document, with the
    >  's intact. Instead I am getting bizarre characters (copied from
    > windows CMD window):
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <sourcexml>
    > someáspaceáseparatedátext
    > </sourcexml>


    This an understandable behaviour:

    * The   is parsed by the XML parser, and is considered being a
    NO-BREAK SPACE Unicode character. See here:
    <http://www.unicode.org/charts/PDF/U0080.pdf>, character 0xA0.

    * The XML is parsed with the XSL, and the NO-BREAK SPACE Unicode
    character is kept unchanged in the result XML document.

    * The output document is encoded as ISO-8859-1 (Latin 1) for some reason
    (this is the default character encoding on many platforms), instead of
    UTF-8, and in ISO-8859-1, the NO-BREAK SPACE character is encoded as a
    single 0xA0 byte.

    * When the CMD window tries to display the character, it understands the
    XML document as being encoded in the CP437 character set (the *very* old
    DOS character set, for compatibility). It gets the 0xA0 byte, and in
    CP437 the 0xA0 byte represents the LATIN SMALL LETTER A WITH ACUTE
    Unicode character, which is what you see. See here:
    <http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT>


    Of course in UTF-8 the output would have been the folowing bytes:
    some space separated text (Latin 1 bytes)
    which in CMD would be displayed as something I can't type here, because
    it invloves 0xc2 'BOX DRAWINGS LIGHT DOWN AND HORIZONTAL' characters.
    It would probably have looked like this:
    some|áspace|áseparated|átext

    --
    Laurent
     
    yzzzzz, Nov 13, 2003
    #8
  9. "Maarten Wiltink" <> wrote in message
    news:3fb3fc05$0$58708$4all.nl...
    > "Collin VanDyck" <> wrote in message
    > news:...
    >
    > > I am trying to get a Java/Xalan transform to pass through a numeric
    > > character reference (i.e.  ) and it seems to be converting the
    > > character to its UNICODE representation.

    >
    > Probably UTF-8 encoded. Try specifying the encoding to generate in
    > your xsl:eek:utput element. Iso-8859-1 should do what you want.


    On second thoughts, it shouldn't. Since U+00a0 is a valid character
    in iso-8859-1, it will be output verbatim. Asking for output in
    (7-bits) us-ascii should cause the processor to produce a character
    entity.

    Groetjes,
    Maarten Wiltink
     
    Maarten Wiltink, Nov 14, 2003
    #9
  10. In article <>,
    Collin VanDyck <> wrote:

    >I am trying to get a Java/Xalan transform to pass through a numeric
    >character reference (i.e.  ) and it seems to be converting the
    >character to its UNICODE representation.


    This is normal. *Why* do you want to have it output as  ? It
    shouldn't make any difference to the programs that use the output, if
    they read it as XML.

    If you want it for readability, you could specify that the output
    encoding should be ascii.

    -- Richard

    --
    Spam filter: to mail me from a .com/.net site, put my surname in the headers.

    FreeBSD rules!
     
    Richard Tobin, Nov 14, 2003
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Collin VanDyck
    Replies:
    9
    Views:
    459
    Richard Tobin
    Nov 14, 2003
  2. Martin Honnen

    xsl : &#160; is being displayed as ?

    Martin Honnen, Aug 11, 2004, in forum: XML
    Replies:
    1
    Views:
    495
    wharfprada
    Aug 11, 2004
  3. Replies:
    1
    Views:
    3,622
    A. Bolmarcich
    May 27, 2005
  4. Tim Daneliuk
    Replies:
    0
    Views:
    264
    Tim Daneliuk
    Jan 12, 2005
  5. Stimp
    Replies:
    2
    Views:
    639
    Stimp
    Oct 27, 2006
Loading...

Share This Page