Re: perfidious pound sign

Discussion in 'HTML' started by Jukka K. Korpela, Aug 20, 2013.

  1. 2013-08-20 11:56, lipska the kat wrote:

    > The various totals and sub-totals are prefixed by the UK Pound sterling
    > prefix £
    > I use the exact same process to transform the xml in two places, once
    > when I show the invoice to a customer
    > just before they complete payment and once when I print the invoice.


    What does the generated HTML (as seen e.g. in a browser with View
    Source) look like?

    > The problem comes when I try to display the £ sign
    > I'm currently using the html character code for currency characters
    > (in hex) £


    That should work.

    > here's an example usage in invoice.xsl
    >
    > <td class='subtot'>
    > £<xsl:value-of select='@lineTotal'/>
    > </td>


    However, what happens when this is processed to generate HTML? My guess
    is that the “expansion†of £, i.e. the “£†character, gets written
    into the HTML document. That is, at the HTML level, you don’t have the
    character reference but the character itself, in some encoding. And then
    you may have an encoding problem.

    > This works perfectly when I view the invoice to print it
    > but results in a ? when I display it to the user just before they checkout


    Is it literally the question mark “?†or in fact a white question mark
    in a lozenge, as seen e.g. at
    http://www.fileformat.info/info/unicode/char/fffd/index.htm ?
    Especially in the latter case, this looks like an encoding mismatch.
    That is, the actual encoding of the HTML data differs from the encoding
    that the browser infers from Content-Type header, meta tag, heuristic
    algorithm, or its defaults. For generalities on this, see
    http://www.w3.org/International/O-charset

    In particular, if the actual encoding (as set by the process that
    generates the HTML document) is ISO-8859-1 or windows-1252, then the “£â€
    character is internally represented as the byte A3 (hexadecimal). If the
    document is (mis)interpreted as being in UTF-8 encoding, then that octet
    is invalid data (the coded UTF-8 representation any character cannot
    start with the byte A3), and the browser typically indicates this
    character-level data error using the REPLACEMENT CHARACTER “ â€.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Aug 20, 2013
    #1
    1. Advertising

  2. 2013-08-20 16:42, lipska the kat wrote:

    > If I view source in Firefox I see the ?, in fact you are right, it is a
    > ? in a lozenge - the REPLACEMENT CHARACTER apparently.
    >
    > So, if I copy the character into a text file and open the file in a hex
    > viewer I see 0xEF 0xBF 0xBD which is UTF8 hex for the REPLACEMENT
    > CHARACTER so it looks like the replacement character is being written to
    > the output stream ... what does this mean?


    The actual HTML document could still contain the byte 0xA3. I tested
    with a document containing such a byte (which means “£†in ISO-8859-1)
    but declared to be UTF-8, and Firefox shows it as REPLACEMENT CHARACTER,
    so that I copy and paste it, I get that character. When saved in UTF-8
    format, I get its UTF-8 encoded form. So apparently Firefox internally
    replaces the erroneous data by REPLACEMENT CHARACTER. In a sense it has
    to, since in internal representation, the DOM, text data characters, not
    bye sequences.

    > ========== the xsl file has the following
    >
    > <?xml version='1.0' encoding='UTF-8'?>
    > ...
    > <xsl:eek:utput encoding="UTF-8"/>
    >
    > ========== and the transformer component has
    >
    > pageContext.getResponse().setCharacterEncoding("UTF-8");
    > pageContext.getResponse().setContentType("text/html; charset=UTF-8");


    I don’t have much experience with XSL, but as far as I can see, this
    should work. But it seems very probable now that *something* causes the
    data to appear in a non-UTF-8 encoding in the HTML document.

    I wonder what happens if you set the encoding, in all contexts, to
    windows-1252. It would mean moving to a wrong direction in principle,
    but it might magically fix the problem at hand.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Aug 20, 2013
    #2
    1. Advertising

  3. 2013-08-20 18:37, lipska the kat wrote:

    > In Firefox it's possible to change the character encoding on the fly
    > so, if I change the character encoding of the *u*k*n* page causing the
    > problem to windows-1252 or ISO-8859-1 or indeed *anything other than*
    > UTF-8 then the pound sign appears (well not Chinese simplified :)
    >
    > What does this mean?


    It means that the actual data in the HTML document contains byte 0xA3
    there. It does not really mean “£†in all other encodings except UTF-8
    and Chinese encodings, but it means that in several 8-bit encodings,
    such as a few ISO 8859 encodings, see
    http://www.cs.tut.fi/~jkorpela/iso8859/maps.htm8
    For example, if you set the encoding to ISO-8859-2 (“Middle Europeanâ€),
    0xA3 is “Åâ€; in ISO-8859-4, it is “Ŗâ€, etc.

    > I think it means that somehow, the output is being written through the
    > transformer and the transformer is setting the encoding of the page to
    > something other than UTF-8 despite my best efforts ...


    That must be the reason. But I can’t guess why that happens.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Aug 20, 2013
    #3
  4. 2013-08-20 12:42, Jukka K. Korpela wrote:

    > My guess is that the “expansion†of £, i.e. the “£†character, gets
    > written into the HTML document.


    Just an idea: I wonder whether the use of &pound; instead of £
    could be a workaround. After all, in generic XML, &pound; is an
    undefined entity reference, so maybe, just maybe, it gets written as
    such into the generated HTML document. And there it would work
    perfectly, independently of character encoding.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Aug 20, 2013
    #4
  5. 2013-08-21 11:37, lipska the kat wrote:

    > The thing I really can't get my head around is that exactly the same
    > xsl, xml, transformer code and custom tag is used to output the invoice
    > in two different places and one of the instances works and the other
    > doesn't. The only other difference is the stylesheet but I don't see how
    > this can affect the result.


    Do you mean CSS stylesheet (and not XSL “stylesheetâ€)? CSS cannot change
    the character encoding of an HTML document, which seems to be the issue
    here.

    Can you reduce the problem to a case where you have the simplest
    possible XML file and a transformation that produces an HTML document
    where the problem then appears? It seems that all content beyond the
    £ is irrelevant here, so the important thing is to track down what
    happens to it.

    And what happens if you use just “£†in the XSL file? The same, I would
    guess, but perhaps worth checking.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Aug 21, 2013
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jerry III

    Re: Pound Sign in text emails

    Jerry III, Oct 15, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    4,476
    Jerry III
    Oct 15, 2003
  2. =?Utf-8?B?bTAwbm0wbmtleQ==?=

    streamreader will not read UK pound sign!!!

    =?Utf-8?B?bTAwbm0wbmtleQ==?=, Nov 21, 2005, in forum: ASP .Net
    Replies:
    6
    Views:
    5,261
  3. T.J.
    Replies:
    10
    Views:
    1,419
  4. Christoph Michael Becker

    Re: perfidious pound sign

    Christoph Michael Becker, Aug 21, 2013, in forum: HTML
    Replies:
    0
    Views:
    434
    Christoph Michael Becker
    Aug 21, 2013
  5. Jukka K. Korpela

    Re: perfidious pound sign

    Jukka K. Korpela, Aug 28, 2013, in forum: HTML
    Replies:
    0
    Views:
    414
    Jukka K. Korpela
    Aug 28, 2013
Loading...

Share This Page