Character Sets in polish windows

Discussion in 'HTML' started by mccreeryd@yahoo.com, Jan 19, 2009.

  1. Guest

    I manage an application that consists of a web front end to a MS-SQL
    database for input of data. The application is in english and I only
    want it to work in english, displaying english characters and
    accepting english character as inputs.

    One of the users has a polish version of windows running internet
    explorer. Can someone explain why on the polish installation junk
    characters returned from the database. It obviously interprets the
    characters as polish but I dont want it to interpret anything just
    display them just like on the english version of windows.

    How do I explain a solution/workaround to the developers.

    Thanks for any assistance

    Daithi

    ADJMC
     
    , Jan 19, 2009
    #1
    1. Advertising

  2. Andy Dingley Guest

    On 19 Jan, 13:56, wrote:
    > I manage an application that consists of a web front end to a MS-SQL
    > database for input of data. The application is in english and I only
    > want it to work in english, displaying english characters and
    > accepting english character as inputs.


    "English" is a language. Your problem is more about character sets.
    Wikipedia is pretty readable on these topics and you might start from
    the article on "Windows-1250".

    > One of the users has a polish version of windows running internet
    > explorer.  Can someone explain why on the polish installation junk
    > characters returned from the database.


    No, we need more information. A URL would be good, but telling us
    details such as which Windows codepage they're using, which HTTP
    request headers their browser sends, what your server returns and also
    whether your server actually changes its behaviour depending on the
    headers in the request, or if it's just coded to return the same to
    all requests. Even knowing the database, web server and web scripting
    language would be good.

    Most of all, samples of the HTML content returned are pretty important
    - although it's hard to show these, as you'd have to deliver them to
    us through a medium that's "encoding clean" and wouldn't change them
    further (this is always a problem in remote debugging this sort of
    bug).


    I'm assuming that the "Polish browser" is running on Windows codepage
    1250 (a UK computer would probably be running 1252 instead). This
    means I'd expect it to send a HTTP request header that looks a bit
    like this@

    Accept-Language: pl
    Accept-Charset: ISO-8859-2

    I can't say more than this with any confidence, without seeing your
    examples. However:

    I might assume your server _doesn't_ do anything with these headers.
    It assumes that everyone is English, uses the "english" character sets
    and it will return the same content no matter who asked for it. In
    that case, it would return content in the English language, would use
    an "English-friendly" character set that's probably ISO-8859-1 (could
    be others, but that's popular), and would label it as being in the
    character set that it had actually used.

    If that's the case, then everything would "work". Your Poles would see
    correctly displayed English. They wouldn't see Polish language, and
    they wouldn't even be able to save Polish-specific characters (such as
    entering their own names) into the database and retrieving them
    correctly without these characters (and those alone) being corrupted.
    However they would have workable read-only access to English content,
    without garbage.

    Now as I understand you, then this isn't what's happening. Instead
    your Polish read-only users are seeing English text being corrupted.
    That's weird - it should never happen in a correctly implemented
    system, even a system that makes no attempt to support anything beyond
    English in ASCII.

    It looks like you might be falling foul of Yoda's Law of Character
    Encoding here, "Do, or do not. There is no 'try.'"

    You can build a system that _doesn't_ do foreign encodings, and it
    will work. Or you can build a system that _does_ do foreign encodings,
    and it will work, and it will work for its foreign encodings too.
    Where things go "wrong" (meaning garbage, not just languages that
    deliberately aren't supported) it's usually caused by "half-encoding"
    something. Either not encoding things at all, but sending headers as
    if they had been, or vice versa. It's trying to do encoding and only
    implementing half of it that causes the trouble - allowing characters
    to be input from a <form> and using the server's built-in features to
    recognise the browser's "non-English" encoding before storing, but
    then spitting these same octets back under an "English" encoding
    regardless of how they were intended is a favourite.

    > It obviously interprets the
    > characters as polish but I dont want it to interpret anything just
    > display them just like on the english version of windows.


    This is odd, because ASCII is consistent across most encodings around
    (most non-ASCII encodings work by using the "upper" characters above
    127). I really shouldn't try to guess any more without hard data,
    particular if Jukka is watching. :cool:

    However (as a wild guess) there _are_ a few differences between the
    encodings used for Windows-1250 and ISO-8859-2. It's possible that a
    web server _might_ receive content in Widnows-1250, recognise it thus
    as being sufficently "Polish" to not treat it as English any more, but
    then send it back as ISO-8859-2 as its favoured approach for handligng
    Polish. But that's a guess - we'd have to see headers.


    > How do I explain a solution/workaround to the developers.


    For a happy, peaceful life you do three things:

    * You abandon serious support for old browsers (where "old" currently
    means "old enough this just isn't a problem any more")

    * You code on the server in a language that understands Unicode.

    * You switch to using Unicode with UTF-8 encoding throughout (for the
    HTTP at least). No Windows encodings. No ISO-8859-* encodings.

    * Everything Just Works. Really. It's great - so much easier than
    doing it the old way.

    * You rigidly police this through your developers (this is my day job,
    I hate it - I simultaneously support English, Spanish, Czech,
    Afrikaans, Arabic and a bunch more languages). They _will_ keep
    breaking your encoding, even though it's the simplest way to work. Cut
    their hands off if they do. I insist in "Copyright © FooCo" comments
    at the top of ALL source files, just to keep an eye
     
    Andy Dingley, Jan 19, 2009
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. schapopa

    OuterHTML - loosing polish fonts

    schapopa, Jan 24, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    357
    schapopa
    Jan 24, 2005
  2. schapopa

    OuterHTML - loosing polish fonts

    schapopa, Jan 24, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    388
    schapopa
    Jan 24, 2005
  3. ron
    Replies:
    1
    Views:
    910
    Stewart Gordon
    Jul 2, 2003
  4. declan
    Replies:
    0
    Views:
    487
    declan
    Feb 25, 2004
  5. Replies:
    2
    Views:
    790
    Oliver Wong
    Feb 6, 2006
Loading...

Share This Page