UTF-8 encoding decoding not working with Danish characters

Discussion in 'XML' started by LarsM, Feb 10, 2005.

  1. LarsM

    LarsM Guest

    Hi all,
    I am new to XML, but I use it for an RSS feed.

    I have one problem, which I have really been struggling with.

    My XML document is generated from the contents of a MySQL database. It is
    UTF-8 encoded.

    However, the Danish special characters appear wrong.

    For example the letter å becomes "Ã¥", the letter ø becomes "ø"

    See an examle here:
    http://netm.dk/blog/rss/index_rss2.xml

    I thought that it could be because the encoding was not set in the document,
    so I added this:
    <?xml version="1.0" encoding="UTF-8" ?>
    However, that did not make any difference, as can be seen here:
    http://netm.dk/blog/rss/test_rss2.xml

    The text decodes correctly on my regular web pages on http://netm.dk/

    What am I doing wrong?

    Regards,
    Lars
    www.netm.dk
    LarsM, Feb 10, 2005
    #1
    1. Advertising

  2. LarsM

    Malte Guest

    LarsM wrote:
    > Hi all,
    > I am new to XML, but I use it for an RSS feed.
    >
    > I have one problem, which I have really been struggling with.
    >
    > My XML document is generated from the contents of a MySQL database. It is
    > UTF-8 encoded.
    >
    > However, the Danish special characters appear wrong.
    >
    > For example the letter å becomes "Ã¥", the letter ø becomes "ø"
    >
    > See an examle here:
    > http://netm.dk/blog/rss/index_rss2.xml
    >
    > I thought that it could be because the encoding was not set in the document,
    > so I added this:
    > <?xml version="1.0" encoding="UTF-8" ?>
    > However, that did not make any difference, as can be seen here:
    > http://netm.dk/blog/rss/test_rss2.xml
    >
    > The text decodes correctly on my regular web pages on http://netm.dk/
    >
    > What am I doing wrong?
    >
    > Regards,
    > Lars
    > www.netm.dk
    >
    >
    >
    >

    This is not limited to XML. I try to send JavaMail mails. When doing
    this from a Windows PC, Danish characters are garbled, when running the
    exact same program on Linux, the characters get through fine.

    Hope we get rid of thos ¤%@£¥ darned NLS issues sometime in my lifetime,
    but I doubt it.
    Malte, Feb 10, 2005
    #2
    1. Advertising

  3. LarsM wrote:

    > My XML document is generated from the contents of a MySQL database. It is
    > UTF-8 encoded.


    You have to take care that *every* tool in the toolchain
    knows how to handle utf-8 correctly. Maybe you give us
    a list of tools involved ?

    > The text decodes correctly on my regular web pages on http://netm.dk/


    Your web page looks OK to me.
    I bet it is in the database or shortly thereafter.
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=, Feb 10, 2005
    #3
  4. LarsM

    LarsM Guest

    "Jürgen Kahrs" wrote:
    >
    > Maybe you give us a list of tools involved ?


    Thanks Jürgen,
    The RSS feed is being generated by the same Blog application
    ("Boastmachine"), which I use to generate the Web pages. As far as I know it
    accesses the database in the same way as for the "real" pages.
    But I will check up on that.
    -Lars
    LarsM, Feb 10, 2005
    #4
  5. LarsM wrote:

    > The RSS feed is being generated by the same Blog application
    > ("Boastmachine"), which I use to generate the Web pages. As far as I know it
    > accesses the database in the same way as for the "real" pages.


    So the problem should be in the Blog application.

    > But I will check up on that.


    Good idea. Maybe there is simply a bug in the RSS
    extraction mechanism.
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=, Feb 10, 2005
    #5
  6. LarsM

    Nick Kew Guest

    LarsM wrote:

    > Hi all,
    > I am new to XML, but I use it for an RSS feed.
    >
    > I have one problem, which I have really been struggling with.
    >
    > My XML document is generated from the contents of a MySQL database. It is
    > UTF-8 encoded.


    No. It's ASCII encoded before an agent even looks at the document itself.
    See RFC3023 for details.

    The good news is that the fix is a single line in httpd.conf.

    --
    Nick Kew
    Nick Kew, Feb 10, 2005
    #6
  7. LarsM

    Malte Guest

    LarsM wrote:
    > Hi all,
    > I am new to XML, but I use it for an RSS feed.
    >
    > I have one problem, which I have really been struggling with.
    >
    > My XML document is generated from the contents of a MySQL database. It is
    > UTF-8 encoded.
    >
    > However, the Danish special characters appear wrong.
    >
    > For example the letter å becomes "Ã¥", the letter ø becomes "ø"
    >
    > See an examle here:
    > http://netm.dk/blog/rss/index_rss2.xml
    >
    > I thought that it could be because the encoding was not set in the document,
    > so I added this:
    > <?xml version="1.0" encoding="UTF-8" ?>
    > However, that did not make any difference, as can be seen here:
    > http://netm.dk/blog/rss/test_rss2.xml
    >
    > The text decodes correctly on my regular web pages on http://netm.dk/
    >
    > What am I doing wrong?
    >
    > Regards,
    > Lars
    > www.netm.dk
    >
    >
    >
    >

    Pointing my (Linux) Firefox browser at your web site, and having
    encoding set to utf-8, I see you page fine. Setting encoding to
    ISO-8859-1 generates the å stuff. One never knows how the users'
    browsers are setup.

    Look at this page: www.vietbao.com

    Great looking, authentic, Vietnamese fonts with utf-8. Obviously not
    looking good with iso (vn fonts not part of iso..).
    Malte, Feb 10, 2005
    #7
  8. LarsM

    LarsM Guest

    "Malte" wrote:
    > Pointing my (Linux) Firefox browser at your web site, and having encoding
    > set to utf-8, I see you page fine. Setting encoding to ISO-8859-1
    > generates the å stuff. One never knows how the users' browsers are setup.


    Is that looking at http://netm.dk/blog/rss/index_rss2.xml also?

    -Lars
    LarsM, Feb 10, 2005
    #8
  9. /LarsM/:

    > My XML document is generated from the contents of a MySQL database. It is
    > UTF-8 encoded.
    >
    > However, the Danish special characters appear wrong.
    >
    > For example the letter å becomes "Ã¥", the letter ø becomes "ø"
    >
    > See an examle here:
    > http://netm.dk/blog/rss/index_rss2.xml


    Sound like an MySQL configuration issue, to me.

    --
    Stanimir
    Stanimir Stamenkov, Feb 10, 2005
    #9
  10. LarsM

    LarsM Guest

    "Nick Kew" wrote:
    >
    > The good news is that the fix is a single line in httpd.conf.


    I don't have my own Apache server, but am using an ISP (Freepaq.dk). Where
    can I make the configuration change, then?

    -Lars
    LarsM, Feb 10, 2005
    #10
  11. In article <420b5294$0$48698$>,
    "LarsM" <> wrote:

    > I don't have my own Apache server, but am using an ISP (Freepaq.dk). Where
    > can I make the configuration change, then?


    In a .htaccess file if your host allows it. Failing that, you could ask
    your host to map .xml to application/xml. Failing that, I recommend
    switching to another host.

    --
    Henri Sivonen

    http://iki.fi/hsivonen/
    Henri Sivonen, Feb 10, 2005
    #11
  12. LarsM

    Malte Guest

    LarsM wrote:
    > "Malte" wrote:
    >
    >>Pointing my (Linux) Firefox browser at your web site, and having encoding
    >>set to utf-8, I see you page fine. Setting encoding to ISO-8859-1
    >>generates the å stuff. One never knows how the users' browsers are setup.

    >
    >
    > Is that looking at http://netm.dk/blog/rss/index_rss2.xml also?
    >
    > -Lars
    >
    >


    That gives me the funny looking chars as well, regardless of encoding
    settings in the browser.

    BTW, solved my JavaMail NLS problem. Had Tomcat start with the
    -DEncoding parm set.
    Malte, Feb 10, 2005
    #12
  13. LarsM

    LarsM Guest

    "Henri Sivonen" wrote:
    >


    > In a .htaccess file if your host allows it. Failing that, you could ask
    > your host to map .xml to application/xml. Failing that, I recommend
    > switching to another host.


    I've been reading through the RFC, but please enlighten me. What would the
    syntax be for setting this? Please be as specific as possible.

    Regards,
    Lars
    LarsM, Feb 10, 2005
    #13
  14. Hi there


    Henri Sivonen wrote:

    > In a .htaccess file if your host allows it. Failing that, you could ask
    > your host to map .xml to application/xml. Failing that, I recommend
    > switching to another host.


    lynx -head http://netm.dk/blog/rss/index_rss2.xml
    HTTP/1.0 200 OK
    Date: Thu, 10 Feb 2005 14:46:31 GMT
    Server: Apache/1.3.33 (Unix) mod_perl/1.29 DAV/1.0.3 mod_gzip/1.3.26.1a
    PHP/4.3.9
    Last-Modified: Tue, 08 Feb 2005 08:03:13 GMT
    ETag: "bd67c1-1141-42087241"
    Accept-Ranges: bytes
    Content-Length: 4417
    Content-Type: application/xml
    Age: 704
    X-Cache: HIT from www.sput.nl
    X-Cache-Lookup: HIT from www.sput.nl:8080
    Proxy-Connection: close

    lynx -head http://netm.dk/blog/rss/test_rss2.xml
    HTTP/1.0 200 OK
    Date: Thu, 10 Feb 2005 14:48:18 GMT
    Server: Apache/1.3.33 (Unix) mod_perl/1.29 DAV/1.0.3 mod_gzip/1.3.26.1a
    PHP/4.3.9
    Last-Modified: Mon, 07 Feb 2005 18:45:44 GMT
    ETag: "11e2dc0-1022-4207b758"
    Accept-Ranges: bytes
    Content-Length: 4130
    Content-Type: application/xml
    Age: 624
    X-Cache: HIT from www.sput.nl
    X-Cache-Lookup: HIT from www.sput.nl:8080
    Proxy-Connection: close

    This one on my box;
    lynx -head http://www.sput.nl/software/leased-line/leased-line.xml
    HTTP/1.1 200 OK
    Date: Thu, 10 Feb 2005 15:00:02 GMT
    Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2
    Last-Modified: Sun, 30 Jan 2005 07:44:42 GMT
    ETag: "2787c-4840-41fc906a"
    Accept-Ranges: bytes
    Content-Length: 18496
    Connection: close
    Content-Type: text/xml; charset=UTF-8

    However, my browser does consider all these files to be UTF-8 XML.


    Regards,
    Rob
    --
    +----------------------------------------------------------------------+
    | The EU constitution will turn the EU into a SU |
    | Vote against the EU constitution in the referendum |
    +----------------------------------------------------------------------+
    Rob van der Putten, Feb 10, 2005
    #14
  15. LarsM

    LarsM Guest

    Sorry, but excactly how do I set that setting, which Nick Kew and Henry
    Sivonen suggested?

    I have been reading through the RFC, but it is not completely clear to me...

    Cheers,
    Lars
    www.netm.dk
    LarsM, Feb 10, 2005
    #15
  16. /LarsM/:

    >>> My XML document is generated from the contents of a MySQL database. It is UTF-8 encoded.
    >>>
    >>> However, the Danish special characters appear wrong.
    >>>
    >>> For example the letter å becomes "Ã¥", the letter ø becomes "ø"
    >>>
    >>> See an examle here:
    >>> http://netm.dk/blog/rss/index_rss2.xml

    >>
    >> Sound like an MySQL configuration issue, to me.

    >
    > Sorry, but excactly how do I set that setting, which Nick Kew and Henry
    > Sivonen suggested?
    >
    > I have been reading through the RFC, but it is not completely clear to me...


    Please, quote at least some relevant text from the post you're
    replying to.

    What I've meant is, AFAIK MySQL versions prior 4.1 doesn't handle
    Unicode characters. I have no experience with the 4.1 version but
    seems the encoding configuration could be tricky with it, too.

    It could happen that a text is inserted into the DB using some
    encoding and read using another (depending on the connection driver
    configuration) producing different results. So, I guess, somehow the
    info is inserted UTF-8 encoded but then read using ISO-8859-1, for
    example. Generally it has nothing to do with RFCs but MySQL specific
    configuration.

    I've worked on an application which used MySQL 4.0 as data store and
    because it was targeted for the Japanese market we had to configure
    the connection driver specifically to encode/decode using a
    Shift_JIS encoding.

    --
    Stanimir
    Stanimir Stamenkov, Feb 10, 2005
    #16
  17. LarsM

    LarsM Guest

    "Stanimir Stamenkov" wrote :
    >
    >
    > What I've meant is, AFAIK MySQL versions prior 4.1 doesn't handle Unicode
    > characters. I have no experience with the 4.1 version but seems the
    > encoding configuration could be tricky with it, too.
    >

    Thank you Stanimir. I think my Web host is on 4.0 only. I will look into
    that and maybe go for another encoding all the way through...
    Sorry about not quoting correctly...

    Regards,
    Lars
    www.netm.dk
    LarsM, Feb 11, 2005
    #17
  18. Hi there


    LarsM wrote:

    > I am new to XML, but I use it for an RSS feed.
    >
    > I have one problem, which I have really been struggling with.
    >
    > My XML document is generated from the contents of a MySQL database. It is
    > UTF-8 encoded.
    >
    > However, the Danish special characters appear wrong.
    >
    > For example the letter å becomes "Ã¥", the letter ø becomes "ø"


    In ISO-8859-1 a-ring is 0xE5, in UTF-8 0xC3 0xA5
    0xC3 0xA5 in ISO-8859-1 is A-tilde Yen.
    The same applies to the other example.

    So maybe the data gets stored as UTF-8 but retreived as ISO-8859-1 and
    then converted to UTF-8.


    Vr.Gr,
    Rob
    --
    +----------------------------------------------------------------------+
    | The EU constitution will turn the EU into a SU |
    | Vote against the EU constitution in the referendum |
    +----------------------------------------------------------------------+
    Rob van der Putten, Feb 11, 2005
    #18
  19. On Thu, 10 Feb 2005, LarsM wrote:

    > X-Newsreader: Microsoft Outlook Express 6.00.2900.2180
    >
    > However, the Danish special characters appear wrong.
    > For example the letter ? becomes "??", the letter ? becomes "??"


    As long as you are unable to post special, non-ASCII characters
    with appropriate MIME header in your newsreader^W Outlook Express,
    don't expect anything.

    You need to make these settings:

    Tools > Options > Send
    Mail Sending Format > Plain Text Settings > Message format MIME
    News Sending Format > Plain Text Settings > Message format MIME
    Encode text using: None

    Better yet, get a newsreader instead of OE.

    --
    Top-posting.
    What's the most irritating thing on Usenet?
    Andreas Prilop, Feb 11, 2005
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Klaus Jensen

    Danish chars not displaying

    Klaus Jensen, Dec 11, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    401
    Klaus Jensen
    Dec 11, 2003
  2. =?Utf-8?B?TmFtZXNwYWNl?=

    Problem with cookie and htmencoding (danish characters)

    =?Utf-8?B?TmFtZXNwYWNl?=, Jul 18, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    453
    =?Utf-8?B?TmFtZXNwYWNl?=
    Jul 18, 2005
  3. Jens Jensen

    Strange danish letters bug.

    Jens Jensen, Feb 13, 2007, in forum: ASP .Net
    Replies:
    3
    Views:
    340
    Jens Jensen
    Feb 13, 2007
  4. Grzegorz ¦liwiñski
    Replies:
    2
    Views:
    940
    Grzegorz ¦liwiñski
    Jan 19, 2011
  5. Replies:
    2
    Views:
    368
    Nathan Keel
    Aug 14, 2009
Loading...

Share This Page