MySql+UTF8 woes

Discussion in 'Ruby' started by Ronald Fischer, Jul 26, 2007.

  1. I am experiencing a strange problem when retrieving Unicode values from
    my MySql5 database on Windows. Here is what I have:

    * All programs/modules in my application require in the beginning a file =
    containing

    $KCODE=3D'u'
    require 'jcode'

    in the beginning, ensuring that "everything is done in Unicode".

    * I checked that the MySql database is "as UTF-8 as possible", in
    particular
    - in the table editor of the MySql Administrator, the character set
    for the rows is listed as utf8 (with sorting order utf8_general_ci)
    - The "start variables" tab of the Administrator says utf8 for the =
    default
    character set
    - mysql.ini says=20
    default-character-set=3Dutf8
    - On the mysql command line, I see

    mysql> SHOW VARIABLES LIKE 'character\_set\_%';
    +--------------------------+--------+
    | Variable_name | Value |
    +--------------------------+--------+
    | character_set_client | utf8 |=20
    | character_set_connection | utf8 |=20
    | character_set_database | utf8 |=20
    | character_set_filesystem | binary |=20
    | character_set_results | utf8 |=20
    | character_set_server | utf8 |=20
    | character_set_system | utf8 |=20
    +--------------------------+--------+
    =20

    Indeed, the following works well:

    When I store (from Ruby, via 'mysql' gem) a UTF-8 string
    into the database, and later retrieve it, I get back exactly
    what I had put in. This works equally well, whether I type
    (using my UTF-8 enabled text editor) the character (for=20
    example, a German umlaut) directly into the Ruby string,=20
    which I then store, i.e. (table X having only one column)

    sqlstatement=3D"INSERT INTO X VALUES ('=F6')"
    dbhandle.query(sqlstatement)

    or whether I write the code point in hexadecimal:
    =20
    uml_o=3D"\xc3\xb6" # =F6, i.e. German umlaut-o in UTF-8 encoding
    sqlstatement=3D"INSERT INTO X VALUES ('#{uml_o}')"
    dbhandle.query(sqlstatement)

    In both cases, when I retrieve the rows with=20
    dbhandle.query("SELECT * FROM X")
    I eventually get back the row with my "=F6" as a UTF-8=20
    2-Byte-sequence C3 B6.

    So far OK. But there are several things which do not work:

    (1) When I look at the database table using the SQL Administrator,
    I do not see the letter '=F6', but instead the characters =
    corresponding
    to the Windows ANSI code sequence for C3 and B6 characters.

    (2) When I manually enter data into the database using MySql
    Administrator, by submitting the SQL statement
    INSERT INTO X VALUES ('=F6')
    the =F6 is displayed fine when looking at the table with a SELECT
    using the Administrator, but when I read the table from Ruby, I do
    not get back a 2 byte UTF-8 sequence for the umlaut, but only one=20
    byte with hex code F6. Note that this F6 corresponds to the ANSI=20
    code for =F6 in Windows.

    (3) When I put the unicode data into a file, and use a Java application
    to read the file and write the data into the database (using JDBC), =
    and=20
    I look at the database then with MySql Administrator, I can see the=20
    German characters; and when I then read them back with my Ruby=20
    application, I get it encoded in Windows ANSI, not Unicode.

    Observations (1) and (2) could be explained in the way that maybe the=20
    MySql administrator interface was not written with UTF-8 in minde. What
    surprises me more is that when writing data using Java (which is =
    inherently
    Unicode aware) and reading it back with Ruby, causes somewhere a =
    conversion
    from UTF-8 into Windows ANSI.=20

    Does someone have experienced a similar problem and can point out a =
    possible
    reason for it?


    Ronald
    --=20
    Ronald Fischer <>
    Phone: +49-89-452133-162
    Ronald Fischer, Jul 26, 2007
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. gry
    Replies:
    2
    Views:
    707
    Alf P. Steinbach
    Mar 13, 2012
  2. Guest

    XML::Simple and utf8 woes

    Guest, Mar 18, 2006, in forum: Perl Misc
    Replies:
    16
    Views:
    941
    Guest
    Mar 29, 2006
  3. Guest

    Text::Levenshtein and utf8 woes

    Guest, Mar 26, 2006, in forum: Perl Misc
    Replies:
    3
    Views:
    235
    Guest
    Mar 26, 2006
  4. John

    UTF8 European characters in MySQL

    John, Apr 26, 2007, in forum: Perl Misc
    Replies:
    10
    Views:
    198
  5. kbin
    Replies:
    2
    Views:
    153
    Joost Diepenmaat
    Mar 6, 2008
Loading...

Share This Page