Using Unicode in C programs

Discussion in 'C Programming' started by Marco Iannaccone, Sep 1, 2005.

  1. I'd like to start using Unicod (especially UTF-8) in my C programs, and
    would like some infos on how to start.
    Can you tell me some documents (possibily online) explaining Unidoce
    and UTF-8, and how I can use them in my programs (writing and reading
    from file, from the console, processing Unicode strings and chars
    inside the program, etc...)?

    Thanx
    Marco Iannaccone, Sep 1, 2005
    #1
    1. Advertising

  2. Marco Iannaccone

    Simon Biber Guest

    Marco Iannaccone wrote:
    > I'd like to start using Unicod (especially UTF-8) in my C programs, and
    > would like some infos on how to start.
    > Can you tell me some documents (possibily online) explaining Unidoce
    > and UTF-8, and how I can use them in my programs (writing and reading
    > from file, from the console, processing Unicode strings and chars
    > inside the program, etc...)?


    C provides a concept of wide characters (arrays of wchar_t) and
    multibyte characters (arrays of char where each character may take up
    more than one byte). The C standard defines functions for converting
    between wide and multibyte representations. The standard does not
    specify what encoding these two representational forms take.

    On at least one platform, depending on the current locale setting, the
    wide characters built in to C represent Unicode characters, and the
    multibyte characters represent the UTF-8 form.

    The following program attempts to set the locale to en_AU.UTF-8, which
    means Australian English in UTF-8 encoding. The language portion doesn't
    matter, just the encoding does. It then takes a UTF-8 string (which
    happens to contain Simplified Chinese characters), and converts it to
    the wide character representation, which on my platform is equivalent to
    Unicode.

    #include <locale.h>
    #include <stdlib.h>
    #include <stdio.h>

    int main(void)
    {
    wchar_t ucs2[5];
    if(!setlocale(LC_ALL, "en_AU.UTF-8"))
    {
    printf("Unable to set locale to Australian English in UTF-8\n");
    return 0;
    }

    /* The UTF-8 representation of string "水调歌头"
    (four Chinese characters pronounced shui3 diao4 ge1 tou2) */
    char *utf8 = "\xE6\xB0\xB4\xE8\xB0\x83\xE6\xAD\x8C\xE5\xA4\xB4";

    mbstowcs(ucs2, utf8, sizeof ucs2 / sizeof *ucs2);

    printf("UTF-8: ");
    for(char *p = utf8; *p; p++)
    printf("%02X ", (unsigned)(unsigned char)*p);
    printf("\n");

    printf("Unicode: ");
    for(wchar_t *p = ucs2; *p; p++)
    printf("U+%04lX ", (unsigned long) *p);
    printf("\n");

    return 0;
    }

    [sbiber@eagle c]$ c99 -Wall utf8ucs2.c -o utf8ucs2
    [sbiber@eagle c]$ ./utf8ucs2
    UTF-8: E6 B0 B4 E8 B0 83 E6 AD 8C E5 A4 B4
    Unicode: U+6C34 U+8C03 U+6B4C U+5934

    I'd be interested to know how widespread this technique works. Is it
    portable?

    --
    Simon.
    Simon Biber, Sep 1, 2005
    #2
    1. Advertising

  3. "Marco Iannaccone" <> wrote in message
    news:...
    > I'd like to start using Unicod (especially UTF-8) in my C programs, and
    > would like some infos on how to start.
    > Can you tell me some documents (possibily online) explaining Unidoce
    > and UTF-8, and how I can use them in my programs (writing and reading
    > from file, from the console, processing Unicode strings and chars
    > inside the program, etc...)?


    The best and the most authorative source of info on all aspects of Unicode
    is www.unicode.org.
    At least read the Unicode FAQ and the article on Unicode "To the BMP and
    beyond!" by Eric Muller of Adobe Systems (the doc must be linked somewhere
    at unicode.org -- or just google for it). Read that info with attention.
    By default, Unicode isn't guaranteed to be supported by anything in every
    compiler on every system, unlike ASCII. But, to the best of my knowledge
    recent linux distros support UTF-8 in functions like printf() and fopen().
    Once again, make use of www.unicode.org.

    Alex
    Alexei A. Frounze, Sep 1, 2005
    #3
  4. On Thu, 01 Sep 2005 02:53:57 -0700, Marco Iannaccone wrote:

    > I'd like to start using Unicod (especially UTF-8) in my C programs, and
    > would like some infos on how to start.
    > Can you tell me some documents (possibily online) explaining Unidoce
    > and UTF-8, and how I can use them in my programs (writing and reading
    > from file, from the console, processing Unicode strings and chars
    > inside the program, etc...)?


    If you would like a quick intro into what your up against see this link:

    http://www.io plex.com/~miallen/libmba/dl/docs/ref/text_details.html

    It describes an api used to improve portability of code across different
    platforms which you may or may not be concerned with but it does describe
    the basics of working with Unicode in C.

    Mike
    Michael B Allen, Sep 2, 2005
    #4
  5. Thanx a lot! :) (and thanx to everyone for helping! I'll start
    studying...! :p)
    Marco Iannaccone, Sep 2, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    12
    Views:
    1,638
    Dave Thompson
    Jan 10, 2005
  2. Coca
    Replies:
    7
    Views:
    736
    Aidan Grey
    Aug 24, 2004
  3. Replies:
    18
    Views:
    619
    Dave Thompson
    Jan 10, 2005
  4. lone_eagle
    Replies:
    3
    Views:
    632
    psykeedelik
    May 26, 2009
  5. Casey Hawthorne
    Replies:
    14
    Views:
    445
Loading...

Share This Page