wide characters

Discussion in 'C Programming' started by Bill Cunningham, Oct 15, 2008.

  1. I want to print out the Chinese character meaning water which is decimal
    27750 I believe. Do I use wprintf to do this and just include wchar.h ? So
    far I haven't gotten anything to work.

    Bill
     
    Bill Cunningham, Oct 15, 2008
    #1
    1. Advertising

  2. On 15 Oct 2008 at 21:23, Bill Cunningham wrote:
    > I want to print out the Chinese character meaning water which is
    > decimal 27750 I believe. Do I use wprintf to do this and just include
    > wchar.h ? So far I haven't gotten anything to work.


    To be honest, internationalization in "standard" C is a complete mess,
    hacked on imperfectly to the language at the last possible minute. The
    wchar_t representation of a string is platform *and locale* dependent,
    so bad things can happen if the run-time locale of your program is
    different from the compile-time locale.

    The best advice is to take advantage of an existing Unicode library:
    someone else has already made the mistakes you're likely to made,
    debugged them, and put the resulting code in a library for you to use,
    so why reinvent the wheel?

    A good option could be the ICU library (http://www.icu-project.org)
    developed at IBM.
     
    Antoninus Twink, Oct 16, 2008
    #2
    1. Advertising

  3. Antoninus Twink <> writes:

    > On 15 Oct 2008 at 21:23, Bill Cunningham wrote:
    >> I want to print out the Chinese character meaning water which is
    >> decimal 27750 I believe. Do I use wprintf to do this and just include
    >> wchar.h ? So far I haven't gotten anything to work.

    >
    > To be honest, internationalization in "standard" C is a complete mess,
    > hacked on imperfectly to the language at the last possible minute. The
    > wchar_t representation of a string is platform *and locale* dependent,
    > so bad things can happen if the run-time locale of your program is
    > different from the compile-time locale.


    I may regret this but I can't see what you mean by this. The only
    meaning I can put on it applies equally to programs that use a library
    like ICU.

    > The best advice is to take advantage of an existing Unicode library:
    > someone else has already made the mistakes you're likely to made,
    > debugged them, and put the resulting code in a library for you to use,
    > so why reinvent the wheel?
    >
    > A good option could be the ICU library (http://www.icu-project.org)
    > developed at IBM.


    Do you really think that is easier than either of the methods
    illustrated here:

    #include <wchar.h>
    #include <locale.h>
    #include <stdio.h>

    int main(int argc, char **argv)
    {
    wchar_t water = 27750;
    setlocale(LC_ALL, "");
    printf("汦");
    printf("%lc\n", water);
    return 0;
    }


    Of course, there are numerous way in which this can go wrong, but that
    also apply to using ICU.

    --
    Ben.
     
    Ben Bacarisse, Oct 16, 2008
    #3
  4. Bill Cunningham

    Michael Guest

    Bill Cunningham wrote:
    > I want to print out the Chinese character meaning water which is decimal
    > 27750 I believe. Do I use wprintf to do this and just include wchar.h ? So
    > far I haven't gotten anything to work.
    >
    > Bill
    >
    >

    If you use UTF-8, then the original C library is already enough.
     
    Michael, Oct 16, 2008
    #4
  5. Bill Cunningham

    0m Guest

    On Oct 16, 1:00 pm, Michael <-ip.org> wrote:
    > Bill Cunningham wrote:
    > > I want to print out the Chinese character meaning water which is decimal
    > > 27750 I believe. Do I use wprintf to do this and just include wchar.h ? So
    > > far I haven't gotten anything to work.

    >
    > If you use UTF-8, then the original C library is already enough.


    Yes. I can print the Chinese word for water as I print ascii on my
    machine.

    (btw, the Chinese word for water is $B?e(B.
    http://www.chinese-tools.com/tools/calligrapher.html?cn=水,
    http://www.chinese-tools.com/tools/sinograms.html?q=水 )

    $ cat a.c
    #include <stdlib.h>
    #include <stdio.h>

    int main(int argc, char *argv[])
    {
    if (!argv[1]) return EXIT_FAILURE;
    printf("%s\n", argv[1]);
    return EXIT_SUCCESS;
    }

    $ make && ./a.out "hello $B?e(B"
    gcc -ansi -pedantic -Wall -W -c -o a.o a.c
    a.c:4: warning: unused parameter 'argc'
    gcc a.o -o a.out
    hello $B?e(B
    $
     
    0m, Oct 16, 2008
    #5
  6. Ben I am not seeing what you and Antonius are meaning by saying
    "locale". I understand run-time and compile-time but I've never used the
    term "locale".

    Bill
     
    Bill Cunningham, Oct 16, 2008
    #6
  7. "Bill Cunningham" <> writes:

    > Ben I am not seeing what you and Antonius are meaning by saying
    > "locale". I understand run-time and compile-time but I've never used the
    > term "locale".


    I did not use the term and I claimed that I could understand what
    Antoninus Twink meant by his posting. Unless he comes back to explain
    what he meant, I suggest you ignore the term (as he used it).

    --
    Ben.
     
    Ben Bacarisse, Oct 16, 2008
    #7
  8. On 16 Oct 2008 at 20:40, Ben Bacarisse wrote:
    > "Bill Cunningham" <> writes:
    >> Ben I am not seeing what you and Antonius are meaning by saying
    >> "locale". I understand run-time and compile-time but I've never used
    >> the term "locale".

    >
    > I did not use the term and I claimed that I could understand what
    > Antoninus Twink meant by his posting. Unless he comes back to explain
    > what he meant, I suggest you ignore the term (as he used it).


    I have the impression (perhaps it's just an unfounded prejudice) that
    trying to work portably with wide characters in raw C is fraught with
    difficulty, and relying on intelligent library routines is a safer
    option.

    Here's a quote from the wprintf manpage:

    glibc represents wide characters using their Unicode (ISO-10646)
    code point, but other platforms don’t do this. Also, the use of C99
    universal character names of the form \unnnn does not solve this
    problem. Therefore, in internationalized programs, the format string
    should consist of ASCII wide characters only, or should be
    constructed at run time in an internationalized way (e.g., using
    gettext(3) or iconv(3), followed by mbstowcs(3)).
     
    Antoninus Twink, Oct 16, 2008
    #8
  9. "Antoninus Twink" <> wrote in message
    news:...

    [snip]

    > Here's a quote from the wprintf manpage:
    >
    > glibc represents wide characters using their Unicode (ISO-10646)
    > code point, but other platforms don't do this. Also, the use of C99
    > universal character names of the form \unnnn does not solve this
    > problem. Therefore, in internationalized programs, the format string
    > should consist of ASCII wide characters only, or should be
    > constructed at run time in an internationalized way (e.g., using
    > gettext(3) or iconv(3), followed by mbstowcs(3)).


    I have gettext and FSF's libiconv on my system. I will have to find out
    what mbstowcs is. Ok I see what you're trying to say. Basically stay away
    from C's wchar.h functions and use something better.

    Bill
     
    Bill Cunningham, Oct 16, 2008
    #9
  10. "Bill Cunningham" <> writes:

    > "Antoninus Twink" <> wrote in message
    > news:...
    >
    > [snip]
    >
    >> Here's a quote from the wprintf manpage:
    >>
    >> glibc represents wide characters using their Unicode (ISO-10646)
    >> code point, but other platforms don't do this. Also, the use of C99
    >> universal character names of the form \unnnn does not solve this
    >> problem. Therefore, in internationalized programs, the format string
    >> should consist of ASCII wide characters only, or should be
    >> constructed at run time in an internationalized way (e.g., using
    >> gettext(3) or iconv(3), followed by mbstowcs(3)).

    >
    > I have gettext and FSF's libiconv on my system. I will have to find out
    > what mbstowcs is. Ok I see what you're trying to say. Basically stay away
    > from C's wchar.h functions and use something better.


    That can't be what he is saying because mbstowcs is, roughly speaking,
    one of "C's whcar.h functions".

    I think, from the sort of programs I've seen you write, you will be
    fine with standard C for a while yet.

    There *is* a problem with wide character support but it is not fixed
    by using other libraries. If there is going to be a miss-match
    between the wide character representation used by your compiler and
    that used by your run-time, then your will have trouble. The solution
    is to use only run-time strings (this is what the quote is saying but
    I have translated it from the system specific language of glibc,
    gettext etc.). This applies to any program using any such facilities,
    including the standard ones[1].

    If you can assume that there is no such miss-match, then all is well.

    [1] In fact it applies to all programs that use any character data, it
    is just that we all assume that the execution and source character
    sets are the same these days. In the old days, this problem occurred
    even with printf("Hello world.\n");

    --
    Ben.
     
    Ben Bacarisse, Oct 16, 2008
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Web Developer

    char 8bit wide or 7bit wide in c++?

    Web Developer, Jul 31, 2003, in forum: C++
    Replies:
    2
    Views:
    595
    John Harrison
    Jul 31, 2003
  2. Jonathan Mcdougall

    wide characters: "illusion of portability"?

    Jonathan Mcdougall, May 2, 2005, in forum: C++
    Replies:
    3
    Views:
    609
    Jonathan Mcdougall
    May 6, 2005
  3. Disc Magnet
    Replies:
    2
    Views:
    726
    Jukka K. Korpela
    May 15, 2010
  4. Disc Magnet
    Replies:
    2
    Views:
    799
    Neredbojias
    May 14, 2010
  5. Martin Rinehart

    80 columns wide? 132 columns wide?

    Martin Rinehart, Oct 31, 2008, in forum: Javascript
    Replies:
    16
    Views:
    184
    John W Kennedy
    Nov 13, 2008
Loading...

Share This Page