wide characters

B

Bill Cunningham

I want to print out the Chinese character meaning water which is decimal
27750 I believe. Do I use wprintf to do this and just include wchar.h ? So
far I haven't gotten anything to work.

Bill
 
A

Antoninus Twink

I want to print out the Chinese character meaning water which is
decimal 27750 I believe. Do I use wprintf to do this and just include
wchar.h ? So far I haven't gotten anything to work.

To be honest, internationalization in "standard" C is a complete mess,
hacked on imperfectly to the language at the last possible minute. The
wchar_t representation of a string is platform *and locale* dependent,
so bad things can happen if the run-time locale of your program is
different from the compile-time locale.

The best advice is to take advantage of an existing Unicode library:
someone else has already made the mistakes you're likely to made,
debugged them, and put the resulting code in a library for you to use,
so why reinvent the wheel?

A good option could be the ICU library (http://www.icu-project.org)
developed at IBM.
 
B

Ben Bacarisse

Antoninus Twink said:
To be honest, internationalization in "standard" C is a complete mess,
hacked on imperfectly to the language at the last possible minute. The
wchar_t representation of a string is platform *and locale* dependent,
so bad things can happen if the run-time locale of your program is
different from the compile-time locale.

I may regret this but I can't see what you mean by this. The only
meaning I can put on it applies equally to programs that use a library
like ICU.
The best advice is to take advantage of an existing Unicode library:
someone else has already made the mistakes you're likely to made,
debugged them, and put the resulting code in a library for you to use,
so why reinvent the wheel?

A good option could be the ICU library (http://www.icu-project.org)
developed at IBM.

Do you really think that is easier than either of the methods
illustrated here:

#include <wchar.h>
#include <locale.h>
#include <stdio.h>

int main(int argc, char **argv)
{
wchar_t water = 27750;
setlocale(LC_ALL, "");
printf("汦");
printf("%lc\n", water);
return 0;
}


Of course, there are numerous way in which this can go wrong, but that
also apply to using ICU.
 
M

Michael

Bill said:
I want to print out the Chinese character meaning water which is decimal
27750 I believe. Do I use wprintf to do this and just include wchar.h ? So
far I haven't gotten anything to work.

Bill
If you use UTF-8, then the original C library is already enough.
 
L

lovecreatesbeauty

If you use UTF-8, then the original C library is already enough.

Yes. I can print the Chinese word for water as I print ascii on my
machine.

(btw, the Chinese word for water is $B?e(B.
http://www.chinese-tools.com/tools/calligrapher.html?cn=水,
http://www.chinese-tools.com/tools/sinograms.html?q=水 )

$ cat a.c
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
if (!argv[1]) return EXIT_FAILURE;
printf("%s\n", argv[1]);
return EXIT_SUCCESS;
}

$ make && ./a.out "hello $B?e(B"
gcc -ansi -pedantic -Wall -W -c -o a.o a.c
a.c:4: warning: unused parameter 'argc'
gcc a.o -o a.out
hello $B?e(B
$
 
B

Bill Cunningham

Ben I am not seeing what you and Antonius are meaning by saying
"locale". I understand run-time and compile-time but I've never used the
term "locale".

Bill
 
B

Ben Bacarisse

Bill Cunningham said:
Ben I am not seeing what you and Antonius are meaning by saying
"locale". I understand run-time and compile-time but I've never used the
term "locale".

I did not use the term and I claimed that I could understand what
Antoninus Twink meant by his posting. Unless he comes back to explain
what he meant, I suggest you ignore the term (as he used it).
 
A

Antoninus Twink

I did not use the term and I claimed that I could understand what
Antoninus Twink meant by his posting. Unless he comes back to explain
what he meant, I suggest you ignore the term (as he used it).

I have the impression (perhaps it's just an unfounded prejudice) that
trying to work portably with wide characters in raw C is fraught with
difficulty, and relying on intelligent library routines is a safer
option.

Here's a quote from the wprintf manpage:

glibc represents wide characters using their Unicode (ISO-10646)
code point, but other platforms don’t do this. Also, the use of C99
universal character names of the form \unnnn does not solve this
problem. Therefore, in internationalized programs, the format string
should consist of ASCII wide characters only, or should be
constructed at run time in an internationalized way (e.g., using
gettext(3) or iconv(3), followed by mbstowcs(3)).
 
B

Bill Cunningham

[snip]
Here's a quote from the wprintf manpage:

glibc represents wide characters using their Unicode (ISO-10646)
code point, but other platforms don't do this. Also, the use of C99
universal character names of the form \unnnn does not solve this
problem. Therefore, in internationalized programs, the format string
should consist of ASCII wide characters only, or should be
constructed at run time in an internationalized way (e.g., using
gettext(3) or iconv(3), followed by mbstowcs(3)).

I have gettext and FSF's libiconv on my system. I will have to find out
what mbstowcs is. Ok I see what you're trying to say. Basically stay away
from C's wchar.h functions and use something better.

Bill
 
B

Ben Bacarisse

Bill Cunningham said:
[snip]
Here's a quote from the wprintf manpage:

glibc represents wide characters using their Unicode (ISO-10646)
code point, but other platforms don't do this. Also, the use of C99
universal character names of the form \unnnn does not solve this
problem. Therefore, in internationalized programs, the format string
should consist of ASCII wide characters only, or should be
constructed at run time in an internationalized way (e.g., using
gettext(3) or iconv(3), followed by mbstowcs(3)).

I have gettext and FSF's libiconv on my system. I will have to find out
what mbstowcs is. Ok I see what you're trying to say. Basically stay away
from C's wchar.h functions and use something better.

That can't be what he is saying because mbstowcs is, roughly speaking,
one of "C's whcar.h functions".

I think, from the sort of programs I've seen you write, you will be
fine with standard C for a while yet.

There *is* a problem with wide character support but it is not fixed
by using other libraries. If there is going to be a miss-match
between the wide character representation used by your compiler and
that used by your run-time, then your will have trouble. The solution
is to use only run-time strings (this is what the quote is saying but
I have translated it from the system specific language of glibc,
gettext etc.). This applies to any program using any such facilities,
including the standard ones[1].

If you can assume that there is no such miss-match, then all is well.

[1] In fact it applies to all programs that use any character data, it
is just that we all assume that the execution and source character
sets are the same these days. In the old days, this problem occurred
even with printf("Hello world.\n");
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top