writing wide chars

E

Elie Roux

Hello,

I would like to write a wide chars string with printf, but I do not
really understand the behaviour I have with this basic test program for
example :

#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
int main () {
char *syllable="abà";
wchar_t *text=malloc (200 * sizeof(wchar_t));
size_t text_max_length=200;
mbstowcs(text,syllable,text_max_length);
printf ("%ls\n", text);
free(text);
return 0;
}

It does not print the last character, whereas printf("%s\n",syllable)
prints it.
If I change "abà" for something with only ascii characters, I obtain the
good output, but the string I need to write will contain non-ascii
characters.

Does someone have a solution for printing it ?

I'm under ubuntu dapper, my locale is UTF-8...

Thanks,
 
S

Simon Biber

Elie said:
Hello,

I would like to write a wide chars string with printf, but I do not
really understand the behaviour I have with this basic test program for
example :

#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>

#include said:
int main () {
char *syllable="abà";
wchar_t *text=malloc (200 * sizeof(wchar_t));
size_t text_max_length=200;

setlocale(LC_CTYPE, "");
mbstowcs(text,syllable,text_max_length);
printf ("%ls\n", text);
free(text);
return 0;
}
It does not print the last character, whereas printf("%s\n",syllable)
prints it.
If I change "abà" for something with only ascii characters, I obtain the
good output, but the string I need to write will contain non-ascii
characters.

Does someone have a solution for printing it ?

I'm under ubuntu dapper, my locale is UTF-8...

The problem was that C programs start in the "C" locale which supports
nothing but the basic execution character set, in this case ASCII. When
individual bytes were passed through "%s" they were unchanged so the
UTF-8 still came through. However, mbstowcs was operating under the
assumption that all characters should be 0-127 and failed to convert UTF-8.

With my additions above, the locale is set to the system default, which
in your case is probably "en_US.utf8" or "fr_FR.utf8". These locales
support UTF-8 and so the mbstowcs correctly converts the characters to
UTF-32, and the printf correctly converts them back to UTF-8.

Note that not all systems allow UTF-8 locales, and not all systems use
UTF-32 as their wide character set, as is generally the case on recent
Linux systems. Windows systems tend to use UTF-16 for wide characters,
leaving the problem of surrogate pairs up to the user to deal with.
 
E

Elie Roux

Simon Biber a écrit :
The problem was that C programs start in the "C" locale which supports
nothing but the basic execution character set, in this case ASCII. When
individual bytes were passed through "%s" they were unchanged so the
UTF-8 still came through. However, mbstowcs was operating under the
assumption that all characters should be 0-127 and failed to convert UTF-8.

With my additions above, the locale is set to the system default, which
in your case is probably "en_US.utf8" or "fr_FR.utf8". These locales
support UTF-8 and so the mbstowcs correctly converts the characters to
UTF-32, and the printf correctly converts them back to UTF-8.

Note that not all systems allow UTF-8 locales, and not all systems use
UTF-32 as their wide character set, as is generally the case on recent
Linux systems. Windows systems tend to use UTF-16 for wide characters,
leaving the problem of surrogate pairs up to the user to deal with.

Thank you for this answer, I really did not know that...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top