writing wide chars

Elie Roux · Aug 14, 2006

Hello,

I would like to write a wide chars string with printf, but I do not
really understand the behaviour I have with this basic test program for
example :

#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
int main () {
char *syllable="abà";
wchar_t *text=malloc (200 * sizeof(wchar_t));
size_t text_max_length=200;
mbstowcs(text,syllable,text_max_length);
printf ("%ls\n", text);
free(text);
return 0;
}

It does not print the last character, whereas printf("%s\n",syllable)
prints it.
If I change "abà" for something with only ascii characters, I obtain the
good output, but the string I need to write will contain non-ascii
characters.

Does someone have a solution for printing it ?

I'm under ubuntu dapper, my locale is UTF-8...

Thanks,

Simon Biber · Aug 15, 2006

Elie said:
Hello,

I would like to write a wide chars string with printf, but I do not
really understand the behaviour I have with this basic test program for
example :

#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>

#include said:
int main () {
char *syllable="abà";
wchar_t *text=malloc (200 * sizeof(wchar_t));
size_t text_max_length=200;

setlocale(LC_CTYPE, "");

mbstowcs(text,syllable,text_max_length);
printf ("%ls\n", text);
free(text);
return 0;
}
It does not print the last character, whereas printf("%s\n",syllable)
prints it.
If I change "abà" for something with only ascii characters, I obtain the
good output, but the string I need to write will contain non-ascii
characters.

Does someone have a solution for printing it ?

I'm under ubuntu dapper, my locale is UTF-8...

The problem was that C programs start in the "C" locale which supports
nothing but the basic execution character set, in this case ASCII. When
individual bytes were passed through "%s" they were unchanged so the
UTF-8 still came through. However, mbstowcs was operating under the
assumption that all characters should be 0-127 and failed to convert UTF-8.

With my additions above, the locale is set to the system default, which
in your case is probably "en_US.utf8" or "fr_FR.utf8". These locales
support UTF-8 and so the mbstowcs correctly converts the characters to
UTF-32, and the printf correctly converts them back to UTF-8.

Note that not all systems allow UTF-8 locales, and not all systems use
UTF-32 as their wide character set, as is generally the case on recent
Linux systems. Windows systems tend to use UTF-16 for wide characters,
leaving the problem of surrogate pairs up to the user to deal with.

Elie Roux · Aug 15, 2006

Simon Biber a écrit :

The problem was that C programs start in the "C" locale which supports
nothing but the basic execution character set, in this case ASCII. When
individual bytes were passed through "%s" they were unchanged so the
UTF-8 still came through. However, mbstowcs was operating under the
assumption that all characters should be 0-127 and failed to convert UTF-8.

With my additions above, the locale is set to the system default, which
in your case is probably "en_US.utf8" or "fr_FR.utf8". These locales
support UTF-8 and so the mbstowcs correctly converts the characters to
UTF-32, and the printf correctly converts them back to UTF-8.

Note that not all systems allow UTF-8 locales, and not all systems use
UTF-32 as their wide character set, as is generally the case on recent
Linux systems. Windows systems tend to use UTF-16 for wide characters,
leaving the problem of surrogate pairs up to the user to deal with.

Thank you for this answer, I really did not know that...

wcstombs() problem	16	Feb 23, 2012
No fread and fwrite for wide characters?	10	Nov 10, 2011
attempting to print unicode characters.	23	Aug 29, 2010
Help with Loop	0	Mar 30, 2023
printf and extended chars	3	Mar 8, 2009
Wide character input/output	14	Feb 23, 2008
C language. work with text	3	Dec 10, 2021
wide character file to wstring - unexpected results	1	Dec 14, 2011

writing wide chars

Elie Roux

Simon Biber

Elie Roux

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads