Why printf() does not care of my locale settings ?

  • Thread starter Guilbert STABILO
  • Start date
G

Guilbert STABILO

Hi all,

I have to manage the internationalization on a C++ application which
has to work with different countries and regions.
I also have to work with ANSI 8-bits characters (no Unicode allowed in
my project).
My first test was to print the 0x9A character using the French code
page (1252) which stands for U-Trema.
Then I changed the page code to Russian (1251) so my printf should
display a Cyrillic U but it still displays a U-Trema.
I also tried to printf all the ANSI characters but the result was the
same in both cases (French then Russian).

=> Did I miss something ?
=> How can I manage so my printf() displays the wanted character from
the wanted code page ?

Thanks in advance for your help.


I built this small code under Windows but I am using standard calls
which work the same under Unix:

#include <locale.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
// 0x9A is U-Trema when using the French 1252 code page.

printf(setlocale(LC_ALL, "french"));
printf(": '%c'\n", 0x9A);

// 0x9A should become cyrillic U when using the Russian 1251 code
page.

printf(setlocale(LC_ALL, "russian"));
printf(": '%c'\n", 0x9A);

return(0);
}


And here is the program output:

French_France.1252: 'Ü'
Russian_Russia.1251: 'Ü'
 
D

David Resnick

Hi all,

I have to manage the internationalization on a C++ application which
has to work with different countries and regions.
I also have to work with ANSI 8-bits characters (no Unicode allowed in
my project).
My first test was to print the 0x9A character using the French code
page (1252) which stands for U-Trema.
Then I changed the page code to Russian (1251) so my printf should
display a Cyrillic U but it still displays a U-Trema.
I also tried to printf all the ANSI characters but the result was the
same in both cases (French then Russian).

=> Did I miss something ?
=> How can I manage so my printf() displays the wanted character from
the wanted code page ?

Thanks in advance for your help.

I built this small code under Windows but I am using standard calls
which work the same under Unix:

#include <locale.h>
#include <stdio.h>

int main(int argc, char *argv[])
        {
        // 0x9A is U-Trema when using the French 1252 code page.

        printf(setlocale(LC_ALL, "french"));
        printf(": '%c'\n", 0x9A);

        // 0x9A should become cyrillic U when using the Russian 1251 code
page.

        printf(setlocale(LC_ALL, "russian"));
        printf(": '%c'\n", 0x9A);

        return(0);
        }

And here is the program output:

French_France.1252: 'Ü'
Russian_Russia.1251: 'Ü'

I believe that the issue is that C is not in charge of what character
set your terminal displays. If you look up setlocale you'll see that
it influences things like collation and the ctype macros. Your
terminal is probably set for the Windows equivalent of iso-8859-1 or
-15 roughly speaking. I have a vague and perhaps faulty memory that
the dos command chcp (CHange Code Page) might be relevant to changing
what terminal will display. You could try chcp 1252 in a DOS window
before executing your program in the same console. If you have
questions about that you should try a windows group...

-David
 
J

Jens Thoms Toerring

Guilbert STABILO said:
I have to manage the internationalization on a C++ application which
has to work with different countries and regions.
I also have to work with ANSI 8-bits characters (no Unicode allowed in
my project).
My first test was to print the 0x9A character using the French code
page (1252) which stands for U-Trema.
Then I changed the page code to Russian (1251) so my printf should
display a Cyrillic U but it still displays a U-Trema.

This isn't related to C but to the program that shows the
output of your C program. If you tell printf() to output the
character 0x9A then it will output this character (redirect the
output to a file and you will find it in that file if you look
at it e.g. with a hex editor). The important question is how
the program that actually shows the character interprets it -
and that's unrelated to any settings within your C program. So
you have to set up the program that shows the output (on Unix
e.g. xterm, I don't know what's normally used under Windows) to
interpret the character according to the code page you want.
int main(int argc, char *argv[])
{
// 0x9A is U-Trema when using the French 1252 code page.
printf(setlocale(LC_ALL, "french"));
printf(": '%c'\n", 0x9A);
// 0x9A should become cyrillic U when using the Russian 1251 code
page.
printf(setlocale(LC_ALL, "russian"));
printf(": '%c'\n", 0x9A);

No, that's a misconception. Setting the locale has some influ-
ence on how printf() outputs things (e.g. if it will use a
decimal dot or a comma when outputting floating point numbers
etc.) but it can't change anything about the interpretation of
its output by a different program, the program that makes the
output from your C program appear on your screen.

Regards, Jens
 
T

Tim Harig

Subject: Re: Why printf() does not care of my locale settings ?

The C language itself doesn't specify any encoding. It also doesn't
interpret an particular character set other then the standard escapes.
I also have to work with ANSI 8-bits characters (no Unicode allowed in
my project).

C does not understand ANSI, 8859-1, UTF-8 or any other encoding; but,
it will process whatever binary representation you give it. It is
almost totally agnostic as to what the representation stands for.
My first test was to print the 0x9A character using the French code
page (1252) which stands for U-Trema.
Then I changed the page code to Russian (1251) so my printf should
display a Cyrillic U but it still displays a U-Trema.

In both cases printf wrote the 0x9A to stdout. stdout, by default, is
normally directed at some kind of text terminal (or console in Windows
parlance). The terminal is then responsible for displaying the character
with a gryph in a particular font.
I also tried to printf all the ANSI characters but the result was the
same in both cases (French then Russian).

The terminal settings where the same for both circumstances therefore it
displayed the gryph that is associated with 0xA9 for the encoding and font
that it is set to.
=> Did I miss something ?
=> How can I manage so my printf() displays the wanted character from
the wanted code page ?

Printf doesn't display anything. It writes a string of 8 bit characters to
stdout. Your terminal gives it meaning.
printf(setlocale(LC_ALL, "french"));

setlocale() doesn't change anything to printf() as printf() just
regergitates whatever you feed it. I does, in some cases, affect functions
such as iswlower() and towupper() etc. for wide characters.

printf(": '%c'\n", 0x9A);

printf() writes
French_France.1252: 'Ü'
Russian_Russia.1251: 'Ü'

If you redirect the output to a file and viewed it with a hex editor,
you should see that they both 0x9A wrote to the file. If you view the
file inside of different views using different encoding settings, they
will appear using different gryphs based on the encoding.
 
T

Tim Harig

I believe that the issue is that C is not in charge of what character
set your terminal displays. If you look up setlocale you'll see that
it influences things like collation and the ctype macros. Your
terminal is probably set for the Windows equivalent of iso-8859-1 or
-15 roughly speaking. I have a vague and perhaps faulty memory that
the dos command chcp (CHange Code Page) might be relevant to changing
what terminal will display. You could try chcp 1252 in a DOS window
before executing your program in the same console. If you have
questions about that you should try a windows group...

You can do the same thing from inside of your program using
SetConsoleOuputCP:

http://msdn.microsoft.com/en-us/library/ms686036(VS.85).aspx
 
G

Guilbert STABILO

Thanks to all for your contributions which made me understand how it
works.
I realized that I misunderstood the way the character were are sent to
the terminal : the C program only sends a binary data which is
interpreted by the receiving application depending (the terminal, a
GUI ...) on its encoding configuration.
Under Windows, I succeeded in changing the ANSI code page to Russian
but I had to reboot the computer so the setting was applied.
I found no way to do it at run time. I tried SetConsoleCP(1251) and
SetConsoleOutputCP(1251) => the function returned OK but the setting
was not effective.
I am going to dig deeper on a Windows forum.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top