locale name strings on windows xp

himanshu.garg · Apr 2, 2008

Hello,

I have a std c++ program that uses char/string everywhere and
works well with single byte characters.

The program depends on what a char is so to make it work for
utf-8 I assume I just have to do the following :-

replace char by wchar_t
set the locale to unicode/utf-8 using
locale::global(loc("<locale name>"));

The following program on my system outputs :-
C
2

int main()
{
std::locale loc;
std::cout << loc.name() << std::endl;
std::cout << sizeof(wchar_t) << std::endl;
}

Is there a way I can find out the name of available locales for
use with the locale constructor? Will my approach work? If I read a
utf-8 file will wchar_t store the character code for the corresponding
characters?

Thank You,
Himanshu

James Kanze · Apr 2, 2008

I have a std c++ program that uses char/string everywhere and
works well with single byte characters.

For what definition of "works well"?

This is more relevant than you might think. I'll bet it doesn't
handle all possible accented characters correctly. Or Japanese
or Chinese characters. You probably expect this, however, if
you move to Unicode. In other words, you're adding
functionality. And that functionality will almost certainly
require additional code.

The program depends on what a char is so to make it work for
utf-8 I assume I just have to do the following :-

replace char by wchar_t

UTF-8 is stored in a char, not a wchar_t. On many systems,
wchar_t can be used for UTF-16 or UTF-32. Note, however, that
UTF-16 is also a multiunit encoding, and if you're really
dealing in characters, you have to deal with multiple code
points for a single character in UTF-32 as well.

set the locale to unicode/utf-8 using
locale::global(loc("<locale name>"));

This may or may not be necessary, depending on what you're
doing. It is almost certainly not sufficient.

The following program on my system outputs :-
C
2

int main()
{
std::locale loc;
std::cout << loc.name() << std::endl;
std::cout << sizeof(wchar_t) << std::endl;
}

The first result is required by the standard. On start up, the
global locale is set to "C".

Is there a way I can find out the name of available locales
for use with the locale constructor?

Not portably. Under Unix and Unix look alikes, I've found that
looking at the contents of a directory called "/usr/lib/locale"
or "/usr/share/locale" will often help (but these directories
may also contain additional files), and Unix has a formal naming
convention as well. I have no idea what the situation is under
Windows. (Language names, e.g. "french" or "german", seem to
work, but I don't know how you'd specify an encoding.)

The only portable way I know of finding out exactly what locale
work is by an exhaustive search. Fairly easy to write, but
don't expect the program to finish anytime soon (say, anytime in
the next couple of centuries).

Will my approach work?

Not without some additional work.

If I read a utf-8 file will wchar_t store the character code
for the corresponding characters?

It might, if you use the appropriate locale. If it does,
however, you've probably still got some additional work before
you can say that your program "works well".

himanshu.garg · Apr 2, 2008

For what definition of "works well"?

This is more relevant than you might think. I'll bet it doesn't
handle all possible accented characters correctly. Or Japanese
or Chinese characters. You probably expect this, however, if
you move to Unicode. In other words, you're adding
functionality. And that functionality will almost certainly
require additional code.

Yes it doesn't handle the chars you mentioned. It works only when
chars are single byte.

UTF-8 is stored in a char, not a wchar_t. On many systems,
wchar_t can be used for UTF-16 or UTF-32. Note, however, that
UTF-16 is also a multiunit encoding, and if you're really
dealing in characters, you have to deal with multiple code
points for a single character in UTF-32 as well.

This may or may not be necessary, depending on what you're
doing. It is almost certainly not sufficient.

I wrote the following on GNU/Linux and for a utf-8 Arabic file it
outputs nothing :-
#include<locale>
#include<iostream>
#include<string>
int main()
{
std::locale::global(std::locale("en_US.UTF-8"));
wchar_t c;
std::wcin >> c;
std::wcout << c;
}

The first result is required by the standard. On start up, the
globallocaleis set to "C".

Not portably. Under Unix and Unix look alikes, I've found that
looking at the contents of a directory called "/usr/lib/locale"
or "/usr/share/locale" will often help (but these directories
may also contain additional files), and Unix has a formal naming
convention as well. I have no idea what the situation is underWindows. (Language names, e.g. "french" or "german", seem to
work, but I don't know how you'd specify an encoding.)

The only portable way I know of finding out exactly whatlocale
work is by an exhaustive search. Fairly easy to write, but
don't expect the program to finish anytime soon (say, anytime in
the next couple of centuries).

Not without some additional work.

It might, if you use the appropriatelocale. If it does,
however, you've probably still got some additional work before
you can say that your program "works well".

Thanks for your reply. My understanding of the problem has hopefully
improved.

Thank You,
Himanshu

locale in c++	3	Nov 11, 2009
std::locale question	7	Mar 2, 2011
std::locale problem on Mac OSX	3	Aug 17, 2011
std::locale ctor fails (L10n with C++)	5	Mar 29, 2008
Problem in using std::locale on unix system.	2	Nov 28, 2006
const reference of std::locale	2	Oct 8, 2010
Setting C++ locale for 1 category	3	Oct 10, 2011
g++ and locale on Solaris 10	0	May 15, 2005

locale name strings on windows xp

himanshu.garg

James Kanze

himanshu.garg

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads