locale name strings on windows xp

H

himanshu.garg

Hello,

I have a std c++ program that uses char/string everywhere and
works well with single byte characters.

The program depends on what a char is so to make it work for
utf-8 I assume I just have to do the following :-

replace char by wchar_t
set the locale to unicode/utf-8 using
locale::global(loc("<locale name>"));

The following program on my system outputs :-
C
2

int main()
{
std::locale loc;
std::cout << loc.name() << std::endl;
std::cout << sizeof(wchar_t) << std::endl;
}

Is there a way I can find out the name of available locales for
use with the locale constructor? Will my approach work? If I read a
utf-8 file will wchar_t store the character code for the corresponding
characters?

Thank You,
Himanshu
 
J

James Kanze

I have a std c++ program that uses char/string everywhere and
works well with single byte characters.

For what definition of "works well"?

This is more relevant than you might think. I'll bet it doesn't
handle all possible accented characters correctly. Or Japanese
or Chinese characters. You probably expect this, however, if
you move to Unicode. In other words, you're adding
functionality. And that functionality will almost certainly
require additional code.
The program depends on what a char is so to make it work for
utf-8 I assume I just have to do the following :-
replace char by wchar_t

UTF-8 is stored in a char, not a wchar_t. On many systems,
wchar_t can be used for UTF-16 or UTF-32. Note, however, that
UTF-16 is also a multiunit encoding, and if you're really
dealing in characters, you have to deal with multiple code
points for a single character in UTF-32 as well.
set the locale to unicode/utf-8 using
locale::global(loc("<locale name>"));

This may or may not be necessary, depending on what you're
doing. It is almost certainly not sufficient.
The following program on my system outputs :-
C
2
int main()
{
std::locale loc;
std::cout << loc.name() << std::endl;
std::cout << sizeof(wchar_t) << std::endl;
}

The first result is required by the standard. On start up, the
global locale is set to "C".
Is there a way I can find out the name of available locales
for use with the locale constructor?

Not portably. Under Unix and Unix look alikes, I've found that
looking at the contents of a directory called "/usr/lib/locale"
or "/usr/share/locale" will often help (but these directories
may also contain additional files), and Unix has a formal naming
convention as well. I have no idea what the situation is under
Windows. (Language names, e.g. "french" or "german", seem to
work, but I don't know how you'd specify an encoding.)

The only portable way I know of finding out exactly what locale
work is by an exhaustive search. Fairly easy to write, but
don't expect the program to finish anytime soon (say, anytime in
the next couple of centuries).
Will my approach work?

Not without some additional work.
If I read a utf-8 file will wchar_t store the character code
for the corresponding characters?

It might, if you use the appropriate locale. If it does,
however, you've probably still got some additional work before
you can say that your program "works well".
 
H

himanshu.garg

For what definition of "works well"?

This is more relevant than you might think. I'll bet it doesn't
handle all possible accented characters correctly. Or Japanese
or Chinese characters. You probably expect this, however, if
you move to Unicode. In other words, you're adding
functionality. And that functionality will almost certainly
require additional code.
Yes it doesn't handle the chars you mentioned. It works only when
chars are single byte.
UTF-8 is stored in a char, not a wchar_t. On many systems,
wchar_t can be used for UTF-16 or UTF-32. Note, however, that
UTF-16 is also a multiunit encoding, and if you're really
dealing in characters, you have to deal with multiple code
points for a single character in UTF-32 as well.


This may or may not be necessary, depending on what you're
doing. It is almost certainly not sufficient.

I wrote the following on GNU/Linux and for a utf-8 Arabic file it
outputs nothing :-
#include<locale>
#include<iostream>
#include<string>
int main()
{
std::locale::global(std::locale("en_US.UTF-8"));
wchar_t c;
std::wcin >> c;
std::wcout << c;
}
The first result is required by the standard. On start up, the
globallocaleis set to "C".


Not portably. Under Unix and Unix look alikes, I've found that
looking at the contents of a directory called "/usr/lib/locale"
or "/usr/share/locale" will often help (but these directories
may also contain additional files), and Unix has a formal naming
convention as well. I have no idea what the situation is underWindows. (Language names, e.g. "french" or "german", seem to
work, but I don't know how you'd specify an encoding.)

The only portable way I know of finding out exactly whatlocale
work is by an exhaustive search. Fairly easy to write, but
don't expect the program to finish anytime soon (say, anytime in
the next couple of centuries).


Not without some additional work.


It might, if you use the appropriatelocale. If it does,
however, you've probably still got some additional work before
you can say that your program "works well".

Thanks for your reply. My understanding of the problem has hopefully
improved.

Thank You,
Himanshu
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top