localization issue c++

T

Terry IT

hi,
i've some C and C++ code written. I'll speak about the C++ code
here.

All of the text user interface is with printfs and scanf. They are
all ascii characters. They are working well on english . Now i have to
change these
cout "Select 1 for choosing option billing \n") ; to japanese ,
chinese etc.

One option would be to use GNU gettext but there are issue with GPL
code. So i have to use plain C++.

My idea is to store for each language a prefix code and the string in
seperate file

like for japanese , str_ja
1 "japanse string for choosing optino billing"
2 " japenase optin for chooseing the payment mode"

str_cn similar.


Will replacing the string with wstring and cout with wcout and
reading string with wcin work or is there something else along with
wcout,to be done ? .

I fail to understand the relevance of converting from wstring to
string. Isn't wstring supposed to accomodate strings . Then why do
people do that or the other way round?

What do you guys use or recommend in such situations or how to design
such tasks ?
 
J

James Kanze

i've some C and C++ code written. I'll speak about the C++
code here.
All of the text user interface is with printfs and scanf. They
are all ascii characters. They are working well on english .
Now i have to change these
cout "Select 1 for choosing option billing \n") ; to japanese ,
chinese etc.
One option would be to use GNU gettext but there are issue
with GPL code. So i have to use plain C++.

What is your platform? The function gettext() isn't GNU; it's
tradionnal Unix, and is present under Solaris, and possibly other
Unix as well. Otherwise, there are probably equivalent system
dependent solutions.

Otherwise: there is a facet for messages in <locale>, which
should be implemented. You'll have to find the documentation
for your compiler, however, for information how to install
messages for a given locale. Supposing it is actually
supported; most systems seem to prefer other mechanisms. (The
documentation of VC++ says that "Currently, while the messages
class is implemented, there are no messages." It also fails to
say how you could install any of your own.)
My idea is to store for each language a prefix code and the
string in seperate file
like for japanese , str_ja
1 "japanse string for choosing optino billing"
2 " japenase optin for chooseing the payment mode"
str_cn similar.

If you're implementing something of your own, why not use the
basic gettext interface (or rather dgettext, in order to support
different domains, i.e. the possibility of multiple message
files)?
Will replacing the string with wstring and cout with wcout and
reading string with wcin work or is there something else along
with wcout,to be done ? .

What are you doing with text in your program? There are two
possible solutions, use char and UTF-8, or use something else
and UTF-16 or UTF-32. Depending on what you are actually doing
in the program, and what the system supports, one or the other
solution is preferable---I generally use UTF-8 on char most of
the time (but this is at least partially based on portability
concerns).

If you decide on UTF-16 or UTF-32, depending on the platform,
wchar_t might or might not support one of these. If you're not
concerned with portability, and the platforms wchar_t supports
one that is useful to you, use wchar_t; otherwise, use a typedef
to a convenient integral type. (If you're Windows, and can get
by with UTF-16, wchar_t should work just fine.)
I fail to understand the relevance of converting from wstring
to string. Isn't wstring supposed to accomodate strings . Then
why do people do that or the other way round?

First of all, neither std::string nor std::wstring represent
text strings. Basically, std::string is a sequence of char, and
std::wstring is a sequence of wchar_t. Typically, a char is 8
or 9 bits, and can only hold a limited number of single byte
characters, not enough for Japanese or Chinese. There are two
ways of handling this problem: use more than one char for each
character (UTF-8 on char), or use a bigger basic type (UTF-16 on
wchar_t). Depending on what you are doing: using multibyte
characters may cause no difficulty whatsoever (so there's no
real argument for using wchar_t), using multibyte characters
does cause problems (i.e. isspace doesn't work for multibyte
characters, for obvious reasons), in which case, UTF-16 or
UTF-32 might make life a lot easier, or (the most frequent case
if you're doing serious text processing) you have to treat even
UTF-16 and UTF-32 as multibyte (in which case, you might as well
go back to UTF-8 and char).
What do you guys use or recommend in such situations or how
to design such tasks ?

Two important things for starters: define exactly what you have
to do with the strings. And learn about the various encodings,
and how they work. The Unicode Consortium site,
http://www.unicode.org, has a lot of information. I found
_Fonts and Encodings_, by Yannis Haralambous very useful; even
if a lot of it may not be relevant to your problem, part I deals
with Unicode in general (as opposed to font management and other
problems).

And be aware that the problem will not be simple. For anything
more than just copying strings back and forth, you'll probably
have to redesign parts of your application.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top