B
Boris
I'm trying to find out what the steps look like to upgrade a program
(which is used on Windows and Unix) from Windows-1252 (the Windows "ANSI"
code page) to UCS-2. Currently the program reads and writes files encoded
in Windows-1252 but should be able to read files encoded in UCS-2, too.
As I don't want to deal with two character representations in the program
I plan to use UCS-2 internally. I should be able to simply use
std::wstring then? When Windows-1252 encoded files are read I have to
convert the data to UCS-2 though. My understanding is that it depends now
on the implementation of the C++ standard library if and what kind of
conversions are supported? I might need to use a third-party library like
the Dinkum Conversions Library which converts data on the fly or something
like UTF-8 CPP where I can call functions explicitly to convert between
character sets?
After converting everything to UCS-2 and storing it in std::wstring I
suppose I can use the well-known string functions to search, replace,
compare strings (including < and >) etc. Is my understanding correct that
I'm safe to use member functions of std::wstring as long as the character
set used is not multibyte?
Last but not least the program needs to save files again. It might make
sense to use UTF-8 here for backward compatibility (as other programs
might be able to read the files more easily if they support only
Windows-1252). Thus I would need another converter to make sure that
std::wstring is encoded in UTF-8 correctly which means I need a
third-party tool again?
Anything I might have missed?
Boris
(which is used on Windows and Unix) from Windows-1252 (the Windows "ANSI"
code page) to UCS-2. Currently the program reads and writes files encoded
in Windows-1252 but should be able to read files encoded in UCS-2, too.
As I don't want to deal with two character representations in the program
I plan to use UCS-2 internally. I should be able to simply use
std::wstring then? When Windows-1252 encoded files are read I have to
convert the data to UCS-2 though. My understanding is that it depends now
on the implementation of the C++ standard library if and what kind of
conversions are supported? I might need to use a third-party library like
the Dinkum Conversions Library which converts data on the fly or something
like UTF-8 CPP where I can call functions explicitly to convert between
character sets?
After converting everything to UCS-2 and storing it in std::wstring I
suppose I can use the well-known string functions to search, replace,
compare strings (including < and >) etc. Is my understanding correct that
I'm safe to use member functions of std::wstring as long as the character
set used is not multibyte?
Last but not least the program needs to save files again. It might make
sense to use UTF-8 here for backward compatibility (as other programs
might be able to read the files more easily if they support only
Windows-1252). Thus I would need another converter to make sure that
std::wstring is encoded in UTF-8 correctly which means I need a
third-party tool again?
Anything I might have missed?
Boris