get wide character and multibyte character value

George2 · Jan 24, 2008

Hello everyone,

I need to know the wide character (unicode) and multibyte (UTF-8)
values of a character string of czech. I personally know nothing about
czech. Is the following approach correct?

1. I use L on the character string and watch memory to get the wide
character representation of the character string in little endian
form;

2. I change the computer region/language to czech, and use function
WideCharToMultiByte, and use CP_ACP as input code page and use the L
character string as input to get the output multibyte character string
output from parameter lpMultiByteStr.

Is (1) and (2) correct? Any more efficient and smart ways?

thanks in advance,
George

Daniel T. · Jan 24, 2008

George2 said:
I need to know the wide character (unicode) and multibyte (UTF-8)
values of a character string of czech. I personally know nothing about
czech. Is the following approach correct?

1. I use L on the character string and watch memory to get the wide
character representation of the character string in little endian
form;

I don't think that would work. The C++ compilers that I have used don't
handle unicode files well.

2. I change the computer region/language to czech, and use function
WideCharToMultiByte, and use CP_ACP as input code page and use the L
character string as input to get the output multibyte character string
output from parameter lpMultiByteStr.

Is (1) and (2) correct? Any more efficient and smart ways?

If you are using Windows there is the "character map" program, on a Mac
go to the Edit menu and select "Special Characters". Or you could simply
go to http://unicode.org/charts/. Czech. uses the Cyrillic alphabet
doesn't it?

James Kanze · Jan 25, 2008

I need to know the wide character (unicode) and multibyte (UTF-8)
values of a character string of czech. I personally know nothing about
czech. Is the following approach correct?

1. I use L on the character string and watch memory to get the wide
character representation of the character string in little endian
form;

(Just a nit, using wchar_t avoids any question of endianness.)

2. I change the computer region/language to czech, and use function
WideCharToMultiByte, and use CP_ACP as input code page and use the L
character string as input to get the output multibyte character string
output from parameter lpMultiByteStr.

(Just a nit, but there isn't any function WideCharToMultiByte in
C++. Most of the rest of this paragraph doesn't make much sense
to me either.)

Is (1) and (2) correct? Any more efficient and smart ways?

Neither is correct if you want to know the Unicode encodings.
Both depend a lot on the aleas of the implementation.

What's wrong with just looking the information up in the code
tables at the Unicode site? (Note that there won't be just one
encoding---the actual values will depend on the canonical form
being used.)

wide character file to wstring - unexpected results	1	Dec 14, 2011
Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
Why "Wide character in print"?	40	Sep 30, 2012
Multi-character constants	2	Jul 9, 2008
wcout does not print wide character string in solaris.	1	Dec 14, 2006
Questions on ISO C character constants	1	Nov 8, 2011
Questions on character constants	2	Dec 12, 2010
Wide characters and streams	3	Sep 30, 2006

get wide character and multibyte character value

George2

Daniel T.

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads