get wide character and multibyte character value

G

George2

Hello everyone,


I need to know the wide character (unicode) and multibyte (UTF-8)
values of a character string of czech. I personally know nothing about
czech. Is the following approach correct?

1. I use L on the character string and watch memory to get the wide
character representation of the character string in little endian
form;

2. I change the computer region/language to czech, and use function
WideCharToMultiByte, and use CP_ACP as input code page and use the L
character string as input to get the output multibyte character string
output from parameter lpMultiByteStr.

Is (1) and (2) correct? Any more efficient and smart ways?


thanks in advance,
George
 
D

Daniel T.

George2 said:
I need to know the wide character (unicode) and multibyte (UTF-8)
values of a character string of czech. I personally know nothing about
czech. Is the following approach correct?

1. I use L on the character string and watch memory to get the wide
character representation of the character string in little endian
form;

I don't think that would work. The C++ compilers that I have used don't
handle unicode files well.
2. I change the computer region/language to czech, and use function
WideCharToMultiByte, and use CP_ACP as input code page and use the L
character string as input to get the output multibyte character string
output from parameter lpMultiByteStr.

Is (1) and (2) correct? Any more efficient and smart ways?

If you are using Windows there is the "character map" program, on a Mac
go to the Edit menu and select "Special Characters". Or you could simply
go to http://unicode.org/charts/. Czech. uses the Cyrillic alphabet
doesn't it?
 
J

James Kanze

I need to know the wide character (unicode) and multibyte (UTF-8)
values of a character string of czech. I personally know nothing about
czech. Is the following approach correct?
1. I use L on the character string and watch memory to get the wide
character representation of the character string in little endian
form;

(Just a nit, using wchar_t avoids any question of endianness.)
2. I change the computer region/language to czech, and use function
WideCharToMultiByte, and use CP_ACP as input code page and use the L
character string as input to get the output multibyte character string
output from parameter lpMultiByteStr.

(Just a nit, but there isn't any function WideCharToMultiByte in
C++. Most of the rest of this paragraph doesn't make much sense
to me either.)
Is (1) and (2) correct? Any more efficient and smart ways?

Neither is correct if you want to know the Unicode encodings.
Both depend a lot on the aleas of the implementation.

What's wrong with just looking the information up in the code
tables at the Unicode site? (Note that there won't be just one
encoding---the actual values will depend on the canonical form
being used.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,830
Latest member
ZADIva7383

Latest Threads

Top