Unicode character in C++

L

liveshell

Hi all,
In my application, I am reading a file and storing it in a
array of character. That is ascii format...now in certain situation I
get unicode character (or lets say junk character). I want to know
that whether it is plain ascii or Unicode...How can I ??

Thanks,
LiveShell
 
M

Michael DOUBEZ

liveshell a écrit :
Hi all,
In my application, I am reading a file and storing it in a
array of character. That is ascii format...now in certain situation I
get unicode character (or lets say junk character). I want to know
that whether it is plain ascii or Unicode...How can I ??

Supposing your junk is UTF-8, you have to look for MSB equal to 1. This
is how is is done in UTF-8: char 0-127 are the historical ascii char,
and the number of ones in the MSB of the char gives the number of char
that follow in the encoding:
US-ASCII: 0xxxxxxx
2 bytes: 10xxxxxx xxxxxxxx
3 bytes: 110xxxxx xxxxxxxx xxxxxxxx
4 bytes: 1110xxxx xxxxxxxx xxxxxxxx xxxxxxxx

Michael
 
J

James Kanze

liveshell a écrit :
Supposing your junk is UTF-8, you have to look for MSB equal to 1. This
is how is is done in UTF-8: char 0-127 are the historical ascii char,
and the number of ones in the MSB of the char gives the number of char
that follow in the encoding:
US-ASCII: 0xxxxxxx
2 bytes: 10xxxxxx xxxxxxxx
3 bytes: 110xxxxx xxxxxxxx xxxxxxxx
4 bytes: 1110xxxx xxxxxxxx xxxxxxxx xxxxxxxx

Note too that the following bytes will always have 10 in their
upper bits, so that should be something like:
2 bytes: 10xxxxxx 10xxxxxx
3 bytes: 110xxxxx 10xxxxxx 10xxxxxx
4 bytes: 1110xxxx 10xxxxxx 10xxxxxx 10xxxxxx
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top