Confusion between UTF-8 and Unicode

J

jeanlutrin

Lets take the Euro symbol. In Unicode, its represented:
U+20AC

Its represented in UTF-8 in memory as:
E2 82

Impossible... only chars from 0x80 to 0x7ff (included) can
be represented on two bytes in UTF-8.

Moreover the last "hex letter" in UTF-8 is always the same
as the Unicode codepoint (U+20AC). So in this example, the
last "UTF-8" byte has to end with "C".

Correct UTF-8 representation for U+20AC is :

E2 82 AC

Now I know that you just forgot to paste the "AC", but still...

It needed to be corrected :)


Peace,

Jean
 
B

Bryce

Impossible... only chars from 0x80 to 0x7ff (included) can
be represented on two bytes in UTF-8.

Moreover the last "hex letter" in UTF-8 is always the same
as the Unicode codepoint (U+20AC). So in this example, the
last "UTF-8" byte has to end with "C".

Correct UTF-8 representation for U+20AC is :

E2 82 AC

Now I know that you just forgot to paste the "AC", but still...

Yea, that was it... :-0

I just opened in a hex editor, and must have misread the results....
It needed to be corrected :)

Thanks for correcting
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top