Confusion between UTF-8 and Unicode

jeanlutrin · Mar 18, 2005

Lets take the Euro symbol. In Unicode, its represented:

U+20AC

Its represented in UTF-8 in memory as:
E2 82

Impossible... only chars from 0x80 to 0x7ff (included) can
be represented on two bytes in UTF-8.

Moreover the last "hex letter" in UTF-8 is always the same
as the Unicode codepoint (U+20AC). So in this example, the
last "UTF-8" byte has to end with "C".

Correct UTF-8 representation for U+20AC is :

E2 82 AC

Now I know that you just forgot to paste the "AC", but still...

It needed to be corrected

Peace,

Jean

Bryce · Mar 21, 2005

Impossible... only chars from 0x80 to 0x7ff (included) can
be represented on two bytes in UTF-8.

Moreover the last "hex letter" in UTF-8 is always the same
as the Unicode codepoint (U+20AC). So in this example, the
last "UTF-8" byte has to end with "C".

Correct UTF-8 representation for U+20AC is :

E2 82 AC

Now I know that you just forgot to paste the "AC", but still...

Yea, that was it... :-0

I just opened in a hex editor, and must have misread the results....

It needed to be corrected

Thanks for correcting

Unicode (UTF-8) in C	13	Mar 16, 2014
Unicode/UTF-8 confusion	1	Mar 15, 2008
Python unicode utf-8 characters and MySQL unicode utf-8 characters	2	Jan 18, 2011
Stuck with urllib.quote and Unicode/UTF-8	0	May 7, 2011
UTF-8 and strings	44	Jun 7, 2011
Unicode and UTF-8	4	Oct 9, 2005
hex dump w/ or w/out utf-8 chars	40	Jul 7, 2013
converting UTF-8 to unicode hex with perl	4	Jun 27, 2009

Confusion between UTF-8 and Unicode

jeanlutrin

Bryce

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads