Regarding UTF16

  • Thread starter news.fe.internet.bosch.com
  • Start date
N

news.fe.internet.bosch.com

Hi ,

I wanted information about UTF16 format and what is disadvantage over UTF8
format..

TIA

Mohan
 
V

Vladimir S. Oka

news.fe.internet.bosch.com said:
Hi ,

I wanted information about UTF16 format and what is disadvantage over UTF8
format..


This group (c.l.c) is the wrong place to ask.

I believe comp.software.international is (followup-to added).

Cheers

Vladimir
 
T

those who know me have no need of my name

in comp.lang.c i read:
I wanted information about UTF16 format and what is disadvantage over UTF8
format..

in c the most notable difference would be that utf-16 would usually be
composed of two bytes -- in c bytes need not be 8 bits, but it is very
common -- so then fully half the code space has a byte whose value is 0. a
sequence of such codes cannot be treated as a string. a utf-8 sequence can
be treated as a normal string, and these days it is a common form for an
implementation's mbcs.

in c we also have wide characters and wide character strings. there is no
requirement that the encoding be utf-16 -- some implementations use it,
some do not; these days i would expect utf-32 (or ucs-4 -- yuck!) the more
common. with a wide character string the embedded null byte pitfall is
avoided but there is other effort required to make them work well.
 
J

Jordan Abel

in comp.lang.c i read:


in c the most notable difference would be that utf-16 would usually be
composed of two bytes -- in c bytes need not be 8 bits, but it is very
common -- so then fully half the code space has a byte whose value is 0. a

Actually it's roughly one in 128. Of the set of 16-bit values:

There are a total of 65536 values. There are 510 that have exactly one 0
byte, exactly 1 that has two 0 bytes, and 255*255=65025 that do not
contain a 0 byte.

However, the area with a first byte of 0 and a second byte between 32
and 126 are considered "the most important" for traditional reasons, and
this encompasses the entire basic execution character set.
 
T

those who know me have no need of my name

in comp.lang.c i read:
On 2006-02-05, those who know me have no need of my name
in c [...] utf-16 would usually be composed of two bytes [...] so then
fully half the code space has a byte whose value is 0.

Actually it's roughly one in 128. Of the set of 16-bit values:

There are a total of 65536 values. There are 510 that have exactly one 0
byte, exactly 1 that has two 0 bytes, and 255*255=65025 that do not
contain a 0 byte.

err, oops -- thanks for the catch!
However, the area with a first byte of 0 and a second byte between 32
and 126 are considered "the most important" for traditional reasons, and
this encompasses the entire basic execution character set.

just where my mind was at, unfortunately.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top