in comp.lang.c i read:
I wanted information about UTF16 format and what is disadvantage over UTF8
format..
in c the most notable difference would be that utf-16 would usually be
composed of two bytes -- in c bytes need not be 8 bits, but it is very
common -- so then fully half the code space has a byte whose value is 0. a
sequence of such codes cannot be treated as a string. a utf-8 sequence can
be treated as a normal string, and these days it is a common form for an
implementation's mbcs.
in c we also have wide characters and wide character strings. there is no
requirement that the encoding be utf-16 -- some implementations use it,
some do not; these days i would expect utf-32 (or ucs-4 -- yuck!) the more
common. with a wide character string the embedded null byte pitfall is
avoided but there is other effort required to make them work well.