Regarding UTF16

Discussion in 'C Programming' started by news.fe.internet.bosch.com, Feb 2, 2006.

  1. Hi ,

    I wanted information about UTF16 format and what is disadvantage over UTF8
    format..

    TIA

    Mohan
     
    news.fe.internet.bosch.com, Feb 2, 2006
    #1
    1. Advertising

  2. news.fe.internet.bosch.com wrote:
    > Hi ,
    >
    > I wanted information about UTF16 format and what is disadvantage over UTF8
    > format..



    This group (c.l.c) is the wrong place to ask.

    I believe comp.software.international is (followup-to added).

    Cheers

    Vladimir
     
    Vladimir S. Oka, Feb 2, 2006
    #2
    1. Advertising

  3. news.fe.internet.bosch.com

    Guest

    , Feb 2, 2006
    #3
  4. in comp.lang.c i read:

    >I wanted information about UTF16 format and what is disadvantage over UTF8
    >format..


    in c the most notable difference would be that utf-16 would usually be
    composed of two bytes -- in c bytes need not be 8 bits, but it is very
    common -- so then fully half the code space has a byte whose value is 0. a
    sequence of such codes cannot be treated as a string. a utf-8 sequence can
    be treated as a normal string, and these days it is a common form for an
    implementation's mbcs.

    in c we also have wide characters and wide character strings. there is no
    requirement that the encoding be utf-16 -- some implementations use it,
    some do not; these days i would expect utf-32 (or ucs-4 -- yuck!) the more
    common. with a wide character string the embedded null byte pitfall is
    avoided but there is other effort required to make them work well.

    --
    a signature
     
    those who know me have no need of my name, Feb 5, 2006
    #4
  5. news.fe.internet.bosch.com

    Jordan Abel Guest

    On 2006-02-05, those who know me have no need of my name <> wrote:
    > in comp.lang.c i read:
    >
    >>I wanted information about UTF16 format and what is disadvantage over UTF8
    >>format..

    >
    > in c the most notable difference would be that utf-16 would usually be
    > composed of two bytes -- in c bytes need not be 8 bits, but it is very
    > common -- so then fully half the code space has a byte whose value is 0. a


    Actually it's roughly one in 128. Of the set of 16-bit values:

    There are a total of 65536 values. There are 510 that have exactly one 0
    byte, exactly 1 that has two 0 bytes, and 255*255=65025 that do not
    contain a 0 byte.

    However, the area with a first byte of 0 and a second byte between 32
    and 126 are considered "the most important" for traditional reasons, and
    this encompasses the entire basic execution character set.
     
    Jordan Abel, Feb 5, 2006
    #5
  6. in comp.lang.c i read:
    >On 2006-02-05, those who know me have no need of my name
    ><> wrote:


    >> in c [...] utf-16 would usually be composed of two bytes [...] so then
    >> fully half the code space has a byte whose value is 0.

    >
    >Actually it's roughly one in 128. Of the set of 16-bit values:
    >
    >There are a total of 65536 values. There are 510 that have exactly one 0
    >byte, exactly 1 that has two 0 bytes, and 255*255=65025 that do not
    >contain a 0 byte.


    err, oops -- thanks for the catch!

    >However, the area with a first byte of 0 and a second byte between 32
    >and 126 are considered "the most important" for traditional reasons, and
    >this encompasses the entire basic execution character set.


    just where my mind was at, unfortunately.

    --
    a signature
     
    those who know me have no need of my name, Feb 12, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Xah Lee

    convert gb18030 to utf16

    Xah Lee, Mar 6, 2005, in forum: Python
    Replies:
    2
    Views:
    1,560
    Xah Lee
    Mar 7, 2005
  2. John Perks and Sarah Mount

    UTF16 codec doesn't round-trip?

    John Perks and Sarah Mount, May 28, 2005, in forum: Python
    Replies:
    1
    Views:
    482
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    May 28, 2005
  3. Fuzzyman
    Replies:
    4
    Views:
    589
    Fuzzyman
    Feb 7, 2006
  4. jmgeu

    utf8 to utf16

    jmgeu, Mar 9, 2007, in forum: VHDL
    Replies:
    0
    Views:
    484
    jmgeu
    Mar 9, 2007
  5. R Wood
    Replies:
    4
    Views:
    552
    Adam Atlas
    Apr 24, 2007
Loading...

Share This Page