mbtowc - combining character

Discussion in 'C Programming' started by Old Wolf, Apr 4, 2007.

  1. Old Wolf

    Old Wolf Guest

    As far as I can see, mbtowc and mbstowcs assume that there is
    exactly one wide character for each multi-byte sequence. How are
    you meant to cope with MBS that correspond to two wide characters?

    For example, if it is Unicode and the MBS represents a letter
    with a combining diacritic.
    Old Wolf, Apr 4, 2007
    #1
    1. Advertising

  2. Old Wolf

    Guest

    On 4 huhti, 02:18, "Old Wolf" <> wrote:
    > As far as I can see, mbtowc and mbstowcs assume that there is
    > exactly one wide character for each multi-byte sequence. How are
    > you meant to cope with MBS that correspond to two wide characters?
    >
    > For example, if it is Unicode and the MBS represents a letter
    > with a combining diacritic.


    You aren't. That's purely implementation defined.
    , Apr 4, 2007
    #2
    1. Advertising

  3. Old Wolf

    CBFalconer Guest

    Old Wolf wrote:
    >
    > As far as I can see, mbtowc and mbstowcs assume that there is
    > exactly one wide character for each multi-byte sequence. How are
    > you meant to cope with MBS that correspond to two wide characters?
    >
    > For example, if it is Unicode and the MBS represents a letter
    > with a combining diacritic.


    The same way you convert '\n' to a cr/lf output sequence.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>



    --
    Posted via a free Usenet account from http://www.teranews.com
    CBFalconer, Apr 4, 2007
    #3
  4. Old Wolf wrote:
    > As far as I can see, mbtowc and mbstowcs assume that there is
    > exactly one wide character for each multi-byte sequence.


    There is exactly one wide character for each multi-byte sequence.

    > How are
    > you meant to cope with MBS that correspond to two wide characters?
    >
    > For example, if it is Unicode and the MBS represents a letter
    > with a combining diacritic.


    Those are two separate multi-byte sequences. The C functions work on
    the character level, not on the glyph level.
    =?utf-8?B?SGFyYWxkIHZhbiBExLNr?=, Apr 4, 2007
    #4
  5. Op Wed, 04 Apr 2007 01:18:46 +0200 schreef Old Wolf
    <>:
    > As far as I can see, mbtowc and mbstowcs assume that there is
    > exactly one wide character for each multi-byte sequence. How are
    > you meant to cope with MBS that correspond to two wide characters?
    >
    > For example, if it is Unicode and the MBS represents a letter
    > with a combining diacritic.


    Perform canonical decomposition before converting.



    --
    Gemaakt met Opera's revolutionaire e-mailprogramma:
    http://www.opera.com/mail/
    Boudewijn Dijkstra, Apr 4, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Velvet
    Replies:
    9
    Views:
    14,781
    Joerg Jooss
    Jan 19, 2006
  2. raavi
    Replies:
    2
    Views:
    901
    raavi
    Mar 2, 2006
  3. Ross

    How do mbtowc() and wctomb() work?

    Ross, Jul 24, 2006, in forum: C Programming
    Replies:
    9
    Views:
    648
  4. kyuupi

    mbtowc recovery

    kyuupi, Sep 6, 2007, in forum: C Programming
    Replies:
    1
    Views:
    271
    Bart van Ingen Schenau
    Sep 6, 2007
  5. Neil Booth

    mbtowc question

    Neil Booth, Sep 6, 2007, in forum: C Programming
    Replies:
    1
    Views:
    347
    Bart van Ingen Schenau
    Sep 6, 2007
Loading...

Share This Page