Re: UTF-8 and wchar_t

Discussion in 'C Programming' started by Ersek, Laszlo, Mar 2, 2010.

  1. In article <86.com>, Michal Nazarewicz <> writes:

    > Also, what happens when I say to wprintf() a string which contains wide
    > character which has no representation in current locale (ie. some funky
    > unicode character where locale is set to ISO-8859-1 encoding)?


    wprintf() will return a negative value [and errno will be set to EILSEQ].


    > Can I somehow instruct the standard library function to print, say,
    > a question mark in such situations or do I have to handle such cases by
    > myself?


    On a second thought, you might be better off if you converted the output
    with iconv() too, from WCHAR_T to the codeset used by the current
    locale.

    http://www.opengroup.org/onlinepubs/007908775/xsh/iconv.html
    ----v----
    If iconv() encounters a character in the input buffer that is valid, but
    for which an identical character does not exist in the target codeset,
    iconv() performs an implementation-dependent conversion on this
    character.
    ----^----

    (You would have to test this.)

    You should be able to get the codeset used by the current locale by
    calling

    nl_langinfo(CODESET)

    http://www.opengroup.org/onlinepubs/007908775/xsh/nl_langinfo.html

    (Sorry for being glibc/SUSv2-specific.)

    Cheers,
    lacos
     
    Ersek, Laszlo, Mar 2, 2010
    #1
    1. Advertising

  2. On Tue, 02 Mar 2010 21:42:07 +0100, Michal Nazarewicz <> wrote:
    > Thanks for all the links and information. I have been considering
    > iconv() but didn't notice that it can do conversion to/from wchar_t as
    > well and that was my biggest concern. I'll be sure to look more into
    > it.


    To clarify further, it's not necessarily able to do so. The GNU
    implementation does support it, but more generally, available
    iconv sources/targets are implementation-defined.

    > however depending on glibc may hurt me a bit as my code won't
    > quite work on, say, BSD then.


    Indeed I'm not sure if WCHAR_T is available for iconv there.
    You can probably use GNU libiconv (under LGPL) there too if you like,
    though.

    (Yeah, getting Unixy, sorry about that; if one continues further,
    probably better to move to comp.unix.programming)

    --
    Mikko Rauhala <> - http://www.iki.fi/mjr/blog/
    The Finnish Pirate Party - http://piraattipuolue.fi/
    World Transhumanist Association - http://transhumanism.org/
    Singularity Institute - http://singinst.org/
     
    Mikko Rauhala, Mar 3, 2010
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jon Willeke

    wchar_t -> UTF-8?

    Jon Willeke, Feb 8, 2004, in forum: C++
    Replies:
    2
    Views:
    7,550
    Tilman Kuepper
    Feb 9, 2004
  2. Steven T. Hatton
    Replies:
    23
    Views:
    7,718
    Phlip
    Mar 12, 2006
  3. Replies:
    3
    Views:
    1,107
    James Kanze
    Aug 15, 2008
  4. Boris Du¹ek
    Replies:
    3
    Views:
    1,462
    Boris Du¹ek
    Nov 3, 2008
  5. Ersek, Laszlo

    Re: UTF-8 and wchar_t

    Ersek, Laszlo, Mar 2, 2010, in forum: C Programming
    Replies:
    3
    Views:
    1,840
    Ersek, Laszlo
    Mar 3, 2010
Loading...

Share This Page