Wide character input/output

Discussion in 'C Programming' started by Ioannis Vranos, Feb 23, 2008.

  1. [The current message encoding is set to Unicode (UTF-8) because it
    contains Greek]


    The following code does not work as expected:


    #include <wchar.h>
    #include <locale.h>
    #include <stdio.h>
    #include <stddef.h>

    int main()
    {
    char *p= setlocale( LC_ALL, "Greek" );

    wchar_t input[50];

    if (!p)
    printf("NULL returned!\n");

    fgetws(input, 50, stdin);

    wprintf(L"%s\n", input);

    return 0;
    }


    Under Linux:


    [john@localhost src]$ ./foobar-cpp
    Test
    T
    [john@localhost src]$


    [john@localhost src]$ ./foobar-cpp
    Δοκιμαστικό
    �
    [john@localhost src]$




    Under MS Visual C++ 2008 Express:

    Test
    Test

    Press any key to continue . . .


    Δοκιμαστικό
    ??????ε????

    Press any key to continue . . .


    Am I missing something?
    Ioannis Vranos, Feb 23, 2008
    #1
    1. Advertising

  2. Ioannis Vranos <> writes:

    > [The current message encoding is set to Unicode (UTF-8) because it
    > contains Greek]
    >
    >
    > The following code does not work as expected:
    >
    >
    > #include <wchar.h>
    > #include <locale.h>
    > #include <stdio.h>
    > #include <stddef.h>
    >
    > int main()
    > {
    > char *p= setlocale( LC_ALL, "Greek" );
    >
    > wchar_t input[50];
    >
    > if (!p)
    > printf("NULL returned!\n");
    >
    > fgetws(input, 50, stdin);
    >
    > wprintf(L"%s\n", input);


    You need "%ls". This is very important with wprintf since without it
    %s denotes a multi-byte character sequence. printf("%ls\n" input)
    should also work. You need the w version if you want the multi-byte
    conversion of %s or if the format has to be a wchar_t pointer.

    >
    > return 0;
    > }
    >
    >
    > Under Linux:
    >
    >
    > [john@localhost src]$ ./foobar-cpp
    > Test
    > T
    > [john@localhost src]$
    >
    >
    > [john@localhost src]$ ./foobar-cpp
    > Δοκιμαστικό
    > �
    > [john@localhost src]$


    The above my not be the only problem. In cases like this, you need to
    say way encoding your terminal is using.

    <snip>

    --
    Ben.
    Ben Bacarisse, Feb 23, 2008
    #2
    1. Advertising

  3. Ben Bacarisse wrote:
    >
    > You need "%ls". This is very important with wprintf since without it
    > %s denotes a multi-byte character sequence. printf("%ls\n" input)
    > should also work. You need the w version if you want the multi-byte
    > conversion of %s or if the format has to be a wchar_t pointer.



    Perhaps you may help me understand better. We have the usual char
    encoding which is implementation defined (usually ASCII).

    wchar_t is wide character encoding, which is the "largest character set
    supported by the system", so I suppose Unicode under Linux and Windows.

    What exactly is a multi-byte character?

    I have to say that I am talking about C95 here, not C99.


    >
    >> return 0;
    >> }
    >>
    >>
    >> Under Linux:
    >>
    >>
    >> [john@localhost src]$ ./foobar-cpp
    >> Test
    >> T
    >> [john@localhost src]$
    >>
    >>
    >> [john@localhost src]$ ./foobar-cpp
    >> Δοκιμαστικό
    >> �
    >> [john@localhost src]$

    >
    > The above my not be the only problem. In cases like this, you need to
    > say way encoding your terminal is using.



    You are somehow correct on this. My terminal encoding was UTF-8 and I
    added Greek(ISO-8859-7). Under the last, the following code works OK:


    #include <wchar.h>
    #include <locale.h>
    #include <stdio.h>
    #include <stddef.h>

    int main()
    {
    char *p= setlocale( LC_ALL, "Greek" );

    wprintf(L"Δοκιμαστικό\n");

    return 0;
    }

    [john@localhost src]$ ./foobar-cpp
    Δοκιμαστικό
    [john@localhost src]$


    Also the original, fixed according to your suggestion:


    #include <wchar.h>
    #include <locale.h>
    #include <stdio.h>
    #include <stddef.h>

    int main()
    {
    char *p= setlocale( LC_ALL, "Greek" );

    wchar_t input[50];

    if (!p)
    printf("NULL returned!\n");

    fgetws(input, 50, stdin);

    wprintf(L"%ls", input);

    return 0;
    }

    works OK too:

    [john@localhost src]$ ./foobar-cpp
    Δοκιμαστικό
    Δοκιμαστικό
    [john@localhost src]$


    It works OK under Terminal UTF-8 default encoding too. So "%ls" is what
    was really needed.


    BTW, how can we define UTF-8 as the locale?


    Thanks a lot.
    Ioannis Vranos, Feb 23, 2008
    #3
  4. Ioannis Vranos wrote:
    >
    > It works OK under Terminal UTF-8 default encoding too. So "%ls" is what
    > was really needed.



    Actually the code:

    #include <wchar.h>
    #include <locale.h>
    #include <stdio.h>
    #include <stddef.h>

    int main()
    {
    char *p= setlocale( LC_ALL, "Greek" );

    wprintf(L"Δοκιμαστικό\n");

    return 0;
    }

    works only when I set the Terminal encoding to Greek (ISO-8859-7).



    >
    >
    > BTW, how can we define UTF-8 as the locale?
    >
    >
    > Thanks a lot.
    Ioannis Vranos, Feb 23, 2008
    #4
  5. Ioannis Vranos <> writes:

    > Ben Bacarisse wrote:
    >>
    >> You need "%ls". This is very important with wprintf since without it
    >> %s denotes a multi-byte character sequence. printf("%ls\n" input)
    >> should also work. You need the w version if you want the multi-byte
    >> conversion of %s or if the format has to be a wchar_t pointer.

    >
    >
    > Perhaps you may help me understand better. We have the usual char
    > encoding which is implementation defined (usually ASCII).
    >
    > wchar_t is wide character encoding, which is the "largest character
    > set supported by the system", so I suppose Unicode under Linux and
    > Windows.
    >
    > What exactly is a multi-byte character?


    It is a confusing term. It means an encoding that uses sequences of
    ordinary bytes (in the C sense -- chars) to encode a large character
    set. The most common example is UTF-8.

    > I have to say that I am talking about C95 here, not C99.
    >
    >
    >>
    >>> return 0;
    >>> }
    >>>
    >>>
    >>> Under Linux:
    >>>
    >>>
    >>> [john@localhost src]$ ./foobar-cpp
    >>> Test
    >>> T
    >>> [john@localhost src]$
    >>>
    >>>
    >>> [john@localhost src]$ ./foobar-cpp
    >>> Δοκιμαστικό
    >>> �
    >>> [john@localhost src]$

    >>
    >> The above my not be the only problem. In cases like this, you need to
    >> say way encoding your terminal is using.

    >
    >
    > You are somehow correct on this.


    Strange, I know!

    > My terminal encoding was UTF-8 and I
    > added Greek(ISO-8859-7). Under the last, the following code works OK:
    >
    >
    > #include <wchar.h>
    > #include <locale.h>
    > #include <stdio.h>
    > #include <stddef.h>
    >
    > int main()
    > {
    > char *p= setlocale( LC_ALL, "Greek" );
    >
    > wprintf(L"Δοκιμαστικό\n");
    >
    > return 0;
    > }
    >
    > [john@localhost src]$ ./foobar-cpp
    > Δοκιμαστικό
    > [john@localhost src]$
    >
    >
    > Also the original, fixed according to your suggestion:
    >
    >
    > #include <wchar.h>
    > #include <locale.h>
    > #include <stdio.h>
    > #include <stddef.h>
    >
    > int main()
    > {
    > char *p= setlocale( LC_ALL, "Greek" );
    >
    > wchar_t input[50];
    >
    > if (!p)
    > printf("NULL returned!\n");
    >
    > fgetws(input, 50, stdin);
    >
    > wprintf(L"%ls", input);
    >
    > return 0;
    > }
    >
    > works OK too:
    >
    > [john@localhost src]$ ./foobar-cpp
    > Δοκιμαστικό
    > Δοκιμαστικό
    > [john@localhost src]$
    >
    >
    > It works OK under Terminal UTF-8 default encoding too. So "%ls" is
    > what was really needed.
    >
    >
    > BTW, how can we define UTF-8 as the locale?


    I *think* this is now off-topic. I don't think C says anything about
    what the locale string means...

    The character encoding is usually specified after a '.'. I use, for
    example, "en-GB.UTF-8". I suspect that if you only specify a part of
    the locale (or one that does not make sense) your C library picks up
    what to do from the execution environment. To me "Greek" looks like
    an odd locale string. I would expect "el-GR.UTF-8" or
    "el-GR.ISO8859-7".

    --
    Ben.
    Ben Bacarisse, Feb 23, 2008
    #5
  6. Ioannis Vranos <> writes:

    > Ioannis Vranos wrote:
    >>
    >> It works OK under Terminal UTF-8 default encoding too. So "%ls" is
    >> what was really needed.

    >
    >
    > Actually the code:
    >
    > #include <wchar.h>
    > #include <locale.h>
    > #include <stdio.h>
    > #include <stddef.h>
    >
    > int main()
    > {
    > char *p= setlocale( LC_ALL, "Greek" );
    >
    > wprintf(L"Δοκιμαστικό\n");
    >
    > return 0;
    > }
    >
    > works only when I set the Terminal encoding to Greek (ISO-8859-7).


    This sort of thing is almost impossible to investigate over Usenet.
    Your news software will take your code and may or may not encode the
    characters of the L"..." string in the encoding of your post (UTF-8).
    It makes it very hard to know what the program text actually is.

    Another complication is that the locale setting affects the run-time
    behaviour, but you program also depends on what character encoding is
    expected by the compiler that builds the string.

    --
    Ben.
    Ben Bacarisse, Feb 23, 2008
    #6
  7. Ben Bacarisse wrote:
    >> BTW, how can we define UTF-8 as the locale?

    >
    > I *think* this is now off-topic. I don't think C says anything about
    > what the locale string means...
    >
    > The character encoding is usually specified after a '.'. I use, for
    > example, "en-GB.UTF-8". I suspect that if you only specify a part of
    > the locale (or one that does not make sense) your C library picks up
    > what to do from the execution environment. To me "Greek" looks like
    > an odd locale string. I would expect "el-GR.UTF-8" or
    > "el-GR.ISO8859-7".



    I got the idea from:

    http://msdn2.microsoft.com/en-us/library/x99tb11d(VS.80).aspx

    http://msdn2.microsoft.com/en-us/library/39cwe7zf(VS.80).aspx
    Ioannis Vranos, Feb 24, 2008
    #7
  8. Ben Bacarisse wrote:

    >> BTW, how can we define UTF-8 as the locale?

    >
    > I *think* this is now off-topic. I don't think C says anything about
    > what the locale string means...
    >
    > The character encoding is usually specified after a '.'. I use, for
    > example, "en-GB.UTF-8". I suspect that if you only specify a part of
    > the locale (or one that does not make sense) your C library picks up
    > what to do from the execution environment. To me "Greek" looks like
    > an odd locale string. I would expect "el-GR.UTF-8" or
    > "el-GR.ISO8859-7".



    This code works with gcc:

    #include <wchar.h>
    #include <locale.h>
    #include <stdio.h>
    #include <stddef.h>

    int main()
    {
    char *p= setlocale( LC_ALL, "greek" );

    wchar_t input[50];

    if (!p)
    printf("NULL returned!\n");

    fgetws(input, 50, stdin);

    wprintf(L"%ls", input);

    return 0;
    }


    [john@localhost src]$ ./foobar-cpp
    Δοκιμαστικό
    Δοκιμαστικό
    [john@localhost src]$


    When I place el-GR.UTF-8 or el-GR.ISO8859-7 I get:


    [john@localhost src]$ ./foobar-cpp
    NULL returned!

    [john@localhost src]$
    Ioannis Vranos, Feb 24, 2008
    #8
  9. Ioannis Vranos <> writes:

    > Ben Bacarisse wrote:
    >>> BTW, how can we define UTF-8 as the locale?

    >>
    >> I *think* this is now off-topic. I don't think C says anything about
    >> what the locale string means...
    >>
    >> The character encoding is usually specified after a '.'. I use, for
    >> example, "en-GB.UTF-8". I suspect that if you only specify a part of
    >> the locale (or one that does not make sense) your C library picks up
    >> what to do from the execution environment. To me "Greek" looks like
    >> an odd locale string. I would expect "el-GR.UTF-8" or
    >> "el-GR.ISO8859-7".

    >
    > I got the idea from:
    >
    > http://msdn2.microsoft.com/en-us/library/x99tb11d(VS.80).aspx


    Ah, OK. Anyway, we are off-topic now. I think you'd have to post in
    a Windows group to find out what locale strings mean there.

    --
    Ben.
    Ben Bacarisse, Feb 24, 2008
    #9
  10. Ben Bacarisse wrote:
    > Ioannis Vranos <> writes:
    >
    >> Ben Bacarisse wrote:
    >>>> BTW, how can we define UTF-8 as the locale?
    >>> I *think* this is now off-topic. I don't think C says anything about
    >>> what the locale string means...
    >>>
    >>> The character encoding is usually specified after a '.'. I use, for
    >>> example, "en-GB.UTF-8". I suspect that if you only specify a part of
    >>> the locale (or one that does not make sense) your C library picks up
    >>> what to do from the execution environment. To me "Greek" looks like
    >>> an odd locale string. I would expect "el-GR.UTF-8" or
    >>> "el-GR.ISO8859-7".

    >> I got the idea from:
    >>
    >> http://msdn2.microsoft.com/en-us/library/x99tb11d(VS.80).aspx

    >
    > Ah, OK. Anyway, we are off-topic now. I think you'd have to post in
    > a Windows group to find out what locale strings mean there.



    I am a Linux user. The "el-GR.UTF-8" and "el-GR.ISO8859-7" you suggested
    make setlocale() return NULL. The "greek" and "Greek" suggested by
    MSDN works. So I supposed there is a portable way for this. Aren't any
    portable locale encoding strings?
    Ioannis Vranos, Feb 24, 2008
    #10
  11. Clarified:


    > I am a Linux user. The "el-GR.UTF-8" and "el-GR.ISO8859-7" you suggested
    > make setlocale() return NULL


    ==> under Linux.

    > The "greek" and "Greek" suggested by MSDN
    > works


    ==> under Linux.

    > So I supposed there is a portable way for this. Aren't any
    > portable locale encoding strings?
    Ioannis Vranos, Feb 24, 2008
    #11
  12. Ioannis Vranos

    CBFalconer Guest

    Ioannis Vranos wrote:
    >
    > [The current message encoding is set to Unicode (UTF-8) because
    > it contains Greek]
    >
    > The following code does not work as expected:
    >
    > #include <wchar.h>
    > #include <locale.h>
    > #include <stdio.h>
    > #include <stddef.h>
    >
    > int main() {
    > char *p= setlocale( LC_ALL, "Greek" );
    > wchar_t input[50];
    >
    > if (!p)
    > printf("NULL returned!\n");
    > fgetws(input, 50, stdin);
    > wprintf(L"%s\n", input);
    > return 0;
    > }
    >

    .... snip ...
    >
    > Am I missing something?


    Yes. If setlocale fails, it returns NULL, which you detect, but do
    not immediately exit the program. You also forgot to check for
    errors in executing fgetws or wprintf.

    --
    [mail]: Chuck F (cbfalconer at maineline dot net)
    [page]: <http://cbfalconer.home.att.net>
    Try the download section.



    --
    Posted via a free Usenet account from http://www.teranews.com
    CBFalconer, Feb 24, 2008
    #12
  13. Ioannis Vranos

    CBFalconer Guest

    Ioannis Vranos wrote:
    >

    .... snip ...
    >
    > I have attached a screenshot.


    According to which, I believe, you are using a c++ compiler.

    --
    [mail]: Chuck F (cbfalconer at maineline dot net)
    [page]: <http://cbfalconer.home.att.net>
    Try the download section.



    --
    Posted via a free Usenet account from http://www.teranews.com
    CBFalconer, Feb 24, 2008
    #13
  14. Ioannis Vranos wrote:
    > Clarified:
    >
    >
    >> I am a Linux user. The "el-GR.UTF-8" and "el-GR.ISO8859-7" you
    >> suggested make setlocale() return NULL

    >
    > ==> under Linux.
    >
    >> The "greek" and "Greek" suggested by MSDN works

    >
    > ==> under Linux.
    >
    >> So I supposed there is a portable way for this. Aren't any portable
    >> locale encoding strings?



    Also based on
    http://gcc.gnu.org/onlinedocs/libstdc /22_locale/locale.html where it
    mentions "locale -a" and provides a list of locales, in my system it
    outputs among other things:


    galego
    galician
    gd_GB
    gd_GB.iso885915
    gd_GB.utf8
    german
    gez_ER
    gez_ER@abegede
    gez_ER.utf8
    gez_ER.utf8@abegede
    gez_ET
    gez_ET@abegede
    gez_ET.utf8
    gez_ET.utf8@abegede
    gl_ES
    gl_ES@euro
    gl_ES.iso88591
    gl_ES.iso885915@euro
    gl_ES.utf8
    ==> greek
    gu_IN
    gu_IN.utf8
    gv_GB
    gv_GB.iso88591
    gv_GB.utf8
    hebrew
    he_IL
    he_IL.iso88598
    he_IL.utf8
    hi_IN
    hi_IN.utf8
    hr_HR
    hr_HR.iso88592
    hr_HR.utf8
    hrvatski
    hsb_DE
    hsb_DE.iso88592
    hsb_DE.utf8
    hu_HU
    hu_HU.iso88592
    hu_HU.utf8
    hungarian


    So "greek" is a valid locale for linux too.
    Ioannis Vranos, Feb 24, 2008
    #14
  15. Ioannis Vranos <> writes:

    > Ioannis Vranos wrote:
    >> Clarified:
    >>
    >>
    >>> I am a Linux user. The "el-GR.UTF-8" and "el-GR.ISO8859-7" you
    >>> suggested make setlocale() return NULL

    >>
    >> ==> under Linux.
    >>
    >>> The "greek" and "Greek" suggested by MSDN works

    >>
    >> ==> under Linux.
    >>
    >>> So I supposed there is a portable way for this. Aren't any portable
    >>> locale encoding strings?

    >
    > Also based on
    > http://gcc.gnu.org/onlinedocs/libstdc /22_locale/locale.html where it
    > mentions "locale -a" and provides a list of locales, in my system it
    > outputs among other things:
    >
    > galego
    > galician
    > gd_GB

    ....
    > gl_ES.iso885915@euro
    > gl_ES.utf8
    > ==> greek


    Post in comp.unix.programmer. I think you can define anything you
    like under Linux, but what is and is not valid is not specified by C.
    Other standards (like POSIX) probably specify much more.

    > So "greek" is a valid locale for linux too.


    --
    Ben.
    Ben Bacarisse, Feb 24, 2008
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Web Developer

    char 8bit wide or 7bit wide in c++?

    Web Developer, Jul 31, 2003, in forum: C++
    Replies:
    2
    Views:
    574
    John Harrison
    Jul 31, 2003
  2. George2
    Replies:
    2
    Views:
    367
    James Kanze
    Jan 25, 2008
  3. Disc Magnet
    Replies:
    2
    Views:
    698
    Jukka K. Korpela
    May 15, 2010
  4. Disc Magnet
    Replies:
    2
    Views:
    781
    Neredbojias
    May 14, 2010
  5. Martin Rinehart

    80 columns wide? 132 columns wide?

    Martin Rinehart, Oct 31, 2008, in forum: Javascript
    Replies:
    16
    Views:
    162
    John W Kennedy
    Nov 13, 2008
Loading...

Share This Page