converting from windows wchar_t to linux wchar_t

Discussion in 'C++' started by yakir22@gmail.com, Aug 14, 2008.

  1. Guest

    Hello experts,
    I am dealing now in porting our server from windows to linux. our
    client is running only on windows machine.
    to avoid the wchar_t size problem ( in windows its 2 bytes and linux
    is 4 bytes ) we defined

    #ifdef WIN32
    #define t_wchar_t wchar_t
    #else // LINUX
    #define t_wchar_t short
    #endif

    on the server I get a buffer that contains windows t_wchar_t string.
    something like

    struct user_data
    {
    t_wchar_t name[32];
    .....
    .....
    };

    all the data transfer is working great as long as the server don't
    care what's in the string
    my problem start when I want to print out some logs on the server
    using the content of the buffer.

    my Q is : is there a simple way to convert a 2 bytes wchar_t (windows
    version ) to 4 bytes wchar_t ( linux version ).

    Thanks
    , Aug 14, 2008
    #1
    1. Advertising

  2. Rolf Magnus Guest

    wrote:

    > my Q is : is there a simple way to convert a 2 bytes wchar_t (windows
    > version ) to 4 bytes wchar_t ( linux version ).


    I suggest to use something like libiconv(http://en.wikipedia.org/wiki/Iconv)
    to convert to a common character set on both sides.
    Rolf Magnus, Aug 14, 2008
    #2
    1. Advertising

  3. James Kanze Guest

    On Aug 14, 5:30 pm, Victor Bazarov <> wrote:
    > wrote:
    > > Hello experts,
    > > I am dealing now in porting our server from windows to linux. our
    > > client is running only on windows machine.
    > > to avoid the wchar_t size problem ( in windows its 2 bytes and linux
    > > is 4 bytes ) we defined


    > > #ifdef WIN32
    > > #define t_wchar_t wchar_t
    > > #else // LINUX
    > > #define t_wchar_t short
    > > #endif


    > You might be better off with a typedef, although it's not a
    > very significant difference.


    I would be if the second were unsigned short. Something like
    "t_wchar_t( something )" would be legal if it were a typedef,
    not if it were a #define.

    > Also, for some reason I seem to remember that wchar_t is an
    > unsigned type. Since 'char' is often signed (though different
    > from 'singed char', of course), perhaps I remember
    > incorrectly...


    Both are very implementation defined. In practice, you
    generally shouldn't be using wchar_t in portable code:-(.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Aug 14, 2008
    #3
  4. James Kanze Guest

    On Aug 15, 9:23 am, "Chris Becke" <> wrote:
    > > my Q is : is there a simple way to convert a 2 bytes wchar_t
    > > (windows version ) to 4 bytes wchar_t ( linux version ).


    > wchar_t is a particularly useless type : Because its
    > implementation defined it doesn't have (in protable code) any
    > kind of assurance of what type of character encoding it may be
    > using or capable of using.


    That's partially true of char as well; in addition, the
    character encoding can depend on the source of the data. But at
    least, char is guaranteed to be at least 8 bits, so you know
    that it can hold all useful external encodings. (For better or
    for worse, the external world is 8 bits, and any attempt to do
    otherwise is bound to fail in the long run.)

    > The next point is that *unicode* characters are unsigned.


    I'm not sure what that's supposed to mean. ALL character
    encodings I've ever seen use only non-negative values: ASCII
    doesn't define any negative encodings, nor do any of the ISO
    8859 encodings. The fact that char can be (and often is) a
    signed 8 bit value causes no end of problems because of this.
    The character value isn't really signed or unsigned: it's just a
    value (that happens never to be negative).

    What is true is that the Unicode encoding formats UTF-16 and
    UTF-8 require values in the range of 0-0xFFFF and 0-0xFF,
    respectively, and that if you're short is 16 bits or your char 8
    (both relatively frequent cases), those values won't fit in the
    corresponding signed types. (For historical reasons, we still
    manage to make do putting UTF-8, and other 8 bit encodings, in
    an 8 bit signed char. It's a hack, and it's not, at least in
    theory, guaranteed to work, but in practice, it's often the
    least bad choice available.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Aug 15, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. AIM
    Replies:
    0
    Views:
    325
  2. Replies:
    4
    Views:
    3,266
  3. Replies:
    4
    Views:
    2,075
  4. responsible
    Replies:
    2
    Views:
    397
    llothar
    Apr 17, 2007
  5. Heinrich Wolf

    wchar_t in Linux

    Heinrich Wolf, May 6, 2011, in forum: C Programming
    Replies:
    41
    Views:
    1,657
    Philipp Thomas
    May 20, 2011
Loading...

Share This Page