converting from windows wchar_t to linux wchar_t

Y

yakir22

Hello experts,
I am dealing now in porting our server from windows to linux. our
client is running only on windows machine.
to avoid the wchar_t size problem ( in windows its 2 bytes and linux
is 4 bytes ) we defined

#ifdef WIN32
#define t_wchar_t wchar_t
#else // LINUX
#define t_wchar_t short
#endif

on the server I get a buffer that contains windows t_wchar_t string.
something like

struct user_data
{
t_wchar_t name[32];
.....
.....
};

all the data transfer is working great as long as the server don't
care what's in the string
my problem start when I want to print out some logs on the server
using the content of the buffer.

my Q is : is there a simple way to convert a 2 bytes wchar_t (windows
version ) to 4 bytes wchar_t ( linux version ).

Thanks
 
J

James Kanze

You might be better off with a typedef, although it's not a
very significant difference.

I would be if the second were unsigned short. Something like
"t_wchar_t( something )" would be legal if it were a typedef,
not if it were a #define.
Also, for some reason I seem to remember that wchar_t is an
unsigned type. Since 'char' is often signed (though different
from 'singed char', of course), perhaps I remember
incorrectly...

Both are very implementation defined. In practice, you
generally shouldn't be using wchar_t in portable code:-(.
 
J

James Kanze

wchar_t is a particularly useless type : Because its
implementation defined it doesn't have (in protable code) any
kind of assurance of what type of character encoding it may be
using or capable of using.

That's partially true of char as well; in addition, the
character encoding can depend on the source of the data. But at
least, char is guaranteed to be at least 8 bits, so you know
that it can hold all useful external encodings. (For better or
for worse, the external world is 8 bits, and any attempt to do
otherwise is bound to fail in the long run.)
The next point is that *unicode* characters are unsigned.

I'm not sure what that's supposed to mean. ALL character
encodings I've ever seen use only non-negative values: ASCII
doesn't define any negative encodings, nor do any of the ISO
8859 encodings. The fact that char can be (and often is) a
signed 8 bit value causes no end of problems because of this.
The character value isn't really signed or unsigned: it's just a
value (that happens never to be negative).

What is true is that the Unicode encoding formats UTF-16 and
UTF-8 require values in the range of 0-0xFFFF and 0-0xFF,
respectively, and that if you're short is 16 bits or your char 8
(both relatively frequent cases), those values won't fit in the
corresponding signed types. (For historical reasons, we still
manage to make do putting UTF-8, and other 8 bit encodings, in
an 8 bit signed char. It's a hack, and it's not, at least in
theory, guaranteed to work, but in practice, it's often the
least bad choice available.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top