System functions + wchar_t

G

gamehack

Hi all,

I've been thinking about all the system functions which accept wchar_t.
The point is that they don't define what encoding the wchar_t has to
be. Let us assume that all the exernal input is UTF-8 and all the
output is also UTF-8 and your internal representation is using wchar_t
encoded using UTF-16. So when you call wcout or other system functions
which accept wide characters, what encoding do they assume?

Regards
 
V

Victor Bazarov

gamehack said:
I've been thinking about all the system functions which accept wchar_t.

What system functions are those? Do you mean platform-specific ones?
The point is that they don't define what encoding the wchar_t has to
be.

It's probably implementation-defined or platform-defined. Have you tried
reading the documentation?
> Let us assume that all the exernal input is UTF-8 and all the
output is also UTF-8 and your internal representation is using wchar_t
encoded using UTF-16. So when you call wcout or other system functions
which accept wide characters, what encoding do they assume?

I would venture a guess that _locales_ have something to do with it.

V
 
R

Ron Natalie

gamehack said:
Hi all,

I've been thinking about all the system functions which accept wchar_t.
The point is that they don't define what encoding the wchar_t has to
be. Let us assume that all the exernal input is UTF-8 and all the
output is also UTF-8 and your internal representation is using wchar_t
encoded using UTF-16. So when you call wcout or other system functions
which accept wide characters, what encoding do they assume?

Regards
Welcome to the piss poor implementation of internationalization in C++.
The implementation punts and assumes that you can always uniquely
convert from wide stream to multibyte unsing the woefully inadequate
C library function.
 
R

Ron Natalie

Victor said:
What system functions are those? Do you mean platform-specific ones?
Anything in the standard that takes an filename for one (fstreams,
etc..). The main args are another.
 
G

Guest

Ron said:
Anything in the standard that takes an filename for one (fstreams,
etc..). The main args are another.

I guess all those functions, that gamehack has in mind, interpret
strings of chars according to locales on particular operating system.

Cheers
 
R

Ron Natalie

Mateusz said:
I guess all those functions, that gamehack has in mind, interpret
strings of chars according to locales on particular operating system.
That is a nonsensical statement. There is no guarantee that there
exists a way to map wchar_t based strings into a string of chars
in any locale.
 
G

Guest

Ron said:
That is a nonsensical statement. There is no guarantee that there
exists a way to map wchar_t based strings into a string of chars
in any locale.

I said I guess. So, please explain me how function like fopen knows
what is the codepage of ASCII string passed to it?
I think there must be some trick or so because fopen is able find path
given in many charsets.

Cheers
 
R

Ron Natalie

Mateusz said:
I said I guess. So, please explain me how function like fopen knows
what is the codepage of ASCII string passed to it?
I think there must be some trick or so because fopen is able find path
given in many charsets.
It works on UNIX because you effectively have an 8 bit clean path.
Any character other than / and \0 is legitimate.
 
G

Guest

Ron said:
It works on UNIX because you effectively have an 8 bit clean path.
Any character other than / and \0 is legitimate.

I'm not sure. There is still a possibility that filesystem is
"incompatible", in term of charset, with given path and the file can not
be found.

Cheers
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top