L
Lauri Alanko
Hello.
I find C99's language on internationalization features particularly
hard to decipher, so I'd appreciate some clarifications.
I'm particularly interested in the relationship of the execution
character set encoding of character literals and character string
literals, and its relationship with locale encodings and the wchar_t
encoding.
Firstly, is it possible for a locale to use a different encoding for
the basic execution character set than the compiler uses for the
literals? If not, doesn't this mean that the locale system (and the C
standard) are insufficient in an environment where both ASCII- and
EBCDIC-based encodings can be used?
If it is possible for a locale to use a completely different encoding,
then how can ordinary character literals and character string literals
be converted to the locale's encoding? It is of course possible to
convert between wide characters and the locale, but how do I convert
from the literal encoding into the wchar_t encoding?
Is it perhaps guaranteed that ((wchar_t) 'a' == L'a')? I haven't seen
any text to suggest this, and this would mean that an implementation
couldn't use EBCDIC for character literals and UCS-4 for wchar_t. But
maybe someone can give a definitive answer?
It is of course possible to define the mapping manually:
wchar_t char_to_wchar[] = {
['a'] = L'a',
['b'] = L'b',
// ... etc for all of the portable basic character set
};
But this seems like horrible redundant hack that I wouldn't like to
use except as a last resort. Is something like this really necessary
in order to print out character string literals correctly in all
locales?
Lauri
I find C99's language on internationalization features particularly
hard to decipher, so I'd appreciate some clarifications.
I'm particularly interested in the relationship of the execution
character set encoding of character literals and character string
literals, and its relationship with locale encodings and the wchar_t
encoding.
Firstly, is it possible for a locale to use a different encoding for
the basic execution character set than the compiler uses for the
literals? If not, doesn't this mean that the locale system (and the C
standard) are insufficient in an environment where both ASCII- and
EBCDIC-based encodings can be used?
If it is possible for a locale to use a completely different encoding,
then how can ordinary character literals and character string literals
be converted to the locale's encoding? It is of course possible to
convert between wide characters and the locale, but how do I convert
from the literal encoding into the wchar_t encoding?
Is it perhaps guaranteed that ((wchar_t) 'a' == L'a')? I haven't seen
any text to suggest this, and this would mean that an implementation
couldn't use EBCDIC for character literals and UCS-4 for wchar_t. But
maybe someone can give a definitive answer?
It is of course possible to define the mapping manually:
wchar_t char_to_wchar[] = {
['a'] = L'a',
['b'] = L'b',
// ... etc for all of the portable basic character set
};
But this seems like horrible redundant hack that I wouldn't like to
use except as a last resort. Is something like this really necessary
in order to print out character string literals correctly in all
locales?
Lauri