Encoding of character literals

L

Lauri Alanko

Hello.

I find C99's language on internationalization features particularly
hard to decipher, so I'd appreciate some clarifications.

I'm particularly interested in the relationship of the execution
character set encoding of character literals and character string
literals, and its relationship with locale encodings and the wchar_t
encoding.

Firstly, is it possible for a locale to use a different encoding for
the basic execution character set than the compiler uses for the
literals? If not, doesn't this mean that the locale system (and the C
standard) are insufficient in an environment where both ASCII- and
EBCDIC-based encodings can be used?

If it is possible for a locale to use a completely different encoding,
then how can ordinary character literals and character string literals
be converted to the locale's encoding? It is of course possible to
convert between wide characters and the locale, but how do I convert
from the literal encoding into the wchar_t encoding?

Is it perhaps guaranteed that ((wchar_t) 'a' == L'a')? I haven't seen
any text to suggest this, and this would mean that an implementation
couldn't use EBCDIC for character literals and UCS-4 for wchar_t. But
maybe someone can give a definitive answer?

It is of course possible to define the mapping manually:

wchar_t char_to_wchar[] = {
['a'] = L'a',
['b'] = L'b',
// ... etc for all of the portable basic character set
};

But this seems like horrible redundant hack that I wouldn't like to
use except as a last resort. Is something like this really necessary
in order to print out character string literals correctly in all
locales?


Lauri
 
J

James Kuyper

On 11/03/2011 04:41 PM, Lauri Alanko wrote:
....
Is it perhaps guaranteed that ((wchar_t) 'a' == L'a')?

I'm no expert on internationalization - as a US programmer I've never
had any need to worry about it. However, that question at least I can
answer:

C99 says, in effect, that the above expression is guaranteed to be true
if the implementation does not pre-define __STDC_MB_NEQ_WC__ (7.17p2).
6.10.8p1 seems to indicate that definition of the macro with a value of
1 is mandatory - but that might be an example of poor wording or a
misinterpretation on my part. It seems inconsistent with the "if" in 7.17p2.
 
H

Harald van Dijk

C99 says, in effect, that the above expression is guaranteed to be true
if the implementation does not pre-define __STDC_MB_NEQ_WC__ (7.17p2).
6.10.8p1 seems to indicate that definition of the macro with a value of
1 is mandatory - but that might be an example of poor wording or a
misinterpretation on my part. It seems inconsistent with the "if" in 7.17p2.

It's supposed to be in 6.10.8p2, see DR #333, or a draft of C1x in
which this has been corrected.
 
L

Lauri Alanko

It's supposed to be in 6.10.8p2, see DR #333, or a draft of C1x in
which this has been corrected.

Thanks, that is useful. So C99 mandates that for the basic character
set, chars and the corresponding wchar_t's have the same integer
value, and C1x makes this guarantee conditional on the presence of the
macro.

But is btowc guaranteed to honor this equality in all locales? And, if
__STDC_MB_NEQ_WC__ is defined, and btowc is the only way to convert a
char to wchar_t, is it guaranteed to work correctly on integer
character constants (from the basic character set) in all locales?
That is, is (btowc('a') == L'a') going to be true in all
implementations in all legit locales? And if not, how

The corner case I'm thinking of is of course the situation where the
native encoding used by integer character literals is EBCDIC, but
wchar_t uses UCS-4, and the current locale is ASCII-based. So one
cannot cast from integer character literals to wchar_t, but one also
cannot use locale-dependent conversion functions. Is this a situation
that standard C is even able to support?


Lauri
 
L

lawrence.jones

Lauri Alanko said:
Thanks, that is useful. So C99 mandates that for the basic character
set, chars and the corresponding wchar_t's have the same integer
value, and C1x makes this guarantee conditional on the presence of the
macro.

No, there was a production error in N1256 which put the macro in the
wrong paragraph; it was always supposed to have been conditional.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top