L
Luca Forlizzi
There are 2 points in Sec. 6.4.4.4, describing character constants
that are
not entirely clear to me. It may be that I don't read well the text or
that I
have not understood correcly the issues of character encondings.
In p10 there is the sentence "The value of an integer character
constant
containing a single character that maps to a single-byte execution
character is the
numerical value of the representation of the mapped character
interpreted as an integer".
This confirms that it may be that a single character of the source set
may be
mapped to multiple bytes in the execution character set (and this
consistent with
other parts of the standard). But still in p10 there is the sentence
"If an integer
character constant contains a single character or escape sequence, its
value
is the one that results when an object with type char whose value is
that of the
single character or escape sequence is converted to
type int". This sentence seems to imply that the value corresponding
to a single
character (or escape sequence) can be fit into a single object of
thype char,
i.e., into a single byte. Isn't the latter sentence a contradiction
with the
former (and other parts of the standard)?
In p11 there is the sentence "The value of a wide character constant
containing a single
multibyte character that maps to a member of the extended execution
character set is the
wide character corresponding to that multibyte character, as defined
by the mbtowc
function, with an implementation-defined current locale."
This sentence suggests to me that the function mbtowc maps the
multibyte encoding
of a character of the *source* character set to a wide character.
I find this surprising because of the following reasons:
1) the second parameter of mbtowc is a char *, so a pointer to bytes
in the
execution environment
2) wctomb operates at runtime so I think it converts a wide character
to a multibyte
encoding in the execution environment; I would expect that wctomb and
mbtowc were
inverse of each other
One more question: a byte is (sec. 3.3.6) a unit of data storage of
the execution environment.
Isn't it possible that the host environment has units of data storage
with a different
number of bits?
that are
not entirely clear to me. It may be that I don't read well the text or
that I
have not understood correcly the issues of character encondings.
In p10 there is the sentence "The value of an integer character
constant
containing a single character that maps to a single-byte execution
character is the
numerical value of the representation of the mapped character
interpreted as an integer".
This confirms that it may be that a single character of the source set
may be
mapped to multiple bytes in the execution character set (and this
consistent with
other parts of the standard). But still in p10 there is the sentence
"If an integer
character constant contains a single character or escape sequence, its
value
is the one that results when an object with type char whose value is
that of the
single character or escape sequence is converted to
type int". This sentence seems to imply that the value corresponding
to a single
character (or escape sequence) can be fit into a single object of
thype char,
i.e., into a single byte. Isn't the latter sentence a contradiction
with the
former (and other parts of the standard)?
In p11 there is the sentence "The value of a wide character constant
containing a single
multibyte character that maps to a member of the extended execution
character set is the
wide character corresponding to that multibyte character, as defined
by the mbtowc
function, with an implementation-defined current locale."
This sentence suggests to me that the function mbtowc maps the
multibyte encoding
of a character of the *source* character set to a wide character.
I find this surprising because of the following reasons:
1) the second parameter of mbtowc is a char *, so a pointer to bytes
in the
execution environment
2) wctomb operates at runtime so I think it converts a wide character
to a multibyte
encoding in the execution environment; I would expect that wctomb and
mbtowc were
inverse of each other
One more question: a byte is (sec. 3.3.6) a unit of data storage of
the execution environment.
Isn't it possible that the host environment has units of data storage
with a different
number of bits?