E
Edward Rutherford
There are 2 points in Sec. 6.4.4.4, describing character constants
that are not entirely clear to me. It may be that I have not understood
correctly the issues of character encondings.
In p10 there is the sentence "The value of an integer character
constant containing a single character that maps to a single-byte
execution character is the numerical value of the representation of the
mapped character interpreted as an integer".
This confirms that it may be that a single character of the source set
may be mapped to multiple bytes in the execution character set (and this
consistent with other parts of the standard). But still on p10 there is
the sentence "If an integer character constant contains a single
character or escape sequence, its value is the one that results when an
object with type char whose value is that of the single character or
escape sequence is converted to type int". This sentence seems to imply
that the value corresponding to a single character (or escape sequence)
can be fit into a single object of type char, i.e., into a single byte.
Isn't the latter sentence a contradiction with the former (and other
parts of the standard)?
On p11 there is the sentence "The value of a wide character constant
containing a single multibyte character that maps to a member of the
extended execution character set is the wide character corresponding to
that multibyte character, as defined by the mbtowc function, with an
implementation-defined current locale."
This sentence suggests to me that the function mbtowc maps the
multibyte encoding of a character of the *source* character set to a wide
character.
I find this surprising for the following reasons:
1) the second parameter of mbtowc is a char *, so a pointer to bytes
in the execution environment
2) wctomb operates at runtime so I think it converts a wide character
to a multibyte encoding in the execution environment; I would expect that
wctomb and mbtowc were inverse of each other
One more question: a byte is (sec. 3.3.6) a unit of data storage of
the execution environment. Isn't it possible that the host environment
has units of data storage with a different number of bits?
that are not entirely clear to me. It may be that I have not understood
correctly the issues of character encondings.
In p10 there is the sentence "The value of an integer character
constant containing a single character that maps to a single-byte
execution character is the numerical value of the representation of the
mapped character interpreted as an integer".
This confirms that it may be that a single character of the source set
may be mapped to multiple bytes in the execution character set (and this
consistent with other parts of the standard). But still on p10 there is
the sentence "If an integer character constant contains a single
character or escape sequence, its value is the one that results when an
object with type char whose value is that of the single character or
escape sequence is converted to type int". This sentence seems to imply
that the value corresponding to a single character (or escape sequence)
can be fit into a single object of type char, i.e., into a single byte.
Isn't the latter sentence a contradiction with the former (and other
parts of the standard)?
On p11 there is the sentence "The value of a wide character constant
containing a single multibyte character that maps to a member of the
extended execution character set is the wide character corresponding to
that multibyte character, as defined by the mbtowc function, with an
implementation-defined current locale."
This sentence suggests to me that the function mbtowc maps the
multibyte encoding of a character of the *source* character set to a wide
character.
I find this surprising for the following reasons:
1) the second parameter of mbtowc is a char *, so a pointer to bytes
in the execution environment
2) wctomb operates at runtime so I think it converts a wide character
to a multibyte encoding in the execution environment; I would expect that
wctomb and mbtowc were inverse of each other
One more question: a byte is (sec. 3.3.6) a unit of data storage of
the execution environment. Isn't it possible that the host environment
has units of data storage with a different number of bits?