As long as the compiler (or, almost certainly, the preprocessor in this
case) supports the basic source character set, it remains within its rights
to reject any other characters it encounters within the source code.
That is arguably incorrect, Richard.
C89 2.2.1 Character Sets
[...]
In a character constant or string literal, members of the execution
character set shall be represented by corresponding members of
the source character set or by escape sequences consisting of
the backslash \ followed by one or more characters. A byte with all
bits set to 0, called the null character, shall exist in the basic
execution set; it is used to terminate a character string literal.
[...]
In the execution character set, there shall be control characters
representing alert, backspace, carriage return, and new line. If any
other characters are encountered in a source file (except in
a character constant, a string literal, a header name, a comment,
or a preprocessing token that is never converted to a token), the
behaviour is undefined.
3.1.3.4 Character Constants
[...]
An integer character constant is a sequence of one or more multibyte
characters enclosed in single-quotes, as in 'x' or 'ab'. A wide
character constant is the same, except prefixed by the letter L.
With a few exceptions detailed later, the elements of the sequence
are any members of the source character set; they are mapped in an
implementation-defined manner to members of the execution character set.
Thus, string constants (and string literals) are allowed to contain
multi-byte characters; the value of those is implementation-defined,
and it is true that the implementation might choose to define the
values as being illegal. You are technically correct about that aspect,
though -in a way- misleading, in that the standard explicitly allows
for multi-byte character support, so it is, at least psychologically,
not the same kind of "within its rights" as would be, say, whether
dollar-sign is permitted in identifier names (which would clearly
be extension.)
I would, though, argue that your statement is not exactly correct, in that
the C89 standard defines the source character set, and defines the
execution character set, and defines the allowed characters in
literals to include representations of the execution character set,
*and the basic execution character set is defined to include some characters
that do not appear in the basic source character set*. It is thus not
permitted for the compiler to define the representation of those
additional characters (null, alert, backspace, carriage return, and
new line) as being illegal.
There is the semantic question of whether (e.g.) \a appearing in
a literal is a single character or a pair of characters for the purpose
of "If any other characters are encountered in the source file", but
notice that 3.1.3.4 specifically notes that there are exceptions to
"the elements of the sequence are any members of the source character set".
I'm not entirely clear, reading the whole of 3.1.3.4, as to which
portions are considered by the standard to be the "exceptions" and
which not, but for the purposes of this present nit, is is enough to
point out that the standard -says- there are exceptions, and
thus that within literals, there are permited values defined as valid
and yet which are not members of the source character set.