On Mon, 25 Sep 2006 07:23:26 +0000, Chris Torek wrote:
In C99, there is a towupper() function that handles wide characters. If
the implementation happens to use Unicode for its wide characters (and
of course supports C99 well enough), this will do it; if not, it will
not.
There is nothing forcing any given implementation to *not* handle
Unicode with toupper(), but there is nothing forcing it to do so either.
How might towupper() handle changing the case of the German eszet (funky
looking B to mine eyes). In the lower case variant it can be represented
by a single integer (32-bit, 64-bit, 1024-bit, w'ever). However, the
uppercase must be "SS", necessitating two integers of some type
(regardless of the encoding format: UTF-8, UTF-16, UTF-32).
Point being, the standard C string manipulation interface CANNOT
fully support Unicode, and IMHO the above example is a rather trivial
proof of why ISO C cannot support Unicode at any level of sufficiency,
short of a scenario where "Unicode" is used as a fancy term for 7-bit
ASCII.