I'll mark it OT, since we've left C behind quite a bit by now.
P.J. Plauger said:
Here's a coarse scale or two, just from personal experience.
-- Number of address bits required to address a "large" memory:
1960 15 IBM 7090
1970 20 IBM 360
1980 25 VAX 11/780
1990 30 various
2000 35 various
Nice, but this misses a point: there is an upper limit. Address bits
will not continue to grow indefinitely, because there is an upper limit
to the amount of information that will fit in the universe. Or maybe
there isn't, but then we're talking a radical shift in physics, which
may happen but doesn't allow for fair comparison anymore.
-- Number of bits required to represent a (commonly used)
character set:
1960 6 numerous vendor-specific codes
1970 7 7-bit ASCII
1980 8 extended ASCII
1990 16 DBCS and others
2000 21 Unicode
I could make a similar table of "barely adequate" communication
speeds, which also continue to expand exponentially.
But again: it can't go on forever. The question here, therefore, is
whether we've reached the end of the line, not whether exponential
expansion is happening.
So long as you think in terms of linear increases in demand
for bytes or characters, it's easy to believe at each stage
that you're through expanding. After all, you currently have
a bit of headroom, and what possible need can there be for
much larger programs/character sets?
Don't think this question hasn't been asked, unlike those people who
asserted that "640K ought to be enough for anybody" (which Bill Gates
famously never said) or "16 bits ought to be enough, since it's better
than wasting 32 bits". Unicode doesn't say "21 bits ought to be enough
for anybody". It can say "21 bits is enough for every character known to
man", because it is. Unlike memory, communication speed and a host of
other things that keep growing, there is a conceivable upper limit, and
it is not that unreasonable to state we're close to it.
I personally can't imagine that people will ever want to
define common attribute bits for, say:
-- roman, italic, bold, underscore
-- red, green, blue
-- point size
-- font
But if we did, each attribute bit would double the number
of effective character codes, wouldn't it?
That's why Unicode doesn't work that way, and no character set ever has.
They encode *characters*, not *glyphs*. A glyph is what you see on your
screen, and it may have many nice properties by which it is affected,
including the formatting characteristics you describe. But a Roman
capital letter A is a Roman capital letter A, no matter what style,
color, size or font it happens to be displayed in. Being able to leave
these things unstated will always remain useful.
Actually, "glyph sets" were (and probably still are) in common use for
display on dumb terminals with hardwired character sets (and probably
some applications for not so dumb terminals, too). Remember when the
character set was 7-bit ASCII and the terminals extended this to an
8-bit glyph set with the upper bit meaning "reverse video"? That's this.
The point is, effective comparison stops being useful at this point,
because you've shifted the way you look at what a code point represents.
As the Unicode FAQ itself states:
"Both Unicode and ISO 10646 have policies in place that formally limit
future code assignment to the integer range that can be expressed with
current UTF-16 (0 to 1,114,111). Even if other encoding forms (i.e.
other UTFs) can represent larger intergers, these policies mean that all
encoding forms will always represent the same set of characters. Over a
million possible codes is far more than enough for the goal of Unicode
of encoding characters, not glyphs. Unicode is not designed to encode
arbitrary data. If you wanted, for example, to give each 'instance of a
character on paper throughout history' its own code, you might need
trillions or quadrillions of such codes; noble as this effort might be,
you would not use Unicode for such an encoding."
Here's a more interesting thing to think about than adding "blink" bits:
suppose we encounter extraterrestrial cultures one day, and we want to
synch character sets eventually... *Then* Unicode may become
insufficient. But I don't think it would be fair to blame the current
standard for that.
Nor can I imagine that a large government like China might
thumb its nose at an international standard and, say,
require a parallel set of many ISO 10646 codes.
It already thumbs its nose to some extent. Unicode is still viewed with
great suspicion in some parts of the Eastern world, and alternate
character sets continue to be in use. But the Chinese government can
require of ISO 10646 what it wants; it's not likely to get it if it
can't be supported by technical requirements, as opposed to politics.
Maybe you can slip in one character that's spurious that way, but not a
few thousand. Maybe when the Chinese achieve global domination and
abolish our preposterous 21-bit standards, but not before.
For over 40 years I've been reading regular articles by
pundits who explain why larger/faster hardware is a waste
of time and will never sell. They've all been wrong. And
the further back in time you look, the greater the redshift
in the predictions.
These arguments do not cleanly translate to character sets, your little
tables notwithstanding. The upper limit may not be 21 bits, but if
that's not the upper limit, it's pretty close to it in orders of
magnitude. If people one day decide to abandon the concept of "character
set" and go crazy stuffing all sorts of attributes in it (adopting
"glyph sets"), that's a clear change in application, unlike increased
hardware capacity. It will be fueled by the *ability* to use such sets
efficiently, not the *need* to do this.
So, you may well be right that the need for larger
character sets has finally come to an end. I'll wait
and see. Meanwhile, I make sure that the code I write
will work with 32- (not 31-) bit character sets. With
any luck, the code will have adequate capacity until
I retire...
Fortunately for you, writing code that can handle both 21-bit and 32-bit
character sets is hardly a challenge, given the current state of
computer hardware. Even if Unicode had to grow someday (which would have
to mean a new standard, of course), it wouldn't exactly be hard to
implement, at least not as far as code point size is concerned.
S.