I'm not disagreeing with *most* of what you wrote; just two minor
nitpicks, and an open statement at the end.
There's no such thing about "the current definition of ASCII".
According to Wikipedia,
http://en.wikipedia.org/wiki/Ascii
<quote>
The American Standards Association (ASA, later to become ANSI) first
published ASCII as a standard in 1963. ASCII-1963 lacked the lowercase
letters, and had an up-arrow instead of the caret and a left-arrow instead
of the underscore. The 1967 version added the lowercase letters, changed the
names of a few control characters and moved the two controls ACK and ESC
from the lowercase letters area into the control codes area.
ASCII was subsequently updated and published as ANSI X3.4-1968, ANSI
X3.4-1977, and finally, ANSI X3.4-1986
</quote>
So while it may be pedantic, it would not be incorrect or meaningless to
ask, "Which version of ASCII do you mean?"
While ASCII
is very common, extended ASCII is not.
I believe MS-DOS (I forget which versions) uses extended ASCII, so it
couldn't have been that uncommon (the MS-QBasic program, for example, made
heavy use of characters 176 to 218).
Now, will you persist on insisting that, your words:
"ASCII is 8 bit" ?
The term "ASCII" in the sentence "ASCII is 8 bit" in this context might
refer to multiple things (even if we disregard all versions of ASCII prior
to the ANSI X3.4-1986 standard), one of which might be "The encoding Java
uses when we ask for the 'ASCII' encoding."
Conceptually, we have a string in memory, and we wish to store that
string to disk, using a specific encoding. In our case, the 'ASCII'
encoding. Now when we say "Encoding FOO is n bits", what we usually mean is
either "the encoding uses n bits per character to represent a given string"
or the less restrictive "*on average*, the encoding uses n bits per
character to represent a given string". In this sense, UTF-16 can be said to
be "16 bits" even though certain characters take 32 bits to encode. It's
imprecise (arguably flat out wrong), but you "know what they mean" when they
say it.
Now if we had an encoding which was said to be "7 bits", then the
encoding of a 16 character string should be 112 bits. An encoding which is
said to be "8 bits" would use 128 bits to encode that same 16 character
string.
So when you encoding a 16 character string in Java using the "ASCII"
encoding, does it result in a bitstream of length 112 or 128? I would guess
it 128.
I think one problem here is that ASCII conflates the concept of
numbering characters and encoding them. There's a clear dinstinction between
those concepts with Unicode and, say, UTF-8. Unicode merely assigns numbers
to each character, and UTF-8 assigns a mapping between numbers and
bitstreams.
When ASCII is used as a character-numbering scheme, there are 128
character-number mappings, and ASCII is a "closed" system, where no new
characters can be added to it, so it might make sense to actually say that
this character-number mapping is inherently 7 bits (contrast this with
Unicode, where more characters may be added in the future, and so the system
does not inherently have a bit size).
When ASCII is used as an encoding, to convert to bitstream, it seems
most implementations use 8 bits per character. So in that sense, it would
seem that "ASCII", the number-to-bitstream mapping system, is 8 bits.
- Oliver