Hello, Roedy Green !
You said:
the name nativeToAscii fails in two respects.
1. The utility does not do a standard "conversion to ASCII". It
converts to an Sun-invented encoding scheme described with ASCII
characters. It is not ASCII any more than Base64 is.
No, that is not a valid comparison. In base64 the value the value
0x41 is merely encoding some combination of bits and does not
signify the letter A as it does in ASCII. The same is not true of
the output of native2ascii. Every numeric value actually means
the same thing as it does in ASCII.
Ascii only
represents 128 chars. Sun's encoding represents 64K.
Not a valid criticism. Each of its 128 values maps to one
abstract symbol, but people have been finding ways to use
combinations of those symbols to represent other things for years
now. For example they might use e^ to signify the letter ê.
Saying that is not ASCII is like saying it isn't ASCII because
you can combine the letters to form words. ASCII only specifies
the meaning of individual symbols and does not limit what meaning
you apply to the combinations of those symbols.
2. nativeToAscii sometimes converts "ascii" to native, the reverse of
what its name implies.
In which case the command would be native2ascii -reverse which
seems fairly self explanatory to me. They could have split that
into 2 programs, but that would be inferior to a single program
in my mind. And remember that it is usually much rarer that the
reverse direction is used. I'm not even sure that the reverse
option was in the original version.
ASCII files are human readable things with chars 0..128.
So is the output of native2ascii.
I don't
count mime, base64 or sun's Unicode encoding as ASCII even though it
use the ASCII set.
You have to seperate the numeric values from the meaning assigned
to specific values. I can't speak for MIME but you are correct
that base64 is not ASCII. Even though it uses the same range of
values it does not assign the same meaning as does ASCII. The
output of native2ascii uses the same numeric range and assigns
the same meaning to those values and is therefore ASCII.
It is no more Ascii than Indonsian is English
because they use the same alphabet.
I don't believe Indonesian actually uses the same alphabet as
English, but that is beside the point.
Once again the issue is not the bit patterns, but the meanings
assigned to those bit patterns. Indonesian is not English because
they do not ascribe the same meanings to the letters. But that
is not the case with native2ascii since it assigns the same
meaning as does ASCII.
A better analogy would be an English text that contained a
foreign word or phrase. Does the text cease to be English because
of this?
This
is NOT the way you encode those characters in ASCII.
No you don't encode them at all in ASCII. You have to encode them
in some other convention on top of ASCII, by using one or more
ASCII characters. However that is still ASCII because the meaning
of each code unit is the same.
All the weird
ones ones should be SUB or ?
I don't see why you think that. Clearly they have to be dropped
or replaced by something else. I don't see what makes one
replacement more natural than another. If you replace them with ?
that is not truly correct since the original file did not have a
? there. Let's say instead that the replacement was instead the
string "{non-ASCII character}", would that make it non ASCII? The
fact that they choose a repleacement that differs by character
and is reversible seems to have no bearing on whether it is
ASCII.