[*] the phrase is in quotes because comparing ASCII and Unicode doesn't
make
sense, as ASCII both an encoding system and a character set, while
Unicode
is a character set without an encoding system.
After spending some time reviewing defitiontion of unicode I see that
many, including one from unicode.org
(
http://www.unicode.org/faq/basic_q.html#a) talk about unicode being an
encoding as well as character set.
Hmm, maybe my terminology is a bit loose. I meant that ASCII encodes
characters directly to byte sequences, whereas Unicode is a mapping from
characters to natural numbers (of arbitrary size; e.g. not merely 0 to
2^32), and then you need a seperate encoding, like UTF-8, to map from those
natural numbers to byte-sequences of finite size.
Unicode.org technical introductions tarts with:
"The Unicode Standard is the universal character encoding standard used
for representation of text for computer processing."
Perhaps I am completely misunderstanding your point. There is plenty
of content on the internet where ASCII and Unicode are compared. Even
if that comparison is not between two things of the same type, isn't it
fairly clear that when comparing a filename's ASCII bytes equence to a
classname's Unicode byte sequence one cannot perform a byte by byte
match?
A lot of people (subconciously?) think Unicode maps directly from
characters to byte sequences; it's a common misconception, so it wouldn't
surprise me that there would be a large amount of content on the Internet
which makes this mistake, or gloss over it. AFAIK, there's no such thing as
a "Unicode byte sequence". You could talk about comparing ASCII byte
sequences to UTF-8 byte sequences, but not to "Unicode byte sequences".
- Oliver