R
Richard Herring
Phlip said:No, because some glyphs might be composite characters.
There are two more important questions:
A. can we do text-in-text-out with no glyph awareness?
B. where do we set the envelop for business goals?
If the answer to A. is Yes, then we can freely pass text through the wsc
functions, except when wcschr() and such functions become glyph-hostile.
As soon as you need something as mundane as a regular expression, you need
smart character awareness. (That's why boost's regex opts to bond with ICU,
a character encoding library.)
The answer to B. is you should set technical goals just a little wider than
your business goals. If the business only wants to target the Western
European languages, you should _not_ design for raw Unicode. You should
enable ISO Latin 1, and should write clean code. The cleanest code has its
string literals in resource files for easy replacement, and has only a few
modules that process text. That makes upgrades to more locales easier,
without writing speculative code.
(I once had major fun porting a GUI to Greek. A reputable vendor of
internationalization tools wrote the GUI for Western Europe, and filled it
up with lots of calls to translation functions that did nothing when the
program ran in only one code-page. Activating Greek triggered bugs in every
single one of these speculative calls, because they had been written but
never tested. So, naturally, I got blamed for each bug I encountered.)
If the business side wants to widen their target, say, all the code-page
oriented locales (Greek, Russian, Arabic, etc.) then you _still_ don't
enable for Unicode. You will use it, sometimes, as an intermediate point
between translating encodings between locales.
You'd also better ask them whether they really only want to use these
locales one at a time. The "code-page" model doesn't work too well when
you want to display Russian and Arabic simultaneously.
Or when they ask you why your $$$ application isn't as polyglot as theirWhen the business side wants Traditional Chinese, Inuit, Kannada, etc, only
_then_ do you party with your Unicode!
free web browser.