D
David Mathog
In English words like "O'Clock" contain an embedded character
which the C function iswpunct() classifies as punctuation. So
in order to tokenize a string of text containing this type of
word properly one cannot simply use wcstok(), special
rules like "a quote immediately followed and preceded by an alphabet
character is not treated as punctuation" must be added.
What I'm wondering is if there is a standard function to do this
somewhere in the "w" set of functions which were added for multilingual
support? I mean, I know what the rules are for English, but the whole
point of the wide characters is to support other languages portable, and
it would seem the somewhere in the LC_CTYPE information set this
information should be present and accessible. That said, I have yet to
find anything in there which seems appropriate. Is there such a function?
Thanks,
David Mathog
which the C function iswpunct() classifies as punctuation. So
in order to tokenize a string of text containing this type of
word properly one cannot simply use wcstok(), special
rules like "a quote immediately followed and preceded by an alphabet
character is not treated as punctuation" must be added.
What I'm wondering is if there is a standard function to do this
somewhere in the "w" set of functions which were added for multilingual
support? I mean, I know what the rules are for English, but the whole
point of the wide characters is to support other languages portable, and
it would seem the somewhere in the LC_CTYPE information set this
information should be present and accessible. That said, I have yet to
find anything in there which seems appropriate. Is there such a function?
Thanks,
David Mathog