"\w" doesn't match wordchars outside of [A-Za-z].
I suppose that this is in some way controlled by locale.
Is there any way to make this localesetting at runtime?
alex
The java.util.regex package is not locale-senistive at all. The
character-class shorthands (\w, \d, \s) and POSIX character classes
(\p{Alpha}, \p{Digit}, etc.) only ever match ASCII characters. If you
want to match non-ASCII characters, you have to use Unicode blocks
like \p{InGreek}, or categories like \p{IsLetter} (which can be
shortened to \pL).
Oddly enough, the word-boundary construct, \b, works with *all*
Unicode letters and digits, not just the ASCII ones. That makes sense
when I think about how frustrating it would be if it didn't, but it
makes it seem that much stranger that \w, \d and \s are limited to the
ASCII range.