Setting locale for java.util.regex at runtime

A

Alex Polite

"\w" doesn't match wordchars outside of [A-Za-z].

I suppose that this is in some way controlled by locale.

Is there any way to make this localesetting at runtime?

alex
 
A

Alan Moore

"\w" doesn't match wordchars outside of [A-Za-z].

I suppose that this is in some way controlled by locale.

Is there any way to make this localesetting at runtime?

alex

The java.util.regex package is not locale-senistive at all. The
character-class shorthands (\w, \d, \s) and POSIX character classes
(\p{Alpha}, \p{Digit}, etc.) only ever match ASCII characters. If you
want to match non-ASCII characters, you have to use Unicode blocks
like \p{InGreek}, or categories like \p{IsLetter} (which can be
shortened to \pL).

Oddly enough, the word-boundary construct, \b, works with *all*
Unicode letters and digits, not just the ASCII ones. That makes sense
when I think about how frustrating it would be if it didn't, but it
makes it seem that much stranger that \w, \d and \s are limited to the
ASCII range.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top