problem matching accented chars on OS X

A

Alex Fenton

Hi

I'm finding words within strings in Western European languages, so I
need to account
for accented characters, such as ê (e circumflex) and à (a grave). On
ruby 1.8.2
MSW the following works for me (simplified):

WORD_PATTERN = /^[\w\xC0-\xD6\xD8-\xF6\xF8-\xFF]+$/s

\w gets me a-z + A-Z , the hex characters are the positions of the
accented characters in
iso-8859-1 encoding. This seems to work, but when I run the same code on
OS X, I get

.../lib/weft/backend/sqlite.rb:533: mismatch multibyte code length in
char-class range: /^[\w\xC0-\xD6\xD8-\xF6\xF8-\xFF]+$/ (SyntaxError)

Any pointers? I'm not sure what is going wrong.

Is there a library written that can help me matching letter characters
(ideally in a
variety of codesets)? [:alpha:] regex class seeemed to be synonymous
with \w, which
doesn't match enough.

cheers
alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top