utf-8

J

julia_2683

I run perl v5.8.7 and my regular expresion is ($txt =~ m/(\w+|é\w+)/g)
which do not take every utf-8 word. How to make this regular
expression to take every utf-8 word ?
 
J

Joost Diepenmaat

I run perl v5.8.7 and my regular expresion is ($txt =~ m/(\w+|é\w+)/g)
which do not take every utf-8 word. How to make this regular
expression to take every utf-8 word ?

Just \w should work, provided you're handling your encodings correctly *and*
your $txt is actually utf-8 encoded. This is IMO a bug.

Note that if your script itself is utf8 encoded you need to "use utf8"
somewhere at the top of your script.

For instance:

#/usr/bin/perl -w
use strict;

# set output stream as utf-8 encoded (i have a utf-8 enabled terminal)
binmode STDOUT,":utf8";

my $str="\x{e9}"; # "é", not necessarily as utf-8 - very likely latin-1
utf8::upgrade($str); # force utf-8 encoding

print "$str was ",($str =~ /\w+/ ? "" : "not "),"matched\n";

Joost.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top