If *both* the pattern *and* the subject (the string matched against) are
not in UTF-8, then, and only then, does \D equal [^0-9].
However, if either of them is in UTF-8 format (which does not
necessarely mean they contain a non-ASCII character), then \D excludes a
lot more than just the digits 0 to 9.
$ perl -wE 'chr =~ /[^0-9]/ or $c ++ for 0x00 .. 0xD7FF; say $c' 10
$ perl -wE 'chr =~ /\D/ or $c ++ for 0x00 .. 0xD7FF; say $c' 220
You need to use (0x00 .. 0xD7FF, 0xE000 .. 0xFDCF, 0xFDF0.. 0xFFFD) here,
otherwise you miss 10 characters ("FULLWIDTH DIGIT X" in Unicode-speak).
The following gives 230 rather than 220 for the count:
#!/usr/bin/perl
use warnings;
use strict;
use Unicode::UCD 'charinfo';
sub count_match
{
my ($re)=@_;
my $c;
for my $n (0x00 .. 0xD7FF, 0xE000 .. 0xFDCF, 0xFDF0.. 0xFFFD) {
if (chr($n) =~ /$re/) {
my $ci = charinfo($n);
print sprintf ('%02X', $n), " which is ", $$ci{name}
, " matches\n";
$c++;
}
}
print "There are $c characters matching \"$re\".\n";
}
count_match('\d');
However, I got the above list of valid Unicode numbers here by trial and
error (running with 0x00..0xFFFF and seeing where Perl complained about
"Unicode character xxx is illegal") so there might be something I've
missed.