G
Guest
Hello utf8 wizards,
My series of utf8-related problem reports continues. Please have a look
at the output of the following script:
#!/usr/bin/perl # uncommenting one or either of these
# binmode(STDOUT,":utf8"); # lines changes output significantly!
# use utf8;
use Text::Levenshtein qw(distance);
$lemma="Å tein";# the first letter is capital S with hacek, or U+0160
@candidates=("stein","Stein","steïn","Steïn","štein","šteïn");
for $candidate (@candidates) {
print "$lemma -> $candidate: ".
distance($lemma,$candidate)."\n";
}
Please note the edit distances; apparently the Text::Levenshtein module
works bytewise and not characterwise. To make things even more complicated,
the return values of 'distance' change with the settings of the first
lines. Again, perl is v.5.8.5 on a Linux box, everything utf8-enabled.
Best regards,
Oliver.
My series of utf8-related problem reports continues. Please have a look
at the output of the following script:
#!/usr/bin/perl # uncommenting one or either of these
# binmode(STDOUT,":utf8"); # lines changes output significantly!
# use utf8;
use Text::Levenshtein qw(distance);
$lemma="Å tein";# the first letter is capital S with hacek, or U+0160
@candidates=("stein","Stein","steïn","Steïn","štein","šteïn");
for $candidate (@candidates) {
print "$lemma -> $candidate: ".
distance($lemma,$candidate)."\n";
}
Please note the edit distances; apparently the Text::Levenshtein module
works bytewise and not characterwise. To make things even more complicated,
the return values of 'distance' change with the settings of the first
lines. Again, perl is v.5.8.5 on a Linux box, everything utf8-enabled.
Best regards,
Oliver.