A
Aronaxis, the Sourceror
# ind_compare($string1,$string2 [,accuracy] )
# calculates strings "similarity"
# algorithm is cutted from some old Pascal code, and rewritten to use
# perl RE-engine backtracking for speed.
use warnings;
use strict;
sub DEBUG () {1}
our ($cnt, $match);
sub ind_compare ($$;$) {
my $max_len = $_[2];
# numification for security reasons.
$max_len="" unless $max_len +=0;
# WHY NOT?!
# my ($cnt,$match)=(0,0);
($cnt, $match) = (0,0);
use re 'eval'; # because of $max_len interpolation
# in regex below. But we cleaned it.
# loop for comparing $_[0] against $_[1], and $_[1] against $_[0] too
for my $i (0,1) {
$_[$i] =~ m{
( .{1,$max_len} )
(?{
$cnt++;
$match++ if index( $_[1-$i], $1 ) != -1;
})
(?!) # that always fail and force backtracking.
}x;
}
print "$match/$cnt\n" if DEBUG;
return 0 unless $cnt;
$match/$cnt;
}
print ind_compare( "abcdefgh","abcdefgh" ), "\n\n";
print ind_compare( "abcdefg!","abcdefgh" ), "\n\n";
__END__
Results:
1) if $match and $cnt in function &ind_compare declared as "our":
72/72
1
64/72
0.888888888888889
2) if $match and $cnt in function &ind_compare declared as "my":
72/72
1
0/0
0
.......................
I found, that if I declare theese vars as "my", then code inside (?{...})
always uses _first_ instances of theese vars, created on _first_ sub
invocation. (strange, they still cleared on sub exit, but on second
invocation $cnt inside regex and $cnt outside are not the same!)..
So it works correctly only once!
dammit, why?!
looks like some scoping issues, but then why there are no problems with
lexical scoped $i, or @_ (which AFAIK is also lexical) ?
oh, yeah, that's an ActiveState port of perl 5.8.0 for Windows;
# calculates strings "similarity"
# algorithm is cutted from some old Pascal code, and rewritten to use
# perl RE-engine backtracking for speed.
use warnings;
use strict;
sub DEBUG () {1}
our ($cnt, $match);
sub ind_compare ($$;$) {
my $max_len = $_[2];
# numification for security reasons.
$max_len="" unless $max_len +=0;
# WHY NOT?!
# my ($cnt,$match)=(0,0);
($cnt, $match) = (0,0);
use re 'eval'; # because of $max_len interpolation
# in regex below. But we cleaned it.
# loop for comparing $_[0] against $_[1], and $_[1] against $_[0] too
for my $i (0,1) {
$_[$i] =~ m{
( .{1,$max_len} )
(?{
$cnt++;
$match++ if index( $_[1-$i], $1 ) != -1;
})
(?!) # that always fail and force backtracking.
}x;
}
print "$match/$cnt\n" if DEBUG;
return 0 unless $cnt;
$match/$cnt;
}
print ind_compare( "abcdefgh","abcdefgh" ), "\n\n";
print ind_compare( "abcdefg!","abcdefgh" ), "\n\n";
__END__
Results:
1) if $match and $cnt in function &ind_compare declared as "our":
72/72
1
64/72
0.888888888888889
2) if $match and $cnt in function &ind_compare declared as "my":
72/72
1
0/0
0
.......................
I found, that if I declare theese vars as "my", then code inside (?{...})
always uses _first_ instances of theese vars, created on _first_ sub
invocation. (strange, they still cleared on sub exit, but on second
invocation $cnt inside regex and $cnt outside are not the same!)..
So it works correctly only once!
dammit, why?!
looks like some scoping issues, but then why there are no problems with
lexical scoped $i, or @_ (which AFAIK is also lexical) ?
oh, yeah, that's an ActiveState port of perl 5.8.0 for Windows;