Matchtable

B

Bernard

Hi all

I already searched on Google, but unfortunately I don't know what it's
exactly called what I'm looking for.

So what I'd like to do is to create a (text-)table and have a script to
find the best match. (Sorry... my english...)

So here's an example:

Table:
#field1 |field2 |field3 |Output
* |* |* |Line1
t* |* |* |Line2
t* |x* |* |Line3
* |x* |x* |Line4


Cases:

Field 1: abc
Field 2: abc
Field 3: abc
Should Match Line1

Field 1: test
Field 2: abc
Field 3: abc
Should match Line2

Field 1: test
Field 2: xyz
Field 3: abc
Should match Line3

Field 1: test
Field 2: xyz
Field 3: xyz
Should match Line4

and so on...

So I'm looking for a way to find the most exact match for given Input.
Is there a module that does this. If not how would you solve it? I know
I should include a some code of what I've already tried, but all the
ways I thought of, are just to complicated and I'm sure there must be an
easier way.

So what I thought of is:
- Build a Hash and then sort by length of key (field 1)... but that
wouldn't work for the 4th example.
- Build 3 Hashes with field 1-3 as keys and sort again by length and
give some kind of rating to the matches (so if field 1 matches with a
String with a lenght of 3 Characters it will get 3 Points)

Thank you very much in advance
Bernard
 
C

ccc31807

So what I'd like to do is to create a (text-)table and have a script to
find the best match.

This is simply a logic problem. The easy part is building the data
structure. See script below. After you load your data into memory, you
can access it through the keys. The problem is even with small levels
of keys, the number of permutations becomes enormous. The logic you
use isn't tied to Perl, or any other language, but depends on your
data.

Here's how I would do it. It builds a three level data structure, and
then prints all the permutations. Your logic would replace the print
statement.

CC

--------script-----------
#! perl
use strict;
use warnings;
my %table;
while(<DATA>)
{
next if /^#/;
chomp;
my ($k1, $k2, $k3, $v) = split /\|/;
$table{$k1}{$k2}{$k3} = $v;
}

foreach my $k1 (keys %table)
{ foreach my $k2 (keys %{$table{$k1}})
{ foreach my $k3 (keys %{$table{$k1}{$k2}})
{ print "$k1 -- $k2 -- $k3 == $table{$k1}{$k2}{$k3}\n";
}
}
}
exit(0);

__DATA__
#field1 |field2 |field3 |Output
* |* |* |Line1
t* |* |* |Line2
t* |x* |* |Line3
* |x* |x* |Line4

-----------output-------------
D:\PerlLearn\learning>perl table.plx
t* -- * -- * == Line2
t* -- x* -- * == Line3
* -- * -- * == Line1
* -- x* -- x* == Line4
 
S

Steve C

Bernard said:
Hi all

I already searched on Google, but unfortunately I don't know what it's
exactly called what I'm looking for.

So what I'd like to do is to create a (text-)table and have a script to
find the best match. (Sorry... my english...)

So here's an example:

Table:
#field1 |field2 |field3 |Output
* |* |* |Line1
t* |* |* |Line2
t* |x* |* |Line3
* |x* |x* |Line4


Cases:

Field 1: abc
Field 2: abc
Field 3: abc
Should Match Line1

Field 1: test
Field 2: abc
Field 3: abc
Should match Line2

Field 1: test
Field 2: xyz
Field 3: abc
Should match Line3

Field 1: test
Field 2: xyz
Field 3: xyz
Should match Line4

and so on...

So I'm looking for a way to find the most exact match for given Input.
Is there a module that does this. If not how would you solve it? I know
I should include a some code of what I've already tried, but all the
ways I thought of, are just to complicated and I'm sure there must be an
easier way.

So what I thought of is:
- Build a Hash and then sort by length of key (field 1)... but that
wouldn't work for the 4th example.
- Build 3 Hashes with field 1-3 as keys and sort again by length and
give some kind of rating to the matches (so if field 1 matches with a
String with a lenght of 3 Characters it will get 3 Points)

You don't define "best" or "most exact", so how does this problem have
any solution?

When I've seen truth tables like this, the order matters. You put
the best match at the top, and the least specific later and stop on
the first match. The last line would be your catchall "* * *"
which matches anything.

The following table would implement this:

#field1 |field2 |field3 |Output
* |x* |x* |Line4
t* |x* |* |Line3
t* |* |* |Line2
* |* |* |Line1

I don't see an objective way to decide whether "test xyz xyz" should
output Line4 or Line3 unless you make it explicit in the table by
putting one of the two first.
 
S

sln

Hi all

I already searched on Google, but unfortunately I don't know what it's
exactly called what I'm looking for.

So what I'd like to do is to create a (text-)table and have a script to
find the best match. (Sorry... my english...)

So here's an example:

Table:
#field1 |field2 |field3 |Output
* |* |* |Line1
t* |* |* |Line2
t* |x* |* |Line3
* |x* |x* |Line4


Cases:

Field 1: abc
Field 2: abc
Field 3: abc
Should Match Line1

Field 1: test
Field 2: abc
Field 3: abc
Should match Line2

Field 1: test
Field 2: xyz
Field 3: abc
Should match Line3

Field 1: test
Field 2: xyz
Field 3: xyz
Should match Line4

and so on...

So I'm looking for a way to find the most exact match for given Input.
Is there a module that does this. If not how would you solve it? I know
I should include a some code of what I've already tried, but all the
ways I thought of, are just to complicated and I'm sure there must be an
easier way.

So what I thought of is:
- Build a Hash and then sort by length of key (field 1)... but that
wouldn't work for the 4th example.
- Build 3 Hashes with field 1-3 as keys and sort again by length and
give some kind of rating to the matches (so if field 1 matches with a
String with a lenght of 3 Characters it will get 3 Points)

Thank you very much in advance
Bernard

One approach would be to "weight" the rows, then "weight" the columns
in that order with whatever criteria fits your needs.
Should only need to do it once (or when adding a new record).
Once weighted store it in a file.

-sln

---------------

c:\temp>perl aa.pl
6 |* |x* |x* |Line4
3 |tt* |x* |* |Line3.1
3 |tr* |x* |* |Line3.03
3 |t* |x* |* |Line3.02
3 |att* |x* |* |Line3.0
3 |at* |x* |* |Line3.22
3 |aat* |x* |* |Line3.11
3 |aaab* |x* |* |Line3.04
3 |aaa* |xa* |* |Line3.211
3 |aaa* |x* |* |Line3.21
3 |aa* |x* |* |Line3.2
3 |a* |x* |* |Line3.01
1 |t* |* |* |Line2
0 |* |* |* |Line1

c:\temp>

use strict;
use warnings;

my @template = ();

while (<DATA>)
{
chomp;
next if /^\s*$/;
my ($val,$cnt) = (0,0);

## Weight cross fields
for my $fld (split /\t\|/, $_)
{
$val += 2**$cnt if ($fld =~ /[^*]/);
$cnt++;
last if $cnt>2;
}
push @template, "$val\t|".$_;
}

## Weight within fields
@template = sort {
my ($b0,$b1,$b2,$b3) = split /\t\|/, $b;
my ($a0,$a1,$a2,$a3) = split /\t\|/, $a;
$b0 <=> $a0 ||
$b1 cmp $a1 ||
$b2 cmp $a2 ||
$b3 cmp $a3
} @template;

for my $str (@template) {
print "$str\n";
}
__DATA__

at* |x* |* |Line3.22
aaab* |x* |* |Line3.04
* |* |* |Line1
t* |* |* |Line2
aaa* |x* |* |Line3.21
tt* |x* |* |Line3.1
* |x* |x* |Line4
att* |x* |* |Line3.0
a* |x* |* |Line3.01
t* |x* |* |Line3.02
tr* |x* |* |Line3.03
aat* |x* |* |Line3.11
aa* |x* |* |Line3.2
aaa* |xa* |* |Line3.211
 
B

Bernard

Hi all

Thank you very much for your responses.

I found a solution that works for me in the meantime.
Bernard

Table looks like this:
#Hostname| Subsystem | Probe | User Tag | Group
* | System* | nt* | * | NT_Group
asdf* | System* | nt* | * | NT_Group_APAC
us* | pr* | Misc* | * | US_Servicedesk
us-* | * | * | * | US_Network
* | * | * | * | Default_Group

Script:
#!perl -w
use strict;

my $table_file = './io/matchtst.lst';

my $table = read_table($table_file) or die;

my $group = find_match( $table,
'us-asdf',
'printer',
'Domino',
'') or die;

print "Group is $group\n";
exit 0;


#---- Subroutinen ----#

sub read_table {
my $table_file = shift;

my $table;

open(IN, $table_file) or die "cant read $table_file: $!\n";

while(<IN>) {
chomp;
next if (/^\#/ or ! $_);

s/\*/\.\*/g;

my ($field1, $field2, $field3, $field4, $output) = split(/\s+\|\s+/);
my $key = join('~~', ($field1, $field2, $field3, $field4));

$table->{$key}->{'field1'} = $field1;
$table->{$key}->{'field2'} = $field2;
$table->{$key}->{'field3'} = $field3;
$table->{$key}->{'field4'} = $field4;
$table->{$key}->{'output'} = $output;
}

close IN;

return $table;
}

sub find_match {
my %fields;
my %score;

my $table = shift;

my $field1 = shift;
my $field2 = shift;
my $field3 = shift;
my $field4 = shift;

foreach my $key(sort {length($b) <=> length($a)} keys %$table) {

if ($field1 =~ /$table->{$key}->{'field1'}/ and
$field2 =~ /$table->{$key}->{'field2'}/ and
$field3 =~ /$table->{$key}->{'field3'}/ and
$field4 =~ /$table->{$key}->{'field4'}/) {

if (! $score{'score'} or ($score{'score'} < length($key))) {
$score{'score'} = length($key);
$score{'pattern'} = $key;
}
}
}

return $table->{$score{'pattern'}}->{'output'};

}
 
S

sln

Hi all

Thank you very much for your responses.

I found a solution that works for me in the meantime.
Bernard

Table looks like this:
#Hostname| Subsystem | Probe | User Tag | Group
* | System* | nt* | * | NT_Group
asdf* | System* | nt* | * | NT_Group_APAC
us* | pr* | Misc* | * | US_Servicedesk
us-* | * | * | * | US_Network
* | * | * | * | Default_Group

Script:
#!perl -w
use strict;

my $table_file = './io/matchtst.lst';

my $table = read_table($table_file) or die;

my $group = find_match( $table,
'us-asdf',
'printer',
'Domino',
'') or die;

print "Group is $group\n";
exit 0;


#---- Subroutinen ----#

sub read_table {
my $table_file = shift;

my $table;
open(IN, $table_file) or die "cant read $table_file: $!\n";

open(IN, '<', $table_file) or ...
while(<IN>) {
chomp;
next if (/^\#/ or ! $_);

s/\*/\.\*/g;

my $line;
while( defined($line = <IN>) ) {
chomp $line;
next if ($line =~ /^\#/ || $line =~ /^\s*$);

$line =~ s/\*/\.\*/g; # .* precludes any other quantifiers following
# It basically means there can be only one '*' (.*)
# per line as it's length is given 'weight'
my ($field1, $field2, $field3, $field4, $output) = split(/\s+\|\s+/);
# validate fields are defined
if (!(defined($field1) &&
defined($field2) &&
defined($field3) &&
defined($field4) &&
defined($output)) ) {
print "Table record is undefined", $line, "\n";
next;
}
my $key = join('~~', ($field1, $field2, $field3, $field4));
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# validate it is not a duplicate key (leading to a possible different $output field)
if (!exists( $table->{$key} ) ) {
$table->{$key}->{'field1'} = $field1;
$table->{$key}->{'field2'} = $field2;
$table->{$key}->{'field3'} = $field3;
$table->{$key}->{'field4'} = $field4;
$table->{$key}->{'output'} = $output;
}
else {
print "Duplicate key found: ", $table->{$key}, "\n"
}
}

close IN;

return $table;
}

sub find_match {
my %fields;
my %score;

my $table = shift;

my $field1 = shift;
my $field2 = shift;
my $field3 = shift;
my $field4 = shift;

$score{'score'} = 0;
foreach my $key(sort {length($b) <=> length($a)} keys %$table) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here, if you are sorting by the length of the key in descending order,
there is no need to stay in the loop once you have found a match
if ($field1 =~ /$table->{$key}->{'field1'}/ and ^^ use && instead
$field2 =~ /$table->{$key}->{'field2'}/ and
$field3 =~ /$table->{$key}->{'field3'}/ and
$field4 =~ /$table->{$key}->{'field4'}/) {

if (! $score{'score'} or ($score{'score'} < length($key))) {
^^ use || instead
^^^^^^^^^^
no need to keep checking for this
if ( $score{'score'} < length($key) ) {
$score{'score'} = length($key);
$score{'pattern'} = $key;
}
}
}

return $table->{$score{'pattern'}}->{'output'};
^^^^^^^^^^^^^ I don't know if it was found
if (exists $score{'pattern'} ) {
return $table->{$score{'pattern'}}->{'output'};
}
return "Cannot find $score{'pattern'} !";

I wasn't sure if you were actually going to use this table
as regexp's for search. Now I see you are.
While it is ok to have a table of regexp's, you have
totally crippled it by weighting it on the length of the
joined fields regexp's ie: us*~~pr*~~Misc*~~*

The fact is, it is possible to have different lengths
where the shorter length is the most significant, thats
the only argument needed against that approach. It then
becomes non-systematic, un-programmable.

Its limited regex. You can only have one '*' and at the end only
and I think thats it.
This will force you into making multiple line items per group,
and eventually having to add attribute's for distinction.

IMO, its better to weight the fields based on significance (say, from
left to right) instead of depending upon the combined length of the
fields.

Good job.

-sln
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top