Matchtable

Bernard · Apr 12, 2010

Hi all

I already searched on Google, but unfortunately I don't know what it's
exactly called what I'm looking for.

So what I'd like to do is to create a (text-)table and have a script to
find the best match. (Sorry... my english...)

So here's an example:

Table:
#field1 |field2 |field3 |Output
* |* |* |Line1
t* |* |* |Line2
t* |x* |* |Line3
* |x* |x* |Line4

Cases:

Field 1: abc
Field 2: abc
Field 3: abc
Should Match Line1

Field 1: test
Field 2: abc
Field 3: abc
Should match Line2

Field 1: test
Field 2: xyz
Field 3: abc
Should match Line3

Field 1: test
Field 2: xyz
Field 3: xyz
Should match Line4

and so on...

So I'm looking for a way to find the most exact match for given Input.
Is there a module that does this. If not how would you solve it? I know
I should include a some code of what I've already tried, but all the
ways I thought of, are just to complicated and I'm sure there must be an
easier way.

So what I thought of is:
- Build a Hash and then sort by length of key (field 1)... but that
wouldn't work for the 4th example.
- Build 3 Hashes with field 1-3 as keys and sort again by length and
give some kind of rating to the matches (so if field 1 matches with a
String with a lenght of 3 Characters it will get 3 Points)

Thank you very much in advance
Bernard

ccc31807 · Apr 12, 2010

So what I'd like to do is to create a (text-)table and have a script to
find the best match.

This is simply a logic problem. The easy part is building the data
structure. See script below. After you load your data into memory, you
can access it through the keys. The problem is even with small levels
of keys, the number of permutations becomes enormous. The logic you
use isn't tied to Perl, or any other language, but depends on your
data.

Here's how I would do it. It builds a three level data structure, and
then prints all the permutations. Your logic would replace the print
statement.

CC

--------script-----------
#! perl
use strict;
use warnings;
my %table;
while(<DATA>)
{
next if /^#/;
chomp;
my ($k1, $k2, $k3, $v) = split /\|/;
$table{$k1}{$k2}{$k3} = $v;
}

foreach my $k1 (keys %table)
{ foreach my $k2 (keys %{$table{$k1}})
{ foreach my $k3 (keys %{$table{$k1}{$k2}})
{ print "$k1 -- $k2 -- $k3 == $table{$k1}{$k2}{$k3}\n";
}
}
}
exit(0);

__DATA__
#field1 |field2 |field3 |Output
* |* |* |Line1
t* |* |* |Line2
t* |x* |* |Line3
* |x* |x* |Line4

-----------output-------------
D:\PerlLearn\learning>perl table.plx
t* -- * -- * == Line2
t* -- x* -- * == Line3
* -- * -- * == Line1
* -- x* -- x* == Line4

Steve C · Apr 12, 2010

Bernard said:
Hi all

I already searched on Google, but unfortunately I don't know what it's
exactly called what I'm looking for.

So what I'd like to do is to create a (text-)table and have a script to
find the best match. (Sorry... my english...)

So here's an example:

Table:
#field1 |field2 |field3 |Output
* |* |* |Line1
t* |* |* |Line2
t* |x* |* |Line3
* |x* |x* |Line4

Cases:

Field 1: abc
Field 2: abc
Field 3: abc
Should Match Line1

Field 1: test
Field 2: abc
Field 3: abc
Should match Line2

Field 1: test
Field 2: xyz
Field 3: abc
Should match Line3

Field 1: test
Field 2: xyz
Field 3: xyz
Should match Line4

and so on...

So I'm looking for a way to find the most exact match for given Input.
Is there a module that does this. If not how would you solve it? I know
I should include a some code of what I've already tried, but all the
ways I thought of, are just to complicated and I'm sure there must be an
easier way.

So what I thought of is:
- Build a Hash and then sort by length of key (field 1)... but that
wouldn't work for the 4th example.
- Build 3 Hashes with field 1-3 as keys and sort again by length and
give some kind of rating to the matches (so if field 1 matches with a
String with a lenght of 3 Characters it will get 3 Points)

You don't define "best" or "most exact", so how does this problem have
any solution?

When I've seen truth tables like this, the order matters. You put
the best match at the top, and the least specific later and stop on
the first match. The last line would be your catchall "* * *"
which matches anything.

The following table would implement this:

#field1 |field2 |field3 |Output
* |x* |x* |Line4
t* |x* |* |Line3
t* |* |* |Line2
* |* |* |Line1

I don't see an objective way to decide whether "test xyz xyz" should
output Line4 or Line3 unless you make it explicit in the table by
putting one of the two first.

sln · Apr 13, 2010

Hi all

I already searched on Google, but unfortunately I don't know what it's
exactly called what I'm looking for.

So what I'd like to do is to create a (text-)table and have a script to
find the best match. (Sorry... my english...)

So here's an example:

Table:
#field1 |field2 |field3 |Output
* |* |* |Line1
t* |* |* |Line2
t* |x* |* |Line3
* |x* |x* |Line4

Cases:

Field 1: abc
Field 2: abc
Field 3: abc
Should Match Line1

Field 1: test
Field 2: abc
Field 3: abc
Should match Line2

Field 1: test
Field 2: xyz
Field 3: abc
Should match Line3

Field 1: test
Field 2: xyz
Field 3: xyz
Should match Line4

and so on...

So I'm looking for a way to find the most exact match for given Input.
Is there a module that does this. If not how would you solve it? I know
I should include a some code of what I've already tried, but all the
ways I thought of, are just to complicated and I'm sure there must be an
easier way.

So what I thought of is:
- Build a Hash and then sort by length of key (field 1)... but that
wouldn't work for the 4th example.
- Build 3 Hashes with field 1-3 as keys and sort again by length and
give some kind of rating to the matches (so if field 1 matches with a
String with a lenght of 3 Characters it will get 3 Points)

Thank you very much in advance
Bernard

One approach would be to "weight" the rows, then "weight" the columns
in that order with whatever criteria fits your needs.
Should only need to do it once (or when adding a new record).
Once weighted store it in a file.

-sln

---------------

c:\temp>perl aa.pl
6 |* |x* |x* |Line4
3 |tt* |x* |* |Line3.1
3 |tr* |x* |* |Line3.03
3 |t* |x* |* |Line3.02
3 |att* |x* |* |Line3.0
3 |at* |x* |* |Line3.22
3 |aat* |x* |* |Line3.11
3 |aaab* |x* |* |Line3.04
3 |aaa* |xa* |* |Line3.211
3 |aaa* |x* |* |Line3.21
3 |aa* |x* |* |Line3.2
3 |a* |x* |* |Line3.01
1 |t* |* |* |Line2
0 |* |* |* |Line1

c:\temp>

use strict;
use warnings;

my @template = ();

while (<DATA>)
{
chomp;
next if /^\s*$/;
my ($val,$cnt) = (0,0);

## Weight cross fields
for my $fld (split /\t\|/, $_)
{
$val += 2**$cnt if ($fld =~ /[^*]/);
$cnt++;
last if $cnt>2;
}
push @template, "$val\t|".$_;
}

## Weight within fields
@template = sort {
my ($b0,$b1,$b2,$b3) = split /\t\|/, $b;
my ($a0,$a1,$a2,$a3) = split /\t\|/, $a;
$b0 <=> $a0 ||
$b1 cmp $a1 ||
$b2 cmp $a2 ||
$b3 cmp $a3
} @template;

for my $str (@template) {
print "$str\n";
}
__DATA__

at* |x* |* |Line3.22
aaab* |x* |* |Line3.04
* |* |* |Line1
t* |* |* |Line2
aaa* |x* |* |Line3.21
tt* |x* |* |Line3.1
* |x* |x* |Line4
att* |x* |* |Line3.0
a* |x* |* |Line3.01
t* |x* |* |Line3.02
tr* |x* |* |Line3.03
aat* |x* |* |Line3.11
aa* |x* |* |Line3.2
aaa* |xa* |* |Line3.211

Bernard · Apr 13, 2010

Hi all

Thank you very much for your responses.

I found a solution that works for me in the meantime.
Bernard

Table looks like this:
#Hostname| Subsystem | Probe | User Tag | Group
* | System* | nt* | * | NT_Group
asdf* | System* | nt* | * | NT_Group_APAC
us* | pr* | Misc* | * | US_Servicedesk
us-* | * | * | * | US_Network
* | * | * | * | Default_Group

Script:
#!perl -w
use strict;

my $table_file = './io/matchtst.lst';

my $table = read_table($table_file) or die;

my $group = find_match( $table,
'us-asdf',
'printer',
'Domino',
'') or die;

print "Group is $group\n";
exit 0;

#---- Subroutinen ----#

sub read_table {
my $table_file = shift;

my $table;

open(IN, $table_file) or die "cant read $table_file: $!\n";

while(<IN>) {
chomp;
next if (/^\#/ or ! $_);

s/\*/\.\*/g;

my ($field1, $field2, $field3, $field4, $output) = split(/\s+\|\s+/);
my $key = join('~~', ($field1, $field2, $field3, $field4));

$table->{$key}->{'field1'} = $field1;
$table->{$key}->{'field2'} = $field2;
$table->{$key}->{'field3'} = $field3;
$table->{$key}->{'field4'} = $field4;
$table->{$key}->{'output'} = $output;
}

close IN;

return $table;
}

sub find_match {
my %fields;
my %score;

my $table = shift;

my $field1 = shift;
my $field2 = shift;
my $field3 = shift;
my $field4 = shift;

foreach my $key(sort {length($b) <=> length($a)} keys %$table) {

if ($field1 =~ /$table->{$key}->{'field1'}/ and
$field2 =~ /$table->{$key}->{'field2'}/ and
$field3 =~ /$table->{$key}->{'field3'}/ and
$field4 =~ /$table->{$key}->{'field4'}/) {

if (! $score{'score'} or ($score{'score'} < length($key))) {
$score{'score'} = length($key);
$score{'pattern'} = $key;
}
}
}

return $table->{$score{'pattern'}}->{'output'};

}

sln · Apr 13, 2010

Hi all

Thank you very much for your responses.

I found a solution that works for me in the meantime.
Bernard

Table looks like this:
#Hostname| Subsystem | Probe | User Tag | Group
* | System* | nt* | * | NT_Group
asdf* | System* | nt* | * | NT_Group_APAC
us* | pr* | Misc* | * | US_Servicedesk
us-* | * | * | * | US_Network
* | * | * | * | Default_Group

Script:
#!perl -w
use strict;

my $table_file = './io/matchtst.lst';

my $table = read_table($table_file) or die;

my $group = find_match( $table,
'us-asdf',
'printer',
'Domino',
'') or die;

print "Group is $group\n";
exit 0;

#---- Subroutinen ----#

sub read_table {
my $table_file = shift;

my $table;
open(IN, $table_file) or die "cant read $table_file: $!\n";

open(IN, '<', $table_file) or ...

while(<IN>) {
chomp;
next if (/^\#/ or ! $_);

s/\*/\.\*/g;

my $line;
while( defined($line = <IN>) ) {
chomp $line;
next if ($line =~ /^\#/ || $line =~ /^\s*$);

$line =~ s/\*/\.\*/g; # .* precludes any other quantifiers following
# It basically means there can be only one '*' (.*)
# per line as it's length is given 'weight'

my ($field1, $field2, $field3, $field4, $output) = split(/\s+\|\s+/);

# validate fields are defined
if (!(defined($field1) &&
defined($field2) &&
defined($field3) &&
defined($field4) &&
defined($output)) ) {
print "Table record is undefined", $line, "\n";
next;
}

my $key = join('~~', ($field1, $field2, $field3, $field4));

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# validate it is not a duplicate key (leading to a possible different $output field)
if (!exists( $table->{$key} ) ) {

$table->{$key}->{'field1'} = $field1;
$table->{$key}->{'field2'} = $field2;
$table->{$key}->{'field3'} = $field3;
$table->{$key}->{'field4'} = $field4;
$table->{$key}->{'output'} = $output;

}
else {
print "Duplicate key found: ", $table->{$key}, "\n"
}

}

close IN;

return $table;
}

sub find_match {
my %fields;
my %score;

my $table = shift;

my $field1 = shift;
my $field2 = shift;
my $field3 = shift;
my $field4 = shift;

$score{'score'} = 0;

foreach my $key(sort {length($b) <=> length($a)} keys %$table) {

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here, if you are sorting by the length of the key in descending order,
there is no need to stay in the loop once you have found a match

if ($field1 =~ /$table->{$key}->{'field1'}/ and ^^ use && instead
$field2 =~ /$table->{$key}->{'field2'}/ and
$field3 =~ /$table->{$key}->{'field3'}/ and
$field4 =~ /$table->{$key}->{'field4'}/) {

if (! $score{'score'} or ($score{'score'} < length($key))) {

^^ use || instead
^^^^^^^^^^
no need to keep checking for this
if ( $score{'score'} < length($key) ) {

$score{'score'} = length($key);
$score{'pattern'} = $key;
}
}
}

return $table->{$score{'pattern'}}->{'output'};

^^^^^^^^^^^^^ I don't know if it was found
if (exists $score{'pattern'} ) {
return $table->{$score{'pattern'}}->{'output'};
}
return "Cannot find $score{'pattern'} !";

}

I wasn't sure if you were actually going to use this table
as regexp's for search. Now I see you are.
While it is ok to have a table of regexp's, you have
totally crippled it by weighting it on the length of the
joined fields regexp's ie: us*~~pr*~~Misc*~~*

The fact is, it is possible to have different lengths
where the shorter length is the most significant, thats
the only argument needed against that approach. It then
becomes non-systematic, un-programmable.

Its limited regex. You can only have one '*' and at the end only
and I think thats it.
This will force you into making multiple line items per group,
and eventually having to add attribute's for distinction.

IMO, its better to weight the fields based on significance (say, from
left to right) instead of depending upon the combined length of the
fields.

Good job.

-sln

Readline question	4	Mar 3, 2011
Python point location of intersect between two lines	0	Feb 28, 2018
How can I fix my pattern coding error in c++	0	Mar 19, 2023
how to compare two fields in python	6	Apr 30, 2013
Building a multiline string	5	Feb 4, 2010
How to go about building a crud app when you are a noob	1	Jan 2, 2023
Why is Python telling me variable is local not global?	3	Sep 2, 2023
Rock, Paper, Scissor game. Im getting TypeError, unsupported operand type(s) for -=: 'NoneType' and 'int'	2	Aug 29, 2023

Matchtable

Bernard

ccc31807

Steve C

sln

Bernard

sln

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads