Comparing values of multiple hash keys

Jason · Jul 26, 2006

In the past I've written a simple search engine and have been using it
for awhile, but now I'm trying to make it a little more intuitive.

Originally, the script was simple. Take a keyword entered by a form,
compare it to each value of an array (@data), and return any value
containing the entry:

$var = param('keyword');
foreach $key (@data) {
if ($key =~ /$var/i) { push (@founddata, $key); }
}

But now I'm trying to allow for multi-word phrases, which is a bit more
complex. I couldn't find how others are doing it, so I'm winging it on
my own. I started by splitting $var by the whitespace into an array,
counting the number of instances that $var appeared in the $key, then
adding the results to a hash value:

my (%founddata, @keywords);
my $var = param('keyword');

$var =~ s/(?:,|'|\.)//g; # Remove comma, apostrophe, or period
@keywords = split(/ /, $var);

foreach my $key (@data) {
foreach my $term (@keywords) {
my @matches = $var =~ /($term)/ig;
my $size = @matches;
if ($size > 0) { $founddata{$term} .= $size::$key . "|"; }
}
}

Let's say that @data = ("Men", "Women", "Children", "Pets",
"Monsters"), and someone searches for "m n" (without the quotes, of
course, because so far I'm not worrying about strict phrases). The
results would be:

$keywords[0] = m;
$keywords[1] = n;

$founddata{'m'} = 1::Men|1::Women|1::Monsters;
$founddata{'n'} = 1::Men|1::Women|1::Children|2::Monsters;

Now, how do I compare the two keys "m" and "n" to both remove any value
that's not in both (ie, Children), and then add the rest together to
create something like 2::Men|2::Women|3::Monsters?

In my mind, I would take the result of this, split it by | into an
array, sort it so that Monsters is first, then split it by :: to remove
the numbers (leaving the final values sorted by the number of
instances), unless you guys know an easier route.

TIA,

Jason

Jason · Jul 26, 2006

$founddata{'n'} = 1::Men|1::Women|1::Children|2::Monsters;

Whoops, I screwed up there. I was trying to think of a word with the
letter "n" in it twice; don't know why I thought that Monsters did
(other than the fact that I posted at 3am).

So obviously this would not be 2::Monsters, it would be 1::Monsters.
But for the sake of discussion, pretend that "Monsters" was a word with
2 n's in it ;-)

- J

usenet · Jul 26, 2006

Jason said:
$founddata{'m'} = 1::Men|1::Women|1::Monsters;
$founddata{'n'} = 1::Men|1::Women|1::Children|2::Monsters;

Now, how do I compare the two keys "m" and "n" to both remove any value
that's not in both (ie, Children), and then add the rest together to
create something like 2::Men|2::Women|3::Monsters?

In my mind, I would take the result of this, split it by | into an
array, sort it so that Monsters is first, then split it by :: to remove
the numbers (leaving the final values sorted by the number of
instances), unless you guys know an easier route.

I think you are making this harder than it needs to be (and less
efficient, too).

A common mistake, especially among less experienced programmers, is to
try to build up a bunch of stuff and then post-process it (this is
often manifested by slurping a file into an array and then looping over
the array, which is almost always silly). In your case, you are taking
multiple input values and looping them against your data list to build
up individual arrays, and then trying to post-process those arrays to
get meaningful data. This is rather inefficent, because you must loop
over your data list multiple times (which is backwards, because your
data list is presumably large and your input list is presumably small).

Instead, turn it inside-out. Don't construct outside loops over your
input lists. Instead, loop over your data list (once). For each item
in the data list, calculate the value of it's match to your criteria
(and ignore that element if it fails to meet your criteria).

This way, you fully resolve (or ignore) the value of each item in @data
one element at a time. Thus, you never need to post-process anything,
because everything you want to know is known when you finish the
outside loop around @data.

Consider this code:

#!/usr/bin/perl
use strict; use warnings;

my @data = qw{ Men Women Children Pets Monnsters };
my %keywords = ( 0 => 'm', 1 => 'n');
my @results = ();

WORD:
foreach my $word(@data) { #outside loop around word database
my $count = 0;
foreach my $keyword( values %keywords ) { #inside loop
next WORD unless $word =~ /$keyword/i; #fail criteria-reject
while ($word =~ /$keyword/ig) { $count++ } #perldoc -q count
}
push @results, "${count}::$word";
}
print join("|", @results), "\n";

__END__

Dr.Ruud · Jul 26, 2006

(e-mail address removed) schreef:

push @results, "${count}::$word";

OR-Variant:

push(@results, "${count}::$word")
if $count ;

AND-Variant:

push(@results, "${count}::$word")
if $count == keys %keywords ;

Tad McClellan · Jul 26, 2006

Jason said:
Originally, the script was simple. Take a keyword entered by a form,
compare it to each value of an array (@data), and return any value
containing the entry:

$var = param('keyword');
foreach $key (@data) {
if ($key =~ /$var/i) { push (@founddata, $key); }
}

But now I'm trying to allow for multi-word phrases, which is a bit more
complex. I couldn't find how others are doing it,

Do just what you are already doing (but a hash is more natural for
representing a set), but do it once for each term in the multi-word phrase.

You then have N sets for your N-term phrase.

Find the intersection of the N sets, and you have the answer.

perldoc -q intersection

How do I compute the difference of two arrays? How do I compute the
intersection of two arrays?

$var =~ s/(?:,|'|\.)//g; # Remove comma, apostrophe, or period

$var =~ tr/,'.//d; # Remove comma, apostrophe, or period
# only clearer and faster

Ted Zlatanov · Jul 26, 2006

Do just what you are already doing (but a hash is more natural for
representing a set), but do it once for each term in the multi-word phrase.

You then have N sets for your N-term phrase.

Find the intersection of the N sets, and you have the answer.

perldoc -q intersection

How do I compute the difference of two arrays? How do I compute the
intersection of two arrays?

The OP may want the union of the sets, based on his imprecise
specifications

Ted

usenet · Jul 27, 2006

Dr.Ruud said:
(e-mail address removed) schreef:

OR-Variant:
push(@results, "${count}::$word")
if $count ;
AND-Variant:
push(@results, "${count}::$word")
if $count == keys %keywords ;

The push statement in my code could never be executed unless the
keyword was found at least once in all data elements (as per the OP's
specification) so it would not be necessary to use conditional
expressions on the push as I coded it (interestingly, I sat in on a
great presentation at OSCON that Geoffrey Young gave just a few hours
ago on Devel::Cover which discussed how that module can help identify
redundant secondary checks).

However, the AND variant might not produce correct results. The OP
stipulated that all occurances of each search term be counted within
the data space, so $count could easily exceed keys(), and keys() might
be equal to $count even though the term was not found in all data terms
(because some terms matched more than once).

anno4000 · Jul 27, 2006

Ted Zlatanov said:
The OP may want the union of the sets, based on his imprecise
specifications

Maybe. The answer is still sufficient. The union isn't in the title,
but the faq explains how to compute it.

Anno

Dr.Ruud · Jul 27, 2006

(e-mail address removed) schreef:

Dr.Ruud:

The push statement in my code could never be executed unless the
keyword was found at least once in all data elements (as per the OP's
specification) so it would not be necessary to use conditional
expressions on the push as I coded it (interestingly, I sat in on a
great presentation at OSCON that Geoffrey Young gave just a few hours
ago on Devel::Cover which discussed how that module can help identify
redundant secondary checks).

However, the AND variant might not produce correct results. The OP
stipulated that all occurances of each search term be counted within
the data space, so $count could easily exceed keys(), and keys() might
be equal to $count even though the term was not found in all data
terms (because some terms matched more than once).

Oops yes, I overlooked the while.

Push regex search result into hash with multiple values	14	May 19, 2014
Hash key types and equality of hash keys	2	Mar 1, 2012
I made a blockchain and want to make a cryptocurrency, but my code doesn't verify hash of each block	2	Jun 2, 2024
use of "delete" for hash keys	6	Nov 1, 2010
hash of arrays	1	Sep 13, 2012
suggest better trick to handle multiple hash keys	3	Oct 20, 2011
Sort by number of characters	0	Nov 3, 2023
hash values	2	May 9, 2006

Comparing values of multiple hash keys

Jason

Jason

usenet

Dr.Ruud

Tad McClellan

Ted Zlatanov

usenet

anno4000

Dr.Ruud

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads