Comparing values of multiple hash keys

Discussion in 'Perl Misc' started by Jason, Jul 26, 2006.

  1. Jason

    Jason Guest

    In the past I've written a simple search engine and have been using it
    for awhile, but now I'm trying to make it a little more intuitive.

    Originally, the script was simple. Take a keyword entered by a form,
    compare it to each value of an array (@data), and return any value
    containing the entry:

    $var = param('keyword');
    foreach $key (@data) {
    if ($key =~ /$var/i) { push (@founddata, $key); }
    }


    But now I'm trying to allow for multi-word phrases, which is a bit more
    complex. I couldn't find how others are doing it, so I'm winging it on
    my own. I started by splitting $var by the whitespace into an array,
    counting the number of instances that $var appeared in the $key, then
    adding the results to a hash value:

    my (%founddata, @keywords);
    my $var = param('keyword');

    $var =~ s/(?:,|'|\.)//g; # Remove comma, apostrophe, or period
    @keywords = split(/ /, $var);

    foreach my $key (@data) {
    foreach my $term (@keywords) {
    my @matches = $var =~ /($term)/ig;
    my $size = @matches;
    if ($size > 0) { $founddata{$term} .= $size::$key . "|"; }
    }
    }


    Let's say that @data = ("Men", "Women", "Children", "Pets",
    "Monsters"), and someone searches for "m n" (without the quotes, of
    course, because so far I'm not worrying about strict phrases). The
    results would be:

    $keywords[0] = m;
    $keywords[1] = n;

    $founddata{'m'} = 1::Men|1::Women|1::Monsters;
    $founddata{'n'} = 1::Men|1::Women|1::Children|2::Monsters;


    Now, how do I compare the two keys "m" and "n" to both remove any value
    that's not in both (ie, Children), and then add the rest together to
    create something like 2::Men|2::Women|3::Monsters?

    In my mind, I would take the result of this, split it by | into an
    array, sort it so that Monsters is first, then split it by :: to remove
    the numbers (leaving the final values sorted by the number of
    instances), unless you guys know an easier route.

    TIA,

    Jason
     
    Jason, Jul 26, 2006
    #1
    1. Advertising

  2. Jason

    Jason Guest

    > $founddata{'n'} = 1::Men|1::Women|1::Children|2::Monsters;

    Whoops, I screwed up there. I was trying to think of a word with the
    letter "n" in it twice; don't know why I thought that Monsters did
    (other than the fact that I posted at 3am).

    So obviously this would not be 2::Monsters, it would be 1::Monsters.
    But for the sake of discussion, pretend that "Monsters" was a word with
    2 n's in it ;-)

    - J
     
    Jason, Jul 26, 2006
    #2
    1. Advertising

  3. Jason

    Guest

    Jason wrote:

    > $founddata{'m'} = 1::Men|1::Women|1::Monsters;
    > $founddata{'n'} = 1::Men|1::Women|1::Children|2::Monsters;
    >
    >
    > Now, how do I compare the two keys "m" and "n" to both remove any value
    > that's not in both (ie, Children), and then add the rest together to
    > create something like 2::Men|2::Women|3::Monsters?
    >
    > In my mind, I would take the result of this, split it by | into an
    > array, sort it so that Monsters is first, then split it by :: to remove
    > the numbers (leaving the final values sorted by the number of
    > instances), unless you guys know an easier route.


    I think you are making this harder than it needs to be (and less
    efficient, too).

    A common mistake, especially among less experienced programmers, is to
    try to build up a bunch of stuff and then post-process it (this is
    often manifested by slurping a file into an array and then looping over
    the array, which is almost always silly). In your case, you are taking
    multiple input values and looping them against your data list to build
    up individual arrays, and then trying to post-process those arrays to
    get meaningful data. This is rather inefficent, because you must loop
    over your data list multiple times (which is backwards, because your
    data list is presumably large and your input list is presumably small).

    Instead, turn it inside-out. Don't construct outside loops over your
    input lists. Instead, loop over your data list (once). For each item
    in the data list, calculate the value of it's match to your criteria
    (and ignore that element if it fails to meet your criteria).

    This way, you fully resolve (or ignore) the value of each item in @data
    one element at a time. Thus, you never need to post-process anything,
    because everything you want to know is known when you finish the
    outside loop around @data.

    Consider this code:

    #!/usr/bin/perl
    use strict; use warnings;

    my @data = qw{ Men Women Children Pets Monnsters };
    my %keywords = ( 0 => 'm', 1 => 'n');
    my @results = ();

    WORD:
    foreach my $word(@data) { #outside loop around word database
    my $count = 0;
    foreach my $keyword( values %keywords ) { #inside loop
    next WORD unless $word =~ /$keyword/i; #fail criteria-reject
    while ($word =~ /$keyword/ig) { $count++ } #perldoc -q count
    }
    push @results, "${count}::$word";
    }
    print join("|", @results), "\n";

    __END__


    --
    David Filmer (http://DavidFilmer.com)
     
    , Jul 26, 2006
    #3
  4. Jason

    Dr.Ruud Guest

    schreef:

    > push @results, "${count}::$word";


    OR-Variant:

    push(@results, "${count}::$word")
    if $count ;

    AND-Variant:

    push(@results, "${count}::$word")
    if $count == keys %keywords ;

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Jul 26, 2006
    #4
  5. Jason <> wrote:

    > Originally, the script was simple. Take a keyword entered by a form,
    > compare it to each value of an array (@data), and return any value
    > containing the entry:
    >
    > $var = param('keyword');
    > foreach $key (@data) {
    > if ($key =~ /$var/i) { push (@founddata, $key); }
    > }
    >
    >
    > But now I'm trying to allow for multi-word phrases, which is a bit more
    > complex. I couldn't find how others are doing it,



    Do just what you are already doing (but a hash is more natural for
    representing a set), but do it once for each term in the multi-word phrase.

    You then have N sets for your N-term phrase.

    Find the intersection of the N sets, and you have the answer.

    perldoc -q intersection

    How do I compute the difference of two arrays? How do I compute the
    intersection of two arrays?



    > $var =~ s/(?:,|'|\.)//g; # Remove comma, apostrophe, or period



    $var =~ tr/,'.//d; # Remove comma, apostrophe, or period
    # only clearer and faster


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Jul 26, 2006
    #5
  6. Jason

    Ted Zlatanov Guest

    On 26 Jul 2006, wrote:

    Jason <> wrote:
    >
    >> Originally, the script was simple. Take a keyword entered by a form,
    >> compare it to each value of an array (@data), and return any value
    >> containing the entry:
    >>
    >> $var = param('keyword');
    >> foreach $key (@data) {
    >> if ($key =~ /$var/i) { push (@founddata, $key); }
    >> }
    >>
    >>
    >> But now I'm trying to allow for multi-word phrases, which is a bit more
    >> complex. I couldn't find how others are doing it,

    >
    >
    > Do just what you are already doing (but a hash is more natural for
    > representing a set), but do it once for each term in the multi-word phrase.
    >
    > You then have N sets for your N-term phrase.
    >
    > Find the intersection of the N sets, and you have the answer.
    >
    > perldoc -q intersection
    >
    > How do I compute the difference of two arrays? How do I compute the
    > intersection of two arrays?


    The OP may want the union of the sets, based on his imprecise
    specifications :)

    Ted
     
    Ted Zlatanov, Jul 26, 2006
    #6
  7. Jason

    Guest

    Dr.Ruud wrote:
    > schreef:
    > > push @results, "${count}::$word";

    >
    > OR-Variant:
    > push(@results, "${count}::$word")
    > if $count ;
    > AND-Variant:
    > push(@results, "${count}::$word")
    > if $count == keys %keywords ;


    The push statement in my code could never be executed unless the
    keyword was found at least once in all data elements (as per the OP's
    specification) so it would not be necessary to use conditional
    expressions on the push as I coded it (interestingly, I sat in on a
    great presentation at OSCON that Geoffrey Young gave just a few hours
    ago on Devel::Cover which discussed how that module can help identify
    redundant secondary checks).

    However, the AND variant might not produce correct results. The OP
    stipulated that all occurances of each search term be counted within
    the data space, so $count could easily exceed keys(), and keys() might
    be equal to $count even though the term was not found in all data terms
    (because some terms matched more than once).

    --
    David Filmer (http://DavidFilmer.com)
     
    , Jul 27, 2006
    #7
  8. Jason

    -berlin.de Guest

    Ted Zlatanov <> wrote in comp.lang.perl.misc:
    > On 26 Jul 2006, wrote:
    >
    > Jason <> wrote:
    > >
    > >> Originally, the script was simple. Take a keyword entered by a form,
    > >> compare it to each value of an array (@data), and return any value
    > >> containing the entry:
    > >>
    > >> $var = param('keyword');
    > >> foreach $key (@data) {
    > >> if ($key =~ /$var/i) { push (@founddata, $key); }
    > >> }
    > >>
    > >>
    > >> But now I'm trying to allow for multi-word phrases, which is a bit more
    > >> complex. I couldn't find how others are doing it,

    > >
    > >
    > > Do just what you are already doing (but a hash is more natural for
    > > representing a set), but do it once for each term in the multi-word phrase.
    > >
    > > You then have N sets for your N-term phrase.
    > >
    > > Find the intersection of the N sets, and you have the answer.
    > >
    > > perldoc -q intersection
    > >
    > > How do I compute the difference of two arrays? How do I compute the
    > > intersection of two arrays?

    >
    > The OP may want the union of the sets, based on his imprecise
    > specifications :)


    Maybe. The answer is still sufficient. The union isn't in the title,
    but the faq explains how to compute it.

    Anno
     
    -berlin.de, Jul 27, 2006
    #8
  9. Jason

    Dr.Ruud Guest

    schreef:
    > Dr.Ruud:
    >> :


    >>> push @results, "${count}::$word";

    >>
    >> OR-Variant:
    >> push(@results, "${count}::$word")
    >> if $count ;
    >> AND-Variant:
    >> push(@results, "${count}::$word")
    >> if $count == keys %keywords ;

    >
    > The push statement in my code could never be executed unless the
    > keyword was found at least once in all data elements (as per the OP's
    > specification) so it would not be necessary to use conditional
    > expressions on the push as I coded it (interestingly, I sat in on a
    > great presentation at OSCON that Geoffrey Young gave just a few hours
    > ago on Devel::Cover which discussed how that module can help identify
    > redundant secondary checks).
    >
    > However, the AND variant might not produce correct results. The OP
    > stipulated that all occurances of each search term be counted within
    > the data space, so $count could easily exceed keys(), and keys() might
    > be equal to $count even though the term was not found in all data
    > terms (because some terms matched more than once).


    Oops yes, I overlooked the while.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Jul 27, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. rp
    Replies:
    1
    Views:
    543
    red floyd
    Nov 10, 2011
  2. Alex Fenton

    Hash#values and Hash#keys order

    Alex Fenton, Apr 7, 2006, in forum: Ruby
    Replies:
    1
    Views:
    145
    George Ogata
    Apr 15, 2006
  3. Mage

    hash.keys and hash.values

    Mage, Aug 13, 2006, in forum: Ruby
    Replies:
    14
    Views:
    184
  4. Ronald Fischer

    Hash#keys, Hash#values order question

    Ronald Fischer, Aug 23, 2007, in forum: Ruby
    Replies:
    0
    Views:
    156
    Ronald Fischer
    Aug 23, 2007
  5. Replies:
    3
    Views:
    107
    Ben Bullock
    Apr 19, 2009
Loading...

Share This Page