Fastest way to find a match?

Discussion in 'Perl Misc' started by bukzor, Mar 12, 2008.

  1. bukzor

    bukzor Guest

    Hi,

    I'm trying to find the fastest way in perl to see if a name contains
    another.

    I've a list of 2704 names (aka "A")

    I've another name (aka "B")

    I need to know if any of A is contained in B.

    A = foo foo1 foo2 foo3 foo45 ....
    B = INCASE_foo2_YOUWANT
    is a match

    B = INCASE_YOURDONOTWANT
    is not a match.

    what would be the fastest way to check the 2704 possible values of
    "A" ?

    Thanks,


    so far, I'm using

    foreach $t (keys %A) {
    $v = $B;
    $v = s/$t//;
    if ($v ne $B) {
    print "MATCH"
    }
    }
     
    bukzor, Mar 12, 2008
    #1
    1. Advertising

  2. bukzor <> writes:

    > Hi,
    >
    > I'm trying to find the fastest way in perl to see if a name contains
    > another.
    >
    > I've a list of 2704 names (aka "A")
    >
    > I've another name (aka "B")


    <snip>

    > foreach $t (keys %A) {
    > $v = $B;
    > $v = s/$t//;
    > if ($v ne $B) {
    > print "MATCH"
    > }
    > }


    Did you notice that doesn't work? $v = s/../../; assignes the number of
    replacements of $_ into $v.

    For an unordered list of strings of lengths >= 1, the fastest check
    would probably be to use the index() function, though it may not be that
    much more or even less fast than a regex match (NOT a replace!).

    See perldoc -f index and perldoc perlretut.

    --
    Joost Diepenmaat | blog: http://joost.zeekat.nl/ | work: http://zeekat.nl/
     
    Joost Diepenmaat, Mar 12, 2008
    #2
    1. Advertising

  3. bukzor

    Ben Morrow Guest

    Quoth Joost Diepenmaat <>:
    > bukzor <> writes:
    >
    > > I'm trying to find the fastest way in perl to see if a name contains
    > > another.
    > >
    > > I've a list of 2704 names (aka "A")
    > >
    > > I've another name (aka "B")

    >
    > <snip>
    >
    > > foreach $t (keys %A) {
    > > $v = $B;
    > > $v = s/$t//;
    > > if ($v ne $B) {
    > > print "MATCH"
    > > }
    > > }

    >
    > Did you notice that doesn't work? $v = s/../../; assignes the number of
    > replacements of $_ into $v.
    >
    > For an unordered list of strings of lengths >= 1, the fastest check
    > would probably be to use the index() function, though it may not be that
    > much more or even less fast than a regex match (NOT a replace!).


    For finding one name, index is best. For finding many, probably best is
    something like

    # this makes things considerably faster, but make sure your strings
    # are all in the same Unicode state first. If necessary, Encode them
    # all to UTF8.
    use bytes;

    for my $B (@B) {
    study $B;
    for my $A (@A) {
    $B =~ /$A/ and print "MATCH";
    }
    }

    as 'study' only remembers the last string studied. An alternative would
    be to build a big regex from all the search strings joined with '|', but
    that would probably end up slower due to having such a large compiled
    regex.

    Ben
     
    Ben Morrow, Mar 13, 2008
    #3
  4. bukzor

    John Bokma Guest

    bukzor <> wrote:

    > Hi,
    >
    > I'm trying to find the fastest way in perl to see if a name contains
    > another.
    >
    > I've a list of 2704 names (aka "A")
    >
    > I've another name (aka "B")
    >
    > I need to know if any of A is contained in B.
    >
    > A = foo foo1 foo2 foo3 foo45 ....
    > B = INCASE_foo2_YOUWANT


    As usual, it's hard to say without seeing some real examples of "A" and
    "B". Are the parts in B always separated by _ ?

    --
    John

    http://johnbokma.com/
     
    John Bokma, Mar 13, 2008
    #4
  5. bukzor <> wrote:
    >Hi,
    >
    >I'm trying to find the fastest way in perl to see if a name contains
    >another.
    >
    >I've a list of 2704 names (aka "A")
    >
    >I've another name (aka "B")
    >
    >I need to know if any of A is contained in B.
    >
    >A = foo foo1 foo2 foo3 foo45 ....
    >B = INCASE_foo2_YOUWANT
    >is a match
    >
    >B = INCASE_YOURDONOTWANT
    >is not a match.
    >
    >what would be the fastest way to check the 2704 possible values of
    >"A" ?
    >
    >Thanks,
    >
    >
    >so far, I'm using
    >
    >foreach $t (keys %A) {
    > $v = $B;
    > $v = s/$t//;
    > if ($v ne $B) {


    What does the string value of the number of matches have to do with the
    original text of $B? This condition will always succeed unless $t and $B are
    both '1'.

    Maybe you meant to test the result of s/// directly?
    if ($v =~ s/$t/) {

    However, why do a s/// and awkwardly restore $v for each iteration in the
    first place? A simple
    if ($B =~ m/$_/) {
    will do without all the temporary assignments, which cost time!


    Having said that I strongly believe your code isn't doing what you think
    it's doing in the first place. Initially you wrote
    "if any of A is contained in B"
    But your code is testing if B is a regular expression that matches any of A.
    That is something very different.

    I would imagine a simple index() is what you are looking for

    foreach (keys %A) {
    if (index($B, $_) > -1) {
    print "FOUND";
    }
    }

    This is probably also the fastest method, but you may want to run some
    benchmarks.

    jue
     
    Jürgen Exner, Mar 13, 2008
    #5
  6. On Wed, 12 Mar 2008 16:34:08 -0700 (PDT), bukzor
    <> wrote:

    >I'm trying to find the fastest way in perl to see if a name contains
    >another.

    [...]
    >so far, I'm using
    >
    >foreach $t (keys %A) {
    > $v = $B;
    > $v = s/$t//;
    > if ($v ne $B) {


    Funniest way I've seen thus far to check for a match!

    > print "MATCH"


    How 'bout

    use List::Util 'first';

    # ...

    my @rx=map qr/\Q$_\E/ => @A;
    say "match!" if first { $B ~~ $_ } @rx;

    ?

    L::U is XS and should be pretty fast.


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Mar 13, 2008
    #6
  7. bukzor

    bukzor Guest

    Thanks everyone for your thoughtful posts. The OP was psuedocode and
    not to be taken literally.

    The piece to check is indeed always surrounded by underscores. Here is
    the solution from a coworker that I've implemented:


    What about using a hash table? You could put the 2704 "A" names in
    hash table. Then, break done "B" into all the possible parts that
    could match something in "A" and see if there is something in the hash
    table that matches.

    $hash{"foo"} = 1;
    $hash{"foo1"} = 1;
    ....

    For "INCASE_foo2_YOUWANT", there is only one possible matcher after
    you remove everything before the first underscore and after the last
    underscore, so just check the hash table for the existence of "foo2".

    For "INCASE_foo2_SOMETHING_YOUWANT", you could check both "foo2" and
    "SOMETHING".

    Wouldn't that be near instant?

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    before:
    51.670u 0.100s 0:57.76 89.6% 0+0k 0+0io 1033pf+0w
    after:
    0.560u 0.040s 0:00.58 103.4% 0+0k 0+0io 1033pf+0w
    I'm a happy camper...
     
    bukzor, Mar 13, 2008
    #7
  8. bukzor

    jm Guest

    Michele Dondi a écrit :
    > On Wed, 12 Mar 2008 16:34:08 -0700 (PDT), bukzor
    > <> wrote:
    >
    >> I'm trying to find the fastest way in perl to see if a name contains
    >> another.

    > [...]
    >> so far, I'm using
    >>
    >> foreach $t (keys %A) {
    >> $v = $B;
    >> $v = s/$t//;
    >> if ($v ne $B) {

    >
    > Funniest way I've seen thus far to check for a match!
    >
    >> print "MATCH"

    >
    > How 'bout
    >
    > use List::Util 'first';
    >
    > # ...
    >
    > my @rx=map qr/\Q$_\E/ => @A;
    > say "match!" if first { $B ~~ $_ } @rx;
    >
    > ?
    >
    > L::U is XS and should be pretty fast.
    >
    >
    > Michele


    I didnot find ~~ in man perlop.

    Is this a perl operator?
    or a L::U operator?
     
    jm, Mar 15, 2008
    #8
  9. bukzor

    Ben Morrow Guest

    Quoth jm <>:
    > Michele Dondi a écrit :
    > >
    > > use List::Util 'first';
    > >
    > > # ...
    > >
    > > my @rx=map qr/\Q$_\E/ => @A;
    > > say "match!" if first { $B ~~ $_ } @rx;
    > >
    > > ?
    > >
    > > L::U is XS and should be pretty fast.

    >
    > I didnot find ~~ in man perlop.
    >
    > Is this a perl operator?
    > or a L::U operator?


    It's the new 'smart match' operator in perl 5.10. In the case above,
    since $_ is always a regular expression, it is equivalent to =~; so

    print "match!\n" if first { $B =~ $_ } @rx;

    Ben
     
    Ben Morrow, Mar 15, 2008
    #9
  10. bukzor

    Dr.Ruud Guest

    bukzor schreef:

    > I'm trying to find the fastest way in perl to see if a name
    > contains another.
    >
    > I've a list of 2704 names (aka "A")
    >
    > I've another name (aka "B")
    >
    > I need to know if any of A is contained in B.


    Considered fgrep?

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Mar 16, 2008
    #10
  11. On Sat, 15 Mar 2008 19:10:57 +0100, jm <> wrote:

    >I didnot find ~~ in man perlop.
    >
    >Is this a perl operator?
    >or a L::U operator?


    A

    use 5.010;

    operator. I wanted to stay 5.10ish. Of course you can use =~ instead.


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Mar 16, 2008
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hiwa
    Replies:
    0
    Views:
    653
  2. Prateek
    Replies:
    11
    Views:
    1,162
    Prateek
    Apr 30, 2007
  3. Eugeny Myunster
    Replies:
    16
    Views:
    8,927
    user923005
    Apr 28, 2008
  4. XiaotingHua

    how to find the fastest cpan mirror site?

    XiaotingHua, Nov 2, 2004, in forum: Perl Misc
    Replies:
    1
    Views:
    185
    Anno Siegel
    Nov 2, 2004
  5. hofer
    Replies:
    5
    Views:
    125
    Ben Morrow
    Sep 12, 2008
Loading...

Share This Page