A hash or array of regexp's?

Discussion in 'Perl Misc' started by Tim Shoppa, Mar 28, 2005.

  1. Tim Shoppa

    Tim Shoppa Guest

    I often find myself with a list of things that I'm searching for. And
    for each of the things I'm searching for, there's an action I want to
    do.

    Sometimes the "search for" pattern is just the first four characters in
    the line, for example. Here things are easy: I build a hash with the
    key being the four-character pattern, and the value being the
    subroutine to execute. Works very nicely: get each line, use a
    substr() to extract the first four characters, look them up in the
    hash, and execute the correct subroutine. Very quick, very fast, very
    idiomatic.

    But other times the patterns are not so easily handled. Often they are
    true regexp's, matching variable repeats/patterns. This of course can
    be handled with if matches and blocks to do the actions, but this
    screams out to me as something that I ought to be able to handle using
    a data structure which is something like a hash, using regexp's as
    keys.

    Pages 193/194 of the Camel book reveal how to loop over a bunch of
    precompiled regexp's, using qr// to precompile the regexp's, and this
    isn't bad. But it's not quite the same as a hash lookup. And it seems
    to me that there ought to be an idiom, maybe a CPAN module, that makes
    the whole operation look more like a hash lookup, because that's how I
    think of it in my head, even though I know that regexp's aren't really
    as quick or efficient as simple keys.

    So, is there a common perl idiom for dealing with this situation?
    Maybe a CPAN module?

    Tim.
    Tim Shoppa, Mar 28, 2005
    #1
    1. Advertising

  2. Tim Shoppa

    Guest

    "Tim Shoppa" <> wrote:
    > I often find myself with a list of things that I'm searching for. And
    > for each of the things I'm searching for, there's an action I want to
    > do.
    >
    > Sometimes the "search for" pattern is just the first four characters in
    > the line, for example. Here things are easy: I build a hash with the
    > key being the four-character pattern, and the value being the
    > subroutine to execute. Works very nicely: get each line, use a
    > substr() to extract the first four characters, look them up in the
    > hash, and execute the correct subroutine. Very quick, very fast, very
    > idiomatic.
    >
    > But other times the patterns are not so easily handled. Often they are
    > true regexp's, matching variable repeats/patterns. This of course can
    > be handled with if matches and blocks to do the actions, but this
    > screams out to me as something that I ought to be able to handle using
    > a data structure which is something like a hash, using regexp's as
    > keys.
    >
    > Pages 193/194 of the Camel book reveal how to loop over a bunch of
    > precompiled regexp's, using qr// to precompile the regexp's, and this
    > isn't bad. But it's not quite the same as a hash lookup. And it seems
    > to me that there ought to be an idiom, maybe a CPAN module, that makes
    > the whole operation look more like a hash lookup, because that's how I
    > think of it in my head, even though I know that regexp's aren't really
    > as quick or efficient as simple keys.


    Also, any given string can match many different regexes, while there is
    exactly one hash key it can match. Trying to munge such a situation into a
    hash-like idiom seems very misleading and just asking for trouble.

    I'd just use an array of arrays, with each inner array being of length 2,
    a regex/action pair.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Mar 28, 2005
    #2
    1. Advertising

  3. * Tim Shoppa schrieb:

    > I often find myself with a list of things that I'm searching for. And
    > for each of the things I'm searching for, there's an action I want to
    > do.
    >
    > Sometimes the "search for" pattern is just the first four characters in
    > the line, for example. Here things are easy: I build a hash with the
    > key being the four-character pattern, and the value being the
    > subroutine to execute. Works very nicely: get each line, use a
    > substr() to extract the first four characters, look them up in the
    > hash, and execute the correct subroutine. Very quick, very fast, very
    > idiomatic.
    >
    > But other times the patterns are not so easily handled. Often they are
    > true regexp's, matching variable repeats/patterns. This of course can
    > be handled with if matches and blocks to do the actions, but this
    > screams out to me as something that I ought to be able to handle using
    > a data structure which is something like a hash, using regexp's as
    > keys.
    >
    > So, is there a common perl idiom for dealing with this situation?


    I would do this with an array containing a regex as each second element
    and the callback in the following one, then iterating over this array
    while skipping the callback elements.

    #!/usr/bin/perl -w
    use strict;

    my @array = (
    qr/(line\s(\d)\2)/ => sub { print "match: $1" },
    # ...
    );

    while ( <DATA> ) {
    for my $i ( 0 .. @array-1 ) {
    next if $i % 2; # skip if odd
    my( $re, $sub ) = @array[ $i, $i+1 ];
    $sub->() if $_ =~ $re; # callback
    }
    }
    __DATA__
    line 10
    line 11
    line 12


    >
    > Maybe a CPAN module?


    The Modul Tie::HashRef is moving around the problem of stringified hash
    keys. Perhaps it accepts a reference to a regex as keys -- the doc isn't
    talking about and neither I checked it out yet.

    regards,
    fabian
    Fabian Pilkowski, Mar 29, 2005
    #3
  4. Tim Shoppa

    Tim Shoppa Guest

    Fabian Pikowski wrote:
    > The Modul Tie::HashRef is moving around the problem


    Thanks for the tip, it's not only a tied hash but also a useful
    object-oriented approach to looking for matches. It takes "qr//" forms
    directly as the key, no need stringify/destringify. And to answer the
    other reply, the approach taken ("first match") works fine for my
    purposes.

    I know it's not really a hash (with all the efficiencies that would be
    implied if it was) but I like to think in terms of a hash, and
    Tie::HashRef works wonderfully for this.

    Tim.
    Tim Shoppa, Mar 29, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. rp
    Replies:
    1
    Views:
    513
    red floyd
    Nov 10, 2011
  2. Anthony Martinez
    Replies:
    4
    Views:
    268
    Robert Klemme
    Jun 11, 2007
  3. Michal Suchanek
    Replies:
    6
    Views:
    226
    Nobuyoshi Nakada
    Jun 13, 2007
  4. Srijayanth Sridhar
    Replies:
    19
    Views:
    610
    David A. Black
    Jul 2, 2008
  5. Joao Silva
    Replies:
    16
    Views:
    355
    7stud --
    Aug 21, 2009
Loading...

Share This Page