A hash or array of regexp's?

Tim Shoppa · Mar 28, 2005

I often find myself with a list of things that I'm searching for. And
for each of the things I'm searching for, there's an action I want to
do.

Sometimes the "search for" pattern is just the first four characters in
the line, for example. Here things are easy: I build a hash with the
key being the four-character pattern, and the value being the
subroutine to execute. Works very nicely: get each line, use a
substr() to extract the first four characters, look them up in the
hash, and execute the correct subroutine. Very quick, very fast, very
idiomatic.

But other times the patterns are not so easily handled. Often they are
true regexp's, matching variable repeats/patterns. This of course can
be handled with if matches and blocks to do the actions, but this
screams out to me as something that I ought to be able to handle using
a data structure which is something like a hash, using regexp's as
keys.

Pages 193/194 of the Camel book reveal how to loop over a bunch of
precompiled regexp's, using qr// to precompile the regexp's, and this
isn't bad. But it's not quite the same as a hash lookup. And it seems
to me that there ought to be an idiom, maybe a CPAN module, that makes
the whole operation look more like a hash lookup, because that's how I
think of it in my head, even though I know that regexp's aren't really
as quick or efficient as simple keys.

So, is there a common perl idiom for dealing with this situation?
Maybe a CPAN module?

Tim.

xhoster · Mar 28, 2005

Tim Shoppa said:
I often find myself with a list of things that I'm searching for. And
for each of the things I'm searching for, there's an action I want to
do.

Sometimes the "search for" pattern is just the first four characters in
the line, for example. Here things are easy: I build a hash with the
key being the four-character pattern, and the value being the
subroutine to execute. Works very nicely: get each line, use a
substr() to extract the first four characters, look them up in the
hash, and execute the correct subroutine. Very quick, very fast, very
idiomatic.

But other times the patterns are not so easily handled. Often they are
true regexp's, matching variable repeats/patterns. This of course can
be handled with if matches and blocks to do the actions, but this
screams out to me as something that I ought to be able to handle using
a data structure which is something like a hash, using regexp's as
keys.

Pages 193/194 of the Camel book reveal how to loop over a bunch of
precompiled regexp's, using qr// to precompile the regexp's, and this
isn't bad. But it's not quite the same as a hash lookup. And it seems
to me that there ought to be an idiom, maybe a CPAN module, that makes
the whole operation look more like a hash lookup, because that's how I
think of it in my head, even though I know that regexp's aren't really
as quick or efficient as simple keys.

Also, any given string can match many different regexes, while there is
exactly one hash key it can match. Trying to munge such a situation into a
hash-like idiom seems very misleading and just asking for trouble.

I'd just use an array of arrays, with each inner array being of length 2,
a regex/action pair.

Xho

Fabian Pilkowski · Mar 29, 2005

* Tim Shoppa said:
I often find myself with a list of things that I'm searching for. And
for each of the things I'm searching for, there's an action I want to
do.

Sometimes the "search for" pattern is just the first four characters in
the line, for example. Here things are easy: I build a hash with the
key being the four-character pattern, and the value being the
subroutine to execute. Works very nicely: get each line, use a
substr() to extract the first four characters, look them up in the
hash, and execute the correct subroutine. Very quick, very fast, very
idiomatic.

But other times the patterns are not so easily handled. Often they are
true regexp's, matching variable repeats/patterns. This of course can
be handled with if matches and blocks to do the actions, but this
screams out to me as something that I ought to be able to handle using
a data structure which is something like a hash, using regexp's as
keys.

So, is there a common perl idiom for dealing with this situation?

I would do this with an array containing a regex as each second element
and the callback in the following one, then iterating over this array
while skipping the callback elements.

#!/usr/bin/perl -w
use strict;

my @array = (
qr/(line\s(\d)\2)/ => sub { print "match: $1" },
# ...
);

while ( <DATA> ) {
for my $i ( 0 .. @array-1 ) {
next if $i % 2; # skip if odd
my( $re, $sub ) = @array[ $i, $i+1 ];
$sub->() if $_ =~ $re; # callback
}
}
__DATA__
line 10
line 11
line 12

Maybe a CPAN module?

The Modul Tie::HashRef is moving around the problem of stringified hash
keys. Perhaps it accepts a reference to a regex as keys -- the doc isn't
talking about and neither I checked it out yet.

regards,
fabian

Tim Shoppa · Mar 29, 2005

Fabian said:
The Modul Tie::HashRef is moving around the problem

Thanks for the tip, it's not only a tied hash but also a useful
object-oriented approach to looking for matches. It takes "qr//" forms
directly as the key, no need stringify/destringify. And to answer the
other reply, the approach taken ("first match") works fine for my
purposes.

I know it's not really a hash (with all the efficiencies that would be
implied if it was) but I like to think in terms of a hash, and
Tie::HashRef works wonderfully for this.

Tim.

a trival array/ hash benchmark	6	Mar 20, 2013
dynamically creating a hash from an array	16	Mar 21, 2014
Initialising a hash	46	Feb 5, 2014
having trouble with hash of arrays...	12	Jul 3, 2013
Very simple hash/regex question	14	Aug 23, 2012
Hash key types and equality of hash keys	2	Mar 1, 2012
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
Hash array with variable size?	5	Feb 28, 2011

A hash or array of regexp's?

Tim Shoppa

xhoster

Fabian Pilkowski

Tim Shoppa

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads