perl efficiency -- fastest grepping?

B

Bryan Krone

I have a stream of data comming off a serial port at 19200. I am wondering
what is the most efficient way to grep through the data in realtime? I have
20 or so different strings I need to find. All of which are ~15 characters
or less. Currently I'm using code that looks like this:                

forever loop
{
sysread the serial buffer into $newdata

if( defined $newdata )
{
        $inString =~ s/^.*(.{32})$/$1/o;
        $inString .= $newdata;
}



if( $inString =~ /.*ResetPF.*/o || $inString =~ /.*[gG][oO].*/o || $inString
=~ /.*reset.*/o || $inString =~ /.*sysinit.*/o )
{
        set some flag;
}
}
Is there a more efficient way to grep for the strings to set some flag? This
works pretty well but this is only 4 strings. I would like to add a lot
more but the program slows down after 10 or more strings. Any ideas would
be greatly appreciated.

Thanks
 
P

Peter Wyzl

Bryan Krone said:
I have a stream of data comming off a serial port at 19200. I am wondering
what is the most efficient way to grep through the data in realtime? I
have
20 or so different strings I need to find. All of which are ~15 characters
or less. Currently I'm using code that looks like this:

No doubt others more qualified than I will comment as well, but a couple of
things...
forever loop
{
sysread the serial buffer into $newdata

if( defined $newdata )
{
$inString =~ s/^.*(.{32})$/$1/o;

Why are you using the 'o' switch to the regex? You have you variable being
interpolated.
$inString .= $newdata;


Anyway, I believe you will find substr to be significantly faster for this
operation, simply discarding everything except the last 32 characters in a
string.

$inString = substr( $inString, -32) . $newdata;

Read about that in perlfunc

}



if( $inString =~ /.*ResetPF.*/o || $inString =~ /.*[gG][oO].*/o ||
$inString
=~ /.*reset.*/o || $inString =~ /.*sysinit.*/o )

Ooo!! Your regexen will be VERY inefficient because of the .* causing huge
amounts of backtracking (specially at both ends). Since you are only
looking to match the string, you can discard both sets of .* for a BIG
performance boost (particularly across multiple regexen). Again, you have
the unnecessary 'o' switches, and that second regex can be written using the
'i' switch (case insensitive).

Yielding:

if( $inString =~ /ResetPF/ || $inString =~ /go/i || $inString =~ /reset/ ||
$inString =~ /sysinit/ ){

I think you need to read up a bit more on regexes, particularly switches and
how the regex engine works.

HTH
 
J

James Willmore

On Tue, 16 Nov 2004 05:57:25 -0600, Bryan Krone wrote:

if( $inString =~ /.*ResetPF.*/o || $inString =~ /.*[gG][oO].*/o || $inString
=~ /.*reset.*/o || $inString =~ /.*sysinit.*/o ) {
        set some flag;
}
}
Is there a more efficient way to grep for the strings to set some flag?
This works pretty well but this is only 4 strings. I would like to add a
lot more but the program slows down after 10 or more strings. Any ideas
would be greatly appreciated.

First, if you can do without the regular expressions, do so. You can use
either 'unpack' or 'split' and place the results into an array. Then you
can use 'grep' to find what you need.

Second, I'm going to throw this out here and see what happens.

If you can't get away from using regular expressions ... and because there
are *specific* matches to be performed ... and with each match there might
be a specific flag to be set (or action to be performed based upon the
match), I'd (maybe) use a lookup table. This method may or may not be any
better than the way you're doing it now. I haven't benchmarked it and ...
my benchmarks would be useless against what you're trying to do.

For example:

#!/usr/bin/perl

use strict;
use warnings;

my $inString = 'reset the switch now please';

my %lookup = (
qr{ResetPF} => \&do_resetpf,
qr{go}i => \&do_go,
qr{reset} => \&do_reset,
qr{sysinit} => \&do_sysinit,
);

while( my($key,$value) = each %lookup ) {
if( $inString =~ $key) {
$value->();
}
}

sub do_resetpf {
print "ResetPF matched\n";
}

sub do_go {
print "GgOo matched\n";
}

sub do_reset {
print "reset matched\n";
}

sub do_sysinit {
print "sysinit matched\n";
}

HTH

Jim
 
M

Matija Papec

X-Ftn-To: Bryan Krone

Bryan Krone said:
if( $inString =~ /.*ResetPF.*/o || $inString =~ /.*[gG][oO].*/o || $inString
=~ /.*reset.*/o || $inString =~ /.*sysinit.*/o )
{
        set some flag;
}
}
Is there a more efficient way to grep for the strings to set some flag? This

If you're checking against plain strings (ResetPF, reset..) you can speed up
things with perldoc -f index,

if (1+index($inString, "ResetPF") or ..) {}
 
U

Uri Guttman

DD> And while those may be replacable with index, if you can't do so in the
DD> general case, moving all the matches into a single regex can be
DD> significantly faster...

DD> if( $inString =~ /ResetPF|(?i:go)|reset|sysinit/ )

and alternation of lots of strings in a regex can be very slow as well.

the OP didn't give a proper spec for the problem IMO. if the string in
question has a token in a know place, the fastest way to check for it is
to grab it with a simple regex and then look it up in a hash. so the
data read from the serial line needs to be properly specified with some
way to define where this match string is located. then extraction should
be easy and a hash can be made of the desired strings.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top