Does Perl combine multiple REs into a single automaton?

C

Clint Olsen

Hi:

I posted earlier about how to speed up writing lexical analyzers in Perl.
With that effort in mind, I was curious to know if Perl combines multiple
patterns like:

if (/pat/) {
} elsif (/pat1/) {
....
} elsif (/pat2/) {
....
....
....
} else {
}

so that pat[\d]+ are in a sense combined via alternation with each branch
working like embedded action code?

The reason why I ask is that someone suggested I try to do this manually in
order to help speed up the pattern matching process (presumably using the
"(?{ code })" feature documented in perlre. Is it really faster to do it
this way?

When I'm in the debugger in Perl, I've noticed that the execution path gets
sort of muddied. I don't see Perl executing each match as a separate
statement. It's as if it jumps right to the code for the pattern match.
If that's the case, then there's not much of a compelling argument to embed
action code and cobmine REs.

Thanks,

-Clint
 
A

Anno Siegel

Clint Olsen said:
Hi:

I posted earlier about how to speed up writing lexical analyzers in Perl.
With that effort in mind, I was curious to know if Perl combines multiple
patterns like:

if (/pat/) {
} elsif (/pat1/) {
...
} elsif (/pat2/) {
...
...
...
} else {
}

so that pat[\d]+ are in a sense combined via alternation with each branch
working like embedded action code?

That sounds extremely unlikely.
The reason why I ask is that someone suggested I try to do this manually in
order to help speed up the pattern matching process (presumably using the
"(?{ code })" feature documented in perlre. Is it really faster to do it
this way?

Only benchmarks can answer that. Out of interest, I made up an example
along the lines you sketched up there. The straight if/elsif/else came
out 50% faster than a regex with embedded code.

Anno
 
C

Clint Olsen

No, that would be impossible. First of all, the different patterns may
contain different set of parens, so $1 and friends would need to be set
to different things. But more importantly, in the original code, if
'pat2' matches, but 'pat' and 'pat1' don't, $_ has been accessed three
times. And $_ could be tied.

Ok, that makes sense. I didn't think it would be possible to combine them.
It's just that Perl behind the scenes must be doing some sort of weird
execution for these if/else/branches since they are never 'visited' in the
debugger. It just immediately jumps to the code block.
It *may* be faster to write it as

if (/pat([12]?)/) {
if ($1 eq "") {...}
elsif ($1 eq "1") {...}
else {...}
}

You'd only use the regex engine was. However, you are using a more
complicated pattern, and that means the optimizer can do less. Which
might actually result in a slowdown. You'll have to benchmark to be sure.

I was wondering about that, too - Write a megapattern with capture buffers
and just test the capture buffers for which action to take...

FWIW, I just took the regular expression set for all the keywords of my
language and merged it with the reserved symbols into a larger pattern
separated by an alternation. Since the action code was identical, I
thought it would be a reasonable test. Unfortunately in my case, I didn't
notice any particular speed difference. As you said, this could be because
the pattern is slightly more complicated now or perhaps statistically
speaking the symbols just aren't seen often enough to make a difference...

Thanks,

-Clint
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top