Does Perl combine multiple REs into a single automaton?

Discussion in 'Perl Misc' started by Clint Olsen, Jun 28, 2004.

  1. Clint Olsen

    Clint Olsen Guest

    Hi:

    I posted earlier about how to speed up writing lexical analyzers in Perl.
    With that effort in mind, I was curious to know if Perl combines multiple
    patterns like:

    if (/pat/) {
    } elsif (/pat1/) {
    ....
    } elsif (/pat2/) {
    ....
    ....
    ....
    } else {
    }

    so that pat[\d]+ are in a sense combined via alternation with each branch
    working like embedded action code?

    The reason why I ask is that someone suggested I try to do this manually in
    order to help speed up the pattern matching process (presumably using the
    "(?{ code })" feature documented in perlre. Is it really faster to do it
    this way?

    When I'm in the debugger in Perl, I've noticed that the execution path gets
    sort of muddied. I don't see Perl executing each match as a separate
    statement. It's as if it jumps right to the code for the pattern match.
    If that's the case, then there's not much of a compelling argument to embed
    action code and cobmine REs.

    Thanks,

    -Clint
    Clint Olsen, Jun 28, 2004
    #1
    1. Advertising

  2. Clint Olsen

    Anno Siegel Guest

    Clint Olsen <> wrote in comp.lang.perl.misc:
    > Hi:
    >
    > I posted earlier about how to speed up writing lexical analyzers in Perl.
    > With that effort in mind, I was curious to know if Perl combines multiple
    > patterns like:
    >
    > if (/pat/) {
    > } elsif (/pat1/) {
    > ...
    > } elsif (/pat2/) {
    > ...
    > ...
    > ...
    > } else {
    > }
    >
    > so that pat[\d]+ are in a sense combined via alternation with each branch
    > working like embedded action code?


    That sounds extremely unlikely.

    > The reason why I ask is that someone suggested I try to do this manually in
    > order to help speed up the pattern matching process (presumably using the
    > "(?{ code })" feature documented in perlre. Is it really faster to do it
    > this way?


    Only benchmarks can answer that. Out of interest, I made up an example
    along the lines you sketched up there. The straight if/elsif/else came
    out 50% faster than a regex with embedded code.

    Anno
    Anno Siegel, Jun 28, 2004
    #2
    1. Advertising

  3. Clint Olsen

    Clint Olsen Guest

    On 2004-06-29, Abigail <> wrote:
    >
    > No, that would be impossible. First of all, the different patterns may
    > contain different set of parens, so $1 and friends would need to be set
    > to different things. But more importantly, in the original code, if
    > 'pat2' matches, but 'pat' and 'pat1' don't, $_ has been accessed three
    > times. And $_ could be tied.


    Ok, that makes sense. I didn't think it would be possible to combine them.
    It's just that Perl behind the scenes must be doing some sort of weird
    execution for these if/else/branches since they are never 'visited' in the
    debugger. It just immediately jumps to the code block.

    > It *may* be faster to write it as
    >
    > if (/pat([12]?)/) {
    > if ($1 eq "") {...}
    > elsif ($1 eq "1") {...}
    > else {...}
    > }
    >
    > You'd only use the regex engine was. However, you are using a more
    > complicated pattern, and that means the optimizer can do less. Which
    > might actually result in a slowdown. You'll have to benchmark to be sure.


    I was wondering about that, too - Write a megapattern with capture buffers
    and just test the capture buffers for which action to take...

    FWIW, I just took the regular expression set for all the keywords of my
    language and merged it with the reserved symbols into a larger pattern
    separated by an alternation. Since the action code was identical, I
    thought it would be a reasonable test. Unfortunately in my case, I didn't
    notice any particular speed difference. As you said, this could be because
    the pattern is slightly more complicated now or perhaps statistically
    speaking the symbols just aren't seen often enough to make a difference...

    Thanks,

    -Clint
    Clint Olsen, Jun 29, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. deejayfred
    Replies:
    0
    Views:
    525
    deejayfred
    Oct 2, 2003
  2. Roedy Green

    finite state automaton

    Roedy Green, Dec 23, 2005, in forum: Java
    Replies:
    6
    Views:
    523
    Roedy Green
    Jan 2, 2006
  3. kpp9c
    Replies:
    6
    Views:
    375
    duncan smith
    Sep 23, 2009
  4. Leiradella, Andre V Matos Da Cunha

    RES: RES: Bare-bones Ruby

    Leiradella, Andre V Matos Da Cunha, Dec 29, 2004, in forum: Ruby
    Replies:
    1
    Views:
    289
    Stefan Schmiedl
    Dec 29, 2004
  5. Leiradella, Andre V Matos Da Cunha

    RES: RES: RES: Bare-bones Ruby

    Leiradella, Andre V Matos Da Cunha, Dec 29, 2004, in forum: Ruby
    Replies:
    0
    Views:
    125
    Leiradella, Andre V Matos Da Cunha
    Dec 29, 2004
Loading...

Share This Page