regular expression module?

Discussion in 'Perl Misc' started by Walter Roberson, Apr 20, 2004.

  1. I remember reading on this newsgroup, maybe about a year ago, of
    a module that did a Finite State Machine implementation of "standard"
    regular expressions -- i.e., just what can be built out of
    concatenation, alternation, and repetition, (and perhaps
    some syntactic sugar such as charactre-classes), without the perl
    extensions that lift perl's regex's into a different parser class
    entirely.

    Unfortunately, when I tried to find the posting again, I could
    not, and I could not find anything like that in CPAN.

    The matter came to my mind again today when someone mentioned
    the Yapp module and parsing speeds -- one of the points that
    can really slow down perl's regexes is the backtracking that it
    does. If one happens to be working with input whose grammar is
    not overly complex, then one might be perfectly happy with
    faster but less flexible RE's and a perl equivilent of
    yacc and lex.

    Would anyone happen to recall the name and location of that
    finite-state RE module? Something like that could potentially speed
    up a number of my programs.
    --
    Is "meme" descriptive or perscriptive? Does the knowledge that
    memes exist not subtly encourage the creation of more memes?
    -- A Child's Garden Of Memes
    Walter Roberson, Apr 20, 2004
    #1
    1. Advertising

  2. Walter Roberson

    Anno Siegel Guest

    Walter Roberson <-cnrc.gc.ca> wrote in comp.lang.perl.misc:
    > I remember reading on this newsgroup, maybe about a year ago, of
    > a module that did a Finite State Machine implementation of "standard"
    > regular expressions -- i.e., just what can be built out of
    > concatenation, alternation, and repetition, (and perhaps
    > some syntactic sugar such as charactre-classes), without the perl
    > extensions that lift perl's regex's into a different parser class
    > entirely.
    >
    > Unfortunately, when I tried to find the posting again, I could
    > not, and I could not find anything like that in CPAN.
    >
    > The matter came to my mind again today when someone mentioned
    > the Yapp module and parsing speeds -- one of the points that
    > can really slow down perl's regexes is the backtracking that it
    > does.


    Backtracking is certainly not a Perl-specific addition. Any regular
    expression implementation must backtrack.

    > If one happens to be working with input whose grammar is
    > not overly complex, then one might be perfectly happy with
    > faster but less flexible RE's and a perl equivilent of
    > yacc and lex.
    >
    > Would anyone happen to recall the name and location of that
    > finite-state RE module? Something like that could potentially speed
    > up a number of my programs.


    I don't know the module you have in mind, but if it's a re-implementation
    of regular expressions in Perl, any speedup is unlikely.

    Anno
    Anno Siegel, Apr 20, 2004
    #2
    1. Advertising

  3. -berlin.de (Anno Siegel) writes:

    > Walter Roberson <-cnrc.gc.ca> wrote in comp.lang.perl.misc:
    > > I remember reading on this newsgroup, maybe about a year ago, of
    > > a module that did a Finite State Machine implementation of "standard"
    > > regular expressions -- i.e., just what can be built out of
    > > concatenation, alternation, and repetition, (and perhaps
    > > some syntactic sugar such as charactre-classes), without the perl
    > > extensions that lift perl's regex's into a different parser class
    > > entirely.
    > >
    > > Unfortunately, when I tried to find the posting again, I could
    > > not, and I could not find anything like that in CPAN.
    > >
    > > The matter came to my mind again today when someone mentioned
    > > the Yapp module and parsing speeds -- one of the points that
    > > can really slow down perl's regexes is the backtracking that it
    > > does.

    >
    > Backtracking is certainly not a Perl-specific addition. Any regular
    > expression implementation must backtrack.


    No, the OP is right - "standard" regular expressions can be done with
    a Finite State Machine.

    I don't, however, recall the thread.

    --
    \\ ( )
    . _\\__[oo
    .__/ \\ /\@
    . l___\\
    # ll l\\
    ###LL LL\\
    Brian McCauley, Apr 20, 2004
    #3
  4. Walter Roberson

    Anno Siegel Guest

    Brian McCauley <> wrote in comp.lang.perl.misc:
    > -berlin.de (Anno Siegel) writes:
    >
    > > Walter Roberson <-cnrc.gc.ca> wrote in comp.lang.perl.misc:
    > > > I remember reading on this newsgroup, maybe about a year ago, of
    > > > a module that did a Finite State Machine implementation of "standard"
    > > > regular expressions -- i.e., just what can be built out of
    > > > concatenation, alternation, and repetition, (and perhaps
    > > > some syntactic sugar such as charactre-classes), without the perl
    > > > extensions that lift perl's regex's into a different parser class
    > > > entirely.
    > > >
    > > > Unfortunately, when I tried to find the posting again, I could
    > > > not, and I could not find anything like that in CPAN.
    > > >
    > > > The matter came to my mind again today when someone mentioned
    > > > the Yapp module and parsing speeds -- one of the points that
    > > > can really slow down perl's regexes is the backtracking that it
    > > > does.

    > >
    > > Backtracking is certainly not a Perl-specific addition. Any regular
    > > expression implementation must backtrack.

    >
    > No, the OP is right - "standard" regular expressions can be done with
    > a Finite State Machine.


    I may be confused, but I don't see the contradiction. The finite-state-
    machine *does* the backtracking, no?

    Anno
    Anno Siegel, Apr 20, 2004
    #4
  5. In article <c632jt$m3p$-Berlin.DE>,
    Anno Siegel <-berlin.de> wrote:
    :> No, the OP is right - "standard" regular expressions can be done with
    :> a Finite State Machine.

    :I may be confused, but I don't see the contradiction. The finite-state-
    :machine *does* the backtracking, no?

    No, no backtracking is needed.

    There are two major representations of finite state machines, one "DFA"
    (Deterministic Finite Automata) and the other one "NDFA"
    (Non-Deterministic Finite Automata). NDFA are smaller to diagram out,
    and are possibly more popular. It is common for beginners to believe
    that NDFA can "do more" than DFA can, but they both have the same expressive
    power, and there is a mechanical procedure to convert any NDFA to an
    exactly equivilent DFA. Some implimentations of NDFA use
    backtracking in order to avoid the (non-trivial) overhead of doing
    the NDFA -> DFA conversion; those backtracking implimentations are
    often faster on simple expressions, but when one has a more complex
    expression, then it becomes better to do the conversion and run the
    equivilent-DFA.

    DFA's never need to backtrack. Once you -have- the DFA, it runs in
    time linear to the length of the input: you read the next input
    symbol, index that symbol in the transition table of the current
    state, and that tells you the new state number. Repeat and by the
    end of processing each symbol in the string exactly -once-,
    you have either "accepted" the input (in which case the pattern matched)
    or else you run out of input at a non-accepting state, in which case
    the pattern match failed. The difficulty, the reason this isn't
    used much more often, is that if you are starting with an NDFA, the
    conversion into a DFA takes exponential time. Going the
    regex -> NDFA -> DFA route is thus best suited for cases in which you
    expect to "compile once, run often". Too expensive of a process if
    you are farting around doing on-the-fly typing in of expressions to
    be processed against short files, but well worth doing if you have a
    big input file.... especially if you can separate out the compiled
    DFA and "save" it so that you don't have to recompile it from run
    to run or when other parts of the program change.

    'yacc' is a well known program that does all the work of building
    up compiled DFA, complete with some kind of state-table compression
    so that the result is both space and time efficient. But yacc produces
    C code. I seem to recall hearing of a yacc implimentation that
    had an option for spitting out perl. Or was that a 'lex' program?
    I do not remember clearly.
    --
    Studies show that the average reader ignores 106% of all statistics
    they see in .signatures.
    Walter Roberson, Apr 20, 2004
    #5
  6. Walter Roberson

    Uri Guttman Guest

    >>>>> "WR" == Walter Roberson <-cnrc.gc.ca> writes:

    WR> 'yacc' is a well known program that does all the work of building
    WR> up compiled DFA, complete with some kind of state-table compression
    WR> so that the result is both space and time efficient. But yacc produces
    WR> C code. I seem to recall hearing of a yacc implimentation that
    WR> had an option for spitting out perl. Or was that a 'lex' program?
    WR> I do not remember clearly.

    search google for perl-byacc

    uri
    Uri Guttman, Apr 20, 2004
    #6
  7. Walter Roberson

    pkent Guest

    In article <c63ivi$5il$>,
    -cnrc.gc.ca (Walter Roberson) wrote:

    <snip interesting stuff about regexes>

    > 'yacc' is a well known program that does all the work of building
    > up compiled DFA, complete with some kind of state-table compression
    > so that the result is both space and time efficient. But yacc produces
    > C code. I seem to recall hearing of a yacc implimentation that
    > had an option for spitting out perl. Or was that a 'lex' program?
    > I do not remember clearly.


    Would you be thinking of yapp? As in:

    http://search.cpan.org/~fdesar/Parse-Yapp-1.05/lib/Parse/Yapp.pm

    specifically it says "The script yapp is a front-end to the Parse::Yapp
    module and let you easily create a Perl OO parser from an input grammar
    file." and "Parse::Yapp should compile a clean yacc grammar without any
    modification..."

    P

    --
    pkent 77 at yahoo dot, er... what's the last bit, oh yes, com
    Remove the tea to reply
    pkent, Apr 20, 2004
    #7
  8. In article <>,
    pkent <> wrote:
    :In article <c63ivi$5il$>,
    : -cnrc.gc.ca (Walter Roberson) wrote:

    :<snip interesting stuff about regexes>

    :> 'yacc' is a well known program that does all the work of building
    :> up compiled DFA, complete with some kind of state-table compression
    :> so that the result is both space and time efficient. But yacc produces
    :> C code. I seem to recall hearing of a yacc implimentation that
    :> had an option for spitting out perl. Or was that a 'lex' program?
    :> I do not remember clearly.

    :Would you be thinking of yapp? As in:

    :http://search.cpan.org/~fdesar/Parse-Yapp-1.05/lib/Parse/Yapp.pm

    I was not, but thank you for the reference. I seem to recall seeing
    [years ago] a version of yacc with an explicit perl option. I might
    be able to find it again if I looked around.

    But getting back to my original question: although Yapp does appear to
    have a lot of functionality useful for my purposes, the module that
    I am remembering as having seen mentioned, was, as best I recall,
    just an RE (as opposed to regex) compiler and FSM driver, without the
    full LALR parsing facilities of yacc/yapp. Useful if, for example,
    one had a large number of simple RE's (or just plain strings) to match
    against. join('|', @wordlist) is not very efficient to match against
    in perl, even if one qr's it, as perl backtracks because it assumes
    later portions of the expression might require the full regexp power.

    Maybe I should be looking more closely at the regexp optimizer... but
    boy does it produce ugly expressions! ;-)
    --
    Can a statement be self-referential without knowing it?
    Walter Roberson, Apr 20, 2004
    #8
  9. Walter Roberson

    Jim Cochrane Guest

    In article <c640tf$nn9$>, Walter Roberson wrote:
    > In article <>,
    > pkent <> wrote:
    >:In article <c63ivi$5il$>,
    >: -cnrc.gc.ca (Walter Roberson) wrote:
    >
    >:<snip interesting stuff about regexes>
    >
    >:> 'yacc' is a well known program that does all the work of building
    >:> up compiled DFA, complete with some kind of state-table compression
    >:> so that the result is both space and time efficient. But yacc produces
    >:> C code. I seem to recall hearing of a yacc implimentation that
    >:> had an option for spitting out perl. Or was that a 'lex' program?
    >:> I do not remember clearly.
    >
    >:Would you be thinking of yapp? As in:
    >
    >:http://search.cpan.org/~fdesar/Parse-Yapp-1.05/lib/Parse/Yapp.pm
    >
    > I was not, but thank you for the reference. I seem to recall seeing
    > [years ago] a version of yacc with an explicit perl option. I might
    > be able to find it again if I looked around.
    >
    > But getting back to my original question: although Yapp does appear to
    > have a lot of functionality useful for my purposes, the module that
    > I am remembering as having seen mentioned, was, as best I recall,
    > just an RE (as opposed to regex) compiler and FSM driver, without the
    > full LALR parsing facilities of yacc/yapp. Useful if, for example,
    > one had a large number of simple RE's (or just plain strings) to match
    > against. join('|', @wordlist) is not very efficient to match against
    > in perl, even if one qr's it, as perl backtracks because it assumes
    > later portions of the expression might require the full regexp power.
    >
    > Maybe I should be looking more closely at the regexp optimizer... but
    > boy does it produce ugly expressions! ;-)


    Sorry to ask the obvious, but did you try some carefully-worded google
    searches?

    --
    Jim Cochrane;
    [When responding by email, include the term non-spam in the subject line to
    get through my spam filter.]
    Jim Cochrane, Apr 21, 2004
    #9
  10. In article <>,
    Jim Cochrane <> wrote:
    :In article <c640tf$nn9$>, Walter Roberson wrote:
    :> But getting back to my original question: although Yapp does appear to
    :> have a lot of functionality useful for my purposes, the module that
    :> I am remembering as having seen mentioned, was, as best I recall,
    :> just an RE (as opposed to regex) compiler and FSM driver, without the
    :> full LALR parsing facilities of yacc/yapp.

    :Sorry to ask the obvious, but did you try some carefully-worded google
    :searches?

    Yes, I spent a couple of hours looking for what I thought I remembered
    existing, including examining the thread topic of everything in the
    time range that I thought I had seen the particular message in.

    It is possible that I did not happen upon the right keywords when I
    did the search, but I did try a variety of approaches, and my ability
    to elicit useful information from google is usually quite good.
    --
    "[...] it's all part of one's right to be publicly stupid." -- Dave Smey
    Walter Roberson, Apr 21, 2004
    #10
  11. In article <>,
    Uri Guttman <> wrote:
    :search google for perl-byacc

    Thanks for the reference.

    When I look around, I see a debian RPM at about the 2.0.5 rev
    level, and I see (in a small number of places) the (1993) source for the
    1.8.2 rev level. When I look in cpan, I see a distribution which
    is named perl5-byacc-patches, but not perl-byacc itself.

    Would you happen to have bookmarked for a 2.0.x source that isn't
    an RPM?
    --
    Preposterous!! Where would all the calculators go?!
    Walter Roberson, Apr 22, 2004
    #11
  12. Walter Roberson

    Uri Guttman Guest

    >>>>> "WR" == Walter Roberson <-cnrc.gc.ca> writes:

    WR> In article <>,
    WR> Uri Guttman <> wrote:
    WR> :search google for perl-byacc

    WR> Thanks for the reference.

    WR> Would you happen to have bookmarked for a 2.0.x source that isn't
    WR> an RPM?

    i have never used it. i just remembered the name byacc and that it had a
    perl output option and googled.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Apr 22, 2004
    #12
  13. In article <>,
    Uri Guttman <> wrote:
    :i have never used it. i just remembered the name byacc and that it had a
    :perl output option and googled.

    Thanks, I was able to find enough pieces to put it together.
    I have not compared it to Parse::Yapp as yet, though.
    --
    Rome was built one paycheck at a time. -- Walter Roberson
    Walter Roberson, Apr 22, 2004
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,281
  2. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    839
    Alan Moore
    Dec 2, 2005
  3. GIMME
    Replies:
    3
    Views:
    11,940
    vforvikash
    Dec 29, 2008
  4. Steve
    Replies:
    11
    Views:
    562
    Jim Segrave
    Jul 1, 2006
  5. Faheem Mitha

    regular expression question (re module)

    Faheem Mitha, Oct 11, 2008, in forum: Python
    Replies:
    4
    Views:
    276
Loading...

Share This Page