Building the finite state machine of a Ruby regexp

Discussion in 'Ruby' started by Gilles Lacroix, Aug 1, 2006.

  1. Does anyone know if there is a nice way to build the Finite State Machine
    (FSM) of a Ruby regexp ?

    Assuming the native ruby regexp parser used a (more or less explicit) FSM
    I first thought that there was some builtin way to access or re-build it
    but, after reading some reference docs (Regexp class) and posts I can't
    see any useful methods to do this. But I would be happy to be wrong about
    this...

    As an alternative solution, I wonder if someone knows about a module that
    could parse Ruby regexps (accepting exactly the same regexp grammar as
    Ruby) and let me read the associated FSM (or whatever graph-like structure
    representing the regexp).

    Any help will be greatly appreciated. Thanks sincerely.
    Gilles Lacroix, Aug 1, 2006
    #1
    1. Advertising

  2. On 8/1/06, Gilles Lacroix <> wrote:
    >
    > Does anyone know if there is a nice way to build the Finite State Machine
    > (FSM) of a Ruby regexp ?


    I don't know of anything specifically oriented to Ruby Regexps that
    does this, but surely such tools exist for (say) Perl...?

    > As an alternative solution, I wonder if someone knows about a module that
    > could parse Ruby regexps (accepting exactly the same regexp grammar as
    > Ruby) and let me read the associated FSM (or whatever graph-like structure
    > representing the regexp).


    Given how notoriously opaque they are, I've actually found regular
    expressions to be pretty easy to parse. I can send you a partial
    Regexp parser from one of my projects that just extracted the
    information I needed to know at the time.... but probably what you
    really want is Simon Strandgaard's regexp-engine. You can find it
    here:

    http://rubyforge.org/projects/aeditor/
    Caleb Clausen, Aug 2, 2006
    #2
    1. Advertising

  3. On Aug 2, 2006, at 22:44, Caleb Clausen wrote:

    > On 8/1/06, Gilles Lacroix <> wrote:
    >>
    >> Does anyone know if there is a nice way to build the Finite State
    >> Machine
    >> (FSM) of a Ruby regexp ?

    >
    > I don't know of anything specifically oriented to Ruby Regexps that
    > does this, but surely such tools exist for (say) Perl...?


    Strictly speaking, there isn't anything that does this, since modern
    'extended' regexes (a set which includes Ruby's regexes) aren't
    always representable as finite state automata (i.e., they can
    recognise more than the class of regular languages).

    >> As an alternative solution, I wonder if someone knows about a
    >> module that
    >> could parse Ruby regexps (accepting exactly the same regexp
    >> grammar as
    >> Ruby) and let me read the associated FSM (or whatever graph-like
    >> structure
    >> representing the regexp).


    Sorry, can't help you there either - I don't know and couldn't dig up
    anything that exposes Ruby's regex internals (or Oniguruma's) in the
    way you're asking for, which strikes me as the only way to use
    *exactly* the same grammar as Ruby does. Like Caleb Clausen
    suggested, though, there are probably partial solutions out there,
    depending on how flexible your criteria are and exactly what
    information you need - are you looking for an aid in building
    regexes? just visualising them? some other sort of analysis?

    matthew smillie.
    Matthew Smillie, Aug 3, 2006
    #3
  4. On Thu, 03 Aug 2006 09:22:14 +0900, Matthew Smillie wrote :

    > (...) Like Caleb Clausen
    > suggested, though, there are probably partial solutions out there,
    > depending on how flexible your criteria are and exactly what
    > information you need - are you looking for an aid in building
    > regexes? just visualising them? some other sort of analysis?


    What I'd like to do is, given any regexp :
    1. To test *quickly* if a particular string matches this regexp : that's
    why I'd prefer to use native Ruby regexes. They are the natural choice
    when programming in Ruby and offer very good performance (because the
    parser is compiled into the Ruby interpreter from a well-established C
    code base (GNU regexps) that should be reasonably well optimized after
    decades of public exposure).
    2. To generate "random" strings that match the same regexp : that's why I
    was thinking about using a Finite State Machine (or kind of it since, as
    you pointed out 'modern' regexp are not alway representable as FSM) :
    starting from the initial state and choosing randomly an outbound edge,
    I can add a (first) character to the string. Repeating the process from
    the node I arrived on, I can add a second character, and so on... (then
    I will also have to make sure that I finally reach the final state in a
    reasonable time but that's another problem).

    Gilles Lacroix.
    Gilles Lacroix, Aug 3, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. deejayfred
    Replies:
    0
    Views:
    531
    deejayfred
    Oct 2, 2003
  2. SomeDude
    Replies:
    3
    Views:
    3,131
    arant
    Aug 14, 2006
  3. Inderkal
    Replies:
    8
    Views:
    1,226
    rickman
    Dec 9, 2004
  4. Walter Roberson

    regexp finite state machine?

    Walter Roberson, Mar 8, 2005, in forum: Perl Misc
    Replies:
    0
    Views:
    139
    Walter Roberson
    Mar 8, 2005
  5. Georg Jaehnig

    RegExp as Finite State Machine

    Georg Jaehnig, Dec 30, 2008, in forum: Javascript
    Replies:
    21
    Views:
    333
    Thomas 'PointedEars' Lahn
    Jan 7, 2009
Loading...

Share This Page