Building the finite state machine of a Ruby regexp

G

Gilles Lacroix

Does anyone know if there is a nice way to build the Finite State Machine
(FSM) of a Ruby regexp ?

Assuming the native ruby regexp parser used a (more or less explicit) FSM
I first thought that there was some builtin way to access or re-build it
but, after reading some reference docs (Regexp class) and posts I can't
see any useful methods to do this. But I would be happy to be wrong about
this...

As an alternative solution, I wonder if someone knows about a module that
could parse Ruby regexps (accepting exactly the same regexp grammar as
Ruby) and let me read the associated FSM (or whatever graph-like structure
representing the regexp).

Any help will be greatly appreciated. Thanks sincerely.
 
C

Caleb Clausen

Does anyone know if there is a nice way to build the Finite State Machine
(FSM) of a Ruby regexp ?

I don't know of anything specifically oriented to Ruby Regexps that
does this, but surely such tools exist for (say) Perl...?
As an alternative solution, I wonder if someone knows about a module that
could parse Ruby regexps (accepting exactly the same regexp grammar as
Ruby) and let me read the associated FSM (or whatever graph-like structure
representing the regexp).

Given how notoriously opaque they are, I've actually found regular
expressions to be pretty easy to parse. I can send you a partial
Regexp parser from one of my projects that just extracted the
information I needed to know at the time.... but probably what you
really want is Simon Strandgaard's regexp-engine. You can find it
here:

http://rubyforge.org/projects/aeditor/
 
M

Matthew Smillie

I don't know of anything specifically oriented to Ruby Regexps that
does this, but surely such tools exist for (say) Perl...?

Strictly speaking, there isn't anything that does this, since modern
'extended' regexes (a set which includes Ruby's regexes) aren't
always representable as finite state automata (i.e., they can
recognise more than the class of regular languages).

Sorry, can't help you there either - I don't know and couldn't dig up
anything that exposes Ruby's regex internals (or Oniguruma's) in the
way you're asking for, which strikes me as the only way to use
*exactly* the same grammar as Ruby does. Like Caleb Clausen
suggested, though, there are probably partial solutions out there,
depending on how flexible your criteria are and exactly what
information you need - are you looking for an aid in building
regexes? just visualising them? some other sort of analysis?

matthew smillie.
 
G

Gilles Lacroix

(...) Like Caleb Clausen
suggested, though, there are probably partial solutions out there,
depending on how flexible your criteria are and exactly what
information you need - are you looking for an aid in building
regexes? just visualising them? some other sort of analysis?

What I'd like to do is, given any regexp :
1. To test *quickly* if a particular string matches this regexp : that's
why I'd prefer to use native Ruby regexes. They are the natural choice
when programming in Ruby and offer very good performance (because the
parser is compiled into the Ruby interpreter from a well-established C
code base (GNU regexps) that should be reasonably well optimized after
decades of public exposure).
2. To generate "random" strings that match the same regexp : that's why I
was thinking about using a Finite State Machine (or kind of it since, as
you pointed out 'modern' regexp are not alway representable as FSM) :
starting from the initial state and choosing randomly an outbound edge,
I can add a (first) character to the string. Repeating the process from
the node I arrived on, I can add a second character, and so on... (then
I will also have to make sure that I finally reach the final state in a
reasonable time but that's another problem).

Gilles Lacroix.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top