The inverse problem: generate all instances of a regexp

Dr.Ruud · Feb 17, 2006

Andrei Alexandrescu (See Website For Email) schreef:

I wonder how this could be done elegantly. Given a regular expression
$re, write a function Generate($) that returns a list of all possible
strings that could match $re.

http://search.cpan.org/~bowmanbs/Regexp-Genex/lib/Regexp/Genex.pm

Andrei Alexandrescu (See Website For Email) · Feb 17, 2006

Dr.Ruud said:
Andrei Alexandrescu (See Website For Email) schreef:

http://search.cpan.org/~bowmanbs/Regexp-Genex/lib/Regexp/Genex.pm

Wow, that's exactly what the doctor prescribed... thanks, Dr. Ruud

).

Andrie

Andrei Alexandrescu (See Website For Email) · Feb 19, 2006

Frederic said:
From a theoretical point of view : the easiest way to do what you want is to
turn your regular expression into a finite automaton (see, for example,
http://lambda.uta.edu/cse5317/spring01/notes/node9.html
and nodes around). Then, enumerating the words recognized by a f.a. or
telling if it matches an infinite set of words is easy. I give a simple
implementation of it at the end of this message. Here is a simple
example of use:

[snip]

Thanks a lot for your idea and code, Frederic. I'll use them; the
problem that remains is parsing the darn regular expression

). But, I
approach it on a need basis.

Andrei

Mons · Feb 19, 2006

Andrei said:
I am writing a "make" facility for perl. Yes, Unix "make". I find make
obtuse and limited in unexpected ways, and I'm generating many files
from within Perl itself, so I wrote a little engine that allows me to write:

use Make;

sub Compile { ... }
sub Link { ... }

Depends "aout", "main.o", \&Link;
Depends "main.o", ["main.c", "main.h"], \&Compile;

Make "aout";

Of course, the simplified example above doesn't show the power of the
engine, which shines through when it comes about automatically inferring
rules. For example, the last rule above could be inferred from:

DepPattern '(.*)\.o', ['$1.c', '$1.h'], \&Compile;

I'm quite pleased with having the power of full regexes when defining
makefile rules. Now, I wanted to do something more ambitious. I wanted
to be able to specify a pattern like:

DepPatternMulti 'debug|candidate|release',
'$multi/(.*)\.o', ['$1.c', '$1.h'], \&Compile;

The line above would be equivalent with:

DepPattern 'debug/(.*)\.o', ['$1.c', '$1.h'], \&Compile;
DepPattern 'candidate/(.*)\.o', ['$1.c', '$1.h'], \&Compile;
DepPattern 'release/(.*)\.o', ['$1.c', '$1.h'], \&Compile;

For that, I'd need to expand a possibly complex regexp into all of the
possible strings it could generate. Of course, more limited solutions
would be usable, but the "ars gratia artis" solution would involve
dealing with the full-fledged problem.

May be look at this from another point?
You write regex's as you want, even "infinite", but they should match
the target files.
When you need to do the tasks, you, first of all, traverse the
filesystem and build the tree or list of all files.
(For ex. not all, but from the current directory)
Then, when doing task, you iterate not over you variants ov regex, but
over filelist, applying the patterns to filenames.
Number of files on filesystem is always limited (at this time

, so
you may not care about "infinity" of regex

PS: Sorry for pure English

Ilya Zakharevich · Feb 20, 2006

[A complimentary Cc of this posting was sent to
Ingo Menger

No, its trivial.

Any regex containing {n,} describes an infinite set. Note that * is
short for {0,} and + is short for {1,}.

BS. Consider /a+^/.

The rest describe finite sets.
It is assumed, of course, that the regex is anchored on both sites,
i.e. it has to match the whole string (/^re$/)

"Literally" this is not true (re = a|b). One needs something like

/^(?:RE)$/

It is also assumed that the set of characters is finite.

BTW, this is not true with Perl. Currently the number of distinct characters
is limited by the width of integers (this differs utf8 from UTF-8,
which IS limited).

Hope this helps,
Ilya

Regexp slowdown	12	Sep 5, 2006
Generating all permutations from a regexp	9	Dec 22, 2006
Hard regexp problem	4	Oct 23, 2004
Building the finite state machine of a Ruby regexp	3	Aug 1, 2006
Can anyone write this recursion for simple regexp more beautifullyand clearly than the braggarts	157	Aug 29, 2009
generate all possible strings of given length given a set of characters	6	Oct 3, 2005
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
My Regexp XML Parser -> Structured Perl Data, Cut & Paste Version, No Module's (Vol I)	43	Dec 21, 2005

The inverse problem: generate all instances of a regexp

Dr.Ruud

Andrei Alexandrescu (See Website For Email)

Andrei Alexandrescu (See Website For Email)

Mons

Ilya Zakharevich

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads