The inverse problem: generate all instances of a regexp

  • Thread starter Andrei Alexandrescu (See Website For Email)
  • Start date
A

Andrei Alexandrescu (See Website For Email)

Frederic said:
From a theoretical point of view : the easiest way to do what you want is to
turn your regular expression into a finite automaton (see, for example,
http://lambda.uta.edu/cse5317/spring01/notes/node9.html
and nodes around). Then, enumerating the words recognized by a f.a. or
telling if it matches an infinite set of words is easy. I give a simple
implementation of it at the end of this message. Here is a simple
example of use:
[snip]

Thanks a lot for your idea and code, Frederic. I'll use them; the
problem that remains is parsing the darn regular expression :eek:). But, I
approach it on a need basis.


Andrei
 
M

Mons

Andrei said:
I am writing a "make" facility for perl. Yes, Unix "make". I find make
obtuse and limited in unexpected ways, and I'm generating many files
from within Perl itself, so I wrote a little engine that allows me to write:

use Make;

sub Compile { ... }
sub Link { ... }

Depends "aout", "main.o", \&Link;
Depends "main.o", ["main.c", "main.h"], \&Compile;

Make "aout";

Of course, the simplified example above doesn't show the power of the
engine, which shines through when it comes about automatically inferring
rules. For example, the last rule above could be inferred from:

DepPattern '(.*)\.o', ['$1.c', '$1.h'], \&Compile;

I'm quite pleased with having the power of full regexes when defining
makefile rules. Now, I wanted to do something more ambitious. I wanted
to be able to specify a pattern like:

DepPatternMulti 'debug|candidate|release',
'$multi/(.*)\.o', ['$1.c', '$1.h'], \&Compile;

The line above would be equivalent with:

DepPattern 'debug/(.*)\.o', ['$1.c', '$1.h'], \&Compile;
DepPattern 'candidate/(.*)\.o', ['$1.c', '$1.h'], \&Compile;
DepPattern 'release/(.*)\.o', ['$1.c', '$1.h'], \&Compile;

For that, I'd need to expand a possibly complex regexp into all of the
possible strings it could generate. Of course, more limited solutions
would be usable, but the "ars gratia artis" solution would involve
dealing with the full-fledged problem.
May be look at this from another point?
You write regex's as you want, even "infinite", but they should match
the target files.
When you need to do the tasks, you, first of all, traverse the
filesystem and build the tree or list of all files.
(For ex. not all, but from the current directory)
Then, when doing task, you iterate not over you variants ov regex, but
over filelist, applying the patterns to filenames.
Number of files on filesystem is always limited (at this time :), so
you may not care about "infinity" of regex :)

PS: Sorry for pure English :)
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Ingo Menger
No, its trivial.
Any regex containing {n,} describes an infinite set. Note that * is
short for {0,} and + is short for {1,}.

BS. Consider /a+^/.
The rest describe finite sets.
It is assumed, of course, that the regex is anchored on both sites,
i.e. it has to match the whole string (/^re$/)

"Literally" this is not true (re = a|b). One needs something like

/^(?:RE)$/
It is also assumed that the set of characters is finite.

BTW, this is not true with Perl. Currently the number of distinct characters
is limited by the width of integers (this differs utf8 from UTF-8,
which IS limited).

Hope this helps,
Ilya
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,792
Messages
2,569,639
Members
45,348
Latest member
RoscoeNevi

Latest Threads

Top