matches -> regexp ?

A

ako...

Hello,

this is not exactly a ruby problem, i hope people will not consider
this spamming.

does anyone know of any reference or an algorithm to build a regular
expression (a finite automaton, in general) that would match a given
set of strings? i of course mean a regexp that would match just the
given strings, and not any other. so solutions like /^.*$/ are not
acceptable ;-)

thanks
konstantin
 
R

Robert Retzbach

ako... said:
Hello,

this is not exactly a ruby problem, i hope people will not consider
this spamming.

does anyone know of any reference or an algorithm to build a regular
expression (a finite automaton, in general) that would match a given
set of strings? i of course mean a regexp that would match just the
given strings, and not any other. so solutions like /^.*$/ are not
acceptable ;-)

thanks
konstantin
nice challenge, reminds me a bit of the 4th rubyquiz :>
 
E

Edward Faulkner

--o0ZfoUVt4BxPQnbU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

This is essentially an information compression problem. There's
always a trivial solution:

s = %w{find only these strings}
r = Regexp.new(s.map {|w| '\A' + w + '\Z'}.join('|'))

The _real_ problem is finding the shortest possible regexp. ;-)

--o0ZfoUVt4BxPQnbU
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD4DBQFDneKDnhUz11p9MSARAhZdAJikR9LxgvRKe+HQx+NM/iFPTQPDAKCg7yyF
k9ZvZV/M0aud/vCU9NxSLw==
=EgnI
-----END PGP SIGNATURE-----

--o0ZfoUVt4BxPQnbU--
 
A

ako...

i suspect that from your trivial solution we can arrive at the final
solution by just performing a DFA minimization. the same thing that lex
and yacc do when they generate code. so i guess i have answered my own
question. thanks for helping me. your trivial solution just made my
brain work.

konstantin
 
R

Robert Klemme

ako... said:
i suspect that from your trivial solution we can arrive at the final
solution by just performing a DFA minimization. the same thing that
lex and yacc do when they generate code. so i guess i have answered
my own question. thanks for helping me. your trivial solution just
made my brain work.

Another option is to build up a tree and create the RX from that. I once
wrote that:

14:12:23 [c]: ( echo foo; echo bar; echo band ) | create-rx.rb
(?:ba(?:nd|r)|foo)

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top