Regular Expression Generator

jeremyje · Jun 26, 2006

Is there a library or a way to generate an appropriate regular
expression for any given input string?
(remove quotes for examples)
For example: "1234567890abcdef is in hex9"
Regex Generator returns: [0-9|A-F]{16} [a-z]{2} [a-z]{2} [0-9|a-z]{3}

Or anything that does some sort of similar processing?

Josef Moellers · Jun 26, 2006

Is there a library or a way to generate an appropriate regular
expression for any given input string?
(remove quotes for examples)
For example: "1234567890abcdef is in hex9"
Regex Generator returns: [0-9|A-F]{16} [a-z]{2} [a-z]{2} [0-9|a-z]{3}

Or anything that does some sort of similar processing?

Hardly.
First of all, your example is incorrect: "[0-9|A-F]{16}" will not match
"1...abcdef".
Second, The following RE will also match:
"1234567890abcdef is in hex9" as will
"[0-9a-z]{16} [0-9a-z]{2} [0-9a-z]{2} [0-9a-z]{3}" as will
".{16} .{2] .{2} .{4}" as will
".*\s.*\s.*\s.*" as will
"\S+\s+\S+\s+\S+\s+\S+"

IOW There is no single "appropriate regular expression" but infinitly
many (or some number close to infinity) that it's unpractical.

Reto · Jun 26, 2006

I have noticed there is a software available:
http://www.regexbuddy.com/perl.html
I did not try yet.
I would suggest to collect your recipes and make a list of common used
regex's ;-)
BR,
Reto

Xicheng Jia · Jun 26, 2006

Reto said:
I have noticed there is a software available:
http://www.regexbuddy.com/perl.html
I did not try yet.
I would suggest to collect your recipes and make a list of common used
regex's ;-)

check this:

http://regexlib.com/

and the "External links" section in the following link:

http://www.answers.com/regular+expression?gwp=11&ver=2.0.0.453&method=3

and any questions, goto:

http://regexadvice.com/forums/default.aspx

Good luck,
Xicheng

Jürgen Exner · Jun 26, 2006

Is there a library or a way to generate an appropriate regular
expression for any given input string?
(remove quotes for examples)
For example: "1234567890abcdef is in hex9"
Regex Generator returns: [0-9|A-F]{16} [a-z]{2} [a-z]{2} [0-9|a-z]{3}

Or anything that does some sort of similar processing?

Well, yes, sure: actually the desired RE is a constant: .*
For a more advanced RE you can even quantify it with the length of the
string.

Seriously: it is impossible to derive a generic RE pattern from a single
text sample.

And you provided the point in case: why are you scanning for [a-f] in the
first part (I assume the upper case is a mistake, otherwise the RE wouldn't
match anyway) but for a-z in the second part? Shouldn't that be [is] or
maybe /is/? Without knowing the generic pattern it is impossible to know
what RE you me be looking for.

Jue

Dr.Ruud · Jun 26, 2006

(e-mail address removed) schreef:

Is there a library or a way to generate an appropriate regular
expression for any given input string?
(remove quotes for examples)
For example: "1234567890abcdef is in hex9"
Regex Generator returns: [0-9|A-F]{16} [a-z]{2} [a-z]{2} [0-9|a-z]{3}

Or anything that does some sort of similar processing?

I once created a Visual Basic-function that derived a mask from the
lines of a file. All the lines were supposed to have the same length,
and all characters were printable, so that made it a lot easier.

It would return a string of the same length. Special character values
were used for character sets, like 0x01 for [A-Z], 0x02 for [a-z], 0x03
for [A-Za-z], 0x04 for [0-9], 0x05 for [0-9A-Z], 0x07 for [0-9A-Za-z],
etc. It even recognized EBCDIC-numericals. It could also show a '@' for
alpha and a '#' for numeric.

A graphical character like ',' would mean that all lines in the file had
a ',' in that position. All in all it was very handy to get a quick idea
of what a fixed record file was about.

Ted Zlatanov · Jun 26, 2006

On 26 Jun 2006, (e-mail address removed) wrote:

Seriously: it is impossible to derive a generic RE pattern from a single
text sample.

I think this is incorrect, Jurgen. The OP was asking about an
appropriate, not a generic regex. Other than
http://search.cpan.org/~dankogai/Regexp-Optimizer-0.15/lib/Regexp/Optimizer.pm
(which I mentioned in c.l.p.modules to answer his post, before I saw
his cross-post here), you can always just say

my $regex = '^(' . join('|', @strings) . ')$';

and that's a regex that will match any given non-empty strings.

Ted

Dr.Ruud · Jun 26, 2006

Ted Zlatanov schreef:

my $regex = '^(' . join('|', @strings) . ')$';

and that's a regex that will match any given non-empty strings.

'^(?:' . join( '|', map quotemeta, grep /./, @strings ) . ')$'

Ala Qumsieh · Jun 27, 2006

Dr.Ruud said:
Ted Zlatanov schreef:

'^(?:' . join( '|', map quotemeta, grep /./, @strings ) . ')$'

This solution has a caveat. Regexps have a maximum length (65539 bytes I
believe). If you have enough strings in @strings (or if they are long
enough), then the compiled regexp can exceed this length, and error out. I
encountered this once, and the solution I resorted to was to construct an
anonymous sub on the fly:

my $string = <<EOS;
sub {
local \$_ = shift;
return 1 if /\Q$string[0]\E/;
return 1 if /\Q$string[1]\E/;
....
}
EOS

my $matches = eval $string;

Then use this anon sub to match:

if ($matches->($myString)) { ... }

--Ala

Dr.Ruud · Jun 27, 2006

Ala Qumsieh schreef:

Dr.Ruud:

This solution has a caveat. Regexps have a maximum length (65539
bytes I believe). If you have enough strings in @strings (or if they
are long enough), then the compiled regexp can exceed this length,
and error out. I encountered this once, and the solution I resorted
to was to construct an anonymous sub on the fly:

If so, it would have the same problem, because any of the strings can be
too long.

perl -Mwarnings -le '
$n = 1_000_000 ;
$_ = ".." x $n ;
$r = qr/^\Q$_\E$/ ;
print length($r), ":", /$r/ ;
'

prints 4000011:1

Jürgen Exner · Jun 27, 2006

Ted said:
On 26 Jun 2006, (e-mail address removed) wrote:

I think this is incorrect, Jurgen. The OP was asking about an
appropriate, not a generic regex. Other than
http://search.cpan.org/~dankogai/Regexp-Optimizer-0.15/lib/Regexp/Optimizer.pm
(which I mentioned in c.l.p.modules to answer his post, before I saw
his cross-post here), you can always just say

my $regex = '^(' . join('|', @strings) . ')$';

and that's a regex that will match any given non-empty strings.

True. As will /.+/. And the other extreme is /\Q$string\E/.

Chances are the OP was looking for neither of those 'solution' but for
something in between.
But where the right 'in between' can be found that is something you cannot
decide based on a single sample.

jue

Ted Zlatanov · Jun 27, 2006

This solution has a caveat. Regexps have a maximum length (65539 bytes I
believe). If you have enough strings in @strings (or if they are long
enough), then the compiled regexp can exceed this length, and error out. I
encountered this once, and the solution I resorted to was to construct an
anonymous sub on the fly:

You and Dr. Ruud make great points. My original code was written in
haste, sorry about that. If I did it with some brainwaves active, it
would have been:

# untested
my %hash;
$hash{$_} = 1 foreach @strings;
sub matches { return exists $hash{shift()};}

No need for subroutines and eval(). Then you can use matches() in the
regex as a code escape

Isn't Perl great?

Ted

FAQ 6.24 How do I match a regular expression that's in a variable?	0	Apr 19, 2011
Regular Expression : Bad Character Range	0	Dec 20, 2013
Help needed with tough regular expression matching	11	Oct 12, 2009
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
regular expression	1	Aug 6, 2008
grimace: a fluent regular expression generator in Python	0	Jul 15, 2013
about condensed regular expression syntax	7	Jun 27, 2007
Requesting regular expression help	12	Feb 26, 2010

Regular Expression Generator

jeremyje

Josef Moellers

Reto

Xicheng Jia

Jürgen Exner

Dr.Ruud

Ted Zlatanov

Dr.Ruud

Ala Qumsieh

Dr.Ruud

Jürgen Exner

Ted Zlatanov

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads