Regular Expression Generator

Discussion in 'Perl Misc' started by jeremyje@gmail.com, Jun 26, 2006.

  1. Guest

    Is there a library or a way to generate an appropriate regular
    expression for any given input string?
    (remove quotes for examples)
    For example: "1234567890abcdef is in hex9"
    Regex Generator returns: [0-9|A-F]{16} [a-z]{2} [a-z]{2} [0-9|a-z]{3}

    Or anything that does some sort of similar processing?
     
    , Jun 26, 2006
    #1
    1. Advertising

  2. wrote:
    > Is there a library or a way to generate an appropriate regular
    > expression for any given input string?
    > (remove quotes for examples)
    > For example: "1234567890abcdef is in hex9"
    > Regex Generator returns: [0-9|A-F]{16} [a-z]{2} [a-z]{2} [0-9|a-z]{3}
    >
    > Or anything that does some sort of similar processing?


    Hardly.
    First of all, your example is incorrect: "[0-9|A-F]{16}" will not match
    "1...abcdef".
    Second, The following RE will also match:
    "1234567890abcdef is in hex9" as will
    "[0-9a-z]{16} [0-9a-z]{2} [0-9a-z]{2} [0-9a-z]{3}" as will
    ".{16} .{2] .{2} .{4}" as will
    ".*\s.*\s.*\s.*" as will
    "\S+\s+\S+\s+\S+\s+\S+"

    IOW There is no single "appropriate regular expression" but infinitly
    many (or some number close to infinity) that it's unpractical.

    --
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize
    -- T. Pratchett
     
    Josef Moellers, Jun 26, 2006
    #2
    1. Advertising

  3. Reto Guest

    I have noticed there is a software available:
    http://www.regexbuddy.com/perl.html
    I did not try yet.
    I would suggest to collect your recipes and make a list of common used
    regex's ;-)
    BR,
    Reto
    --

    wrote:
    > Is there a library or a way to generate an appropriate regular
    > expression for any given input string?
    > (remove quotes for examples)
    > For example: "1234567890abcdef is in hex9"
    > Regex Generator returns: [0-9|A-F]{16} [a-z]{2} [a-z]{2} [0-9|a-z]{3}
    >
    > Or anything that does some sort of similar processing?
     
    Reto, Jun 26, 2006
    #3
  4. Xicheng Jia Guest

    Xicheng Jia, Jun 26, 2006
    #4
  5. wrote:
    > Is there a library or a way to generate an appropriate regular
    > expression for any given input string?
    > (remove quotes for examples)
    > For example: "1234567890abcdef is in hex9"
    > Regex Generator returns: [0-9|A-F]{16} [a-z]{2} [a-z]{2} [0-9|a-z]{3}
    >
    > Or anything that does some sort of similar processing?


    Well, yes, sure: actually the desired RE is a constant: .*
    For a more advanced RE you can even quantify it with the length of the
    string.

    Seriously: it is impossible to derive a generic RE pattern from a single
    text sample.

    And you provided the point in case: why are you scanning for [a-f] in the
    first part (I assume the upper case is a mistake, otherwise the RE wouldn't
    match anyway) but for a-z in the second part? Shouldn't that be [is] or
    maybe /is/? Without knowing the generic pattern it is impossible to know
    what RE you me be looking for.

    Jue
     
    Jürgen Exner, Jun 26, 2006
    #5
  6. Dr.Ruud Guest

    schreef:

    > Is there a library or a way to generate an appropriate regular
    > expression for any given input string?
    > (remove quotes for examples)
    > For example: "1234567890abcdef is in hex9"
    > Regex Generator returns: [0-9|A-F]{16} [a-z]{2} [a-z]{2} [0-9|a-z]{3}
    >
    > Or anything that does some sort of similar processing?


    I once created a Visual Basic-function that derived a mask from the
    lines of a file. All the lines were supposed to have the same length,
    and all characters were printable, so that made it a lot easier.

    It would return a string of the same length. Special character values
    were used for character sets, like 0x01 for [A-Z], 0x02 for [a-z], 0x03
    for [A-Za-z], 0x04 for [0-9], 0x05 for [0-9A-Z], 0x07 for [0-9A-Za-z],
    etc. It even recognized EBCDIC-numericals. It could also show a '@' for
    alpha and a '#' for numeric.

    A graphical character like ',' would mean that all lines in the file had
    a ',' in that position. All in all it was very handy to get a quick idea
    of what a fixed record file was about.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Jun 26, 2006
    #6
  7. Ted Zlatanov Guest

    On 26 Jun 2006, wrote:

    wrote:
    >> Is there a library or a way to generate an appropriate regular
    >> expression for any given input string?


    > Seriously: it is impossible to derive a generic RE pattern from a single
    > text sample.


    I think this is incorrect, Jurgen. The OP was asking about an
    appropriate, not a generic regex. Other than
    http://search.cpan.org/~dankogai/Regexp-Optimizer-0.15/lib/Regexp/Optimizer.pm
    (which I mentioned in c.l.p.modules to answer his post, before I saw
    his cross-post here), you can always just say

    my $regex = '^(' . join('|', @strings) . ')$';

    and that's a regex that will match any given non-empty strings.

    Ted
     
    Ted Zlatanov, Jun 26, 2006
    #7
  8. Dr.Ruud Guest

    Ted Zlatanov schreef:

    > my $regex = '^(' . join('|', @strings) . ')$';
    >
    > and that's a regex that will match any given non-empty strings.


    '^(?:' . join( '|', map quotemeta, grep /./, @strings ) . ')$'

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Jun 26, 2006
    #8
  9. Ala Qumsieh Guest

    Dr.Ruud wrote:

    > Ted Zlatanov schreef:
    >
    >> my $regex = '^(' . join('|', @strings) . ')$';
    >>
    >> and that's a regex that will match any given non-empty strings.

    >
    > '^(?:' . join( '|', map quotemeta, grep /./, @strings ) . ')$'


    This solution has a caveat. Regexps have a maximum length (65539 bytes I
    believe). If you have enough strings in @strings (or if they are long
    enough), then the compiled regexp can exceed this length, and error out. I
    encountered this once, and the solution I resorted to was to construct an
    anonymous sub on the fly:

    my $string = <<EOS;
    sub {
    local \$_ = shift;
    return 1 if /\Q$string[0]\E/;
    return 1 if /\Q$string[1]\E/;
    ....
    }
    EOS

    my $matches = eval $string;

    Then use this anon sub to match:

    if ($matches->($myString)) { ... }

    --Ala
     
    Ala Qumsieh, Jun 27, 2006
    #9
  10. Dr.Ruud Guest

    Ala Qumsieh schreef:
    > Dr.Ruud:
    >> Ted Zlatanov:


    >>> my $regex = '^(' . join('|', @strings) . ')$';
    >>>
    >>> and that's a regex that will match any given non-empty strings.

    >>
    >> '^(?:' . join( '|', map quotemeta, grep /./, @strings ) . ')$'

    >
    > This solution has a caveat. Regexps have a maximum length (65539
    > bytes I believe). If you have enough strings in @strings (or if they
    > are long enough), then the compiled regexp can exceed this length,
    > and error out. I encountered this once, and the solution I resorted
    > to was to construct an anonymous sub on the fly:


    If so, it would have the same problem, because any of the strings can be
    too long.

    perl -Mwarnings -le '
    $n = 1_000_000 ;
    $_ = ".." x $n ;
    $r = qr/^\Q$_\E$/ ;
    print length($r), ":", /$r/ ;
    '

    prints 4000011:1

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Jun 27, 2006
    #10
  11. Ted Zlatanov wrote:
    > On 26 Jun 2006, wrote:
    >
    > wrote:
    >>> Is there a library or a way to generate an appropriate regular
    >>> expression for any given input string?

    >
    >> Seriously: it is impossible to derive a generic RE pattern from a
    >> single text sample.

    >
    > I think this is incorrect, Jurgen. The OP was asking about an
    > appropriate, not a generic regex. Other than
    > http://search.cpan.org/~dankogai/Regexp-Optimizer-0.15/lib/Regexp/Optimizer.pm
    > (which I mentioned in c.l.p.modules to answer his post, before I saw
    > his cross-post here), you can always just say
    >
    > my $regex = '^(' . join('|', @strings) . ')$';
    >
    > and that's a regex that will match any given non-empty strings.



    True. As will /.+/. And the other extreme is /\Q$string\E/.

    Chances are the OP was looking for neither of those 'solution' but for
    something in between.
    But where the right 'in between' can be found that is something you cannot
    decide based on a single sample.

    jue
     
    Jürgen Exner, Jun 27, 2006
    #11
  12. Ted Zlatanov Guest

    On 26 Jun 2006, wrote:

    Dr.Ruud wrote:
    >
    >> Ted Zlatanov schreef:
    >>
    >>> my $regex = '^(' . join('|', @strings) . ')$';
    >>>
    >>> and that's a regex that will match any given non-empty strings.

    >>
    >> '^(?:' . join( '|', map quotemeta, grep /./, @strings ) . ')$'

    >
    > This solution has a caveat. Regexps have a maximum length (65539 bytes I
    > believe). If you have enough strings in @strings (or if they are long
    > enough), then the compiled regexp can exceed this length, and error out. I
    > encountered this once, and the solution I resorted to was to construct an
    > anonymous sub on the fly:


    You and Dr. Ruud make great points. My original code was written in
    haste, sorry about that. If I did it with some brainwaves active, it
    would have been:

    # untested
    my %hash;
    $hash{$_} = 1 foreach @strings;
    sub matches { return exists $hash{shift()};}

    No need for subroutines and eval(). Then you can use matches() in the
    regex as a code escape :) Isn't Perl great?

    Ted
     
    Ted Zlatanov, Jun 27, 2006
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,382
  2. Replies:
    9
    Views:
    606
  3. Ben Last
    Replies:
    0
    Views:
    70
    Ben Last
    Jul 15, 2013
  4. Joshua Landau
    Replies:
    0
    Views:
    101
    Joshua Landau
    Jul 16, 2013
  5. Anders J. Munch
    Replies:
    1
    Views:
    80
    Roy Smith
    Jul 17, 2013
Loading...

Share This Page