about condensed regular expression syntax

Discussion in 'Perl Misc' started by raksha34@gmail.com, Jun 27, 2007.

  1. Guest

    hi all,

    i have to match the following types of strings:

    PTY
    IN_B
    IN[3]
    ADD<2>
    SUM{25}
    MULT(9)

    Here's my attempt at condensing the regular expression:

    use strict;
    use warnings;

    my @Data = qw(
    PTY
    COUNT2
    IN_B
    IN[3]
    ADD<2>
    SUM{25}
    MULT(9)
    );

    my %h = qw(
    [ ]
    { }
    ( )
    < >
    );

    my $pin_re = q/\A[a-zA-Z]\w*(?:([<[({])\d+$h{\1})?\z/;

    for my $var (@Data) {
    if ($var =~ m/$pin_re/) {
    print "$var match";
    }
    else {
    print "$var NOmatch";
    }
    }
    **************************** END of CODE **************

    This is what if get:

    PTY match
    COUNT2 match
    IN_B match
    IN[3] NOmatch
    ADD<2> NOmatch
    SUM{25} NOmatch
    MULT(9) NOmatch
    ****************************** END of OUTPUT **********

    The reason for writing the regular expression in this format
    was to avoid having to use a lot ORs.

    but it doesnt work.

    Can you suggest someway of fixing this?

    Thanks,
    Rakesh
    , Jun 27, 2007
    #1
    1. Advertising

  2. wrote:
    > i have to match the following types of strings:
    >
    > my @Data = qw(
    > PTY
    > COUNT2
    > IN_B
    > IN[3]
    > ADD<2>
    > SUM{25}
    > MULT(9)
    > );


    The RE
    /.+/
    will perfectly match those strings.

    It will also match a few other strings, quite a few actually, but as you
    didn't specify any criteria for what strings not to match that should be ok.

    jue
    Jürgen Exner, Jun 27, 2007
    #2
    1. Advertising

  3. Guest

    Ok, a valid string is of the following form:

    i) must start with an alphabet
    ii) then it can be any alphanumeric after that. it can end here, but
    if not then rule iii) applies
    iii) and finally it may or may not end in the following 4 forms:

    [num]
    <num>
    {num}
    (num)

    *** num means any nonnegative integer.


    thanks,

    Rakesh




    Jürgen Exner wrote:
    > wrote:
    > > i have to match the following types of strings:
    > >
    > > my @Data = qw(
    > > PTY
    > > COUNT2
    > > IN_B
    > > IN[3]
    > > ADD<2>
    > > SUM{25}
    > > MULT(9)
    > > );

    >
    > The RE
    > /.+/
    > will perfectly match those strings.
    >
    > It will also match a few other strings, quite a few actually, but as you
    > didn't specify any criteria for what strings not to match that should be ok.
    >
    > jue
    , Jun 27, 2007
    #3
  4. -berlin.de Guest

    <> wrote in comp.lang.perl.misc:
    > hi all,
    >
    > i have to match the following types of strings:
    >
    > PTY
    > IN_B
    > IN[3]
    > ADD<2>
    > SUM{25}
    > MULT(9)
    >
    > Here's my attempt at condensing the regular expression:
    >
    > use strict;
    > use warnings;
    >
    > my @Data = qw(
    > PTY
    > COUNT2
    > IN_B
    > IN[3]
    > ADD<2>
    > SUM{25}
    > MULT(9)
    > );
    >
    > my %h = qw(
    > [ ]
    > { }
    > ( )
    > < >
    > );
    >
    > my $pin_re = q/\A[a-zA-Z]\w*(?:([<[({])\d+$h{\1})?\z/;


    Uh, no, that won't work. I'm not sure how it even compiles, but
    that kind of match-time replacement only works on the replacement
    side of an s///, not in a regex.

    [...]

    > The reason for writing the regular expression in this format
    > was to avoid having to use a lot ORs.
    >
    > but it doesnt work.
    >
    > Can you suggest someway of fixing this?


    Well, use the or's. You don't have to write them yourself. Using your
    table %h from above:

    my $paren_re = join '|' => map "\Q$_\E\\d+\Q$h{$_}\E" => keys %h;
    my $pin_re = qr/\A[a-zA-Z]\w*(?:$paren_re)?\z/;

    That should do what you want.

    The alternative would be to use (?{{ code }}) insertions to provide
    the the closing counterpart, but ugh... I haven't tried this.

    Anno
    -berlin.de, Jun 27, 2007
    #4
  5. On Wed, 27 Jun 2007 06:09:15 -0700, wrote:

    >i) must start with an alphabet


    [a-zA-Z] or [a-z] with -i

    >ii) then it can be any alphanumeric after that. it can end here, but
    >if not then rule iii) applies


    "any" means zero or more? \w*

    >iii) and finally it may or may not end in the following 4 forms:
    >
    >[num]
    ><num>
    >{num}
    >(num)


    Simple enough IMHO to go with the "or":
    (?:\[\d+\]|<\d+>|\{\d+\}|\(\d+\))?. I must say that I've spent some
    time now trying to do the same thing with a hash approach, but all in
    all it seems to me that all attempts are more costly in terms of
    space. All in all I would go this way (/x added for clarity):

    /[a-z]
    \w+
    (?:
    \[\d+\]
    |
    <\d+>
    |
    \{\d+\}
    |
    \(\d+\)
    )?/ix


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
    Michele Dondi, Jun 27, 2007
    #5
  6. Paul Lalli Guest

    On Jun 27, 9:09 am, wrote:
    > Ok, a valid string is of the following form:
    >
    > i) must start with an alphabet


    /^[a-zA-Z] <...>

    > ii) then it can be any alphanumeric after that. it can end here, but


    <...> [a-zA-Z0-9]+(?:<...>)?$/


    > if not then rule iii) applies
    > iii) and finally it may or may not end in the following 4 forms:
    >
    > [num]
    > <num>
    > {num}
    > (num)


    <...> (?:\[\d+\]|<\d+>|\{\d+\}|\(\d+\)) <...>


    Put it all together:
    /^ #beginning of string
    [a-zA-Z] #start with an alpha
    [a-zA-Z0-9]+ #continue with 1 or more alphanums
    (?:\[\d+\]|<\d+>|\{\d+\}|\(\d+\))? #optionally your digits
    $/x #end of string

    Paul Lalli

    P.S. I'm not entirely certain that all of ] } and ) need to be
    escaped, but they won't hurt.
    Paul Lalli, Jun 27, 2007
    #6
  7. Mirco Wahab Guest

    wrote:
    > Ok, a valid string is of the following form:
    >
    > i) must start with an alphabet
    > ii) then it can be any alphanumeric after that. it can end here, but
    > if not then rule iii) applies
    > iii) and finally it may or may not end in the following 4 forms:
    >
    > [num]
    > <num>
    > {num}
    > (num)
    >
    > *** num means any nonnegative integer.
    >
    >


    Your approach wasn't that bad in the first place.
    Please note that some of your replacement chars
    might be special in regex context ==> the ')'.

    The hash thing needs to be enveloped into an
    code assertion, like

    ...
    my @Data = qw'
    PTY
    COUNT2
    IN_B
    IN[3]
    ADD<2>
    SUM{25}
    MULT(9) ';

    my %h = qw' [ ] { } ( \) < > ';

    my $pin_re = qr/^[A-z]+\w?
    (?:
    ( [<{[(] ) \d+
    (??{"$h{$1}"})
    )?
    $/x;

    for (@Data) {
    print "$_ " . (/$pin_re/ ? 'OK' : 'NO') . " match\n"
    }
    ...

    Regards

    M.
    Mirco Wahab, Jun 27, 2007
    #7
  8. Guest

    Hello all,

    Thank you all for helping me out on this. I really appreciate
    everybody's help!

    Actually, what I wanted to do was what Mirco has given.
    The (??{....}) concept is really great.

    I wanted to avoid doing the ORing in the regular expressions as it
    hurts the scalability.

    Thanks once again everybody for the help.

    Regards,
    Rakesh.



    Mirco Wahab wrote:
    > wrote:
    > > Ok, a valid string is of the following form:
    > >
    > > i) must start with an alphabet
    > > ii) then it can be any alphanumeric after that. it can end here, but
    > > if not then rule iii) applies
    > > iii) and finally it may or may not end in the following 4 forms:
    > >
    > > [num]
    > > <num>
    > > {num}
    > > (num)
    > >
    > > *** num means any nonnegative integer.
    > >
    > >

    >
    > Your approach wasn't that bad in the first place.
    > Please note that some of your replacement chars
    > might be special in regex context ==> the ')'.
    >
    > The hash thing needs to be enveloped into an
    > code assertion, like
    >
    > ...
    > my @Data = qw'
    > PTY
    > COUNT2
    > IN_B
    > IN[3]
    > ADD<2>
    > SUM{25}
    > MULT(9) ';
    >
    > my %h = qw' [ ] { } ( \) < > ';
    >
    > my $pin_re = qr/^[A-z]+\w?
    > (?:
    > ( [<{[(] ) \d+
    > (??{"$h{$1}"})
    > )?
    > $/x;
    >
    > for (@Data) {
    > print "$_ " . (/$pin_re/ ? 'OK' : 'NO') . " match\n"
    > }
    > ...
    >
    > Regards
    >
    > M.
    , Jun 28, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,281
  2. Sreedhar Vankayala
    Replies:
    2
    Views:
    698
    Sreedhar Vankayala
    Dec 4, 2004
  3. pekka niiranen
    Replies:
    5
    Views:
    516
    Paul McGuire
    Oct 20, 2004
  4. David Hirschfield

    Help: Creating condensed expressions

    David Hirschfield, Mar 24, 2006, in forum: Python
    Replies:
    7
    Views:
    322
    Bruno Desthuilliers
    Mar 24, 2006
  5. Ion Freeman
    Replies:
    7
    Views:
    1,039
    Arne Vajhøj
    Jun 24, 2009
Loading...

Share This Page