Negated Perl Regexp

Discussion in 'Perl Misc' started by Ronny, May 30, 2006.

  1. Ronny

    Ronny Guest

    If I want to express that a variable $v does NOT match some regular
    expression RE,
    I usually write this as

    $v !~ /RE/ and print "string does not contain pattern\n"

    Is there an easy way to write this in a positive way, i.e using $v =~
    /.../ ?

    I thought about using some of the zero-width lookahead operators, such
    as

    $v =~ /($?RE)/ # DOES NOT WORK

    but this does not work of course, because in general, somewhere within
    $v *will* be a position where RE would not match, even if RE would
    match
    at some other position.


    Background of what this is needed for: I'm writing tiny utilities in
    Perl, which
    act as a filter for input text. Typically, the core of the "program"
    contains
    something like

    /$PATTERN/ && print(transform($_))

    i.e. read all lines from stdin, and if they match some pattern, print
    out a transformed
    version of the line. The is supplied via ARGV. This works fine, but I
    also would like
    the user of this utility to be able to *revert* the sense (i.e. read
    all lines from stdin,
    and if they DO NOT match the pattern, etc.), like you have with grep
    (where the
    option -v reverts the test). The keypoint here is that in this
    particular application,
    I would prefer NOT to introduce an option such as grep's "-v" to my
    utility, but encode
    the "negation of the pattern" into the pattern itself.

    Is this possible at all within the realm of Perl regular expressions,
    or do I have
    to invent my own workaround (which of course would be possible)?
     
    Ronny, May 30, 2006
    #1
    1. Advertising

  2. Ronny

    Mirco Wahab Guest

    Thus spoke Ronny (on 2006-05-30 09:51):

    > Typically, the core of the "program" contains something like
    >
    > /$PATTERN/ && print(transform($_))
    > ...
    > This works fine, but I also would like the user of this utility
    > to be able to *revert* the sense (i.e. read all lines from stdin,
    > and if they DO NOT match the pattern, etc.),


    you mean

    print(transform($_)) unless /$PATTERN/;

    or something?

    Regards

    Mirco
     
    Mirco Wahab, May 30, 2006
    #2
    1. Advertising

  3. Ronny

    Xicheng Jia Guest

    Ronny wrote:
    > If I want to express that a variable $v does NOT match some regular
    > expression RE,
    > I usually write this as
    >
    > $v !~ /RE/ and print "string does not contain pattern\n"


    you can use "or"

    $v =~ /RE/ or print "string does not contain pattern\n";

    For better maintenance, it might be better to write it in the following
    form:

    if (not $v =~ /RE/) {
    print "string does not contain pattern\n";
    }

    Xicheng

    > Is there an easy way to write this in a positive way, i.e using $v =~
    > /.../ ?
    >
    > I thought about using some of the zero-width lookahead operators, such
    > as
    >
    > $v =~ /($?RE)/ # DOES NOT WORK
    >
    > but this does not work of course, because in general, somewhere within
    > $v *will* be a position where RE would not match, even if RE would
    > match
    > at some other position.
    >
    >
    > Background of what this is needed for: I'm writing tiny utilities in
    > Perl, which
    > act as a filter for input text. Typically, the core of the "program"
    > contains
    > something like
    >
    > /$PATTERN/ && print(transform($_))
    >
    > i.e. read all lines from stdin, and if they match some pattern, print
    > out a transformed
    > version of the line. The is supplied via ARGV. This works fine, but I
    > also would like
    > the user of this utility to be able to *revert* the sense (i.e. read
    > all lines from stdin,
    > and if they DO NOT match the pattern, etc.), like you have with grep
    > (where the
    > option -v reverts the test). The keypoint here is that in this
    > particular application,
    > I would prefer NOT to introduce an option such as grep's "-v" to my
    > utility, but encode
    > the "negation of the pattern" into the pattern itself.
    >
    > Is this possible at all within the realm of Perl regular expressions,
    > or do I have
    > to invent my own workaround (which of course would be possible)?
     
    Xicheng Jia, May 30, 2006
    #3
  4. Re: Negated Perl Regexp, Howabout qr in Modules?

    Ronny wrote:
    > $v !~ /RE/ and print "string does not contain pattern\n"
    > Is there an easy way to write this in a positive way, i.e using $v =~
    > /.../ ?

    I have a related question here. One case which the so far posted solutions
    don't address is the use of compiled regular expressions with the qr
    operator. Many modules can take qr regular expressions for filtering or
    homing in some particular datum. However, in some cases I'd like to use a
    negated test for matching. I'm not really willing to extend the original
    module code if I can avoid it. So, can one easily negate a qr-regexp when
    the module code supposedly uses =~ for testing?

    PS: The module in this case is:
    Win32::IE::Mechanize

    --
    With kind regards Veli-Pekka Tätilä ()
    Accessibility, game music, synthesizers and programming:
    http://www.student.oulu.fi/~vtatila/
     
    Veli-Pekka Tätilä, May 30, 2006
    #4
  5. Ronny

    Xicheng Jia Guest

    Re: Negated Perl Regexp, Howabout qr in Modules?

    Veli-Pekka Tätilä wrote:
    > Ronny wrote:
    > > $v !~ /RE/ and print "string does not contain pattern\n"
    > > Is there an easy way to write this in a positive way, i.e using $v =~
    > > /.../ ?

    > I have a related question here. One case which the so far posted solutions
    > don't address is the use of compiled regular expressions with the qr
    > operator. Many modules can take qr regular expressions for filtering or
    > homing in some particular datum. However, in some cases I'd like to use a
    > negated test for matching. I'm not really willing to extend the original
    > module code if I can avoid it. So, can one easily negate a qr-regexp when
    > the module code supposedly uses =~ for testing?


    you want to match anything except those matching the qr//
    expression???? so you might want to try the following:

    my $RE = qr/something here/;

    if ($v =~ /^(?:(?!$RE).)*$/) {
    # any string $v that doesnot match $RE
    }

    (untested)
    Xicheng

    > PS: The module in this case is:
    > Win32::IE::Mechanize
    >
    > --
    > With kind regards Veli-Pekka Tätilä ()
    > Accessibility, game music, synthesizers and programming:
    > http://www.student.oulu.fi/~vtatila/
     
    Xicheng Jia, May 30, 2006
    #5
  6. Ronny

    Ted Zlatanov Guest

    On 30 May 2006, wrote:

    > Background of what this is needed for: I'm writing tiny utilities in
    > Perl, which act as a filter for input text. Typically, the core of
    > the "program" contains something like
    >
    > /$PATTERN/ && print(transform($_))
    >
    > i.e. read all lines from stdin, and if they match some pattern,
    > print out a transformed version of the line. The is supplied via
    > ARGV. This works fine, but I also would like the user of this
    > utility to be able to *revert* the sense (i.e. read all lines from
    > stdin, and if they DO NOT match the pattern, etc.), like you have
    > with grep (where the option -v reverts the test).


    > The keypoint here is that in this particular application, I would
    > prefer NOT to introduce an option such as grep's "-v" to my utility,
    > but encode the "negation of the pattern" into the pattern itself.


    You either ask the user to rewrite $PATTERN, or you give a -v option.
    I don't understand how you would know *when* to negate the pattern
    without a -v option.

    > Is this possible at all within the realm of Perl regular
    > expressions, or do I have to invent my own workaround (which of
    > course would be possible)?


    Yes usually (for example, it may not work nicely if you have code
    embedded inside the regex, and there are many cases that are possible
    but computationally very expensive), but it's much more complicated to
    invert a regex than to invert the test for that regex.

    I honestly don't see a reason why you shouldn't provide a -v option,
    or some way for the user to say "invert this pattern", and then act
    upon that to invert the test. Maybe you can explain...

    Ted
     
    Ted Zlatanov, May 30, 2006
    #6
  7. Ronny

    Ronny Guest

    Ted Zlatanov schrieb:

    > On 30 May 2006, wrote:
    >
    > > Background of what this is needed for: I'm writing tiny utilities in
    > > Perl, which act as a filter for input text. Typically, the core of
    > > the "program" contains something like
    > >
    > > /$PATTERN/ && print(transform($_))
    > >
    > > i.e. read all lines from stdin, and if they match some pattern,
    > > print out a transformed version of the line. The is supplied via
    > > ARGV. This works fine, but I also would like the user of this
    > > utility to be able to *revert* the sense (i.e. read all lines from
    > > stdin, and if they DO NOT match the pattern, etc.), like you have
    > > with grep (where the option -v reverts the test).

    >
    > > The keypoint here is that in this particular application, I would
    > > prefer NOT to introduce an option such as grep's "-v" to my utility,
    > > but encode the "negation of the pattern" into the pattern itself.

    >
    > You either ask the user to rewrite $PATTERN, or you give a -v option.
    > I don't understand how you would know *when* to negate the pattern
    > without a -v option.


    You exactly got the point: I want the user to rewrite the Pattern. The
    question
    is, how to write a *negated* pattern using Perl RE Syntax?

    To the outside world (i.e. to the user), the interface always says kind
    of
    "Supply a pattern and you get a list of lines matching the pattern"
    (actually,
    the lines returned are transformed, but this is not the point here).
    Given
    *this* user interface, is it possible for the user to specify a pattern
    with
    negated meaning - for example, return all lines which do NOT contain
    the string "foo"?

    A variation of this question could be: Return all the lines which do
    contain
    the string "foo" and "bar", but ONLY if they do not contain "baz"
    somewhere
    between "foo" and "bar". I.e. the lines

    ...foo.......bar......baz... (OK, baz after bar)
    ...baz......foo......bar.... (OK, baz before bar)
    ...foo..................bar... (OK, no baz)

    should match, but the lines

    ...foo........baz......bar... (baz between foo and bar)
    ...foo........................... (bar missing)
    ...bar........................... (foo missing)

    should not match. Is it possible to express THIS using perl regexp,
    or do I break here the power of Perl regular expressions? If there
    is a solution to this foo/bar/baz problem, then there is obviously
    one for my original problem as well.

    > > Is this possible at all within the realm of Perl regular
    > > expressions, or do I have to invent my own workaround (which of
    > > course would be possible)?

    >
    > Yes usually (for example, it may not work nicely if you have code
    > embedded inside the regex, and there are many cases that are possible
    > but computationally very expensive), but it's much more complicated to
    > invert a regex than to invert the test for that regex.


    Of course, one hack for my original problem would be to "invent" a
    special
    character (say, exclamation mark) which is allowed to be at the very
    start
    of the expession, and just has the meaning "pattern has negated
    meaning".
    My Perl code would then be:

    if($pattern =~ /^!(.*)$/)
    {
    # negated meaning
    $pattern=$1; # drop ! from pattern
    print transform($line) unless($line =~ $pattern)
    }
    else
    {
    print transform($line) if ($line =~ $pattern)
    }

    This would do the job (and the exclamation mark here is just a "-v"
    switch
    in disguise), but I wondered whether the same effect could also be
    achieved
    by just changing the pattern in a suitable way.

    > I honestly don't see a reason why you shouldn't provide a -v option,


    The reason is because I simplified the problem very much so to make
    it better feasible to discuss here. The interesting point for me is not
    finding out whether the negation effect can be done solely within the
    pattern, or has to be "moved outside" to the distinction between
    =~ and !~, or if/unless construct.

    I have read the man pages about pattern "negation" (such as it occurs
    in the "negative lookahead pattern"), but I did not see whether they
    could
    be applied to my case.

    Ronald
     
    Ronny, May 31, 2006
    #7
  8. Ronny

    Ronny Guest

    Mirco Wahab schrieb:

    > Thus spoke Ronny (on 2006-05-30 09:51):
    >
    > > Typically, the core of the "program" contains something like
    > >
    > > /$PATTERN/ && print(transform($_))
    > > ...
    > > This works fine, but I also would like the user of this utility
    > > to be able to *revert* the sense (i.e. read all lines from stdin,
    > > and if they DO NOT match the pattern, etc.),

    >
    > you mean
    >
    > print(transform($_)) unless /$PATTERN/;
    >
    > or something?


    No, the corresponding code would always be as stated. I think I did not
    explain my problem in a very understandable way. See my reply to Ted
    for a more elaborate explanation.

    Maybe here a more mathematical formulation of the problem:

    Given an arbitrary Perl regexp P, is it then possible to derive from it
    another
    regexp Q, with the property that for every string S the following
    equation holds:

    (S =~ P) == (S !~ Q)

    (S matches P if S does not match Q, and vice versa).

    I.e. is there a general mechanism within the Perl regexp realm which
    allows
    me to find a negated pattern for a given pattern?

    Of course this is easy for specific pattern. For example, assume that P
    is
    the pattern

    [abc]

    which means "every line which either contains at least one a, b or c
    somewhere".
    The negated pattern Q, "every line which contains neither a, b or c" is
    then

    ^[^abc]+$

    In this example, I have kind of "handcrafted" the negated pattern after
    having
    investigated the original pattern. For the [abc] case, it was easy to
    find the
    negated pattern, but in general, this might be hard, so I wondered
    whether
    Perl provided a specific construct which just negates a pattern.

    Ronald
     
    Ronny, May 31, 2006
    #8
  9. Ronny

    Ronny Guest

    Re: Negated Perl Regexp, Howabout qr in Modules?

    Xicheng Jia schrieb:
    > my $RE = qr/something here/;
    >
    > if ($v =~ /^(?:(?!$RE).)*$/) {
    > # any string $v that doesnot match $RE
    > }


    Great! I think this is something I could use for *my* original problem
    too!

    Thank you for pointing this out!

    Ronny
     
    Ronny, May 31, 2006
    #9
  10. Ronny

    Mumia W. Guest

    Ronny wrote:
    > [...]
    > Maybe here a more mathematical formulation of the problem:
    >
    > Given an arbitrary Perl regexp P, is it then possible to derive from it
    > another
    > regexp Q, with the property that for every string S the following
    > equation holds:
    >
    > (S =~ P) == (S !~ Q)
    >
    > (S matches P if S does not match Q, and vice versa).
    >
    > I.e. is there a general mechanism within the Perl regexp realm which
    > allows
    > me to find a negated pattern for a given pattern?
    >


    I don't think so, and given the complexity of RE's, it's probably
    impossible. But all is not lost.

    You could do what (Debian) aptitude does: Let the user place a prefix
    code in the RE that specifies inversion, e.g.:

    aptitude search '~niso-8859!~nbase'

    This searches for all Debian packages that have the string iso-8859 in
    their names, but excludes any that have 'base' in their names.

    ~n introduces an RE to match package names.
    !~n introduces an RE to *not* match package names.

    > Of course this is easy for specific pattern. For example, assume that P
    > is
    > the pattern
    >
    > [abc]
    >
    > which means "every line which either contains at least one a, b or c
    > somewhere".
    > The negated pattern Q, "every line which contains neither a, b or c" is
    > then
    >
    > ^[^abc]+$
    >
    > In this example, I have kind of "handcrafted" the negated pattern after
    > having
    > investigated the original pattern. For the [abc] case, it was easy to
    > find the
    > negated pattern, but in general, this might be hard, [...]


    Depending on the pattern, it might be so hard, supercomputers would take
    eternity to do it.
     
    Mumia W., May 31, 2006
    #10
  11. Ronny

    Mumia W. Guest

    Re: Negated Perl Regexp, Howabout qr in Modules?

    Xicheng Jia wrote:
    > [...]
    > you want to match anything except those matching the qr//
    > expression???? so you might want to try the following:
    >
    > my $RE = qr/something here/;
    >
    > if ($v =~ /^(?:(?!$RE).)*$/) {
    > # any string $v that doesnot match $RE
    > }
    >
    > (untested)
    > Xicheng
    >


    Well, I tested it, and it seems pretty darn good, and just like Ronny, I
    might end up using this in my programs if I can figure out how it works.
    Thanks Xicheng.
     
    Mumia W., May 31, 2006
    #11
  12. Ronny

    Ted Zlatanov Guest

    On 31 May 2006, wrote:

    > You exactly got the point: I want the user to rewrite the
    > Pattern. The question is, how to write a *negated* pattern using
    > Perl RE Syntax?


    You can do it for some cases, but because of limitations on memory and
    CPU cycles, most complex regexes can't be inverted in a reasonable
    amount of time. When there's code inside, it gets even worse.

    Look at the book "Higher-Order Perl" by Mark-Jason Dominus. It has a
    long section on finding all the strings that can match a given regular
    expression; if you read it carefully you'll see why inverting a
    regular expression is generally a hard problem, just as producing all
    the strings that match it.

    Note also that if security is a concern, giving users regexp access is
    equivalent to letting them run any code due to the code escapes
    possible in Perl's regex interpreter. It may be simpler to give the
    users a limited language with a NOT operator. Parse::RecDescent has
    some good examples of this kind of parser in the distribution. The
    users may also prefer this to the raw power of regexps, and it's what
    I would do for a production system.

    > Of course, one hack for my original problem would be to "invent" a
    > special character (say, exclamation mark) which is allowed to be at
    > the very start of the expession, and just has the meaning "pattern
    > has negated meaning".


    Yes :) That would be easiest.

    >> I honestly don't see a reason why you shouldn't provide a -v option,

    >
    > The reason is because I simplified the problem very much so to make
    > it better feasible to discuss here. The interesting point for me is
    > not finding out whether the negation effect can be done solely
    > within the pattern, or has to be "moved outside" to the distinction
    > between =~ and !~, or if/unless construct.


    It should be moved outside, so you can go on to finish the project :)

    Ted
     
    Ted Zlatanov, May 31, 2006
    #12
  13. Re: Negated Perl Regexp, Howabout qr in Modules?

    Xicheng Jia wrote:

    > my $RE = qr/something here/;
    >
    > if ($v =~ /^(?:(?!$RE).)*$/) {
    > # any string $v that doesnot match $RE
    > }
    >


    I've not benchmarked it but I'd suspect that's less efficient than the
    usual answer[1] the OP would have found if he'd been bothered to type
    "negate regex" into a Usenet search engine on this newsgroup.

    [1] The on ska gave.
     
    Brian McCauley, May 31, 2006
    #13
  14. Ronny

    Xicheng Jia Guest

    Re: Negated Perl Regexp, Howabout qr in Modules?

    Brian McCauley wrote:
    > Xicheng Jia wrote:
    >
    > > my $RE = qr/something here/;
    > >
    > > if ($v =~ /^(?:(?!$RE).)*$/) {
    > > # any string $v that doesnot match $RE
    > > }
    > >

    >
    > I've not benchmarked it but I'd suspect that's less efficient than the
    > usual answer[1] the OP would have found if he'd been bothered to type
    > "negate regex" into a Usenet search engine on this newsgroup.


    Here is an old post from Tom Christensen which might best address this
    problem:

    http://groups.google.com/group/comp...075b5b?q=negate regex&rnum=3#7af7898218075b5b

    while the notion of (?:(?!$RE).)* to match anything except $RE(as far
    as I can know) is from Jeffery's book "Mastering Regular Expression".

    HTH,
    Xicheng
     
    Xicheng Jia, May 31, 2006
    #14
  15. Ronny

    Ronny Guest

    Re: Negated Perl Regexp, Howabout qr in Modules?


    > I've not benchmarked it but I'd suspect that's less efficient than the
    > usual answer[1] the OP would have found if he'd been bothered to type
    > "negate regex" into a Usenet search engine on this newsgroup.


    Point taken!

    Ronald
     
    Ronny, Jun 1, 2006
    #15
  16. Ronny

    Ted Zlatanov Guest

    Re: Negated Perl Regexp, Howabout qr in Modules?

    On 31 May 2006, wrote:

    Brian McCauley wrote: > Xicheng Jia wrote: >
    >>> my $RE = qr/something here/;
    >>>
    >>> if ($v =~ /^(?:(?!$RE).)*$/) {
    >>> # any string $v that doesnot match $RE
    >>> }
    >>>

    >>
    >> I've not benchmarked it but I'd suspect that's less efficient than the
    >> usual answer[1] the OP would have found if he'd been bothered to type
    >> "negate regex" into a Usenet search engine on this newsgroup.

    >
    > Here is an old post from Tom Christensen which might best address this
    > problem:
    >
    > http://groups.google.com/group/comp...075b5b?q=negate regex&rnum=3#7af7898218075b5b
    >
    > while the notion of (?:(?!$RE).)* to match anything except $RE(as far
    > as I can know) is from Jeffery's book "Mastering Regular Expression".


    This post does not mention that negating some regexes is
    computationally prohibitive, and code escapes are a problem. Also,
    the "Higher-Order Perl" book I mentioned came out after that post
    (1999), and has some very interesting information in the chapter on
    generating all the possible strings a regex can match. There's
    security considerations when you allow a user to provide you with a
    regex. None of those things is answered by a naive Usenet search.

    Furthermore, the real question was "why doesn't the OP want a -v flag?
    How can he simulate it instead?" and not "how to negate a regex."
    Usually that's the case when people ask for negating a regex, btw.

    Ted
     
    Ted Zlatanov, Jun 1, 2006
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Greg Hurrell
    Replies:
    4
    Views:
    164
    James Edward Gray II
    Feb 14, 2007
  2. Mikel Lindsaar
    Replies:
    0
    Views:
    492
    Mikel Lindsaar
    Mar 31, 2008
  3. Joao Silva
    Replies:
    16
    Views:
    366
    7stud --
    Aug 21, 2009
  4. Replies:
    6
    Views:
    163
    Brian McCauley
    Jun 8, 2007
  5. Klaus
    Replies:
    2
    Views:
    499
    Klaus
    Jun 18, 2012
Loading...

Share This Page