Negated Perl Regexp

R

Ronny

If I want to express that a variable $v does NOT match some regular
expression RE,
I usually write this as

$v !~ /RE/ and print "string does not contain pattern\n"

Is there an easy way to write this in a positive way, i.e using $v =~
/.../ ?

I thought about using some of the zero-width lookahead operators, such
as

$v =~ /($?RE)/ # DOES NOT WORK

but this does not work of course, because in general, somewhere within
$v *will* be a position where RE would not match, even if RE would
match
at some other position.


Background of what this is needed for: I'm writing tiny utilities in
Perl, which
act as a filter for input text. Typically, the core of the "program"
contains
something like

/$PATTERN/ && print(transform($_))

i.e. read all lines from stdin, and if they match some pattern, print
out a transformed
version of the line. The is supplied via ARGV. This works fine, but I
also would like
the user of this utility to be able to *revert* the sense (i.e. read
all lines from stdin,
and if they DO NOT match the pattern, etc.), like you have with grep
(where the
option -v reverts the test). The keypoint here is that in this
particular application,
I would prefer NOT to introduce an option such as grep's "-v" to my
utility, but encode
the "negation of the pattern" into the pattern itself.

Is this possible at all within the realm of Perl regular expressions,
or do I have
to invent my own workaround (which of course would be possible)?
 
M

Mirco Wahab

Thus spoke Ronny (on 2006-05-30 09:51):
Typically, the core of the "program" contains something like

/$PATTERN/ && print(transform($_))
...
This works fine, but I also would like the user of this utility
to be able to *revert* the sense (i.e. read all lines from stdin,
and if they DO NOT match the pattern, etc.),

you mean

print(transform($_)) unless /$PATTERN/;

or something?

Regards

Mirco
 
X

Xicheng Jia

Ronny said:
If I want to express that a variable $v does NOT match some regular
expression RE,
I usually write this as

$v !~ /RE/ and print "string does not contain pattern\n"

you can use "or"

$v =~ /RE/ or print "string does not contain pattern\n";

For better maintenance, it might be better to write it in the following
form:

if (not $v =~ /RE/) {
print "string does not contain pattern\n";
}

Xicheng
 
V

Veli-Pekka Tätilä

Ronny said:
$v !~ /RE/ and print "string does not contain pattern\n"
Is there an easy way to write this in a positive way, i.e using $v =~
/.../ ?
I have a related question here. One case which the so far posted solutions
don't address is the use of compiled regular expressions with the qr
operator. Many modules can take qr regular expressions for filtering or
homing in some particular datum. However, in some cases I'd like to use a
negated test for matching. I'm not really willing to extend the original
module code if I can avoid it. So, can one easily negate a qr-regexp when
the module code supposedly uses =~ for testing?

PS: The module in this case is:
Win32::IE::Mechanize
 
X

Xicheng Jia

Veli-Pekka Tätilä said:
I have a related question here. One case which the so far posted solutions
don't address is the use of compiled regular expressions with the qr
operator. Many modules can take qr regular expressions for filtering or
homing in some particular datum. However, in some cases I'd like to use a
negated test for matching. I'm not really willing to extend the original
module code if I can avoid it. So, can one easily negate a qr-regexp when
the module code supposedly uses =~ for testing?

you want to match anything except those matching the qr//
expression???? so you might want to try the following:

my $RE = qr/something here/;

if ($v =~ /^(?:(?!$RE).)*$/) {
# any string $v that doesnot match $RE
}

(untested)
Xicheng
 
T

Ted Zlatanov

Background of what this is needed for: I'm writing tiny utilities in
Perl, which act as a filter for input text. Typically, the core of
the "program" contains something like

/$PATTERN/ && print(transform($_))

i.e. read all lines from stdin, and if they match some pattern,
print out a transformed version of the line. The is supplied via
ARGV. This works fine, but I also would like the user of this
utility to be able to *revert* the sense (i.e. read all lines from
stdin, and if they DO NOT match the pattern, etc.), like you have
with grep (where the option -v reverts the test).
The keypoint here is that in this particular application, I would
prefer NOT to introduce an option such as grep's "-v" to my utility,
but encode the "negation of the pattern" into the pattern itself.

You either ask the user to rewrite $PATTERN, or you give a -v option.
I don't understand how you would know *when* to negate the pattern
without a -v option.
Is this possible at all within the realm of Perl regular
expressions, or do I have to invent my own workaround (which of
course would be possible)?

Yes usually (for example, it may not work nicely if you have code
embedded inside the regex, and there are many cases that are possible
but computationally very expensive), but it's much more complicated to
invert a regex than to invert the test for that regex.

I honestly don't see a reason why you shouldn't provide a -v option,
or some way for the user to say "invert this pattern", and then act
upon that to invert the test. Maybe you can explain...

Ted
 
R

Ronny

Ted said:
You either ask the user to rewrite $PATTERN, or you give a -v option.
I don't understand how you would know *when* to negate the pattern
without a -v option.

You exactly got the point: I want the user to rewrite the Pattern. The
question
is, how to write a *negated* pattern using Perl RE Syntax?

To the outside world (i.e. to the user), the interface always says kind
of
"Supply a pattern and you get a list of lines matching the pattern"
(actually,
the lines returned are transformed, but this is not the point here).
Given
*this* user interface, is it possible for the user to specify a pattern
with
negated meaning - for example, return all lines which do NOT contain
the string "foo"?

A variation of this question could be: Return all the lines which do
contain
the string "foo" and "bar", but ONLY if they do not contain "baz"
somewhere
between "foo" and "bar". I.e. the lines

...foo.......bar......baz... (OK, baz after bar)
...baz......foo......bar.... (OK, baz before bar)
...foo..................bar... (OK, no baz)

should match, but the lines

...foo........baz......bar... (baz between foo and bar)
...foo........................... (bar missing)
...bar........................... (foo missing)

should not match. Is it possible to express THIS using perl regexp,
or do I break here the power of Perl regular expressions? If there
is a solution to this foo/bar/baz problem, then there is obviously
one for my original problem as well.
Yes usually (for example, it may not work nicely if you have code
embedded inside the regex, and there are many cases that are possible
but computationally very expensive), but it's much more complicated to
invert a regex than to invert the test for that regex.

Of course, one hack for my original problem would be to "invent" a
special
character (say, exclamation mark) which is allowed to be at the very
start
of the expession, and just has the meaning "pattern has negated
meaning".
My Perl code would then be:

if($pattern =~ /^!(.*)$/)
{
# negated meaning
$pattern=$1; # drop ! from pattern
print transform($line) unless($line =~ $pattern)
}
else
{
print transform($line) if ($line =~ $pattern)
}

This would do the job (and the exclamation mark here is just a "-v"
switch
in disguise), but I wondered whether the same effect could also be
achieved
by just changing the pattern in a suitable way.
I honestly don't see a reason why you shouldn't provide a -v option,

The reason is because I simplified the problem very much so to make
it better feasible to discuss here. The interesting point for me is not
finding out whether the negation effect can be done solely within the
pattern, or has to be "moved outside" to the distinction between
=~ and !~, or if/unless construct.

I have read the man pages about pattern "negation" (such as it occurs
in the "negative lookahead pattern"), but I did not see whether they
could
be applied to my case.

Ronald
 
R

Ronny

Mirco said:
Thus spoke Ronny (on 2006-05-30 09:51):


you mean

print(transform($_)) unless /$PATTERN/;

or something?

No, the corresponding code would always be as stated. I think I did not
explain my problem in a very understandable way. See my reply to Ted
for a more elaborate explanation.

Maybe here a more mathematical formulation of the problem:

Given an arbitrary Perl regexp P, is it then possible to derive from it
another
regexp Q, with the property that for every string S the following
equation holds:

(S =~ P) == (S !~ Q)

(S matches P if S does not match Q, and vice versa).

I.e. is there a general mechanism within the Perl regexp realm which
allows
me to find a negated pattern for a given pattern?

Of course this is easy for specific pattern. For example, assume that P
is
the pattern

[abc]

which means "every line which either contains at least one a, b or c
somewhere".
The negated pattern Q, "every line which contains neither a, b or c" is
then

^[^abc]+$

In this example, I have kind of "handcrafted" the negated pattern after
having
investigated the original pattern. For the [abc] case, it was easy to
find the
negated pattern, but in general, this might be hard, so I wondered
whether
Perl provided a specific construct which just negates a pattern.

Ronald
 
R

Ronny

Xicheng said:
my $RE = qr/something here/;

if ($v =~ /^(?:(?!$RE).)*$/) {
# any string $v that doesnot match $RE
}

Great! I think this is something I could use for *my* original problem
too!

Thank you for pointing this out!

Ronny
 
M

Mumia W.

Ronny said:
[...]
Maybe here a more mathematical formulation of the problem:

Given an arbitrary Perl regexp P, is it then possible to derive from it
another
regexp Q, with the property that for every string S the following
equation holds:

(S =~ P) == (S !~ Q)

(S matches P if S does not match Q, and vice versa).

I.e. is there a general mechanism within the Perl regexp realm which
allows
me to find a negated pattern for a given pattern?

I don't think so, and given the complexity of RE's, it's probably
impossible. But all is not lost.

You could do what (Debian) aptitude does: Let the user place a prefix
code in the RE that specifies inversion, e.g.:

aptitude search '~niso-8859!~nbase'

This searches for all Debian packages that have the string iso-8859 in
their names, but excludes any that have 'base' in their names.

~n introduces an RE to match package names.
!~n introduces an RE to *not* match package names.
Of course this is easy for specific pattern. For example, assume that P
is
the pattern

[abc]

which means "every line which either contains at least one a, b or c
somewhere".
The negated pattern Q, "every line which contains neither a, b or c" is
then

^[^abc]+$

In this example, I have kind of "handcrafted" the negated pattern after
having
investigated the original pattern. For the [abc] case, it was easy to
find the
negated pattern, but in general, this might be hard, [...]

Depending on the pattern, it might be so hard, supercomputers would take
eternity to do it.
 
M

Mumia W.

Xicheng said:
[...]
you want to match anything except those matching the qr//
expression???? so you might want to try the following:

my $RE = qr/something here/;

if ($v =~ /^(?:(?!$RE).)*$/) {
# any string $v that doesnot match $RE
}

(untested)
Xicheng

Well, I tested it, and it seems pretty darn good, and just like Ronny, I
might end up using this in my programs if I can figure out how it works.
Thanks Xicheng.
 
T

Ted Zlatanov

You exactly got the point: I want the user to rewrite the
Pattern. The question is, how to write a *negated* pattern using
Perl RE Syntax?

You can do it for some cases, but because of limitations on memory and
CPU cycles, most complex regexes can't be inverted in a reasonable
amount of time. When there's code inside, it gets even worse.

Look at the book "Higher-Order Perl" by Mark-Jason Dominus. It has a
long section on finding all the strings that can match a given regular
expression; if you read it carefully you'll see why inverting a
regular expression is generally a hard problem, just as producing all
the strings that match it.

Note also that if security is a concern, giving users regexp access is
equivalent to letting them run any code due to the code escapes
possible in Perl's regex interpreter. It may be simpler to give the
users a limited language with a NOT operator. Parse::RecDescent has
some good examples of this kind of parser in the distribution. The
users may also prefer this to the raw power of regexps, and it's what
I would do for a production system.
Of course, one hack for my original problem would be to "invent" a
special character (say, exclamation mark) which is allowed to be at
the very start of the expession, and just has the meaning "pattern
has negated meaning".

Yes :) That would be easiest.
The reason is because I simplified the problem very much so to make
it better feasible to discuss here. The interesting point for me is
not finding out whether the negation effect can be done solely
within the pattern, or has to be "moved outside" to the distinction
between =~ and !~, or if/unless construct.

It should be moved outside, so you can go on to finish the project :)

Ted
 
B

Brian McCauley

Xicheng said:
my $RE = qr/something here/;

if ($v =~ /^(?:(?!$RE).)*$/) {
# any string $v that doesnot match $RE
}

I've not benchmarked it but I'd suspect that's less efficient than the
usual answer[1] the OP would have found if he'd been bothered to type
"negate regex" into a Usenet search engine on this newsgroup.

[1] The on ska gave.
 
X

Xicheng Jia

Brian said:
Xicheng said:
my $RE = qr/something here/;

if ($v =~ /^(?:(?!$RE).)*$/) {
# any string $v that doesnot match $RE
}

I've not benchmarked it but I'd suspect that's less efficient than the
usual answer[1] the OP would have found if he'd been bothered to type
"negate regex" into a Usenet search engine on this newsgroup.

Here is an old post from Tom Christensen which might best address this
problem:

http://groups.google.com/group/comp...075b5b?q=negate+regex&rnum=3#7af7898218075b5b

while the notion of (?:(?!$RE).)* to match anything except $RE(as far
as I can know) is from Jeffery's book "Mastering Regular Expression".

HTH,
Xicheng
 
R

Ronny

I've not benchmarked it but I'd suspect that's less efficient than the
usual answer[1] the OP would have found if he'd been bothered to type
"negate regex" into a Usenet search engine on this newsgroup.

Point taken!

Ronald
 
T

Ted Zlatanov

On 31 May 2006, (e-mail address removed) wrote:

Brian McCauley wrote: > Xicheng Jia wrote: >
my $RE = qr/something here/;

if ($v =~ /^(?:(?!$RE).)*$/) {
# any string $v that doesnot match $RE
}

I've not benchmarked it but I'd suspect that's less efficient than the
usual answer[1] the OP would have found if he'd been bothered to type
"negate regex" into a Usenet search engine on this newsgroup.

Here is an old post from Tom Christensen which might best address this
problem:

http://groups.google.com/group/comp...075b5b?q=negate+regex&rnum=3#7af7898218075b5b

while the notion of (?:(?!$RE).)* to match anything except $RE(as far
as I can know) is from Jeffery's book "Mastering Regular Expression".

This post does not mention that negating some regexes is
computationally prohibitive, and code escapes are a problem. Also,
the "Higher-Order Perl" book I mentioned came out after that post
(1999), and has some very interesting information in the chapter on
generating all the possible strings a regex can match. There's
security considerations when you allow a user to provide you with a
regex. None of those things is answered by a naive Usenet search.

Furthermore, the real question was "why doesn't the OP want a -v flag?
How can he simulate it instead?" and not "how to negate a regex."
Usually that's the case when people ask for negating a regex, btw.

Ted
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top