Expressing AND, OR, and NOT in a Single Pattern

usaims · Mar 1, 2007

I'm having a little problem with this example in the Perl Cookbook.

True if pattern BAD does not match, but pattern GOOD does:
/(?=(?

?!BAD).)*$)GOOD/s

My objective is to print only lines that have 'suspended' but not
'Data_services'. It is still printing lines with 'suspended' and
'Data_services' in the same line. So, ideally, this script should
print any lines. Correct me if I am wrong.

##############################
#!/usr/bin/perl
use strict;
use diagnostics;
use warnings;

my @stuff = <DATA>;

foreach my $foo(@stuff) {
if ($foo =~ /(?=(?

?!Data_services).)*$)suspended/s) {
print $foo;

}
}
close(DATA);

__DATA__
<Query id='Data_services.LSSI_Weekly.42' suspended='1' error='Loading
Data Only - cannot run query' wuid='W20070227-140132'
associatedName='libW20070227-140132.so'/>
<Query id='Data_services.SSNMapKeys.14' suspended='1' error='Loading
Data Only - cannot run query' wuid='W20070105-115230'
associatedName='libW20070105-114650.so'/>
<Query id='Data_services.WatercraftKeys.5' suspended='1'
error='Loading Data Only - cannot run query' wuid='W20070123-114242'
associatedName='libW20070123-114242.so'/>

Scott Bryce · Mar 1, 2007

usaims said:
My objective is to print only lines that have 'suspended' but not
'Data_services'.

I prefer to use index for something like this.

It is still printing lines with 'suspended' and
'Data_services' in the same line. So, ideally, this script should
print any lines. Correct me if I am wrong.

There are no lines in your given data that meet your criteria.

Here's my shot at it...

use strict;
use warnings;

while (<DATA>)
{
next if index ($_, 'Data_services') > -1;
print $_ if index ($_, 'suspended') > -1;
}

__DATA__
<Query id='Data_services.LSSI_Weekly.42' suspended='1' error='Loading
Data Only - cannot run query' wuid='W20070227-140132'
associatedName='libW20070227-140132.so'/>
<Query id='Data_services.SSNMapKeys.14' suspended='1' error='Loading
Data Only - cannot run query' wuid='W20070105-115230'
associatedName='libW20070105-114650.so'/>
<Query id='Data_services.WatercraftKeys.5' suspended='1' error='Loading
Data Only - cannot run query' wuid='W20070123-114242'
associatedName='libW20070123-114242.so'/>
<Query id='Other_services.SSNMapKeys.14' suspended='1' error='Loading
Data Only - cannot run query' wuid='W20070105-115230'
associatedName='libW20070105-114650.so'/>

xhoster · Mar 1, 2007

usaims said:
I'm having a little problem with this example in the Perl Cookbook.

True if pattern BAD does not match, but pattern GOOD does:
/(?=(??!BAD).)*$)GOOD/s

Every character from the start of the match to the end of the string
has to not (be the start of a) match to BAD. However, if BAD occurs before
GOOD, the regex can still match, simply by not initiating the match until
after the B of BAD.

You want to the forced exclusion to start at the beginning of the string
and run to the end:

/^(?=(?

?!BAD).)*$).*GOOD/;

But I'd just use two different regex.

Xho

h3xx · Mar 1, 2007

I like doing things in one line:

print grep { /suspended/ && ! /Data_services/ } <DATA>;

gf · Mar 2, 2007

I like doing things in one line:

print grep { /suspended/ && ! /Data_services/ } <DATA>;

I prefer this method too. For clarity and long-term maintenance it is
much better because the esoterica of regex can make the desired
results hard to figure out and the bugs in the pattern even harder to
find.

Also, speed wise, this is a lot faster. The regex engine has to do a
lot of work that can be short circuited by the booleans.

Sometimes it's better to break the search for matching patterns into
single lines too. It's kind of macho programmer-wise to string it all
together into one mondo regex pattern and have it work, but the logic
can get fragile.

The only thing I'd do differently to these patterns is add an anchor
to the 'Data_services' pattern, like so...

/^<Query id='Data_services/

Anchors speed up regex an incredible amount. I did benchmarks of index
vs various ways of using regex, and an anchored qr// that was
initialized outside a loop was the fastest at finding patterns inside
long strings, when the pattern was at the end of the string. At the
beginning of a string it should be equal to index(). Index() was
faster when finding a fixed string somewhere in the middle of another
string.

gf · Mar 2, 2007

The Regexp::Assemble module on CPAN is way cool for building big
patterns with minimal fuss.

http://search.cpan.org/~dland/Regexp-Assemble-0.28/Assemble.pm

The resulting patterns are very efficient and pretty good when you
want to learn how to write complex regex.

Brian McCauley · Mar 4, 2007

Every character from the start of the match to the end of the string
has to not (be the start of a) match to BAD. However, if BAD occurs before
GOOD, the regex can still match, simply by not initiating the match until
after the B of BAD.

You want to the forced exclusion to start at the beginning of the string
and run to the end:

/^(?=(??!BAD).)*$).*GOOD/;

That's exponentially (er, factorially?) ineficient!

/^(?!.*BAD).*GOOD/;

But I'd just use two different regex.

Yes, of course, that's still the best way.

xhoster · Mar 4, 2007

Brian McCauley said:
That's exponentially (er, factorially?) ineficient!

Under what condistions is it exponential? With the patterns I've tested,
it seems to be linear, not exponential. (But still a quite a lot slower
than yours, for reasons I don't quite understand. It would make more sense
to me if it were exponentially slower, rather than constantly 30 times
slower.)

Xho

Mirco Wahab · Mar 4, 2007

Brian said:
That's exponentially (er, factorially?) ineficient!

/^(?!.*BAD).*GOOD/;

Yes, of course, that's still the best way.

This

/^(?!.*BAD).*GOOD/

is, in my opinion, of "Maxwellian beauty".

I tried some time to get the original
expression somehow simplified, it (I)
ended with 'throwing the gun'.

Thanks,

Mirco

What is the error in this code , file not fetch and not fetched in the basemap, not vissible in the browser	1	Jul 26, 2023
Inheritance and the "is-A" connection	1	Dec 6, 2022
Traceback (most recent call last): File "<string>", line 23, in <module>TypeError: '>' not supported between instances of 'complex' and 'in	1	Dec 1, 2023
Pyautogui, cv2 and cannot find image	0	Feb 7, 2023
Ajax function only returns error and does not call php function	2	Aug 6, 2022
Linux: using "clone3" and "waitid"	0	Oct 17, 2023
eval()ing a pattern substitution under 'use strict' and lexical scope	2	Jan 25, 2012
CGI.pm and Use of uninitialized value in pattern match	21	Jan 21, 2009

Expressing AND, OR, and NOT in a Single Pattern

usaims

Scott Bryce

xhoster

h3xx

gf

gf

Brian McCauley

xhoster

Mirco Wahab

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads