regex with nots in it

B

Ben Holness

Hi all,

I would like to know if it is possible to have nots in a single regular
expression and if so, how to do it?

For example if I want a single regular expression that says:

The phrase must have the string "Perl" and must not be followed by "PHP" in it, so that it
would match:

"I like Perl"
"Perl is cool"

But not match

"I like Perl more than PHP"
"Although PHP is OK"

I haven't been able to work out how to do it, but if '!' were the not
operator, then I guess it would be something like

/Perl.*!
PHP:
/

Searching hasn't been much help - the word "not" is way too common :)

Cheers,

Ben
 
B

Ben Holness

Then look for "look-ahead" in perlre.pod.

hmmm. Doesn't seem to work because look-ahead cannot deal with wildcards:

/Perl.*(?!PHP)/

doesn't do what I want :( perlre suggests that it's easier to have it as
two regular expressions, which is what I was trying to avoid.

Any other ideas?

Cheers anyway,

Ben
 
A

Anno Siegel

Ben Holness said:
hmmm. Doesn't seem to work because look-ahead cannot deal with wildcards:

/Perl.*(?!PHP)/

No, your "wild cards" make short shrift with the look-ahead. Even if
there is a "PHP" after "Perl", it is always possible for ".*" to match
enough of the following string to make any "PHP" disappear, so the
negative look-ahead succeeds (doesn't see PHP). Take the ".*" into
the lookahead:

/Perl(.*?!PHP)/

Anno
 
A

Anno Siegel

Bernard El-Hagin said:
Why? It's a perfectly valid suggestion.

The condition that "PHP" must come after "Perl" makes the two-regex
solution a little less attractive. Some trickery with pos() or
@+ is required, as in

/Perl/g && !/\G.*PHP/

which makes it slightly obscure.

Anno
 
B

Ben Holness

Why? It's a perfectly valid suggestion.

The system I have built checks messages for particular content. The
content is defined in a database, so if I need more than one regex, I need
to implement some slightly more clever code than just getting the regex
from the db and matching :)

The suggestions from yourself and Anno are what I needed though;

/Perl(?!.*PHP)/ does exactly what I need :)

Thanks,

Ben
 
R

Randal L. Schwartz

Anno> No, your "wild cards" make short shrift with the look-ahead. Even if
Anno> there is a "PHP" after "Perl", it is always possible for ".*" to match
Anno> enough of the following string to make any "PHP" disappear, so the
Anno> negative look-ahead succeeds (doesn't see PHP). Take the ".*" into
Anno> the lookahead:

Anno> /Perl(.*?!PHP)/

Right, it's the difference between:

Can I find Perl, followed by some number of characters,
followed by something that isn't PHP?

versus

Can I find Perl, followed immediately by something that isn't
"some number of characters followed by PHP"?

Logic can be tough some times. Luckily, Regex are precise, and do
exactly what you tell them. :)

print "Just another Perl hacker,"
 
J

Jeff 'japhy' Pinyan

Anno> /Perl(.*?!PHP)/

Right, it's the difference between:

Can I find Perl, followed by some number of characters,
followed by something that isn't PHP?

versus

Can I find Perl, followed immediately by something that isn't
"some number of characters followed by PHP"?

Uhhh, except that Anno misplaced the '?!' in that regex. It should be

/Perl(?!.*PHP)/
 
M

Malcolm Dew-Jones

Anno Siegel ([email protected]) wrote:
: > : >
: > >
: > >>> The phrase must have the string "Perl" and must not be followed by
: > >>> "PHP" in it, so that it would match:
: > >>>
: > >>> "I like Perl"
: > >>> "Perl is cool"
: > >>>
: > >>> But not match
: > >>>
: > >>> "I like Perl more than PHP"
: > >>> "Although PHP is OK"
: > >>
: > >> Then look for "look-ahead" in perlre.pod.
: > >
: > > hmmm. Doesn't seem to work because look-ahead cannot deal with wildcards:
: > >
: > > /Perl.*(?!PHP)/
: >
: >
: > Try
: >
: >
: > /Perl(?!.*PHP)/
: >
: >
: > > doesn't do what I want :( perlre suggests that it's easier to have it as
: > > two regular expressions, which is what I was trying to avoid.
: >
: >
: > Why? It's a perfectly valid suggestion.

: The condition that "PHP" must come after "Perl" makes the two-regex
: solution a little less attractive. Some trickery with pos() or
: @+ is required, as in

: /Perl/g && !/\G.*PHP/

: which makes it slightly obscure.

No, simply look for what you don't want and reject it

$match = /Perl/ # needs to match this
&& ! /Perl.*PHP/ # but mustn't match this
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top