regex with nots in it

Discussion in 'Perl Misc' started by Ben Holness, Oct 6, 2003.

  1. Ben Holness

    Ben Holness Guest

    Hi all,

    I would like to know if it is possible to have nots in a single regular
    expression and if so, how to do it?

    For example if I want a single regular expression that says:

    The phrase must have the string "Perl" and must not be followed by "PHP" in it, so that it
    would match:

    "I like Perl"
    "Perl is cool"

    But not match

    "I like Perl more than PHP"
    "Although PHP is OK"

    I haven't been able to work out how to do it, but if '!' were the not
    operator, then I guess it would be something like

    /Perl.*!
    PHP:
    /

    Searching hasn't been much help - the word "not" is way too common :)

    Cheers,

    Ben
     
    Ben Holness, Oct 6, 2003
    #1
    1. Advertising

  2. Ben Holness

    Ben Holness Guest


    >> The phrase must have the string "Perl" and must not be followed by
    >> "PHP" in it, so that it would match:
    >>
    >> "I like Perl"
    >> "Perl is cool"
    >>
    >> But not match
    >>
    >> "I like Perl more than PHP"
    >> "Although PHP is OK"

    >
    > Then look for "look-ahead" in perlre.pod.


    hmmm. Doesn't seem to work because look-ahead cannot deal with wildcards:

    /Perl.*(?!PHP)/

    doesn't do what I want :( perlre suggests that it's easier to have it as
    two regular expressions, which is what I was trying to avoid.

    Any other ideas?

    Cheers anyway,

    Ben
     
    Ben Holness, Oct 6, 2003
    #2
    1. Advertising

  3. Ben Holness

    Anno Siegel Guest

    Ben Holness <> wrote in comp.lang.perl.misc:
    >
    > >> The phrase must have the string "Perl" and must not be followed by
    > >> "PHP" in it, so that it would match:
    > >>
    > >> "I like Perl"
    > >> "Perl is cool"
    > >>
    > >> But not match
    > >>
    > >> "I like Perl more than PHP"
    > >> "Although PHP is OK"

    > >
    > > Then look for "look-ahead" in perlre.pod.

    >
    > hmmm. Doesn't seem to work because look-ahead cannot deal with wildcards:
    >
    > /Perl.*(?!PHP)/


    No, your "wild cards" make short shrift with the look-ahead. Even if
    there is a "PHP" after "Perl", it is always possible for ".*" to match
    enough of the following string to make any "PHP" disappear, so the
    negative look-ahead succeeds (doesn't see PHP). Take the ".*" into
    the lookahead:

    /Perl(.*?!PHP)/

    Anno
     
    Anno Siegel, Oct 6, 2003
    #3
  4. Ben Holness

    Anno Siegel Guest

    Bernard El-Hagin <> wrote in comp.lang.perl.misc:
    > "Ben Holness" <> wrote in
    > news:p:
    >
    > >
    > >>> The phrase must have the string "Perl" and must not be followed by
    > >>> "PHP" in it, so that it would match:
    > >>>
    > >>> "I like Perl"
    > >>> "Perl is cool"
    > >>>
    > >>> But not match
    > >>>
    > >>> "I like Perl more than PHP"
    > >>> "Although PHP is OK"
    > >>
    > >> Then look for "look-ahead" in perlre.pod.

    > >
    > > hmmm. Doesn't seem to work because look-ahead cannot deal with wildcards:
    > >
    > > /Perl.*(?!PHP)/

    >
    >
    > Try
    >
    >
    > /Perl(?!.*PHP)/
    >
    >
    > > doesn't do what I want :( perlre suggests that it's easier to have it as
    > > two regular expressions, which is what I was trying to avoid.

    >
    >
    > Why? It's a perfectly valid suggestion.


    The condition that "PHP" must come after "Perl" makes the two-regex
    solution a little less attractive. Some trickery with pos() or
    @+ is required, as in

    /Perl/g && !/\G.*PHP/

    which makes it slightly obscure.

    Anno
     
    Anno Siegel, Oct 6, 2003
    #4
  5. Ben Holness

    Ben Holness Guest


    >> doesn't do what I want :( perlre suggests that it's easier to have it as
    >> two regular expressions, which is what I was trying to avoid.

    >
    >
    > Why? It's a perfectly valid suggestion.


    The system I have built checks messages for particular content. The
    content is defined in a database, so if I need more than one regex, I need
    to implement some slightly more clever code than just getting the regex
    from the db and matching :)

    The suggestions from yourself and Anno are what I needed though;

    /Perl(?!.*PHP)/ does exactly what I need :)

    Thanks,

    Ben
     
    Ben Holness, Oct 6, 2003
    #5
  6. >>>>> "Anno" == Anno Siegel <-berlin.de> writes:

    >> /Perl.*(?!PHP)/


    Anno> No, your "wild cards" make short shrift with the look-ahead. Even if
    Anno> there is a "PHP" after "Perl", it is always possible for ".*" to match
    Anno> enough of the following string to make any "PHP" disappear, so the
    Anno> negative look-ahead succeeds (doesn't see PHP). Take the ".*" into
    Anno> the lookahead:

    Anno> /Perl(.*?!PHP)/

    Right, it's the difference between:

    Can I find Perl, followed by some number of characters,
    followed by something that isn't PHP?

    versus

    Can I find Perl, followed immediately by something that isn't
    "some number of characters followed by PHP"?

    Logic can be tough some times. Luckily, Regex are precise, and do
    exactly what you tell them. :)

    print "Just another Perl hacker,"

    --
    Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
    <> <URL:http://www.stonehenge.com/merlyn/>
    Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
    See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
     
    Randal L. Schwartz, Oct 6, 2003
    #6
  7. On Mon, 6 Oct 2003, Randal L. Schwartz wrote:

    >>>>>> "Anno" == Anno Siegel <-berlin.de> writes:

    >
    >>> /Perl.*(?!PHP)/

    >
    >Anno> /Perl(.*?!PHP)/
    >
    >Right, it's the difference between:
    >
    > Can I find Perl, followed by some number of characters,
    > followed by something that isn't PHP?
    >
    >versus
    >
    > Can I find Perl, followed immediately by something that isn't
    > "some number of characters followed by PHP"?


    Uhhh, except that Anno misplaced the '?!' in that regex. It should be

    /Perl(?!.*PHP)/

    --
    Jeff Pinyan RPI Acacia Brother #734 2003 Rush Chairman
    "And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
    years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
    Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)
     
    Jeff 'japhy' Pinyan, Oct 6, 2003
    #7
  8. Ben Holness

    Roy Johnson Guest

    -berlin.de (Anno Siegel) wrote in message news:<blrk1p$31f$-Berlin.DE>...

    > /Perl(.*?!PHP)/


    By which you mean

    /Perl(?!.*PHP)/
     
    Roy Johnson, Oct 6, 2003
    #8
  9. Anno Siegel (-berlin.de) wrote:
    : Bernard El-Hagin <> wrote in comp.lang.perl.misc:
    : > "Ben Holness" <> wrote in
    : > news:p:
    : >
    : > >
    : > >>> The phrase must have the string "Perl" and must not be followed by
    : > >>> "PHP" in it, so that it would match:
    : > >>>
    : > >>> "I like Perl"
    : > >>> "Perl is cool"
    : > >>>
    : > >>> But not match
    : > >>>
    : > >>> "I like Perl more than PHP"
    : > >>> "Although PHP is OK"
    : > >>
    : > >> Then look for "look-ahead" in perlre.pod.
    : > >
    : > > hmmm. Doesn't seem to work because look-ahead cannot deal with wildcards:
    : > >
    : > > /Perl.*(?!PHP)/
    : >
    : >
    : > Try
    : >
    : >
    : > /Perl(?!.*PHP)/
    : >
    : >
    : > > doesn't do what I want :( perlre suggests that it's easier to have it as
    : > > two regular expressions, which is what I was trying to avoid.
    : >
    : >
    : > Why? It's a perfectly valid suggestion.

    : The condition that "PHP" must come after "Perl" makes the two-regex
    : solution a little less attractive. Some trickery with pos() or
    : @+ is required, as in

    : /Perl/g && !/\G.*PHP/

    : which makes it slightly obscure.

    No, simply look for what you don't want and reject it

    $match = /Perl/ # needs to match this
    && ! /Perl.*PHP/ # but mustn't match this
     
    Malcolm Dew-Jones, Oct 6, 2003
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    712
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. Rick Venter

    perl regex to java regex

    Rick Venter, Oct 29, 2003, in forum: Java
    Replies:
    5
    Views:
    1,634
    Ant...
    Nov 6, 2003
  3. Replies:
    2
    Views:
    600
  4. Xah Lee
    Replies:
    1
    Views:
    944
    Ilias Lazaridis
    Sep 22, 2006
  5. Replies:
    3
    Views:
    769
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page