How make regex that means "contains regex#1 but NOT regex#2" ??

Discussion in 'Python' started by seberino@spawar.navy.mil, Jul 1, 2008.

  1. Guest

    I'm looking over the docs for the re module and can't find how to
    "NOT" an entire regex.

    For example.....

    How make regex that means "contains regex#1 but NOT regex#2" ?

    Chris
    , Jul 1, 2008
    #1
    1. Advertising

  2. Paul McGuire Guest

    On Jul 1, 2:34 am, "A.T.Hofkamp" <> wrote:
    > On 2008-07-01, <> wrote:
    >
    > > I'm looking over the docs for the re module and can't find how to
    > > "NOT" an entire regex.

    >
    > (?! R)
    >
    > > How make regex that means "contains regex#1 but NOT regex#2" ?

    >
    > (\1|(?!\2))
    >
    > should do what you want.
    >
    > Albert


    I think the OP wants both A AND not B, not A OR not B. If the OP want
    to do re.match(A and not B), then I think this can be done as ((?!
    \2)\1), but if he really wants CONTAINS A and not B, then I think this
    requires 2 calls to re.search. See test code below:

    import re

    def test(restr,instr):
    print "%s match %s? %s" %
    (restr,instr,bool(re.match(restr,instr)))

    a = "AAA"
    b = "BBB"

    aAndNotB = "(%s|(?!%s))" % (a,b)

    test(aAndNotB,"AAA")
    test(aAndNotB,"BBB")
    test(aAndNotB,"AAABBB")
    test(aAndNotB,"zAAA")
    test(aAndNotB,"CCC")

    aAndNotB = "((?!%s)%s)" % (b,a)

    test(aAndNotB,"AAA")
    test(aAndNotB,"BBB")
    test(aAndNotB,"AAABBB")
    test(aAndNotB,"zAAA")
    test(aAndNotB,"CCC")

    def test2(arestr,brestr,instr):
    print "%s contains %s but NOT %s? %s" % \
    (instr,arestr,brestr,
    bool(re.search(arestr,instr) and
    not re.search(brestr,instr)))

    test2(a,b,"AAA")
    test2(a,b,"BBB")
    test2(a,b,"AAABBB")
    test2(a,b,"zAAA")
    test2(a,b,"CCC")

    Prints:

    (AAA|(?!BBB)) match AAA? True
    (AAA|(?!BBB)) match BBB? False
    (AAA|(?!BBB)) match AAABBB? True
    (AAA|(?!BBB)) match zAAA? True
    (AAA|(?!BBB)) match CCC? True
    ((?!BBB)AAA) match AAA? True
    ((?!BBB)AAA) match BBB? False
    ((?!BBB)AAA) match AAABBB? True
    ((?!BBB)AAA) match zAAA? False
    ((?!BBB)AAA) match CCC? False
    AAA contains AAA but NOT BBB? True
    BBB contains AAA but NOT BBB? False
    AAABBB contains AAA but NOT BBB? False
    zAAA contains AAA but NOT BBB? True
    CCC contains AAA but NOT BBB? False


    As we've all seen before, posters are not always the most precise when
    describing whether they want match vs. search. Given that the OP used
    the word "contains", I read that to mean "search". I'm not an RE pro
    by any means, but I think the behavior that the OP wants is given in
    the last 4 tests, and I don't know how to do that in a single RE.

    -- Paul
    Paul McGuire, Jul 1, 2008
    #2
    1. Advertising


  3. > -----Original Message-----
    > From: python-list-bounces+jr9445= [mailto:python-
    > list-bounces+jr9445=] On Behalf Of
    >
    > Sent: Tuesday, July 01, 2008 2:29 AM
    > To:
    > Subject: How make regex that means "contains regex#1 but NOT regex#2"
    > ??
    >
    > I'm looking over the docs for the re module and can't find how to
    > "NOT" an entire regex.
    >
    > For example.....
    >
    > How make regex that means "contains regex#1 but NOT regex#2" ?
    >


    Match 'foo.*bar', except when 'not' appears between foo and bar.


    import re

    s = 'fooAAABBBbar'
    print "Should match:", s
    m = re.match(r'(foo(.(?!not))*bar)', s);
    if m:
    print m.groups()

    print

    s = 'fooAAAnotBBBbar'
    print "Should not match:", s
    m = re.match(r'(foo(.(?!not))*bar)', s);
    if m:
    print m.groups()


    == Output ==
    Should match: fooAAABBBbar
    ('fooAAABBBbar', 'B')

    Should not match: fooAAAnotBBBbar



    *****

    The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA621
    Reedick, Andrew, Jul 1, 2008
    #3

  4. > -----Original Message-----
    > From: python-list-bounces+jr9445= [mailto:python-
    > list-bounces+jr9445=] On Behalf Of Reedick, Andrew
    > Sent: Tuesday, July 01, 2008 10:07 AM
    > To: ;
    > Subject: RE: How make regex that means "contains regex#1 but NOT
    > regex#2" ??
    >
    > Match 'foo.*bar', except when 'not' appears between foo and bar.
    >
    >
    > import re
    >
    > s = 'fooAAABBBbar'
    > print "Should match:", s
    > m = re.match(r'(foo(.(?!not))*bar)', s);
    > if m:
    > print m.groups()
    >
    > print
    >
    > s = 'fooAAAnotBBBbar'
    > print "Should not match:", s
    > m = re.match(r'(foo(.(?!not))*bar)', s);
    > if m:
    > print m.groups()
    >
    >
    > == Output ==
    > Should match: fooAAABBBbar
    > ('fooAAABBBbar', 'B')
    >
    > Should not match: fooAAAnotBBBbar
    >



    Fixed a bug with 'foonotbar'. Conceptually it breaks down into:

    First_half_of_Regex#1(not
    Regex#2)(any_char_Not_followed_by_Regex#2)*Second_half_of_Regex#1

    However, if possible, I would make it a two pass regex. Match on
    Regex#1, throw away any matches that then match on Regex#2. A two pass
    is faster and easier to code and understand. Easy to understand == less
    chance of a bug. If you're worried about performance, then a) a
    complicated regex may or may not be faster than two simple regexes, and
    b) if you're passing that much data through a regex, you're probably I/O
    bound anyway.


    import re

    ss = ('foobar', 'fooAAABBBbar', 'fooAAAnotBBBbar', 'fooAAAnotbar',
    'foonotBBBbar', 'foonotbar')

    for s in ss:
    print s,
    m = re.match(r'(foo(?!not)(?:.(?!not))*bar)', s);
    if m:
    print m.groups()
    else:
    print


    == output ==
    foobar ('foobar',)
    fooAAABBBbar ('fooAAABBBbar',)
    fooAAAnotBBBbar
    fooAAAnotbar
    foonotBBBbar
    foonotbar

    *****

    The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA621
    Reedick, Andrew, Jul 1, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rookie
    Replies:
    4
    Views:
    14,761
    Chris Langsenkamp
    Aug 20, 2003
  2. clintonG

    Re: What means Protected WithEvents ?

    clintonG, Jul 5, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    436
    clintonG
    Jul 5, 2003
  3. news.microsoft.com
    Replies:
    1
    Views:
    1,235
    Justin Martin
    Aug 10, 2003
  4. Andreas Klemt
    Replies:
    2
    Views:
    343
  5. ronaldlee
    Replies:
    3
    Views:
    3,559
    Hans Kesting
    Dec 16, 2004
Loading...

Share This Page