regular expression negate a word (not character)

Discussion in 'Python' started by Summercool, Jan 26, 2008.

  1. Summercool

    Summercool Guest

    somebody who is a regular expression guru... how do you negate a word
    and grep for all words that is

    tire

    but not

    snow tire

    or

    snowtire

    so for example, it will grep for

    winter tire
    tire
    retire
    tired

    but will not grep for

    snow tire
    snow tire
    some snowtires

    need to do it in one regular expression
     
    Summercool, Jan 26, 2008
    #1
    1. Advertising

  2. Summercool

    Summercool Guest

    On Jan 25, 5:16 pm, Summercool <> wrote:
    > somebody who is a regular expression guru... how do you negate a word
    > and grep for all words that is
    >
    > tire
    >
    > but not
    >
    > snow tire
    >
    > or
    >
    > snowtire


    i could think of something like

    /[^s][^n][^o][^w]\s*tire/i

    but what if it is not snow but some 20 character-word, then do we need
    to do it 20 times to negate it? any shorter way?
     
    Summercool, Jan 26, 2008
    #2
    1. Advertising

  3. Summercool

    Ben Morrow Guest

    [newsgroups line fixed, f'ups set to clpm]

    Quoth Summercool <>:
    > On Jan 25, 5:16 pm, Summercool <> wrote:
    > > somebody who is a regular expression guru... how do you negate a word
    > > and grep for all words that is
    > >
    > > tire
    > >
    > > but not
    > >
    > > snow tire
    > >
    > > or
    > >
    > > snowtire

    >
    > i could think of something like
    >
    > /[^s][^n][^o][^w]\s*tire/i
    >
    > but what if it is not snow but some 20 character-word, then do we need
    > to do it 20 times to negate it? any shorter way?


    This is no good, since 'snoo tire' fails to match even though you want
    it to. You need something more like

    / (?: [^s]... | [^n].. | [^o]. | [^w] | ^ ) \s* tire /ix

    but that gets *really* tedious for long strings, unless you generate it.

    Ben
     
    Ben Morrow, Jan 26, 2008
    #3
  4. Summercool

    Mark Tolonen Guest

    "Summercool" <> wrote in message
    news:...
    >
    > somebody who is a regular expression guru... how do you negate a word
    > and grep for all words that is
    >
    > tire
    >
    > but not
    >
    > snow tire
    >
    > or
    >
    > snowtire
    >
    > so for example, it will grep for
    >
    > winter tire
    > tire
    > retire
    > tired
    >
    > but will not grep for
    >
    > snow tire
    > snow tire
    > some snowtires
    >
    > need to do it in one regular expression
    >


    What you want is a negative lookbehind assertion:

    >>> re.search(r'(?<!snow)tire','snowtire') # no match
    >>> re.search(r'(?<!snow)tire','baldtire')

    <_sre.SRE_Match object at 0x00FCD608>

    Unfortunately you want variable whitespace:

    >>> re.search(r'(?<!snow\s*)tire','snow tire')

    Traceback (most recent call last):
    File "<interactive input>", line 1, in <module>
    File "C:\dev\python\lib\re.py", line 134, in search
    return _compile(pattern, flags).search(string)
    File "C:\dev\python\lib\re.py", line 233, in _compile
    raise error, v # invalid expression
    error: look-behind requires fixed-width pattern
    >>>


    Python doesn't support lookbehind assertions that can vary in size. This
    doesn't work either:

    >>> re.search(r'(?<!snow)\s*tire','snow tire')

    <_sre.SRE_Match object at 0x00F93480>

    Here's some code (not heavily tested) that implements a variable lookbehind
    assertion, and a function to mark matches in a string to demonstrate it:

    ### BEGIN CODE ###

    import re

    def finditerexcept(pattern,notpattern,string):
    for matchobj in
    re.finditer('(?:%s)|(?:%s)'%(notpattern,pattern),string):
    if not re.match(notpattern,matchobj.group()):
    yield matchobj

    def markexcept(pattern,notpattern,string):
    substrings = []
    current = 0

    for matchobj in finditerexcept(pattern,notpattern,string):
    substrings.append(string[current:matchobj.start()])
    substrings.append('[' + matchobj.group() + ']')
    current = matchobj.end() #

    substrings.append(string[current:])
    return ''.join(substrings)

    ### END CODE ###

    >>> sample='''winter tire

    .... tire
    .... retire
    .... tired
    .... snow tire
    .... snow tire
    .... some snowtires
    .... '''
    >>> print markexcept('tire','snow\s*tire',sample)

    winter [tire]
    [tire]
    re[tire]
    [tire]d
    snow tire
    snow tire
    some snowtires

    --Mark
     
    Mark Tolonen, Jan 26, 2008
    #4
  5. Summercool

    Summercool Guest

    to add to the test cases, the regular expression must be able to grep


    snowbird tire
    tired on a snow day
    snow tire and regular tire
     
    Summercool, Jan 26, 2008
    #5
  6. Summercool

    Guest

    Summercool:
    > to add to the test cases, the regular expression must be able to grep
    > snow tire and regular tire


    I presume there only the second tire has to be found.

    This is my first try:

    text = """
    tire
    word tire word
    word retire word
    word tired word
    snowbird tire word
    tired on a snow day word
    snow tire and regular tire word
    word snow tire word
    word snow tire word
    word some snowtires word
    """

    import re

    def finder(text):
    patt = re.compile( r"\b (\w*) \s* (tire)", re.VERBOSE)
    for mo in patt.finditer(text):
    if not mo.group(1).endswith("snow"):
    yield mo.start(2)

    for end in finder(text):
    print end

    The (lazy) output is the starting point of the "tire" that match:


    1
    11
    28
    43
    63
    73
    120

    Bye,
    bearophile
     
    , Jan 26, 2008
    #6
  7. Summercool

    Paddy Guest

    On Jan 26, 1:16 am, Summercool <> wrote:
    > somebody who is a regular expression guru... how do you negate a word
    > and grep for all words that is
    >
    > tire
    >
    > but not
    >
    > snow tire
    >
    > or
    >
    > snowtire
    >
    > so for example, it will grep for
    >
    > winter tire
    > tire
    > retire
    > tired
    >
    > but will not grep for
    >
    > snow tire
    > snow tire
    > some snowtires
    >
    > need to do it in one regular expression


    Try the answer here:
    http://mail.python.org/pipermail/tutor/2003-August/024902.html
     
    Paddy, Jan 26, 2008
    #7
  8. Summercool

    Guest

    , Jan 26, 2008
    #8
  9. [A complimentary Cc of this posting was sent to
    Summercool
    <>], who wrote in article <>:
    > so for example, it will grep for
    >
    > winter tire
    > tire
    > retire
    > tired
    >
    > but will not grep for
    >
    > snow tire
    > snow tire
    > some snowtires


    This does not describe the problem completely. What about

    thisnow tire
    snow; tire

    etc? Anyway, one of the obvious modifications of

    (^ | \b(?!snow) \w+ ) \W* tire

    should work.

    Hope this helps,
    Ilya
     
    Ilya Zakharevich, Jan 26, 2008
    #9
  10. Summercool

    Greg Bacon Guest

    The code below at least passes your tests.

    Hope it helps,
    Greg

    #! /usr/bin/perl

    use warnings;
    use strict;

    use constant {
    MATCH => 1,
    NO_MATCH => 0,
    };

    my @tests = (
    [ "winter tire", => MATCH ],
    [ "tire", => MATCH ],
    [ "retire", => MATCH ],
    [ "tired", => MATCH ],
    [ "snowbird tire", => MATCH ],
    [ "tired on a snow day", => MATCH ],
    [ "snow tire and regular tire", => MATCH ],
    [ " tire" => MATCH ],
    [ "snow tire" => NO_MATCH ],
    [ "snow tire" => NO_MATCH ],
    [ "some snowtires" => NO_MATCH ],
    );

    my $not_snow_tire = qr/
    ^ \s* tire |
    ([^w\s]|[^o]w|[^n]ow|[^s]now)\s*tire
    /xi;

    my $fail;
    for (@tests) {
    my($str,$want) = @$_;
    my $got = $str =~ /$not_snow_tire/;
    my $pass = !!$want == !!$got;

    print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

    ++$fail unless $pass;
    }

    print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

    __END__

    --
    ... all these cries of having 'abolished slavery,' of having 'preserved the
    union,' of establishing a 'government by consent,' and of 'maintaining the
    national honor' are all gross, shameless, transparent cheats -- so trans-
    parent that they ought to deceive no one. -- Lysander Spooner, "No Treason"
     
    Greg Bacon, Jan 28, 2008
    #10
  11. Summercool

    Dr.Ruud Guest

    Greg Bacon schreef:

    > #! /usr/bin/perl
    >
    > use warnings;
    > use strict;
    >
    > use constant {
    > MATCH => 1,
    > NO_MATCH => 0,
    > };
    >
    > my @tests = (
    > [ "winter tire", => MATCH ],
    > [ "tire", => MATCH ],
    > [ "retire", => MATCH ],
    > [ "tired", => MATCH ],
    > [ "snowbird tire", => MATCH ],
    > [ "tired on a snow day", => MATCH ],
    > [ "snow tire and regular tire", => MATCH ],
    > [ " tire" => MATCH ],
    > [ "snow tire" => NO_MATCH ],
    > [ "snow tire" => NO_MATCH ],
    > [ "some snowtires" => NO_MATCH ],
    > );
    > [...]


    I negated the test, to make the regex simpler:

    my $snow_tire = qr/
    snow [[:blank:]]* tire (?!.*tire)
    /x;

    my $fail;
    for (@tests) {
    my($str,$want) = @$_;
    my $got = $str !~ /$snow_tire/;
    my $pass = !!$want == !!$got;

    print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

    ++$fail unless $pass;
    }

    print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

    __END__

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Jan 28, 2008
    #11
  12. Summercool

    Paul McGuire Guest

    On Jan 25, 7:16 pm, Summercool <> wrote:
    > somebody who is a regular expression guru... how do you negate a word
    > and grep for all words that is
    >
    >   tire
    >
    > but not
    >
    >   snow tire
    >
    > or
    >
    >   snowtire
    >


    Too bad pyparsing's not an option. Here's what it would look like:

    data = """
    Match:
    > winter tire
    > tire
    > retire
    > tired


    But not match:
    > snow tire
    > snow tire
    > some snowtires


    snowbird tire
    tired on a snow day
    snow tire and regular tire

    """

    from pyparsing import CaselessLiteral,Literal,line

    # caseless wasn't really necessary but you never know
    # when you'll run into a "Snow tire"
    snow = CaselessLiteral("snow")
    tire = Literal("tire")
    tire.ignore(snow + tire)

    for matchTokens,matchStart,matchEnd in tire.scanString(data):
    print line(matchStart, data)


    Prints:

    > winter tire
    > tire
    > retire
    > tired

    snowbird tire
    tired on a snow day
    snow tire and regular tire

    -- Paul
     
    Paul McGuire, Jan 28, 2008
    #12
  13. Summercool

    Greg Bacon Guest

    In article <>,
    Dr.Ruud <> wrote:

    : I negated the test, to make the regex simpler: [...]

    Yes, your approach is simpler. I assumed from the "need it all
    in one pattern" constraint that the OP is feeding the regular
    expression to some other program that is looking for matches.

    I dunno. Maybe it was the familiar compulsion with Perl to
    attempt to cram everything into a single pattern.

    Greg
    --
    What light is to the eyes -- what air is to the lungs -- what love is to
    the heart, liberty is to the soul of man.
    -- Robert Green Ingersoll
     
    Greg Bacon, Jan 29, 2008
    #13
  14. Summercool

    Dr.Ruud Guest

    Greg Bacon schreef:
    > Dr.Ruud:


    >> I negated the test, to make the regex simpler: [...]

    >
    > Yes, your approach is simpler. I assumed from the "need it all
    > in one pattern" constraint that the OP is feeding the regular
    > expression to some other program that is looking for matches.


    Yes, I assumed about the same, but thought it would be a nice
    alternative anyways.
    Happy Perling!

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Feb 1, 2008
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    11
    Views:
    287
    Raul Parolari
    Dec 2, 2007
  2. Summercool
    Replies:
    22
    Views:
    432
    Ryan Holmes
    Aug 6, 2010
  3. Neil Morris
    Replies:
    1
    Views:
    118
    Lasse Reichstein Nielsen
    Jul 15, 2003
  4. Sherm Pendley

    need to negate regex in middle of expression

    Sherm Pendley, Jun 20, 2005, in forum: Perl Misc
    Replies:
    8
    Views:
    165
    Tad McClellan
    Jun 20, 2005
  5. Summercool
    Replies:
    14
    Views:
    208
    Dr.Ruud
    Feb 1, 2008
Loading...

Share This Page