regular expression negate a word (not character)

S

Summercool

somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

tire

but not

snow tire

or

snowtire

so for example, it will grep for

winter tire
tire
retire
tired

but will not grep for

snow tire
snow tire
some snowtires

need to do it in one regular expression
 
S

Summercool

somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

tire

but not

snow tire

or

snowtire

i could think of something like

/[^s][^n][^o][^w]\s*tire/i

but what if it is not snow but some 20 character-word, then do we need
to do it 20 times to negate it? any shorter way?
 
B

Ben Morrow

[newsgroups line fixed, f'ups set to clpm]

Quoth Summercool said:
somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

tire

but not

snow tire

or

snowtire

i could think of something like

/[^s][^n][^o][^w]\s*tire/i

but what if it is not snow but some 20 character-word, then do we need
to do it 20 times to negate it? any shorter way?

This is no good, since 'snoo tire' fails to match even though you want
it to. You need something more like

/ (?: [^s]... | [^n].. | [^o]. | [^w] | ^ ) \s* tire /ix

but that gets *really* tedious for long strings, unless you generate it.

Ben
 
M

Mark Tolonen

Summercool said:
somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

tire

but not

snow tire

or

snowtire

so for example, it will grep for

winter tire
tire
retire
tired

but will not grep for

snow tire
snow tire
some snowtires

need to do it in one regular expression

What you want is a negative lookbehind assertion:
<_sre.SRE_Match object at 0x00FCD608>

Unfortunately you want variable whitespace:
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\dev\python\lib\re.py", line 134, in search
return _compile(pattern, flags).search(string)
File "C:\dev\python\lib\re.py", line 233, in _compile
raise error, v # invalid expression
error: look-behind requires fixed-width pattern
Python doesn't support lookbehind assertions that can vary in size. This
doesn't work either:
<_sre.SRE_Match object at 0x00F93480>

Here's some code (not heavily tested) that implements a variable lookbehind
assertion, and a function to mark matches in a string to demonstrate it:

### BEGIN CODE ###

import re

def finditerexcept(pattern,notpattern,string):
for matchobj in
re.finditer('(?:%s)|(?:%s)'%(notpattern,pattern),string):
if not re.match(notpattern,matchobj.group()):
yield matchobj

def markexcept(pattern,notpattern,string):
substrings = []
current = 0

for matchobj in finditerexcept(pattern,notpattern,string):
substrings.append(string[current:matchobj.start()])
substrings.append('[' + matchobj.group() + ']')
current = matchobj.end() #

substrings.append(string[current:])
return ''.join(substrings)

### END CODE ###
.... tire
.... retire
.... tired
.... snow tire
.... snow tire
.... some snowtires
.... '''winter [tire]
[tire]
re[tire]
[tire]d
snow tire
snow tire
some snowtires

--Mark
 
S

Summercool

to add to the test cases, the regular expression must be able to grep


snowbird tire
tired on a snow day
snow tire and regular tire
 
B

bearophileHUGS

Summercool:
to add to the test cases, the regular expression must be able to grep
snow tire and regular tire

I presume there only the second tire has to be found.

This is my first try:

text = """
tire
word tire word
word retire word
word tired word
snowbird tire word
tired on a snow day word
snow tire and regular tire word
word snow tire word
word snow tire word
word some snowtires word
"""

import re

def finder(text):
patt = re.compile( r"\b (\w*) \s* (tire)", re.VERBOSE)
for mo in patt.finditer(text):
if not mo.group(1).endswith("snow"):
yield mo.start(2)

for end in finder(text):
print end

The (lazy) output is the starting point of the "tire" that match:


1
11
28
43
63
73
120

Bye,
bearophile
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Summercool
so for example, it will grep for

winter tire
tire
retire
tired

but will not grep for

snow tire
snow tire
some snowtires

This does not describe the problem completely. What about

thisnow tire
snow; tire

etc? Anyway, one of the obvious modifications of

(^ | \b(?!snow) \w+ ) \W* tire

should work.

Hope this helps,
Ilya
 
G

Greg Bacon

The code below at least passes your tests.

Hope it helps,
Greg

#! /usr/bin/perl

use warnings;
use strict;

use constant {
MATCH => 1,
NO_MATCH => 0,
};

my @tests = (
[ "winter tire", => MATCH ],
[ "tire", => MATCH ],
[ "retire", => MATCH ],
[ "tired", => MATCH ],
[ "snowbird tire", => MATCH ],
[ "tired on a snow day", => MATCH ],
[ "snow tire and regular tire", => MATCH ],
[ " tire" => MATCH ],
[ "snow tire" => NO_MATCH ],
[ "snow tire" => NO_MATCH ],
[ "some snowtires" => NO_MATCH ],
);

my $not_snow_tire = qr/
^ \s* tire |
([^w\s]|[^o]w|[^n]ow|[^s]now)\s*tire
/xi;

my $fail;
for (@tests) {
my($str,$want) = @$_;
my $got = $str =~ /$not_snow_tire/;
my $pass = !!$want == !!$got;

print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

++$fail unless $pass;
}

print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

__END__
 
D

Dr.Ruud

Greg Bacon schreef:
#! /usr/bin/perl

use warnings;
use strict;

use constant {
MATCH => 1,
NO_MATCH => 0,
};

my @tests = (
[ "winter tire", => MATCH ],
[ "tire", => MATCH ],
[ "retire", => MATCH ],
[ "tired", => MATCH ],
[ "snowbird tire", => MATCH ],
[ "tired on a snow day", => MATCH ],
[ "snow tire and regular tire", => MATCH ],
[ " tire" => MATCH ],
[ "snow tire" => NO_MATCH ],
[ "snow tire" => NO_MATCH ],
[ "some snowtires" => NO_MATCH ],
);
[...]

I negated the test, to make the regex simpler:

my $snow_tire = qr/
snow [[:blank:]]* tire (?!.*tire)
/x;

my $fail;
for (@tests) {
my($str,$want) = @$_;
my $got = $str !~ /$snow_tire/;
my $pass = !!$want == !!$got;

print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

++$fail unless $pass;
}

print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

__END__
 
P

Paul McGuire

somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

  tire

but not

  snow tire

or

  snowtire

Too bad pyparsing's not an option. Here's what it would look like:

data = """
Match:
winter tire
tire
retire
tired

But not match:
snow tire
snow tire
some snowtires

snowbird tire
tired on a snow day
snow tire and regular tire

"""

from pyparsing import CaselessLiteral,Literal,line

# caseless wasn't really necessary but you never know
# when you'll run into a "Snow tire"
snow = CaselessLiteral("snow")
tire = Literal("tire")
tire.ignore(snow + tire)

for matchTokens,matchStart,matchEnd in tire.scanString(data):
print line(matchStart, data)


Prints:
winter tire
tire
retire
tired
snowbird tire
tired on a snow day
snow tire and regular tire

-- Paul
 
G

Greg Bacon

: I negated the test, to make the regex simpler: [...]

Yes, your approach is simpler. I assumed from the "need it all
in one pattern" constraint that the OP is feeding the regular
expression to some other program that is looking for matches.

I dunno. Maybe it was the familiar compulsion with Perl to
attempt to cram everything into a single pattern.

Greg
 
D

Dr.Ruud

Greg Bacon schreef:
Dr.Ruud:
I negated the test, to make the regex simpler: [...]

Yes, your approach is simpler. I assumed from the "need it all
in one pattern" constraint that the OP is feeding the regular
expression to some other program that is looking for matches.

Yes, I assumed about the same, but thought it would be a nice
alternative anyways.
Happy Perling!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,434
Messages
2,571,689
Members
48,796
Latest member
Greg L.

Latest Threads

Top