Negation in regular expressions

G

George Sakkis

It's always striked me as odd that you can express negation of a single
character in regexps, but not any more complex expression. Is there a
general way around this shortcoming ? Here's an example to illustrate a
use case:
import re # split with '@' as delimiter
[g.group() for g in re.finditer('[^@]+', 'This @ is a @ test ')]
['This ', ' is a ', ' test ']

Is it possible to use finditer to split the string if the delimiter was
more than one char long (say 'XYZ') ? [yes, I'm aware of re.split, but
that's not the point; this is just an example. Besides re.split returns
a list, not an iterator]

George
 
P

Paddy

George said:
It's always striked me as odd that you can express negation of a single
character in regexps, but not any more complex expression. Is there a
general way around this shortcoming ? Here's an example to illustrate a
use case:
import re # split with '@' as delimiter
[g.group() for g in re.finditer('[^@]+', 'This @ is a @ test ')]
['This ', ' is a ', ' test ']

Is it possible to use finditer to split the string if the delimiter was
more than one char long (say 'XYZ') ? [yes, I'm aware of re.split, but
that's not the point; this is just an example. Besides re.split returns
a list, not an iterator]

George

If your wiling to use groups then the following will split
[g.group(1) for g in re.finditer(r'(.+?)(?:mad:#|$)', 'This @# is a @# test ')]
['This ', ' is a ', ' test ']

- Paddy.
 
P

Paddy

Paddy said:
George said:
It's always striked me as odd that you can express negation of a single
character in regexps, but not any more complex expression. Is there a
general way around this shortcoming ? Here's an example to illustrate a
use case:
import re # split with '@' as delimiter
[g.group() for g in re.finditer('[^@]+', 'This @ is a @ test ')]
['This ', ' is a ', ' test ']

Is it possible to use finditer to split the string if the delimiter was
more than one char long (say 'XYZ') ? [yes, I'm aware of re.split, but
that's not the point; this is just an example. Besides re.split returns
a list, not an iterator]

George

If your wiling to use groups then the following will split
[g.group(1) for g in re.finditer(r'(.+?)(?:mad:#|$)', 'This @# is a @# test ')]
['This ', ' is a ', ' test ']

- Paddy.

Here is another wrapping of the same finditer call that just allows you
to call .group() on the result
.... def __init__(self, x):
.... def grp(x=x):
.... return x
.... self.group = grp
....
[g.group() for g in (G(g.group(1)) for g in re.finditer(r'(.+?)(?:mad:#|$)', 'This @# is a @# test '))] ['This ', ' is a ', ' test ']

- Paddy.
 
S

Steve Holden

George said:
It's always striked me as odd that you can express negation of a single
character in regexps, but not any more complex expression. Is there a
general way around this shortcoming ? Here's an example to illustrate a
use case:


# split with '@' as delimiter
[g.group() for g in re.finditer('[^@]+', 'This @ is a @ test ')]

['This ', ' is a ', ' test ']

Is it possible to use finditer to split the string if the delimiter was
more than one char long (say 'XYZ') ? [yes, I'm aware of re.split, but
that's not the point; this is just an example. Besides re.split returns
a list, not an iterator]
I think you are looking for "negative lookahead assertions". See the docs.

regards
Steve
 
A

Ant

The whole point of regexes is that they define expressions to match
things. [^x] doesn't express the negation of x, it is shorthand for
[a-wy-z...]. But the intent is still to match something. What you seem
to want is a way of saying "Match anything that doesn't match the
string 'XYZ' (for example)" What do you expect to get back from this?
In the string "abcd XYZ hhh XYZ" for example, "XYZ h" doesn't match
"XYZ", nor does the empty string, nor does the entire string.
I think you are looking for "negative lookahead assertions". See the docs.

Negative lookahead and lookbehind are great for expressing that you
want to match X as long as it isn't followed by Y ( "X(?!Y)" ) but
won't help much in your finditer example.

Is there a particular reason you don't want to use split?
 
G

George Sakkis

Paddy said:
George said:
It's always striked me as odd that you can express negation of a single
character in regexps, but not any more complex expression. Is there a
general way around this shortcoming ? Here's an example to illustrate a
use case:
import re # split with '@' as delimiter
[g.group() for g in re.finditer('[^@]+', 'This @ is a @ test ')]
['This ', ' is a ', ' test ']

Is it possible to use finditer to split the string if the delimiter was
more than one char long (say 'XYZ') ? [yes, I'm aware of re.split, but
that's not the point; this is just an example. Besides re.split returns
a list, not an iterator]

George

If your wiling to use groups then the following will split
[g.group(1) for g in re.finditer(r'(.+?)(?:mad:#|$)', 'This @# is a @# test ')]
['This ', ' is a ', ' test ']

Nice! This covers the most common case, that is non-consecutive
delimiters in the middle of the string. There are three edge cases:
consecutive delimiters, delimiter(s) in the beginning and delimiter(s)
in the end.

The regexp r'(.*?)(?:mad:#|$)' would match re.split's behavior if it
wasn't for the last empty string it returns:
s = '@# This @# is a @#@# test '
re.split(r'@#', s) ['', ' This ', ' is a ', '', ' test ']
[g.group(1) for g in re.finditer(r'(.*?)(?:mad:#|$)', s)]
['', ' This ', ' is a ', '', ' test ', '']

Any ideas ?

George
 
A

Ant

re.split(r'@#', s)
['', ' This ', ' is a ', '', ' test ']
[g.group(1) for g in re.finditer(r'(.*?)(?:mad:#|$)', s)]
['', ' This ', ' is a ', '', ' test ', '']

If it's duplicating the behaviour of split, but returning an iterator
instead, how about avoiding hacking around with messy regexes and use
something like the following generator:

def splititer(pattern, string):
posn = 0
while True:
m = pattern.search(string, posn)
if not m:
break
yield string[posn:m.start()]
posn = m.end()
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top