REGEX Negation

R

Rusty Phillips

I know about negative lookahead and negative character closures,
but I can't find any good way to do actual negation.

One thing I'd like to use this for is to match quotes while
guaranteeing that I'm not matching backslashed quotes (that is, if I
find a backslash in the string, the quote in front of it should not
be matched).
This string:
String = q{She said,
Welcome to the party\\""}

Should match
He said, \\"Welcome to the party\\"
as the part within the quotes, and not
He said, \\

There are many more places where I'd like to use a negation
technique - especially I'd like to match things of the form:
"match the largest string that doesn't contain the character sequence
'blah.'"

Are there any ways to do either of these types of negation?
 
B

Brian McCauley

Rusty Phillips said:
I know about negative lookahead and negative character closures,
but I can't find any good way to do actual negation.

There is in general no way to do negation in regex.
One thing I'd like to use this for is to match quotes while
guaranteeing that I'm not matching backslashed quotes (that is, if I
find a backslash in the string, the quote in front of it should not
be matched).

You are talking about negative lookbehind. This is documented not far
from where negative lookahead is documented.

However, one usually looks for an even number of backslahes followed
by a quote. (Note: zero is an even number).

/(<?!\\)(?:\\\\)*"/

Another approach not using lookbehind is given in the answer to the
FAQ "How can I split a [character] delimited string except when inside
[character]? (Comma-separated files)"

Not, of course, that you could have been expected to guess that
because yours not really the same question but is in fact the _next_
question people usually ask after asking the one in the FAQ.
There are many more places where I'd like to use a negation
technique -

Sorry, you have to refactor your question so that is does not invlove
negation.
especially I'd like to match things of the form:
"match the largest string that doesn't contain the character sequence
'blah.'"

Regex can never find the longest - it will find always the first (or
occasionally the last). Within matches starting at the same position
it can be made to favour long or short. So to get the globally
longest match you need to find all such strings and sort.

These strings will be the same set as the set of shortest strings to
start at the beginning of the input or at the 'l' of 'blah' and to end
at the end of the input or at the 'a' of 'blah'

my @substrings = /(?=((?:^|(?<=b)lah).*?(?:$|bla(?=h))))/g;

For example for $_='xxxxblablahwibbleblahfoo' this gives @substrings =
('xxxxblabla','lahwibblebla','lahfoo').

You can then find the longest with sort() or List::Util::reduce().

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
R

Rusty Phillips

That second technique is how I'm doing things now for the one
"negation" I'm doing now (actually /(.*?)(?=blah|$)/, which I know
will at least find the first match not containing the lookahead
string (assuming that such a string is a token, and should not be
absorbed by the regex). I'd just hoped there was a more natural way.

Didn't consider negative lookbehinds for doing quotes, though.
Thanks for the help.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top