variable-width negative look-behind emulation

X

Xavier Noria

I was playing with variable-width negative look-behind emulation and
worked out this regexp that supports assertions about substrings of
the pre-match:

/^.*(??{ index($&,"foo") == -1 ? "" : "(?!)" })bar/

That matches "bar" if it is not preceded at any point by "foo". I know
that can be achieved by reverse + negative look-ahead but this has to
be considered an exercise (that comes from a question in freenode#perl
that required a single regexp to solve the problem).

I tried a few tricks to extend that to patterns as

/^.*(??{ $& =~ m,fo+, ? "(?!)" : "" })bar/

or

/^.*(?{ local $m = $& })(??{ $m =~ m,fo+, ? "(?!)" : "" })bar/

and the like, but I got segmentation faults (maybe that hits some
corner of the documented experimentalness of that stuff). Looks like
the problem comes from nesting m//, but that's a guess.

Any comments?

-- fxn
 
J

Jeff 'japhy' Pinyan

I was playing with variable-width negative look-behind emulation and
worked out this regexp that supports assertions about substrings of
the pre-match:

/^.*(??{ index($&,"foo") == -1 ? "" : "(?!)" })bar/

That matches "bar" if it is not preceded at any point by "foo". I know
that can be achieved by reverse + negative look-ahead but this has to
be considered an exercise (that comes from a question in freenode#perl
that required a single regexp to solve the problem).

You could use

/^(?:[^f]*|f+(?!f|oo))*bar/;
 
X

Xavier Noria

Jeff 'japhy' Pinyan said:
I was playing with variable-width negative look-behind emulation and
worked out this regexp that supports assertions about substrings of
the pre-match:

/^.*(??{ index($&,"foo") == -1 ? "" : "(?!)" })bar/

That matches "bar" if it is not preceded at any point by "foo". I know
that can be achieved by reverse + negative look-ahead but this has to
be considered an exercise (that comes from a question in freenode#perl
that required a single regexp to solve the problem).

You could use

/^(?:[^f]*|f+(?!f|oo))*bar/;

Ah, better, thank you.

Maybe we could take benefit of atomic grouping here? Like this

/^ (?> [^f] | f(?!oo) )*? bar/x

or even better maybe this way

/^(?> (?!foo). )*? bar/x

which has been suggested by Iain Truskett right now on freenode#regex.

-- fxn
 
J

Jeff 'japhy' Pinyan

/^.*(??{ index($&,"foo") == -1 ? "" : "(?!)" })bar/

/^(?:[^f]*|f+(?!f|oo))*bar/;

Ah, better, thank you.

Maybe we could take benefit of atomic grouping here? Like this

/^ (?> [^f] | f(?!oo) )*? bar/x

I'd rather see [^f]+. (That should have been [^f]+ in my regex.)
or even better maybe this way

/^(?> (?!foo). )*? bar/x

That crawls one character at a time, and the (?>) shouldn't be needed.
 
X

Xavier Noria

Jeff 'japhy' Pinyan said:
/^.*(??{ index($&,"foo") == -1 ? "" : "(?!)" })bar/

/^(?:[^f]*|f+(?!f|oo))*bar/;

Ah, better, thank you.

Maybe we could take benefit of atomic grouping here? Like this

/^ (?> [^f] | f(?!oo) )*? bar/x

I'd rather see [^f]+. (That should have been [^f]+ in my regex.)

The problem there is that it seems we loose the states corresponding
to [^f]+, which in spite of the outer *? eats too much and would need
to backtrack:

% perl -wle 'print 1 if "xbar" =~ /^(?>[^f]+|f(?!oo))*?bar/'
% perl -wle 'print 1 if "xbar" =~ /^(?>[^f]|f(?!oo))*?bar/'
1

But [^f]+ is fine if we don't use atomic grouping:

% perl -wle 'print 1 if "xbar" =~ /^(?:[^f]+|f(?!oo))*?bar/'
1
That crawls one character at a time, and the (?>) shouldn't be needed.

Yeah, it's there just to indicate to the engine it can forget about
the states. We either match in one shot or fail.

-- fxn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top