Why do look-ahead and look-behind have to be fixed-width patterns?

I

inhahe

Hi i'm a newbie at this and probably always will be, so don't be surprised
if I don't know what i'm talking about.

but I don't understand why regex look-behinds (and look-aheads) have to be
fixed-width patterns.

i'm getting the impression that it's supposed to make searching
exponentially slower otherwise

but i just don't see how.

say i have the expression (?<=.*?:.*?:).*
all the engine has to do is search for .*?:.*?:.*, and then in each result,
find .*?:.*?: and return the string starting at the point just after the
length of the match.
no exponential time there, and even that is probably more inefficient than
it has to be.
 
J

John Machin

inhahe said:
Hi i'm a newbie at this and probably always will be, so don't be surprised
if I don't know what i'm talking about.

but I don't understand why regex look-behinds (and look-aheads) have to be
fixed-width patterns.

i'm getting the impression that it's supposed to make searching
exponentially slower otherwise

but i just don't see how.

say i have the expression (?<=.*?:.*?:).*
all the engine has to do is search for .*?:.*?:.*, and then in each result,
find .*?:.*?: and return the string starting at the point just after the
length of the match.
no exponential time there, and even that is probably more inefficient than
it has to be.

But that's not what you are telling it to do. You are telling it to
firstly find each position which starts a match with .* -- i.e. every
position -- and then look backwards to check that the previous text
matches .*?:.*?:

To grab the text after the 2nd colon (if indeed there are two or more),
it's much simpler to do this:
.... m = q(s)
.... if m:
.... print m.group(1)
.... else:
.... print 'not found!'
....
 
S

Steven Bethard

John said:
To grab the text after the 2nd colon (if indeed there are two or more),
it's much simpler to do this:


... m = q(s)
... if m:
... print m.group(1)
... else:
... print 'not found!'
...

not found!

Or without any regular expressions:

py> def grab(s):
.... try:
.... first, second, rest = s.split(':', 2)
.... print rest
.... except ValueError:
.... print 'not found!'
....
py> grab('')
not found!
py> grab('a:b:yadda')
yadda
py> grab('a:b:c:d')
c:d
py> grab('a:b:')

py>

To the OP: what is it you're trying to do? Often there is a much
cleaner way to do it without regular expressions...

Steve
 
D

Diez B. Roggisch

but I don't understand why regex look-behinds (and look-aheads) have to be
fixed-width patterns.

i'm getting the impression that it's supposed to make searching
exponentially slower otherwise

That's because of the underlying theory of regular expressions. They are
modelled using so called finite state automata (FSM). These are very much
limited in the complexity of things they can do, and so are regular
expressions. Explaining that further would require to dig deep into the
details of FSM, grammars and languages - deeper than I'm currently willing
to do :) But I wanted to point out that there is a "real" technical reason
for that, not just a lack of feature or willingness to implement one.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top