Why do look-ahead and look-behind have to be fixed-width patterns?

Discussion in 'Python' started by inhahe, Jan 28, 2005.

  1. inhahe

    inhahe Guest

    Hi i'm a newbie at this and probably always will be, so don't be surprised
    if I don't know what i'm talking about.

    but I don't understand why regex look-behinds (and look-aheads) have to be
    fixed-width patterns.

    i'm getting the impression that it's supposed to make searching
    exponentially slower otherwise

    but i just don't see how.

    say i have the expression (?<=.*?:.*?:).*
    all the engine has to do is search for .*?:.*?:.*, and then in each result,
    find .*?:.*?: and return the string starting at the point just after the
    length of the match.
    no exponential time there, and even that is probably more inefficient than
    it has to be.
    inhahe, Jan 28, 2005
    #1
    1. Advertising

  2. inhahe

    John Machin Guest

    inhahe wrote:
    > Hi i'm a newbie at this and probably always will be, so don't be

    surprised
    > if I don't know what i'm talking about.
    >
    > but I don't understand why regex look-behinds (and look-aheads) have

    to be
    > fixed-width patterns.
    >
    > i'm getting the impression that it's supposed to make searching
    > exponentially slower otherwise
    >
    > but i just don't see how.
    >
    > say i have the expression (?<=.*?:.*?:).*
    > all the engine has to do is search for .*?:.*?:.*, and then in each

    result,
    > find .*?:.*?: and return the string starting at the point just after

    the
    > length of the match.
    > no exponential time there, and even that is probably more inefficient

    than
    > it has to be.


    But that's not what you are telling it to do. You are telling it to
    firstly find each position which starts a match with .* -- i.e. every
    position -- and then look backwards to check that the previous text
    matches .*?:.*?:

    To grab the text after the 2nd colon (if indeed there are two or more),
    it's much simpler to do this:

    >>> import re
    >>> q = re.compile(r'.*?:.*?:(.*)').search
    >>> def grab(s):

    .... m = q(s)
    .... if m:
    .... print m.group(1)
    .... else:
    .... print 'not found!'
    ....
    >>> grab('')

    not found!
    >>> grab('::::')

    ::
    >>> grab('a:b:yadda')

    yadda
    >>>>>> grab('a:b:c:d')

    c:d
    >>> grab('a:b:')


    >>>
    John Machin, Jan 28, 2005
    #2
    1. Advertising

  3. John Machin wrote:
    > To grab the text after the 2nd colon (if indeed there are two or more),
    > it's much simpler to do this:
    >
    >>>>import re
    >>>>q = re.compile(r'.*?:.*?:(.*)').search
    >>>>def grab(s):

    >
    > ... m = q(s)
    > ... if m:
    > ... print m.group(1)
    > ... else:
    > ... print 'not found!'
    > ...
    >
    >>>>grab('')

    > not found!
    >
    >>>>grab('::::')

    > ::
    >
    >>>>grab('a:b:yadda')

    > yadda
    >
    >>>>>>>grab('a:b:c:d')

    > c:d
    >
    >>>>grab('a:b:')

    >
    >


    Or without any regular expressions:

    py> def grab(s):
    .... try:
    .... first, second, rest = s.split(':', 2)
    .... print rest
    .... except ValueError:
    .... print 'not found!'
    ....
    py> grab('')
    not found!
    py> grab('a:b:yadda')
    yadda
    py> grab('a:b:c:d')
    c:d
    py> grab('a:b:')

    py>

    To the OP: what is it you're trying to do? Often there is a much
    cleaner way to do it without regular expressions...

    Steve
    Steven Bethard, Jan 28, 2005
    #3
  4. > but I don't understand why regex look-behinds (and look-aheads) have to be
    > fixed-width patterns.
    >
    > i'm getting the impression that it's supposed to make searching
    > exponentially slower otherwise


    That's because of the underlying theory of regular expressions. They are
    modelled using so called finite state automata (FSM). These are very much
    limited in the complexity of things they can do, and so are regular
    expressions. Explaining that further would require to dig deep into the
    details of FSM, grammars and languages - deeper than I'm currently willing
    to do :) But I wanted to point out that there is a "real" technical reason
    for that, not just a lack of feature or willingness to implement one.

    --

    Regards,

    Diez B. Roggisch
    Diez B. Roggisch, Jan 28, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. johnp
    Replies:
    4
    Views:
    3,654
    Toby Inkster
    May 23, 2005
  2. Mr. SweatyFinger
    Replies:
    2
    Views:
    1,747
    Smokey Grindel
    Dec 2, 2006
  3. ssk
    Replies:
    5
    Views:
    5,458
    Jukka K. Korpela
    Oct 30, 2006
  4. Peng Yu
    Replies:
    4
    Views:
    273
    Steven D'Aprano
    Oct 24, 2008
  5. Replies:
    4
    Views:
    160
Loading...

Share This Page