How to match where the search started?

Discussion in 'Python' started by Florian Kaufmann, Sep 28, 2010.

  1. From the documentation:

    7.2.4. Regular Expression Objects, search(string[, pos[, endpos]])
    .... the '^' pattern character matches at the real beginning of the
    string and at positions just after a newline, but not necessarily at
    the index where the search is to start....

    But I'd like to do just that. In Emacs regexps, I think the closest
    equivalent would be \=. Then I could do something like that, and also
    find directly adjacent matches

    reo = re.compile( r'(\=|...)...' );
    while True
    mo = reo.search(text,pos)
    if not mo: break
    ...

    Flo
    Florian Kaufmann, Sep 28, 2010
    #1
    1. Advertising

  2. The thing is that the (\=|...) group is not really part of the match.
    I think this gives you more the idea what I want

    reo = re.compile( r'(\=|.)...' );
    while True
    mo = reo.search(text,pos)
    if not mo: break
    if text[mo.start()] == '\\'
    # a pseudo match. continue after the backslash
    else
    # a real match. continue after the match
    Florian Kaufmann, Sep 28, 2010
    #2
    1. Advertising

  3. The thing is that the (\=|...) group is not really part of the match.
    I think this gives you more the idea what I want

    reo = re.compile( r'(\=|.)...' );
    while True
    mo = reo.search(text,pos)
    if not mo: break
    if text[mo.start()] == '\\'
    # a pseudo match. continue after the backslash
    else
    # a real match. continue after the match
    Florian Kaufmann, Sep 28, 2010
    #3
  4. On Tuesday 28 September 2010, it occurred to Florian Kaufmann to exclaim:
    > >From the documentation:

    > 7.2.4. Regular Expression Objects, search(string[, pos[, endpos]])
    > ... the '^' pattern character matches at the real beginning of the
    > string and at positions just after a newline, but not necessarily at
    > the index where the search is to start....
    >
    > But I'd like to do just that. In Emacs regexps, I think the closest
    > equivalent would be \=. Then I could do something like that, and also
    > find directly adjacent matches
    >
    > reo = re.compile( r'(\=|...)...' );
    > while True
    > mo = reo.search(text,pos)
    > if not mo: break
    > ...
    >
    > Flo


    You could prefix your regexp with r'(.*?)' to create a match of stuff that is
    between the start of search and the start of the first thing you're interested
    in. (untested...)
    Thomas Jollans, Sep 28, 2010
    #4
  5. Florian Kaufmann

    MRAB Guest

    On 28/09/2010 09:10, Florian Kaufmann wrote:
    >> From the documentation:

    >
    > 7.2.4. Regular Expression Objects, search(string[, pos[, endpos]])
    > ... the '^' pattern character matches at the real beginning of the
    > string and at positions just after a newline, but not necessarily at
    > the index where the search is to start....
    >
    > But I'd like to do just that. In Emacs regexps, I think the closest
    > equivalent would be \=. Then I could do something like that, and also
    > find directly adjacent matches
    >
    > reo = re.compile( r'(\=|...)...' );
    > while True
    > mo = reo.search(text,pos)
    > if not mo: break
    > ...
    >

    If you want to anchor the regex at the start position 'pos' then use
    the 'match' method instead.
    MRAB, Sep 28, 2010
    #5
  6. > If you want to anchor the regex at the start position 'pos' then use
    > the 'match' method instead.


    The wickedly problem is that matching at position 'pos' is not a
    requirement, its an option. Look again at my 2nd example, the
    r'(\=|.)...' part, which (of course wrongly) assumes that \= means
    'match at the beginning of the search'. Before the match I am really
    interested in, there is the start of the search, OR there is any
    character.
    Florian Kaufmann, Sep 28, 2010
    #6
  7. Florian Kaufmann

    MRAB Guest

    On 28/09/2010 17:32, Florian Kaufmann wrote:
    >> If you want to anchor the regex at the start position 'pos' then use
    >> the 'match' method instead.

    >
    > The wickedly problem is that matching at position 'pos' is not a
    > requirement, its an option. Look again at my 2nd example, the
    > r'(\=|.)...' part, which (of course wrongly) assumes that \= means
    > 'match at the beginning of the search'. Before the match I am really
    > interested in, there is the start of the search, OR there is any
    > character.


    An alternative is to use the 'regex' module, available from PyPI:

    http://pypi.python.org/pypi/regex

    It has \G, which is the anchor for the start position.
    MRAB, Sep 28, 2010
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hiwa
    Replies:
    0
    Views:
    631
  2. Victor
    Replies:
    2
    Views:
    629
    Victor
    May 17, 2004
  3. ekzept
    Replies:
    0
    Views:
    356
    ekzept
    Aug 10, 2007
  4. Abby Lee
    Replies:
    5
    Views:
    380
    Abby Lee
    Aug 2, 2004
  5. Jake Barnes
    Replies:
    2
    Views:
    299
Loading...

Share This Page