regex module, or don't work as expected

Discussion in 'Python' started by Fabian Holler, Jul 4, 2006.

  1. Howdy,


    i have the following regex "iface lo[\w\t\n\s]+(?=(iface)|$)"

    If "iface" don't follow after the regex "iface lo[\w\t\n\s]" the rest of
    the text should be selected.
    But ?=(iface) is ignored, it is always the whole texte selected.
    What is wrong?


    many thanks

    greetings

    Fabian
    Fabian Holler, Jul 4, 2006
    #1
    1. Advertising

  2. In <44aa670d$0$7872$>, Fabian Holler wrote:

    > Howdy,
    >
    >
    > i have the following regex "iface lo[\w\t\n\s]+(?=(iface)|$)"
    >
    > If "iface" don't follow after the regex "iface lo[\w\t\n\s]" the rest of
    > the text should be selected.
    > But ?=(iface) is ignored, it is always the whole texte selected.
    > What is wrong?


    The ``+`` after the character class means at least one of the characters
    in the class or more. If you have a text like:

    iface lox iface

    Then the it matches the space and the word ``iface`` because the space
    (``\s``) and word characters (``\w``) are part of the character class and
    ``+`` is "greedy". It consumes as many characters as possible and the
    rest of the regex is only evaluated when there are no matches anymore.

    If you want to match non-greedy then put a ``?`` after the ``+``::

    iface lo[\w\t\n\s]+?(?=(iface)|$)

    Now only "iface lox " is matched in the example above.

    Ciao,
    Marc 'BlackJack' Rintsch
    Marc 'BlackJack' Rintsch, Jul 4, 2006
    #2
    1. Advertising

  3. Hello Marc,

    thank you for your answer.

    Marc 'BlackJack' Rintsch wrote:
    > In <44aa670d$0$7872$>, Fabian Holler wrote:


    >> i have the following regex "iface lo[\w\t\n\s]+(?=(iface)|$)"
    >>
    >> If "iface" don't follow after the regex "iface lo[\w\t\n\s]" the rest of
    >> the text should be selected.
    >> But ?=(iface) is ignored, it is always the whole texte selected.
    >> What is wrong?

    >
    > The ``+`` after the character class means at least one of the characters
    > in the class or more. If you have a text like:


    Yes thats right, but that isn't my problem.
    The problem is in the "(?=(iface)|$)" part.

    I have i.e. the text:

    "auto lo eth0
    <MATCH START>iface lo inet loopback
    bla
    blub

    <MATCH END>iface eth0 inet dhcp
    hostname debian"


    My regex should match the marked text.
    But it matchs the whole text starting from iface.
    If there is only one iface entry, the whole text starting from iface
    should be matched.

    greetings

    Fabian
    Fabian Holler, Jul 4, 2006
    #3
  4. Fabian Holler wrote:

    > Yes thats right, but that isn't my problem.
    > The problem is in the "(?=(iface)|$)" part.


    no, the problem is that you're thinking "procedural string matching from
    left to right", but that's not how regular expressions work.

    > I have i.e. the text:
    >
    > "auto lo eth0
    > <MATCH START>iface lo inet loopback
    > bla
    > blub
    >
    > <MATCH END>iface eth0 inet dhcp
    > hostname debian"
    >
    >
    > My regex should match the marked text.
    > But it matchs the whole text starting from iface.


    which is perfectly valid, since a plain "+" is greedy, and you've asked
    for "iface lo" followed by some text followed by *either* end of string
    or another "iface". the rest of the string is a perfectly valid string.

    if you want a non-greedy match, use "+?" instead.

    however, if you just want the text between two string literals, it's
    often more efficient to just split the string twice:

    text = text.split("iface lo", 1)[1].split("iface", 1)[0]

    </F>
    Fredrik Lundh, Jul 4, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Piotr
    Replies:
    0
    Views:
    396
    Piotr
    Jan 6, 2006
  2. Replies:
    3
    Views:
    732
    Reedick, Andrew
    Jul 1, 2008
  3. Piotr
    Replies:
    2
    Views:
    134
    Piotr
    Jan 9, 2006
  4. Piotr
    Replies:
    2
    Views:
    150
    Piotr
    Jan 9, 2006
  5. DK
    Replies:
    10
    Views:
    226
Loading...

Share This Page