regex module, or don't work as expected

F

Fabian Holler

Howdy,


i have the following regex "iface lo[\w\t\n\s]+(?=(iface)|$)"

If "iface" don't follow after the regex "iface lo[\w\t\n\s]" the rest of
the text should be selected.
But ?=(iface) is ignored, it is always the whole texte selected.
What is wrong?


many thanks

greetings

Fabian
 
M

Marc 'BlackJack' Rintsch

Howdy,


i have the following regex "iface lo[\w\t\n\s]+(?=(iface)|$)"

If "iface" don't follow after the regex "iface lo[\w\t\n\s]" the rest of
the text should be selected.
But ?=(iface) is ignored, it is always the whole texte selected.
What is wrong?

The ``+`` after the character class means at least one of the characters
in the class or more. If you have a text like:

iface lox iface

Then the it matches the space and the word ``iface`` because the space
(``\s``) and word characters (``\w``) are part of the character class and
``+`` is "greedy". It consumes as many characters as possible and the
rest of the regex is only evaluated when there are no matches anymore.

If you want to match non-greedy then put a ``?`` after the ``+``::

iface lo[\w\t\n\s]+?(?=(iface)|$)

Now only "iface lox " is matched in the example above.

Ciao,
Marc 'BlackJack' Rintsch
 
F

Fabian Holler

Hello Marc,

thank you for your answer.
i have the following regex "iface lo[\w\t\n\s]+(?=(iface)|$)"

If "iface" don't follow after the regex "iface lo[\w\t\n\s]" the rest of
the text should be selected.
But ?=(iface) is ignored, it is always the whole texte selected.
What is wrong?

The ``+`` after the character class means at least one of the characters
in the class or more. If you have a text like:

Yes thats right, but that isn't my problem.
The problem is in the "(?=(iface)|$)" part.

I have i.e. the text:

"auto lo eth0
<MATCH START>iface lo inet loopback
bla
blub

<MATCH END>iface eth0 inet dhcp
hostname debian"


My regex should match the marked text.
But it matchs the whole text starting from iface.
If there is only one iface entry, the whole text starting from iface
should be matched.

greetings

Fabian
 
F

Fredrik Lundh

Fabian said:
Yes thats right, but that isn't my problem.
The problem is in the "(?=(iface)|$)" part.

no, the problem is that you're thinking "procedural string matching from
left to right", but that's not how regular expressions work.
I have i.e. the text:

"auto lo eth0
<MATCH START>iface lo inet loopback
bla
blub

<MATCH END>iface eth0 inet dhcp
hostname debian"


My regex should match the marked text.
But it matchs the whole text starting from iface.

which is perfectly valid, since a plain "+" is greedy, and you've asked
for "iface lo" followed by some text followed by *either* end of string
or another "iface". the rest of the string is a perfectly valid string.

if you want a non-greedy match, use "+?" instead.

however, if you just want the text between two string literals, it's
often more efficient to just split the string twice:

text = text.split("iface lo", 1)[1].split("iface", 1)[0]

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top