python regex "negative lookahead assertions" problems

J

Jelle Smet

Hi List,

I'm trying to match lines in python using the re module.
The end goal is to have a regex which enables me to skip lines which have ok and warning in it.
But for some reason I can't get negative lookaheads working, the way it's explained in "http://docs.python.org/library/re.html".

Consider this example:

Python 2.6.4 (r264:75706, Nov 2 2009, 14:38:03)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.<_sre.SRE_Match object at 0xb75b1598>

I would expect that this would NOT match as it's a negative lookahead and warning is in the string.


Thanks,
 
H

Helmut Jarausch

Hi List,

I'm trying to match lines in python using the re module.
The end goal is to have a regex which enables me to skip lines which have ok and warning in it.
But for some reason I can't get negative lookaheads working, the way it's explained in "http://docs.python.org/library/re.html".

Consider this example:

Python 2.6.4 (r264:75706, Nov 2 2009, 14:38:03)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.<_sre.SRE_Match object at 0xb75b1598>

I would expect that this would NOT match as it's a negative lookahead and warning is in the string.

'.*' eats all of line. Now, when at end of line, there is no 'warning' anymore, so it matches.
What are you trying to achieve?

If you just want to single out lines with 'ok' or warning in it, why not just
if re.search('(ok|warning)') : call_skip

Helmut.

--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany
 
H

Helmut Jarausch

Hi List,

I'm trying to match lines in python using the re module.
The end goal is to have a regex which enables me to skip lines which
have ok and warning in it.
But for some reason I can't get negative lookaheads working, the way
it's explained in "http://docs.python.org/library/re.html".

Consider this example:

Python 2.6.4 (r264:75706, Nov 2 2009, 14:38:03)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import re
line='2009-11-22 12:15:441 lmqkjsfmlqshvquhsudfhqf qlsfh
qsduidfhqlsiufh qlsiuf qldsfhqlsifhqlius dfh warning qlsfj lqshf
lqsuhf lqksjfhqisudfh qiusdfhq iusfh'
re.match('.*(?!warning)',line)
<_sre.SRE_Match object at 0xb75b1598>

I would expect that this would NOT match as it's a negative lookahead
and warning is in the string.

'.*' eats all of line. Now, when at end of line, there is no 'warning'
anymore, so it matches.
What are you trying to achieve?

If you just want to single out lines with 'ok' or warning in it, why not
just
if re.search('(ok|warning)') : call_skip

Probably you don't want words like 'joke' to match 'ok'.
So, a better regex is

if re.search('\b(ok|warning)\b',line) : SKIP_ME

Helmut.



--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top