Regex - where do I make a mistake?

Johny · Feb 16, 2007

I have
string="""55.
<td>128
170
"""

where I need to replace
55.
170

by space.
So I tried

#############
import re
string="""<td>55.128170
"""
Newstring=re.sub(r'.*'," ",string)
###########

But it does NOT work.
Can anyone explain why?
Thank you
L.

Peter Otten · Feb 16, 2007

Johny said:
I have
string="""55.
<td>128
170
"""

where I need to replace
55.
170

by space.
So I tried

#############
import re
string="""<td>55.128170
"""
Newstring=re.sub(r'.*'," ",string)
###########

But it does NOT work.
Can anyone explain why?

"(?!123)" is a negative "lookahead assertion", i. e. it ensures that "test"
is not followed by "123", but /doesn't/ consume any characters. For your
regex to match "test" must be /immediately/ followed by a '"'.

Regular expressions are too lowlevel to use on HTML directly. Go with
BeautifulSoup instead of trying to fix the above.

Peter

Johny · Feb 16, 2007

"(?!123)" is a negative "lookahead assertion", i. e. it ensures that "test"
is not followed by "123", but /doesn't/ consume any characters. For your
regex to match "test" must be /immediately/ followed by a '"'.

Regular expressions are too lowlevel to use on HTML directly. Go with
BeautifulSoup instead of trying to fix the above.

Peter- Hide quoted text -

- Show quoted text -

Yes, I know "(?!123)" is a negative "lookahead assertion",
but do not know excatly why it does not work.I thought that

(?!...)
Matches if ... doesn't match next. For example, Isaac (?!Asimov) will
match 'Isaac ' only if it's not followed by 'Asimov'.

Peter Otten · Feb 16, 2007

Johny said:
Yes, I know "(?!123)" is a negative "lookahead assertion",
but do not know excatly why it does not work.I thought that

(?!...)
Matches if ... doesn't match next. For example, Isaac (?!Asimov) will
match 'Isaac ' only if it's not followed by 'Asimov'.

The problem is that your regex does not end with the lookahead assertion and
there is nothing to consume the '456' or '789'. To illustrate:

for example in ["before123after", "before234after", "beforeafter"]:

Click to expand...

Click to expand...

.... re.findall("before(?!123)after", example)
....
[]
[]
['beforeafter']

for example in ["before123after", "before234after", "beforeafter"]:

Click to expand...

Click to expand...

.... re.findall(r"before(?!123)\d\d\dafter", example)
....
[]
['before234after']
[]

Peter

Carsten Haese · Feb 16, 2007

Yes, I know "(?!123)" is a negative "lookahead assertion",
but do not know excatly why it does not work.

It *does* work, it just doesn't do what you think it does.

The lookahead assertion is a zero-width match that doesn't match any
actual characters from the subject. It matches an imaginary vertical
line between two consecutive characters of the subject.

Nothing in your pattern matches the string of digits that follows
"test", hence the subject fails to match the pattern.

Also, please note Peter's advice that Regular Expressions are almost
always the wrong tool for working with HTML. It may work in very limited
cases, and maybe you have such a limited case, but you'd better make
sure that you'll never ever have to handle anything beyond this limited
case.

-Carsten

problem with regex, how to conclude more than one character	3	Nov 7, 2008
mmap regex search replace	0	Apr 3, 2009
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
splitting file/content into lines based on regex termination	0	Nov 7, 2013
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
How to keep cookies when making http requests (Python 2.7)	8	Aug 20, 2013
Dictionaries again - where do I make a mistake?	2	Oct 19, 2006
Regex not matching a string	2	Jan 9, 2013

Regex - where do I make a mistake?

Johny

Peter Otten

Johny

Peter Otten

Carsten Haese

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads