Regex not matching a string

P

python.prog29

Hi All -


In the following code ,am trying to remove a multi line - comment that contains "This is a test comment" for some reason the regex is not matching.. can anyone provide inputs on why it is so?

import os
import sys
import re
import fnmatch

def find_and_remove(haystack, needle):
pattern = re.compile(r'/\*.*?'+ needle + '.*?\*/', re.DOTALL)
return re.sub(pattern, "", haystack)

for path,dirs,files in os.walk(sys.argv[1]):
for fname in files:
for pat in ['*.cpp','*.c','*.h','*.txt']:
if fnmatch.fnmatch(fname,pat):
fullname = os.path.join(path,fname)
# put all the text into f and read and replace...
f = open(fullname).read()
result = find_and_remove(f, r"This is a test comment")
print result
 
S

Steven D'Aprano

Hi All -


In the following code ,am trying to remove a multi line - comment that
contains "This is a test comment" for some reason the regex is not
matching.. can anyone provide inputs on why it is so?

It works for me.

Some observations:

Perhaps you should consider using the glob module rather than manually
using fnmatch. That's what glob does.

Also, you never actually write to the files, is that deliberate?

Finally, perhaps your regex simply doesn't match what you think it
matches. Do you actually have any files containing the needle

"/* ... This is a test comment ... */"

(where the ... are any characters) exactly as shown?

Instead of giving us all the irrelevant code that has nothing to do with
matching a regex, you should come up with a simple piece of example code
that demonstrates your problem. Or, in this case, *fails* to demonstrate
the problem.

import re
haystack = "aaa\naaa /*xxxThis is a test comment \nxxx*/aaa\naaa\n"
needle = "This is a test comment"
pattern = re.compile(r'/\*.*?'+ needle + '.*?\*/', re.DOTALL)
print haystack
print re.sub(pattern, "", haystack)
 
P

Peter Otten

In the following code ,am trying to remove a multi line - comment that
contains "This is a test comment" for some reason the regex is not
matching.. can anyone provide inputs on why it is so?
def find_and_remove(haystack, needle):
pattern = re.compile(r'/\*.*?'+ needle + '.*?\*/', re.DOTALL)
return re.sub(pattern, "", haystack)

If a comment does not contain the needle "/\*.*?" extends over the end of
that comment:
'/* yyy */ /* xxx'


One solution may be a substitution function:
.... s = match.group()
.... if needle in s:
.... return ""
.... else:
.... return s
....'/* yyy */ '
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top