Sorry for being too brief!
I was talking about a function which 'counts' the number
of occurences using string & regexp.
I wrote the code for the regexp search as well as the function
search and tested it on a rather large file (800 KB) for
occurences of a certain word. I find that the string search
is at least 2 times faster than the one with regexp, excluding
the time for the regexp.compile() method. This is particularly
noticeable when the file becomes quite large and the word is
spread out.
I also thought the regexp would beat string thumbs down and I
am suprised at the result that it is the other way around.
Here is the code. Note that I am using the 'count' methods that
count the number of occurences rather than the 'find' methods.
# Test to find out whether string search in a data
# is faster than regexp search.
# Results: String search is much faster when it comes
# to many occurences of the sub string.
import time
def strsearch1(s, substr):
t1 = time.time()
print 'Count 1 =>', s.count(substr)
t2 = time.time()
print 'Searching using string, Time taken => ', t2 - t1
def strsearch2(s, substr):
import re
r=re.compile(substr, re.IGNORECASE)
t1 = time.time()
print 'Count 2 =>', len(r.findall(s))
t2 = time.time()
print 'Searching using regexp, Time taken => ', t2 - t1
data=open("test.html", "r").read()
strsearch1(data, "Miriam")
strsearch2(data, "Miriam")
# Output here...
D:\Programming\python>python strsearch.py
Count 1 => 45
Searching using string, Time taken => 0.0599999427795
Count 2 => 45
Searching using regexp, Time taken => 0.110000014305
Test was done on a windows 98 machine using Python 2.3, running
on 248 MB RAM, Intel 1.7 GHz chipset.
I was thinking of using regexp searches in my code, but this convinces
me to stick on to the good old string search.
Thanks for the replies.
-Anand