Can I beat perl at grep-like processing speed?

js · Dec 29, 2006

Just my curiosity.
Can python beats perl at speed of grep-like processing?

$ wget http://www.gutenberg.org/files/7999/7999-h.zip
$ unzip 7999-h.zip
$ cd 7999-h
$ cat *.htm > bigfile
$ du -h bigfile
du -h bigfile
8.2M bigfile

---------- grep.pl ----------
#!/usr/local/bin/perl
open(F, 'bigfile') or die;

while(<F>) {
s/[\n\r]+$//;
print "$_\n" if m/destroy/oi;
}
---------- END ----------
---------- grep.py ----------
#!/usr/bin/env python
import re
r = re.compile(r'destroy', re.IGNORECASE)

for s in file('bigfile'):
if r.search(s): print s.rstrip("\r\n")
---------- END ----------

$ time perl grep.pl > pl.out; time python grep.py > py.out
real 0m0.168s
user 0m0.149s
sys 0m0.015s

real 0m0.450s
user 0m0.374s
sys 0m0.068s
# I used python2.5 and perl 5.8.6

Christophe Cavalaria · Dec 29, 2006

js said:
Just my curiosity.
Can python beats perl at speed of grep-like processing?

$ wget http://www.gutenberg.org/files/7999/7999-h.zip
$ unzip 7999-h.zip
$ cd 7999-h
$ cat *.htm > bigfile
$ du -h bigfile
du -h bigfile
8.2M bigfile

---------- grep.pl ----------
#!/usr/local/bin/perl
open(F, 'bigfile') or die;

while(<F>) {
s/[\n\r]+$//;
print "$_\n" if m/destroy/oi;
}
---------- END ----------
---------- grep.py ----------
#!/usr/bin/env python
import re
r = re.compile(r'destroy', re.IGNORECASE)

for s in file('bigfile'):
if r.search(s): print s.rstrip("\r\n")
---------- END ----------

$ time perl grep.pl > pl.out; time python grep.py > py.out
real 0m0.168s
user 0m0.149s
sys 0m0.015s

real 0m0.450s
user 0m0.374s
sys 0m0.068s
# I used python2.5 and perl 5.8.6

I'm thankful for the Python version or else, I'd never have guessed what
that code was supposed to do!

Try that :
---------- grep.py ----------
#!/usr/bin/env python
import re
def main():
search = re.compile(r'destroy', re.IGNORECASE).search

for s in file('bigfile'):
if search(s): print s.rstrip("\r\n")

main()
---------- END ----------

Nick Craig-Wood · Dec 30, 2006

js said:
Just my curiosity.
Can python beats perl at speed of grep-like processing?

$ wget http://www.gutenberg.org/files/7999/7999-h.zip
$ unzip 7999-h.zip
$ cd 7999-h
$ cat *.htm > bigfile
$ du -h bigfile
du -h bigfile
8.2M bigfile

#!/usr/local/bin/perl
open(F, 'bigfile') or die;

while(<F>) {
s/[\n\r]+$//;
print "$_\n" if m/destroy/oi;
}
#!/usr/bin/env python
import re
r = re.compile(r'destroy', re.IGNORECASE)

for s in file('bigfile'):
if r.search(s): print s.rstrip("\r\n")

$ time perl grep.pl > pl.out; time python grep.py > py.out
real 0m0.168s
user 0m0.149s
sys 0m0.015s

real 0m0.450s
user 0m0.374s
sys 0m0.068s
# I used python2.5 and perl 5.8.6

Playing for the other side temporarily, this is nearly twice as fast...

$ time perl -lne 'print if m/destroy/oi' bigfile >pl.out
real 0m0.133s
user 0m0.120s
sys 0m0.012s

vs

$ time ./z.pl >pl.out.orig
real 0m0.223s
user 0m0.208s
sys 0m0.016s

Which gives the same output modulo a few \r

Bruno Desthuilliers · Jan 2, 2007

js a écrit :

Just my curiosity.
Can python beats perl at speed of grep-like processing?

Probably not.

$ wget http://www.gutenberg.org/files/7999/7999-h.zip
$ unzip 7999-h.zip
$ cd 7999-h
$ cat *.htm > bigfile
$ du -h bigfile
du -h bigfile
8.2M bigfile

---------- grep.pl ----------
#!/usr/local/bin/perl
open(F, 'bigfile') or die;

while(<F>) {
s/[\n\r]+$//;
print "$_\n" if m/destroy/oi;
}
---------- END ----------
---------- grep.py ----------
#!/usr/bin/env python
import re
r = re.compile(r'destroy', re.IGNORECASE)

for s in file('bigfile'):
if r.search(s): print s.rstrip("\r\n")
---------- END ----------

Please notice that you're also benchmarking IO here - and perl seems to
use a custom, highly optimized IO lib, that is much much faster than the
system's one. I once made a Q&D cat-like comparison of perl, Python and
C on my gentoo-linux box, and the perl version was insanely faster than
the C one.

Now the real question is IMHO: is the Python version fast enough ?

My 2 cents..

Fredrik Lundh · Jan 3, 2007

footnote: if you're searching for literal strings with Python 2.5, using "in" is a
lot faster than using re.search.

</F>

perl and sendmail speed problem	8	Feb 8, 2010
speed problems	15	Jun 3, 2004
toy list processing problem: collect similar terms	43	Sep 26, 2010
processing input from multiple files	2	Oct 14, 2010
Should I write my own Grep or Index function ?	3	Jul 7, 2005
size of block device by ftell()	2	Nov 20, 2007
Strange speed-increase by separating "if"s	4	Jul 21, 2005
Question regarding lists and regex	2	Nov 9, 2006

Can I beat perl at grep-like processing speed?

js

Christophe Cavalaria

Nick Craig-Wood

Bruno Desthuilliers

Fredrik Lundh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads