Can I beat perl at grep-like processing speed?


J

js

Just my curiosity.
Can python beats perl at speed of grep-like processing?


$ wget http://www.gutenberg.org/files/7999/7999-h.zip
$ unzip 7999-h.zip
$ cd 7999-h
$ cat *.htm > bigfile
$ du -h bigfile
du -h bigfile
8.2M bigfile

---------- grep.pl ----------
#!/usr/local/bin/perl
open(F, 'bigfile') or die;

while(<F>) {
s/[\n\r]+$//;
print "$_\n" if m/destroy/oi;
}
---------- END ----------
---------- grep.py ----------
#!/usr/bin/env python
import re
r = re.compile(r'destroy', re.IGNORECASE)

for s in file('bigfile'):
if r.search(s): print s.rstrip("\r\n")
---------- END ----------

$ time perl grep.pl > pl.out; time python grep.py > py.out
real 0m0.168s
user 0m0.149s
sys 0m0.015s

real 0m0.450s
user 0m0.374s
sys 0m0.068s
# I used python2.5 and perl 5.8.6
 
Ad

Advertisements

C

Christophe Cavalaria

js said:
Just my curiosity.
Can python beats perl at speed of grep-like processing?


$ wget http://www.gutenberg.org/files/7999/7999-h.zip
$ unzip 7999-h.zip
$ cd 7999-h
$ cat *.htm > bigfile
$ du -h bigfile
du -h bigfile
8.2M bigfile

---------- grep.pl ----------
#!/usr/local/bin/perl
open(F, 'bigfile') or die;

while(<F>) {
s/[\n\r]+$//;
print "$_\n" if m/destroy/oi;
}
---------- END ----------
---------- grep.py ----------
#!/usr/bin/env python
import re
r = re.compile(r'destroy', re.IGNORECASE)

for s in file('bigfile'):
if r.search(s): print s.rstrip("\r\n")
---------- END ----------

$ time perl grep.pl > pl.out; time python grep.py > py.out
real 0m0.168s
user 0m0.149s
sys 0m0.015s

real 0m0.450s
user 0m0.374s
sys 0m0.068s
# I used python2.5 and perl 5.8.6
I'm thankful for the Python version or else, I'd never have guessed what
that code was supposed to do!

Try that :
---------- grep.py ----------
#!/usr/bin/env python
import re
def main():
search = re.compile(r'destroy', re.IGNORECASE).search

for s in file('bigfile'):
if search(s): print s.rstrip("\r\n")

main()
---------- END ----------
 
N

Nick Craig-Wood

js said:
Just my curiosity.
Can python beats perl at speed of grep-like processing?

$ wget http://www.gutenberg.org/files/7999/7999-h.zip
$ unzip 7999-h.zip
$ cd 7999-h
$ cat *.htm > bigfile
$ du -h bigfile
du -h bigfile
8.2M bigfile

#!/usr/local/bin/perl
open(F, 'bigfile') or die;

while(<F>) {
s/[\n\r]+$//;
print "$_\n" if m/destroy/oi;
}
#!/usr/bin/env python
import re
r = re.compile(r'destroy', re.IGNORECASE)

for s in file('bigfile'):
if r.search(s): print s.rstrip("\r\n")

$ time perl grep.pl > pl.out; time python grep.py > py.out
real 0m0.168s
user 0m0.149s
sys 0m0.015s

real 0m0.450s
user 0m0.374s
sys 0m0.068s
# I used python2.5 and perl 5.8.6

Playing for the other side temporarily, this is nearly twice as fast...

$ time perl -lne 'print if m/destroy/oi' bigfile >pl.out
real 0m0.133s
user 0m0.120s
sys 0m0.012s

vs

$ time ./z.pl >pl.out.orig
real 0m0.223s
user 0m0.208s
sys 0m0.016s

Which gives the same output modulo a few \r
 
B

Bruno Desthuilliers

js a écrit :
Just my curiosity.
Can python beats perl at speed of grep-like processing?

Probably not.
$ wget http://www.gutenberg.org/files/7999/7999-h.zip
$ unzip 7999-h.zip
$ cd 7999-h
$ cat *.htm > bigfile
$ du -h bigfile
du -h bigfile
8.2M bigfile

---------- grep.pl ----------
#!/usr/local/bin/perl
open(F, 'bigfile') or die;

while(<F>) {
s/[\n\r]+$//;
print "$_\n" if m/destroy/oi;
}
---------- END ----------
---------- grep.py ----------
#!/usr/bin/env python
import re
r = re.compile(r'destroy', re.IGNORECASE)

for s in file('bigfile'):
if r.search(s): print s.rstrip("\r\n")
---------- END ----------

Please notice that you're also benchmarking IO here - and perl seems to
use a custom, highly optimized IO lib, that is much much faster than the
system's one. I once made a Q&D cat-like comparison of perl, Python and
C on my gentoo-linux box, and the perl version was insanely faster than
the C one.

Now the real question is IMHO: is the Python version fast enough ?

My 2 cents..
 
Ad

Advertisements

F

Fredrik Lundh

footnote: if you're searching for literal strings with Python 2.5, using "in" is a
lot faster than using re.search.

</F>
 

Top