Huge performance gain compared to perl while loading a text file ina list ...!?

M

Marc H.

Hello,

I recently converted one of my perl scripts to python. What the script
does is simply search a lot of big mail files (~40MB) to retrieve
specific emails. I simply converted the script line by line to python,
keeping the algorithms & functions as they were in perl (no
optimization). The purpose was mainly to learn python and see the
differences with perl.

Now, once the converted script was finished, I was amazed to find that
the python version is running 8 times faster (800% faster!). Needless
to say, I was very intrigued and wanted to know what causes such a
performance gap between the two versions. So to keep my story short,
after some research and a few tests, I found that file IO is mainly
the cause of the performance diff.

I made two short test scripts, one in perl and one in python (see
below), and compared the performance difference. As we can see, the
bigger the file the larger the difference in performance....

I'm fairly new to python, and don't know much of its inner working so
I wonder if someone could explain to me why it is so much faster in
python to open a file and load it in a list/array ?

Thanks


-----
#!/usr/bin/python

for i in range(20):
Data = open('data.test').readlines()

-----
#!/usr/bin/perl

for ($i = 0; $i < 20; $i++) {
open(DATA, "data.test");
@Data = <DATA>;
close(DATA);
}

-----
Running tests (data.test = 10MB text file):

blop@moya blop $ time ./ftest.py
real 0m6.408s
user 0m4.552s
sys 0m1.826s

blop@moya blop $ time ./ftest.pl
real 0m22.855s
user 0m21.946s
sys 0m0.822s

-----
Running tests (data.test = 40MB text file):

blop@moya blop $ time ./ftest.py
real 0m26.235s
user 0m18.238s
sys 0m7.872s

blop@moya blop $ time ./ftest.pl
real 3m26.741s
user 3m22.168s
sys 0m3.764s
 
P

Paul Rubin

Marc H. said:
I'm fairly new to python, and don't know much of its inner working so
I wonder if someone could explain to me why it is so much faster in
python to open a file and load it in a list/array ?

My guess is readlines() in Python is separating on newlines while Perl
is doing a regexp scan for the RS string (I forget what it's called in
Perl).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top