Big speed boost in os.walk in Python 2.5

L

looping

Hi,
I noticed a big speed improvement in some of my script that use os.walk
and I write a small script to check it:
import os
for path, dirs, files in os.walk('D:\\FILES\\'):
pass

Results on Windows XP after some run to fill the disk cache (with
~59000 files and ~3500 folders):
Python 2.4.3 : 45s
Python 2.5 : 10s

Very nice, but somewhat strange...
Is Python 2.4.3 os.walk buggy ???
Is this results only valid in Windows or *nix system show the same
difference ?
The profiler show that most of time is spend in ntpath.isdir and this
function is *a lot* faster in Python 2.5.
Maybe this improvement could be backported in Python 2.4 branch for the
next release ?


Python 2.4.3
604295 function calls (587634 primitive calls) in 48.629 CPU
seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
62554 0.264 0.000 0.264 0.000 :0(append)
1 0.001 0.001 48.593 48.593 :0(execfile)
66074 0.197 0.000 0.197 0.000 :0(len)
3521 5.219 0.001 5.219 0.001 :0(listdir)
1 0.036 0.036 0.036 0.036 :0(setprofile)
62554 38.812 0.001 38.812 0.001 :0(stat)
1 0.000 0.000 48.593 48.593 <string>:1(?)
66074 0.218 0.000 0.218 0.000 ntpath.py:116(splitdrive)
3520 0.009 0.000 0.009 0.000 ntpath.py:246(islink)
62554 0.767 0.000 40.137 0.001 ntpath.py:268(isdir)
66074 0.433 0.000 0.650 0.000 ntpath.py:51(isabs)
66074 0.880 0.000 1.726 0.000 ntpath.py:59(join)
20183/3522 1.217 0.000 48.573 0.014 os.py:211(walk)
1 0.000 0.000 48.629 48.629
profile:0(execfile('test.py'))
0 0.000 0.000 profile:0(profiler)
62554 0.174 0.000 0.174 0.000 stat.py:29(S_IFMT)
62554 0.385 0.000 0.559 0.000 stat.py:45(S_ISDIR)
1 0.019 0.019 48.592 48.592 test.py:1(?)


Python 2.5:
604295 function calls (587634 primitive calls) in 17.386 CPU
seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
62554 0.247 0.000 0.247 0.000 :0(append)
1 0.001 0.001 17.315 17.315 :0(execfile)
66074 0.168 0.000 0.168 0.000 :0(len)
3521 5.287 0.002 5.287 0.002 :0(listdir)
1 0.071 0.071 0.071 0.071 :0(setprofile)
62554 7.812 0.000 7.812 0.000 :0(stat)
1 0.000 0.000 17.315 17.315 <string>:1(<module>)
66074 0.186 0.000 0.186 0.000 ntpath.py:116(splitdrive)
3520 0.009 0.000 0.009 0.000 ntpath.py:245(islink)
62554 0.712 0.000 9.013 0.000 ntpath.py:267(isdir)
66074 0.394 0.000 0.581 0.000 ntpath.py:51(isabs)
66074 0.815 0.000 1.564 0.000 ntpath.py:59(join)
20183/3522 1.176 0.000 17.296 0.005 os.py:218(walk)
1 0.000 0.000 17.386 17.386
profile:0(execfile('test.py'))
0 0.000 0.000 profile:0(profiler)
62554 0.159 0.000 0.159 0.000 stat.py:29(S_IFMT)
62554 0.331 0.000 0.489 0.000 stat.py:45(S_ISDIR)
1 0.018 0.018 17.314 17.314 test.py:1(<module>)
 
F

Fredrik Lundh

looping said:
Results on Windows XP after some run to fill the disk cache (with
~59000 files and ~3500 folders):
Python 2.4.3 : 45s
Python 2.5 : 10s

Very nice, but somewhat strange...
Is Python 2.4.3 os.walk buggy ???

No. A few "os" function are now implemented in terms of Windows API:s,
instead of using Microsoft C's POSIX compatibility layer. This includes
os.stat(), which is what isdir() uses to check if something is a
directory. The code was rewritten to work around problems with
timestamps, so the speedup is purely a side effect.
Is this results only valid in Windows or *nix system show the same
difference ?

On Unix system, Python uses POSIX API:s, not Windows API:s.
The profiler show that most of time is spend in ntpath.isdir and this
function is *a lot* faster in Python 2.5.

Why are you asking if something's buggy when you've already figured out
what's been improved?
Maybe this improvement could be backported in Python 2.4 branch for the
next release ?

It's not really broken, so that's not very likely.

</F>
 
L

looping

Fredrik said:
Why are you asking if something's buggy when you've already figured out
what's been improved?
You're right, buggy isn't the right word...

Anyway thanks for your detailed informations and I'm very pleased with
the performance improvement even if it's only a side effect and only on
Windows.
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

looping said:
Maybe this improvement could be backported in Python 2.4 branch for the
next release ?

As Fredrik explains, this is probably the side-effect of a from-scratch
rewrite of the relevant functions. Another (undesirable) side-effect is
that the resulting binary won't work on Windows 95 anymore. So
backporting it as-is is out of the question.

However, even if the patch was improved to still work on W9x, and to not
introduce the other behavioral changes that came with the rewrite, it
still couldn't go into 2.4.x. Likely, 2.4.4 is the final 2.4 release,
and the release candidate for that was already produced.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top