File::Find is slower than using recursion!?

S

Steve Allan

One of my colleagues who is new to Perl wrote a script that used
recursion to do what File::Find is designed to do (he didn't know
about File::Find). He sent it to me, and I sent him back a new
version using File::Find. He later wrote back that my version was
SLOWER!

Skeptical, I reduced both scripts to versions that simply search for
files and put them into an array. I then benchmarked the two and he
was right. Here's what cmpthese() reported using 20 iterations:

s/iter File::Find Recursion
File::Find 5.19 -- -70%
Recursion 1.58 229% --

Is poor performance a known problem with File::Find, or am I using it
improperly? It's hard to believe it could be slower than recursion.
Any insights on why I'm seeing such a discrepancy?

I'm using ActiveState perl, version 5.8.0 on Window XP.

Below is the benchmark script I used. The variable $rootdir is set to
a directory on my machine that contains over 12000 files and numerous
subdirectories. You should only have to modify that one line to run
the script - if you're so inclined :^)

Thanks.

--
-- Steve

#!/usr/bin/perl -w
use strict;
use Benchmark qw:)all) ;

my $rootdir = 'd:/projects/vhi/griffin/dev';
chdir $rootdir or die "Can't change to $rootdir: $!";

cmpthese (20, {
'File::Find' => \&findfiles,
'Recursion' => \&recurse,
}
);

#findfiles();
#recurse();



#======================================================================
# New version uses File::Find
#======================================================================
sub findfiles {
use File::Find;
my @flist;
# search for files not in have list
find (sub { push @flist => $File::Find::name if -f }, '.');
print "In Findfiles: found ", scalar @flist, " files\n";
}

#======================================================================
# Old version uses recursion
#======================================================================
my @globallist;
sub recurse {
use Cwd;

@globallist = ();

# recurse through sub directories checking for known files
opendir CURRENTDIR, "." or die "Cannot open directory ", cwd(), ":! ";
my @files = grep !/^\.\.?$/, readdir CURRENTDIR;
closedir CURRENTDIR;
for (@files) {
checkFile($_, '.');
}
print "In Recurse: found ", scalar @globallist, " files\n";
}

sub checkFile {
my $fileToCheck = $_[0];
my $currentP4Dir = $_[1];

# first see if it is a directory
if ( chdir $fileToCheck == 1 ) {
$currentP4Dir = $currentP4Dir."/".$fileToCheck;
# go through the new directory
opendir CURRENTDIR, "." or die "Cannot open directory ".cwd();
my @files = grep !/^\.\.?$/, readdir CURRENTDIR;
closedir CURRENTDIR;
for (@files) {
checkFile($_, $currentP4Dir);
}
# after finishing, go up a directory
chdir "..";
$currentP4Dir =~ s/^(.*)\/.*$/$1/;
}
else {
#print "$fileToCheck\n";
push @globallist => $fileToCheck;


}
}
#=================================== EOF ======================================
 
S

Sam Holden

One of my colleagues who is new to Perl wrote a script that used
recursion to do what File::Find is designed to do (he didn't know
about File::Find). He sent it to me, and I sent him back a new
version using File::Find. He later wrote back that my version was
SLOWER!

Skeptical, I reduced both scripts to versions that simply search for
files and put them into an array. I then benchmarked the two and he
was right. Here's what cmpthese() reported using 20 iterations:

s/iter File::Find Recursion
File::Find 5.19 -- -70%
Recursion 1.58 229% --

Is poor performance a known problem with File::Find, or am I using it
improperly? It's hard to believe it could be slower than recursion.
Any insights on why I'm seeing such a discrepancy?

I'll take correct over fast any day.

The recursive version doesn't work in the presence of symlink loops.
Where doesn't work means infinitely recurses (well I guess it will run
out of stack or memory at some point).

I'll also take 1 line of code over 30 odd lines, and worry about
speed if it turns out to be not fast enough for a given task.

I've never used File::Find so I don't know if your usage is correct or
not...

[snip code]
 
S

Steve Allan

(e-mail address removed) (Sam Holden) writes:

I'll take correct over fast any day.

No argument there.
The recursive version doesn't work in the presence of symlink loops.
Where doesn't work means infinitely recurses (well I guess it will run
out of stack or memory at some point).

I'll also take 1 line of code over 30 odd lines, and worry about
speed if it turns out to be not fast enough for a given task.

I Agree here too. I wasn't considering the use of the home-grown
recursive solution over the File::Find module. I was more interested
in posting a reproducible example of what seems to be poor performance
to see if someone had any insight as to why File::Find is so much
slower than recursion. Your point about symlinks may be hint.
Perhaps File::Find is just flat out doing more and that accounts for
the extra time. It could also be that my sense that recursion is
inherently slow is outdated and just not true in Perl.

Anyone else have some thoughts on this?

Thanks.
 
B

Benjamin Goldberg

Steve said:
(e-mail address removed) (Sam Holden) writes:



No argument there.


I Agree here too. I wasn't considering the use of the home-grown
recursive solution over the File::Find module. I was more interested
in posting a reproducible example of what seems to be poor performance
to see if someone had any insight as to why File::Find is so much
slower than recursion. Your point about symlinks may be hint.
Perhaps File::Find is just flat out doing more and that accounts for
the extra time.

Yes, it's doing quite a bit more.
It could also be that my sense that recursion is
inherently slow is outdated and just not true in Perl.

Recursion is most certainly *not* inherently slow. I can't think of a
time when it was... at least, not in perl.
 
S

Steve Allan

It appears symlinks are the main issue here.

Sam correctly pointed out that the recursive version didn't address
symlinks. That turned out to be more significant than I initially
realized (being on a windows box). I tried benchmarking this version
from Ben:
Benjamin Goldberg <[email protected]> writes:
Steve Allan wrote:
[snip]

You could go even faster, if you used neither recursion nor File::Find.

sub get_files {
my (@dirs, @flist) = ".";
while( my $dir = pop @dirs ) {
opendir( my $dh, $dir ) or die "opendir($dir): $!";
for my $f (readdir $dh) {
next if $f eq "." or $f eq "..";
$f = "$dir/$f";
push @dirs, $f if !-l $f and -d _;
push @flist, $f if -f $f;
}
}
print "In GetFiles: found ", scalar @flist, " files\n";
}

and recursion was still much faster. Then I modified the recursive
function to check for symlinks, namely I changed

if ( chdir $fileToCheck ) { .. }

to

if ( not -l $fileToCheck and -d $fileToCheck ) {
chdir $fileToCheck or die "Can't cd to $fileToCheck: $! ";
...

and with that one change, the recursive version is the slowest. Here
are the benchmarks for all three:


s/iter Recursion File::Find Get_Files
Recursion 9.13 -- -23% -31%
File::Find 7.05 30% -- -10%
Get_Files 6.34 44% 11% --


It would seem that, on windows at least, just checking for a symlink
is expensive. Is this because a Perl script running on windows still
has to be able to recognize symlinks on samba-mounted file systems
that support symlinks?

Whatever the reason, the above numbers seem much more in line with my
expectations.

Thanks for all the feedback. This has been very educational for me.
 
C

Chris Mattern

David said:
From: "Benjamin Goldberg" <[email protected]>
Subject: Re: File::Find is slower than using recursion!?





<A lot of other stuff snipped>

Recursion was slow in Pascal as implemented on old Apple IIe's. I think the
issue was the overhead requried to push another function onto the stack.

And then, of course, there was the old IBM mainframe architecture--which
*didn't have a stack!* Lotsa fun, that was. Let me tell ya about SAVEAREA...

Chris Mattern
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top