*fastest* was to get a large directory listing in Perl

S

Seth Brundle

There are several methods of getting a large directory listing (3000+ files
in a single directory) in Perl, but all the methods I've tried (<*>,
readdir) are vastly slower in my usage then using readdir in C.

This doesnt seem to make sense, since I imagine perl is just making the same
system call.

Opinions appreciated...
 
A

A. Sinan Unur

There are several methods of getting a large directory listing (3000+
files in a single directory) in Perl, but all the methods I've tried
(<*>, readdir) are vastly slower in my usage then using readdir in C.

This doesnt seem to make sense, since I imagine perl is just making
the same system call.

Maybe you are using readdir incorrectly?

What is your notion of fast & slow?

How do you measure fast and slow?

D:\Home\asu1\UseNet\clpmisc\r> dir
....
09/21/2005 06:14 PM 0 file998
09/21/2005 06:14 PM 0 file999
09/21/2005 06:14 PM 241 myt.pl
09/21/2005 06:18 PM 266 test.pl
3002 File(s) 507 bytes

D:\Home\asu1\UseNet\clpmisc\r> cat test.pl
#!/usr/bin/perl

use strict;
use warnings;

use Benchmark;

sub ls {
opendir my $dir, '.' or die "Cannot opendir '.': $!";
my @files = readdir $dir;
closedir $dir or die "Cannot closedir '.': $!";
}

timethese -1, { ls => \&ls };

__END__

D:\Home\asu1\UseNet\clpmisc\r> test
Benchmark: running ls for at least 1 CPU seconds...
ls: 1 wallclock secs ( 0.76 usr + 0.31 sys = 1.08 CPU) @ 99.35/s
(n=107)

This is on Windows XPSP2, AMD64 running at 1.8Ghz, 1Gb RAM, about 402 MB
allocated.

What is the equivalent C program you tested?

Sinan
 
X

xhoster

Seth Brundle said:
There are several methods of getting a large directory listing (3000+
files in a single directory) in Perl, but all the methods I've tried
(<*>, readdir) are vastly slower in my usage then using readdir in C.

This doesnt seem to make sense, since I imagine perl is just making the
same system call.

Perl first has to determine if readdir is in a list or a string context and
has to unwrap the stack. Then it has to make the same system call as C
does. Then it has to copy the contents of the char* "foo.d_name" someplace
safe (unlike C's readdir), and package that up into a perl scalar, and push
that onto the return stack. And in a list context, it has to do that
repeatedly.
Opinions appreciated...

It is possible you are doing something silly, like calling the underlying
system call 9,000,000+ times. If you posted code (both C and Perl would be
nice, if you want us to do the comparison) (and actual time measurements,
rather than just "vastly slower") we could offer more informed opinions.

Xho
 
X

xhoster

Seth Brundle said:
There are several methods of getting a large directory listing (3000+
files in a single directory) in Perl, but all the methods I've tried
(<*>, readdir) are vastly slower in my usage then using readdir in C.

This doesnt seem to make sense, since I imagine perl is just making the
same system call.

Perl first has to determine if readdir is in a list or a scalar context and
has to unwrap the stack. Then it has to make the same system call as C
does. Then it has to copy the contents of the char* "foo.d_name" someplace
safe (unlike C's readdir), and package that up into a perl scalar, and push
that onto the return stack. And in a list context, it has to do that
repeatedly.
Opinions appreciated...

It is possible you are doing something silly, like calling the underlying
system call 9,000,000+ times. If you posted code (both C and Perl would be
nice, if you want us to do the comparison) (and actual time measurements,
rather than just "vastly slower") we could offer more informed opinions.

Xho
 
A

A. Sinan Unur

....
snip Perl code
....
D:\Home\asu1\UseNet\clpmisc\r> test
Benchmark: running ls for at least 1 CPU seconds...
ls: 1 wallclock secs ( 0.76 usr + 0.31 sys = 1.08 CPU) @ 99.35/s
(n=107)

This is on Windows XPSP2, AMD64 running at 1.8Ghz, 1Gb RAM, about 402
MB allocated.

So we get about 100 readdirs in list context per second.
What is the equivalent C program you tested?

The following C program is really not the equivalent of the Perl program
I posted, but it does copy the names, and creates a list of file names
etc.

I first ran a do-nothing version which called an empty ls() function 100
times to get a baseline timing. The time reported by the Windows'
timethis utility reported an average of 0.16 seconds.

Then I wrote the following:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


#include <sys/types.h>
#include <dirent.h>

void ls(size_t num_files) {
struct dirent *ent;
DIR *dir;
size_t f;

char **list = malloc(1 + num_files * sizeof(*list) );
if( !list ) {
fprintf(stderr, "Memory allocation error\n");
exit(EXIT_FAILURE);
}

dir = opendir(".");
if( !dir ) {
perror("Cannot open '.'");
}

for(f = 0; f != num_files; ++f) {
char *d_name;
ent = readdir(dir);
if( !ent ) {
break;
}
d_name = malloc(1 + strlen(ent->d_name));
if( !d_name ) {
break;
}
strcpy(d_name, ent->d_name);
list[f] = d_name;
}

list[f] = NULL;
}

int main(void) {
struct dirent *ent;
size_t num_files = 0;

DIR *dir = opendir(".");
if( !dir ) {
perror("Cannot open '.'");
}

while(ent = readdir(dir)) {
++num_files;
}

if( closedir(dir) ) {
perror("Cannot close '.'");
}

{
int i;
for(i = 0; i != 100; ++i) {
ls(num_files);
}
}

return 0;
}

D:\Home\asu1\UseNet\clpmisc\r> gcc -Wall -O2 r.c -o r.exe

D:\Home\asu1\UseNet\clpmisc\r> timethis r.exe

TimeThis : Command Line : r.exe
TimeThis : Start Time : Wed Sep 21 20:20:45 2005
TimeThis : End Time : Wed Sep 21 20:20:47 2005
TimeThis : Elapsed Time : 00:00:01.640

So, again, we get about 100 readdirs per second in list context (so to
speak). Now, clearly, I am not a great C programmer, but I would be
interested to see the C program that generates the vastly superior
timings.

Sinan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top