*fastest* was to get a large directory listing in Perl

Discussion in 'Perl Misc' started by Seth Brundle, Sep 21, 2005.

  1. Seth Brundle

    Seth Brundle Guest

    There are several methods of getting a large directory listing (3000+ files
    in a single directory) in Perl, but all the methods I've tried (<*>,
    readdir) are vastly slower in my usage then using readdir in C.

    This doesnt seem to make sense, since I imagine perl is just making the same
    system call.

    Opinions appreciated...
    Seth Brundle, Sep 21, 2005
    #1
    1. Advertising

  2. "Seth Brundle" <> wrote in
    news::

    > There are several methods of getting a large directory listing (3000+
    > files in a single directory) in Perl, but all the methods I've tried
    > (<*>, readdir) are vastly slower in my usage then using readdir in C.
    >
    > This doesnt seem to make sense, since I imagine perl is just making
    > the same system call.


    Maybe you are using readdir incorrectly?

    What is your notion of fast & slow?

    How do you measure fast and slow?

    D:\Home\asu1\UseNet\clpmisc\r> dir
    ....
    09/21/2005 06:14 PM 0 file998
    09/21/2005 06:14 PM 0 file999
    09/21/2005 06:14 PM 241 myt.pl
    09/21/2005 06:18 PM 266 test.pl
    3002 File(s) 507 bytes

    D:\Home\asu1\UseNet\clpmisc\r> cat test.pl
    #!/usr/bin/perl

    use strict;
    use warnings;

    use Benchmark;

    sub ls {
    opendir my $dir, '.' or die "Cannot opendir '.': $!";
    my @files = readdir $dir;
    closedir $dir or die "Cannot closedir '.': $!";
    }

    timethese -1, { ls => \&ls };

    __END__

    D:\Home\asu1\UseNet\clpmisc\r> test
    Benchmark: running ls for at least 1 CPU seconds...
    ls: 1 wallclock secs ( 0.76 usr + 0.31 sys = 1.08 CPU) @ 99.35/s
    (n=107)

    This is on Windows XPSP2, AMD64 running at 1.8Ghz, 1Gb RAM, about 402 MB
    allocated.

    What is the equivalent C program you tested?

    Sinan

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, Sep 21, 2005
    #2
    1. Advertising

  3. Seth Brundle

    Guest

    "Seth Brundle" <> wrote:
    > There are several methods of getting a large directory listing (3000+
    > files in a single directory) in Perl, but all the methods I've tried
    > (<*>, readdir) are vastly slower in my usage then using readdir in C.
    >
    > This doesnt seem to make sense, since I imagine perl is just making the
    > same system call.


    Perl first has to determine if readdir is in a list or a string context and
    has to unwrap the stack. Then it has to make the same system call as C
    does. Then it has to copy the contents of the char* "foo.d_name" someplace
    safe (unlike C's readdir), and package that up into a perl scalar, and push
    that onto the return stack. And in a list context, it has to do that
    repeatedly.

    > Opinions appreciated...


    It is possible you are doing something silly, like calling the underlying
    system call 9,000,000+ times. If you posted code (both C and Perl would be
    nice, if you want us to do the comparison) (and actual time measurements,
    rather than just "vastly slower") we could offer more informed opinions.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Sep 21, 2005
    #3
  4. Seth Brundle

    Guest

    "Seth Brundle" <> wrote:
    > There are several methods of getting a large directory listing (3000+
    > files in a single directory) in Perl, but all the methods I've tried
    > (<*>, readdir) are vastly slower in my usage then using readdir in C.
    >
    > This doesnt seem to make sense, since I imagine perl is just making the
    > same system call.


    Perl first has to determine if readdir is in a list or a scalar context and
    has to unwrap the stack. Then it has to make the same system call as C
    does. Then it has to copy the contents of the char* "foo.d_name" someplace
    safe (unlike C's readdir), and package that up into a perl scalar, and push
    that onto the return stack. And in a list context, it has to do that
    repeatedly.

    > Opinions appreciated...


    It is possible you are doing something silly, like calling the underlying
    system call 9,000,000+ times. If you posted code (both C and Perl would be
    nice, if you want us to do the comparison) (and actual time measurements,
    rather than just "vastly slower") we could offer more informed opinions.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Sep 21, 2005
    #4
  5. "A. Sinan Unur" <> wrote in
    news:Xns96D8BAD449E5Fasu1cornelledu@127.0.0.1:

    > "Seth Brundle" <> wrote in
    > news::
    >
    >> There are several methods of getting a large directory listing (3000+
    >> files in a single directory) in Perl, but all the methods I've tried
    >> (<*>, readdir) are vastly slower in my usage then using readdir in C.

    ....
    snip Perl code
    ....

    > D:\Home\asu1\UseNet\clpmisc\r> test
    > Benchmark: running ls for at least 1 CPU seconds...
    > ls: 1 wallclock secs ( 0.76 usr + 0.31 sys = 1.08 CPU) @ 99.35/s
    > (n=107)
    >
    > This is on Windows XPSP2, AMD64 running at 1.8Ghz, 1Gb RAM, about 402
    > MB allocated.


    So we get about 100 readdirs in list context per second.

    > What is the equivalent C program you tested?


    The following C program is really not the equivalent of the Perl program
    I posted, but it does copy the names, and creates a list of file names
    etc.

    I first ran a do-nothing version which called an empty ls() function 100
    times to get a baseline timing. The time reported by the Windows'
    timethis utility reported an average of 0.16 seconds.

    Then I wrote the following:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>


    #include <sys/types.h>
    #include <dirent.h>

    void ls(size_t num_files) {
    struct dirent *ent;
    DIR *dir;
    size_t f;

    char **list = malloc(1 + num_files * sizeof(*list) );
    if( !list ) {
    fprintf(stderr, "Memory allocation error\n");
    exit(EXIT_FAILURE);
    }

    dir = opendir(".");
    if( !dir ) {
    perror("Cannot open '.'");
    }

    for(f = 0; f != num_files; ++f) {
    char *d_name;
    ent = readdir(dir);
    if( !ent ) {
    break;
    }
    d_name = malloc(1 + strlen(ent->d_name));
    if( !d_name ) {
    break;
    }
    strcpy(d_name, ent->d_name);
    list[f] = d_name;
    }

    list[f] = NULL;
    }

    int main(void) {
    struct dirent *ent;
    size_t num_files = 0;

    DIR *dir = opendir(".");
    if( !dir ) {
    perror("Cannot open '.'");
    }

    while(ent = readdir(dir)) {
    ++num_files;
    }

    if( closedir(dir) ) {
    perror("Cannot close '.'");
    }

    {
    int i;
    for(i = 0; i != 100; ++i) {
    ls(num_files);
    }
    }

    return 0;
    }

    D:\Home\asu1\UseNet\clpmisc\r> gcc -Wall -O2 r.c -o r.exe

    D:\Home\asu1\UseNet\clpmisc\r> timethis r.exe

    TimeThis : Command Line : r.exe
    TimeThis : Start Time : Wed Sep 21 20:20:45 2005
    TimeThis : End Time : Wed Sep 21 20:20:47 2005
    TimeThis : Elapsed Time : 00:00:01.640

    So, again, we get about 100 readdirs per second in list context (so to
    speak). Now, clearly, I am not a great C programmer, but I would be
    interested to see the C program that generates the vastly superior
    timings.

    Sinan


    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, Sep 22, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JD

    Directory Listing

    JD, Aug 24, 2003, in forum: Perl
    Replies:
    0
    Views:
    1,129
  2. Dan King
    Replies:
    1
    Views:
    148
    Jeff Cochran
    Jan 4, 2005
  3. Carlos Diaz

    how to get a http directory listing

    Carlos Diaz, Jul 13, 2005, in forum: Ruby
    Replies:
    2
    Views:
    115
  4. Chris Rebert
    Replies:
    0
    Views:
    269
    Chris Rebert
    Oct 28, 2012
  5. George Mpouras

    the fastest way to create a directory

    George Mpouras, Jul 15, 2013, in forum: Perl Misc
    Replies:
    36
    Views:
    428
    George Mpouras
    Jul 23, 2013
Loading...

Share This Page