finding subdirectories without parsing every file

Discussion in 'Perl Misc' started by Helen, Aug 14, 2003.

  1. Helen

    Helen Guest

    Hi

    Is there any way to get the subdirectories of a directory without
    having to sort through all the files in a directory?

    I'm actually building a little perl script that looks at the
    directories and then prints out a directory tree (as a webpage).

    I've been using file::find to generate the directory tree but it's too
    slow. I think the problem is that it looks at each file in the
    directory. I'm not interested in what's in the directory, I just want
    to know what the subdirectories are.

    It takes about 30 seconds to build the directory tree on some of the
    larger sites and the directory searching seems to be where the
    bottlekneck is. That's compared to around 5 seconds to just download
    the file.

    Thanks :)

    Helen
    Helen, Aug 14, 2003
    #1
    1. Advertising

  2. Helen

    Tore Aursand Guest

    On Thu, 14 Aug 2003 00:45:26 -0700, Helen wrote:
    > Is there any way to get the subdirectories of a directory without
    > having to sort through all the files in a directory?


    Why is it going slow? Maybe you could share some of your code with us, so
    that we're able to actually know what you're talking about?

    I tend to use the following code when "filtering out" directories;

    #!/usr/bin/perl
    #
    use strict;
    use warnings;
    use File::Find;

    my $root = '/home/tore';
    my @dirs = ();
    find sub { push(@dirs, $File::Find::name) if -d }, $root;

    I guess there are faster ways to do it (as always), but this solution does
    it for me.


    --
    Tore Aursand <>
    Tore Aursand, Aug 14, 2003
    #2
    1. Advertising

  3. Helen

    danglesocket Guest

    >>> Helen<> 8/14/2003 3:45:26 AM >>>
    Hi

    Is there any way to get the subdirectories of a directory without
    having to sort through all the files in a directory?

    I'm actually building a little perl script that looks at the
    directories and then prints out a directory tree (as a webpage).

    I've been using file::find to generate the directory tree but it's too
    slow. I think the problem is that it looks at each file in the
    directory. I'm not interested in what's in the directory, I just want
    to know what the subdirectories are.

    -this should do what you want, it's kind of like a 'tree' cmd on win but
    *nix like.
    ( none of those lines ).
    #!/usr/bin/perl -w

    use strict;
    use warnings;

    #yeah, 20 minutes later
    #my $path = '/';
    my $path = '/cygdrive/h/';
    #my $path = '/home/dangle/';
    &read_dir($path);


    sub open_dir {
    my $dir = shift;
    my $path;
    chdir($dir) or die "can't opendir $dir: $!";
    opendir(DIR, $dir) || die "can't opendir $dir: $!";
    $path = `pwd`;
    my @dir_o = grep { !/^\./ && -d "$dir/$_" } readdir(DIR);
    closedir DIR;
    foreach my $dir_n (@dir_o) {
    my $path_n = "$dir/$dir_n/";
    print $path_n, "\n";
    &read_dir($path_n);
    }
    }

    # did you find a directory
    # send it to open dir to get contents
    sub read_dir {
    # send pwd
    my $path = shift;
    opendir(DIR, $path) || die "can't opendir $path: $!";
    my @dir_o = grep { !/^\./ && -d "$path/$_" } readdir(DIR);
    closedir DIR;
    foreach my $dir (@dir_o) {
    print $path . $dir . "\n";
    &open_dir($path . $dir);
    }
    }






    __danglesocket__
    danglesocket, Aug 14, 2003
    #3
  4. > Purl Gurl
    > --
    >
    > #!perl
    >
    > print "Content-type: text/plain\n\n";
    >
    > $internal_path = "c:/apache/users/callgirl";


    works better if you do this:

    chdir($internal_path)
    or die "Can't chdir to $internal_path: $!\n";

    Just to show why:
    --unmodified--
    [jim@oplinux jim]$ perl news.pl
    Content-type: text/plain

    [jim@oplinux jim]$

    --with my 'die' added--
    [jim@oplinux jim]$ perl news.pl
    Content-type: text/plain

    Can't chdir to c:/apache/users/callgirl: No such file or directory
    [jim@oplinux jim]$

    But hey, you were busy and didn't have time to test
    it before posting ... right?

    Jim
    James Willmore, Aug 15, 2003
    #4
  5. > James Willmore wrote:
    >
    > > > Purl Gurl

    >
    > (snipped)
    >
    > > > $internal_path = "c:/apache/users/callgirl";

    >
    > > Can't chdir to c:/apache/users/callgirl: No such file or directory

    >
    > > But hey, you were busy and didn't have time to test
    > > it before posting ... right?

    >
    >
    > No, you are inventing a lame excuse to troll
    > evidenced by this idiocy of expecting sample
    > code to compensate for every possibility,
    > which is an insulting slap across the face
    > of a reader.

    <snip>

    No, I was pointing out an error you made and made light of it -
    because you always seem to point out the mistakes of others in a
    demeaning way. I figured that I would return the favor.
    (RE: "I am sure you boys can determine what is wrong.")

    More to the point - the results, if you had bothered to read them
    fully, pointed out that - your method did NOT return an error when
    using the 'chdir' function, the fix I added did.

    Now, what if the end user of your version changed the path, made a
    mistake in typing, and then the script didn't work. Would it not be
    more productive that the script TELL them EXACTLY what happened? You
    ALWAYS complain about the code of others - why did you not just own
    the error (okay - poor coding) and move on instead of going off on
    some ramblings about God knows what - I just ignored it (not just snip
    it in the reply)?

    If you want to continue this 'flame out', you have my email address.
    Don't take up bandwidth doing it here.
    James Willmore, Aug 15, 2003
    #5
  6. Helen

    Helen Guest

    (James Willmore) wrote in message news:<>...
    > > Is there any way to get the subdirectories of a directory without
    > > having to sort through all the files in a directory?
    > > <snip>
    > > I've been using file::find to generate the directory tree but it's too
    > > slow. I think the problem is that it looks at each file in the
    > > directory. I'm not interested in what's in the directory, I just want
    > > to know what the subdirectories are.

    >


    Thanks for the help of all who've answered my post. :)

    > Ah.... but how far down the parent directory do you wish to search?
    > File::Find has a 'finddepth' method and a multitude of options.


    I really need it to list all of the directories, no matter how deep it
    goes. I've designed the system so that it's simple to make sure that
    the directory tree doesn't go too deep, but I didn't want to enforce a
    depth because it makes the script less flexiable.

    > Post your code and maybe we can lend more assistance.


    I'm using the method below to build a "tree" structure which
    represents the directories on our web server. The main complication is
    that sites can have subsites, but in this part of the code I'm only
    looking for the subdirectories of one site. If it finds another
    subsite it stops recursing. This works because I load all the subsites
    into the tree before I load all the subdirectories.

    The directories and sites are stored in a tree object that uses the
    directory and site path to add new sites/dirs to the tree. It's then
    quite easy to recurse the bits I want when I'm printing the tree.

    On the page where I'm doing the recursing it prints out only the
    subdirectories of the site that don't belong to another subsite. So
    it's really only looking at a small part of the tree. The problem is
    that "small" is a relative term. I'm testing it with a subsite that
    has 800 subdirectories (and over 9000 files) as a worst case scenario
    (which isn't the biggest site on the server). I'm not sure I'll be
    able to get the load time to anywhere near 10 seconds, but I like
    working with such a large site because the effects of changing parts
    of the script are exagerated.

    The subsites are stored in a database, but the first thing I did was
    make sure that all the database accesses happened at the same time. So
    there are only two calls to the database (no matter how big the tree
    gets) and they both use the same database handle. The database stuff
    happens before I go looking for the subdirectories.

    my $nodePath = "$basePath/".$node->getDirectory();
    find(\&wanted, "$basePath/".$node->getDirectory());

    sub wanted {
    my $currentFile = $File::Find::name;
    if(-d $currentFile) {
    if($currentFile ne $nodePath) {
    my $newDir = $currentFile;
    $newDir =~ s/$basePath\///;

    # if this directory is actually a site,
    # we only want to recurse it
    # if we're told to by the recurseSubSites parameter
    if(!$siteTree->isNodeSite($newDir)) {
    # if this directory isn't a site,
    # add the directory to the site tree
    $siteTree->addDirectory($newDir);
    } elsif(!$recurseSubSites) {
    # we don't want to recurse any of this directory's subdirs
    $File::Find::prune = 1;
    } # end if
    } # end if
    } # end if
    } # end wanted

    Since I posted here, I've done more comparisons of how fast it runs. A
    lot of the problem is with the adding the node to the site tree and
    I'm going to try to reduce that by doing sorting within the nodes as I
    add them (and probably some other stuff too).

    However, it takes a good 10-15 seconds just to print the directories
    with the rest of the sub commented out. Perhaps I'm doing something in
    an inefficient way? Or is it that I'm going to have to live with this
    sort of speed if I'm using perl to recurse that many directories? I
    actually didn't realise that I had so many files in the directories, I
    thought it was only one or two thousand. I don't think I can rely on
    the sorting of the operating system because I'm on a unix system that
    seems to just return the files on alphabetical order.

    Anyway, any comments or suggestions about the code would be
    appreciated. I'm a bit of a newbie perl programmer so I'm just
    muddling along and don't really know if I'm doing things the best way.

    Thanks again for your help. It's given me a few more things to think
    about.

    Helen
    Helen, Aug 15, 2003
    #6
  7. (Helen) wrote in message news:<>...
    <snip>
    > On the page where I'm doing the recursing it prints out only the
    > subdirectories of the site that don't belong to another subsite. So
    > it's really only looking at a small part of the tree. The problem is
    > that "small" is a relative term. I'm testing it with a subsite that
    > has 800 subdirectories (and over 9000 files) as a worst case scenario
    > (which isn't the biggest site on the server). I'm not sure I'll be
    > able to get the load time to anywhere near 10 seconds, but I like
    > working with such a large site because the effects of changing parts
    > of the script are exagerated.


    If you want to do benchmarking, you can use the Benchmark module.
    This should give you a snapshot of how the changes you make far as far
    as time and CPU are concerned.

    >
    > The subsites are stored in a database, but the first thing I did was
    > make sure that all the database accesses happened at the same time. So
    > there are only two calls to the database (no matter how big the tree
    > gets) and they both use the same database handle. The database stuff
    > happens before I go looking for the subdirectories.
    >
    > my $nodePath = "$basePath/".$node->getDirectory();
    > find(\&wanted, "$basePath/".$node->getDirectory());
    >
    > sub wanted {
    > my $currentFile = $File::Find::name;
    > if(-d $currentFile) {
    > if($currentFile ne $nodePath) {
    > my $newDir = $currentFile;
    > $newDir =~ s/$basePath\///;
    >
    > # if this directory is actually a site,
    > # we only want to recurse it
    > # if we're told to by the recurseSubSites parameter
    > if(!$siteTree->isNodeSite($newDir)) {
    > # if this directory isn't a site,
    > # add the directory to the site tree
    > $siteTree->addDirectory($newDir);
    > } elsif(!$recurseSubSites) {
    > # we don't want to recurse any of this directory's subdirs
    > $File::Find::prune = 1;
    > } # end if
    > } # end if
    > } # end if
    > } # end wanted


    At first glance, it appears that you have everything in place to do
    what you want. Just a suggestion - given the amount of files you are
    dealing with and what you want the end result to look like, have you
    considered writing out to file in, maybe XML or CSV? This would free
    up memory and save the information you already processed in the event
    your script is killed for some reason. Then you could also just
    process the directories with one script and do something with the
    results with another. Again, it's just a suggestion and may lead to
    other issues.

    HTH

    Jim
    James Willmore, Aug 16, 2003
    #7
  8. Helen

    Helen Guest

    <snip>

    > If you want to do benchmarking, you can use the Benchmark module.
    > This should give you a snapshot of how the changes you make far as far
    > as time and CPU are concerned.


    Just wanted to thank you for this suggestion. It's made my
    optimisation a *lot* easier. :)

    Helen
    Helen, Aug 19, 2003
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thieum22
    Replies:
    1
    Views:
    659
    Joe Smith
    Aug 6, 2004
  2. =?Utf-8?B?SXJ3YW5zeWFo?=
    Replies:
    4
    Views:
    2,437
    =?Utf-8?B?SXJ3YW5zeWFo?=
    Oct 30, 2007
  3. Alan
    Replies:
    9
    Views:
    498
    Andrew Thompson
    Nov 12, 2007
  4. duane
    Replies:
    1
    Views:
    82
    Janwillem Borleffs
    Sep 18, 2003
  5. Steve (another one)

    finding every combination of n values at y positions

    Steve (another one), Jul 2, 2004, in forum: Perl Misc
    Replies:
    4
    Views:
    92
    Anno Siegel
    Jul 5, 2004
Loading...

Share This Page