Dynamic directory handles?

Discussion in 'Perl Misc' started by IanW, Dec 22, 2005.

  1. IanW

    IanW Guest

    I have a chunk of code that counts files.dirs and size for a directory tree.
    It goes like this:

    ================
    #use strict;

    my $basedir = "j:/files/sw-test";
    my $fcount = 0;
    my $fsize = 0;
    my $dcount = 0;
    dircount();

    sub dircount {
    my($cdir) = shift;
    $cdir .= "/" if $cdir ne "";
    my $dh = "DH" . length($cdir);
    opendir($dh,"$basedir/$cdir");
    while(my $fl = readdir($dh)){
    next if $fl =~ /^\.{1,2}$/;
    if(-d "$basedir/$cdir$fl"){
    $dcount++;
    dircount("$cdir$fl");
    }
    else{
    $fcount++;
    $fsize += -s "$basedir/$cdir$fl";
    }
    }
    close($dh);
    }

    print "$fcount files and $dcount directories totalling $fsize bytes in
    size";
    ================

    If I "use strict" it says "Can't use string ("DH0") as a symbol ref while
    "strict refs" in use at D:\test.pl line 14". What's the best way to get
    round this, since I need a dynamic dir handle for the routine to work
    properly.

    Thanks
    Ian
     
    IanW, Dec 22, 2005
    #1
    1. Advertising

  2. "IanW" <> wrote in
    news:doe5kj$ms$:

    > I have a chunk of code that counts files.dirs


    First off, you are better off doing this using the File::Find module
    rather than using recursion. If this is not a learning exercise, then I
    would also urge you to look at File::Find::Rule to simplify matters when
    processing items one by one.

    > ================
    > #use strict;
    >
    > my $basedir = "j:/files/sw-test";


    The program should check @ARGV for an argument, and supply a reasonable
    default if one is not present.

    > my $fcount = 0;
    > my $fsize = 0;
    > my $dcount = 0;


    These are values that should be returned by your sub. You might want to
    check for calling context in the sub and supply an appropriate scalar
    value.

    > dircount();
    >
    > sub dircount {
    > my($cdir) = shift;
    > $cdir .= "/" if $cdir ne "";
    > my $dh = "DH" . length($cdir);


    Using lexical dirhandles, this should not be necessary.

    > opendir($dh,"$basedir/$cdir");


    You should *always* check if calls to open/opendir succeeded.

    > while(my $fl = readdir($dh)){
    > next if $fl =~ /^\.{1,2}$/;
    > if(-d "$basedir/$cdir$fl"){
    > $dcount++;
    > dircount("$cdir$fl");
    > }


    I do prefer using File::Spec for path manipulation.

    > If I "use strict" it says "Can't use string ("DH0") as a symbol ref
    > while "strict refs" in use at D:\test.pl line 14". What's the best way
    > to get round this, since I need a dynamic dir handle for the routine
    > to work properly.


    Here is a revised version of your code. I cannot vouch for its accuracy,
    since there seems to be discrepancy between the output of this script
    and that of du on my system (probably because the space taken by zero-
    length files is not taken to account). As always, corrections welcome:

    #!/usr/bin/perl

    use strict;
    use warnings;

    use File::Spec::Functions qw(canonpath catfile);

    my $basedir = canonpath($ARGV[0] || '.');
    my ($fcount, $dcount, $fsize) = dircount($basedir);


    printf("%d files and %d directories totalling %d bytes in size\n",
    $fcount, $dcount, $fsize);

    sub dircount {
    my ($cdir) = @_;
    my ($fcount, $dcount, $fsize) = (0, 0, 0);
    my $path = catfile($basedir, $cdir);
    opendir my $dh, $path or die "Cannot open directory '$path': $!";
    while (my $fl = readdir($dh)){
    next if $fl eq '.' or $fl eq '..';
    if(-d (my $d = catfile($path, $fl))){
    $dcount++;
    my ($fc, $dc, $fs) = dircount($d);
    $fcount += $fc;
    $dcount += $dc;
    $fsize += $fs;
    } else {
    $fcount++;
    $fsize += -s $d;
    }
    }
    close($dh);
    return ($fcount, $dcount, $fsize);
    }





    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Dec 22, 2005
    #2
    1. Advertising

  3. IanW

    Anno Siegel Guest

    IanW <> wrote in comp.lang.perl.misc:
    > I have a chunk of code that counts files.dirs and size for a directory tree.
    > It goes like this:
    >
    > ================
    > #use strict;
    >
    > my $basedir = "j:/files/sw-test";
    > my $fcount = 0;
    > my $fsize = 0;
    > my $dcount = 0;
    > dircount();
    >
    > sub dircount {
    > my($cdir) = shift;
    > $cdir .= "/" if $cdir ne "";
    > my $dh = "DH" . length($cdir);
    > opendir($dh,"$basedir/$cdir");
    > while(my $fl = readdir($dh)){
    > next if $fl =~ /^\.{1,2}$/;
    > if(-d "$basedir/$cdir$fl"){
    > $dcount++;
    > dircount("$cdir$fl");
    > }
    > else{
    > $fcount++;
    > $fsize += -s "$basedir/$cdir$fl";
    > }
    > }
    > close($dh);
    > }
    >
    > print "$fcount files and $dcount directories totalling $fsize bytes in
    > size";
    > ================
    >
    > If I "use strict" it says "Can't use string ("DH0") as a symbol ref while
    > "strict refs" in use at D:\test.pl line 14". What's the best way to get
    > round this, since I need a dynamic dir handle for the routine to work
    > properly.


    Just leave $dh undefined instead of setting it to a string value.
    opendir() will then create an anonymous directory handle. So change

    my $dh = "DH" . length($cdir);

    to

    my $dh;

    Also, you call dircount() without an argument. Presumably you wanted
    to say

    dircount( $basedir);

    A better solution would be to use the standard module File::Find:

    use File::Find;

    sub dircount {
    my $cdir = shift;
    find sub {
    if ( -d ) {
    ++ $dcount;
    } else {
    ++ $fcount;
    $fsize += -s;
    }
    }, $cdir;
    }

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Dec 22, 2005
    #3
  4. IanW <> wrote:

    > I have a chunk of code that counts files.dirs and size for a directory tree.



    You have a whole bunch of problems, some big, some small.

    I'll mention the lesser ones in comments about your code below,
    but the 3 big ones are:

    1) You should always enable warnings (and strict) when
    developing Perl code.

    2) You get a dynamic dirhandle the same way you get a dynamic
    filehandle, so:

    perldoc -q filehandle

    How can I make a filehandle local to a subroutine? How do I pass file-
    handles between subroutines? How do I make an array of filehandles?

    3) There is an already-invented (and tested) wheel for doing
    recursive directory searching, the File::Find module.

    You can read the module's docs with:

    perldoc File::Find


    >================
    > #use strict;



    You lose all of the benefits of that statement when you comment it out!


    > my $basedir = "j:/files/sw-test";
    > my $fcount = 0;
    > my $fsize = 0;
    > my $dcount = 0;
    > dircount();
    >
    > sub dircount {
    > my($cdir) = shift;



    $cdir will be undef for the top-level call.


    > $cdir .= "/" if $cdir ne "";



    You have all of the path components separated already, so I'd
    paste the dir separator in myself on each usage instead of
    burying one inside of a variable's value.


    > my $dh = "DH" . length($cdir);
    > opendir($dh,"$basedir/$cdir");



    You get a dynamic dirhandle when the variable is undef.

    Your variable is not undef, so there is no dynamic dirhandle...


    You should always check the return value to see if you actually
    got what you asked for:

    opendir($dh,"$basedir/$cdir") or die "could not open '$basedir/$cdir' $!";


    > while(my $fl = readdir($dh)){
    > next if $fl =~ /^\.{1,2}$/;
    > if(-d "$basedir/$cdir$fl"){
    > $dcount++;
    > dircount("$cdir$fl");



    If $fl is a symlink to a "higher" directory, then your
    code will go into an infinite loop here.



    Applying the minimum changes to fix (IMO) your code, I get:

    ------------------------
    #!/usr/bin/perl
    use warnings;
    use strict;

    my $basedir = '/home/tadmc/temp';
    my $fcount = 0;
    my $fsize = 0;
    my $dcount = 0;
    dircount();

    sub dircount {
    my($cdir) = shift || '';
    opendir(my $dh,"$basedir/$cdir") or die "could not open dir $!";
    while(my $fl = readdir($dh)){
    next if $fl =~ /^\.{1,2}$/;
    if(-d "$basedir/$cdir/$fl"){
    $dcount++;
    dircount("$cdir/$fl");
    }
    else{
    $fcount++;
    $fsize += -s "$basedir/$cdir/$fl";
    }
    }
    close($dh);
    }

    print "$fcount files and $dcount directories totalling $fsize bytes in size\n";
    ------------------------



    Recasting it to use the tried-and-true module, I get:

    ------------------------
    #!/usr/bin/perl
    use warnings;
    use strict;
    use File::Find;

    my $basedir = '/home/tadmc/temp';
    my $fcount = 0;
    my $fsize = 0;
    my $dcount = 0;
    find( \&dircount, $basedir );

    sub dircount {
    return if $_ eq '.' or $_ eq '..';
    $dcount++ if -d;
    return unless -f; # only care about plain files at this point
    $fcount++;
    $fsize += -s;
    }

    print "$fcount files and $dcount directories totalling $fsize bytes in size\n";
    ------------------------


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Dec 22, 2005
    #4
  5. IanW

    IanW Guest

    "A. Sinan Unur" <> wrote in message
    news:Xns97344FEAA98EDasu1cornelledu@127.0.0.1...

    > First off, you are better off doing this using the File::Find module
    > rather than using recursion. If this is not a learning exercise, then I
    > would also urge you to look at File::Find::Rule to simplify matters when
    > processing items one by one.
    >
    >> ================
    >> #use strict;
    >>
    >> my $basedir = "j:/files/sw-test";

    >
    > The program should check @ARGV for an argument, and supply a reasonable
    > default if one is not present.


    It will actually be a subroutine that forms part of a larger CGI script and
    the basedir will actually be passed from a form field.

    >> my $fcount = 0;
    >> my $fsize = 0;
    >> my $dcount = 0;

    >
    > These are values that should be returned by your sub. You might want to
    > check for calling context in the sub and supply an appropriate scalar
    > value.


    I see the way you've done it in the modified code below, however I didn't
    think there was anything wrong with a few global scope vars as long as you
    don't forget you've used them globally and then try and use the same names
    in another unrelated part of the script... but it's not a huge script and I
    can keep track of those things easily enough.

    >> dircount();
    >>
    >> sub dircount {
    >> my($cdir) = shift;
    >> $cdir .= "/" if $cdir ne "";
    >> my $dh = "DH" . length($cdir);

    >
    > Using lexical dirhandles, this should not be necessary.


    lexical is one of those words that I've never got my head round in
    programming terms, but I see in the example you've not given $dh a value,
    which ties in with what Anno says about opendir creating an anonymous
    handle.

    >> opendir($dh,"$basedir/$cdir");

    >
    > You should *always* check if calls to open/opendir succeeded.


    Must admit I get a bit lazy in CGI scripts with that, because to be
    user-friendly, it means more than just adding "die..." bit to the end of the
    open line. I've also never come across a directory or file that wouldn't
    open on any of my scripts..

    >> while(my $fl = readdir($dh)){
    >> next if $fl =~ /^\.{1,2}$/;
    >> if(-d "$basedir/$cdir$fl"){
    >> $dcount++;
    >> dircount("$cdir$fl");
    >> }

    >
    > I do prefer using File::Spec for path manipulation.
    >
    >> If I "use strict" it says "Can't use string ("DH0") as a symbol ref
    >> while "strict refs" in use at D:\test.pl line 14". What's the best way
    >> to get round this, since I need a dynamic dir handle for the routine
    >> to work properly.

    >
    > Here is a revised version of your code. I cannot vouch for its accuracy,
    > since there seems to be discrepancy between the output of this script
    > and that of du on my system (probably because the space taken by zero-
    > length files is not taken to account). As always, corrections welcome:
    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > use File::Spec::Functions qw(canonpath catfile);
    >
    > my $basedir = canonpath($ARGV[0] || '.');
    > my ($fcount, $dcount, $fsize) = dircount($basedir);
    >
    >
    > printf("%d files and %d directories totalling %d bytes in size\n",
    > $fcount, $dcount, $fsize);
    >
    > sub dircount {
    > my ($cdir) = @_;
    > my ($fcount, $dcount, $fsize) = (0, 0, 0);


    ahh yes of course :) I was thinking of sth along those lines, though thought
    I might have to use qw//

    > my $path = catfile($basedir, $cdir);
    > opendir my $dh, $path or die "Cannot open directory '$path': $!";
    > while (my $fl = readdir($dh)){
    > next if $fl eq '.' or $fl eq '..';


    is there any reason for doing it that way over my original line using a
    regexp? is it a performance thing?

    > if(-d (my $d = catfile($path, $fl))){
    > $dcount++;
    > my ($fc, $dc, $fs) = dircount($d);
    > $fcount += $fc;
    > $dcount += $dc;
    > $fsize += $fs;


    Would the following work, a a shortened version of those 3 lines?

    ($fcount, $dcount, $fsize) += ($fc, $dc, $fs);

    > } else {
    > $fcount++;
    > $fsize += -s $d;
    > }
    > }
    > close($dh);
    > return ($fcount, $dcount, $fsize);
    > }


    thanks :)

    Ian
     
    IanW, Dec 22, 2005
    #5
  6. IanW

    IanW Guest

    > Just leave $dh undefined instead of setting it to a string value.
    > opendir() will then create an anonymous directory handle. So change
    >
    > my $dh = "DH" . length($cdir);
    >
    > to
    >
    > my $dh;


    I like that solution :)

    > Also, you call dircount() without an argument. Presumably you wanted
    > to say
    >
    > dircount( $basedir);


    well, $basedir is a global var and I put it in all the places it's needed
    anyway

    > A better solution would be to use the standard module File::Find:
    >
    > use File::Find;
    >
    > sub dircount {
    > my $cdir = shift;
    > find sub {
    > if ( -d ) {
    > ++ $dcount;
    > } else {
    > ++ $fcount;
    > $fsize += -s;
    > }
    > }, $cdir;
    > }


    that's very concise, thanks! I looked at the File:Find module docs before
    but the document made my eyes glaze over. I suppose it's one of those
    modules that really useful once you've taken teh time to plow through the
    docs and understand it properly.

    Regards
    Ian
     
    IanW, Dec 22, 2005
    #6
  7. IanW

    IanW Guest

    "Tad McClellan" <> wrote in message
    news:...
    > IanW <> wrote:
    >
    >> I have a chunk of code that counts files.dirs and size for a directory
    >> tree.

    >
    >
    > You have a whole bunch of problems, some big, some small.


    oh dear.. I was worried that might happen! ;)

    > I'll mention the lesser ones in comments about your code below,
    > but the 3 big ones are:
    >
    > 1) You should always enable warnings (and strict) when
    > developing Perl code.
    >
    > 2) You get a dynamic dirhandle the same way you get a dynamic
    > filehandle, so:
    >
    > perldoc -q filehandle
    >
    > How can I make a filehandle local to a subroutine? How do I pass
    > file-
    > handles between subroutines? How do I make an array of filehandles?
    >
    > 3) There is an already-invented (and tested) wheel for doing
    > recursive directory searching, the File::Find module.


    The only thing that sometimes puts me off using modules for relatively
    simple things like this, is that I wonder how much extra resources they use
    or whether they compromise performance in some way. That is, File: Find must
    be quite a sizable module with a stack of function/options, so couldn't that
    mean lots more memory to run, or is that an incorrect presumption?

    > You can read the module's docs with:
    >
    > perldoc File::Find
    >
    >
    >>================
    >> #use strict;

    >
    >
    > You lose all of the benefits of that statement when you comment it out!


    yes, I know - I had it commented out to double check that the script worked
    without use strict.

    >> my $basedir = "j:/files/sw-test";
    >> my $fcount = 0;
    >> my $fsize = 0;
    >> my $dcount = 0;
    >> dircount();
    >>
    >> sub dircount {
    >> my($cdir) = shift;

    >
    >
    > $cdir will be undef for the top-level call.
    >
    >
    >> $cdir .= "/" if $cdir ne "";

    >
    >
    > You have all of the path components separated already, so I'd
    > paste the dir separator in myself on each usage instead of
    > burying one inside of a variable's value.


    there was a reason I did that, but I can't recall what it was now (bear with
    me - it's nearly home-time and my brain is frazzled!)

    >> my $dh = "DH" . length($cdir);
    >> opendir($dh,"$basedir/$cdir");

    >
    >
    > You get a dynamic dirhandle when the variable is undef.
    >
    > Your variable is not undef, so there is no dynamic dirhandle...
    >
    >
    > You should always check the return value to see if you actually
    > got what you asked for:
    >
    > opendir($dh,"$basedir/$cdir") or die "could not open '$basedir/$cdir'
    > $!";
    >
    >
    >> while(my $fl = readdir($dh)){
    >> next if $fl =~ /^\.{1,2}$/;
    >> if(-d "$basedir/$cdir$fl"){
    >> $dcount++;
    >> dircount("$cdir$fl");

    >
    >
    > If $fl is a symlink to a "higher" directory, then your
    > code will go into an infinite loop here.


    it's a script that will only run on my Windows servers, so that wasn't an
    issue

    > Applying the minimum changes to fix (IMO) your code, I get:
    >
    > ------------------------
    > #!/usr/bin/perl
    > use warnings;
    > use strict;
    >
    > my $basedir = '/home/tadmc/temp';
    > my $fcount = 0;
    > my $fsize = 0;
    > my $dcount = 0;
    > dircount();
    >
    > sub dircount {
    > my($cdir) = shift || '';


    that's a neat way of avoiding getting a warning (yes, I did have use
    warnings in there for a while :).. is there any particular reason you use
    single quotes there instead of double quotes? I tend to use "" for pretty
    much everything. Also, I don't ever seem to use "||" - "or" would work as
    well in that scenario wouldn't it?

    > opendir(my $dh,"$basedir/$cdir") or die "could not open dir $!";
    > while(my $fl = readdir($dh)){
    > next if $fl =~ /^\.{1,2}$/;
    > if(-d "$basedir/$cdir/$fl"){
    > $dcount++;
    > dircount("$cdir/$fl");
    > }
    > else{
    > $fcount++;
    > $fsize += -s "$basedir/$cdir/$fl";
    > }
    > }
    > close($dh);
    > }
    >
    > print "$fcount files and $dcount directories totalling $fsize bytes in
    > size\n";
    > ------------------------
    >
    >
    >
    > Recasting it to use the tried-and-true module, I get:
    >
    > ------------------------
    > #!/usr/bin/perl
    > use warnings;
    > use strict;
    > use File::Find;
    >
    > my $basedir = '/home/tadmc/temp';
    > my $fcount = 0;
    > my $fsize = 0;
    > my $dcount = 0;
    > find( \&dircount, $basedir );
    >
    > sub dircount {
    > return if $_ eq '.' or $_ eq '..';
    > $dcount++ if -d;
    > return unless -f; # only care about plain files at this point
    > $fcount++;
    > $fsize += -s;
    > }
    >
    > print "$fcount files and $dcount directories totalling $fsize bytes in
    > size\n";
    > ------------------------


    thanks
    Ian
     
    IanW, Dec 22, 2005
    #7
  8. At 2005-12-22 12:32PM, IanW <> wrote:
    > "Tad McClellan" <> wrote in message
    > news:...
    >
    > >> #use strict;

    > >
    > > You lose all of the benefits of that statement when you comment it out!

    >
    > yes, I know - I had it commented out to double check that the script worked
    > without use strict.


    If your code runs with strict, it will certainly run without.

    [...]
    > > my($cdir) = shift || '';

    >
    > that's a neat way of avoiding getting a warning (yes, I did have use
    > warnings in there for a while :).. is there any particular reason you use
    > single quotes there instead of double quotes? I tend to use "" for pretty
    > much everything. Also, I don't ever seem to use "||" - "or" would work as
    > well in that scenario wouldn't it?


    I use single quotes to remind myself (and perl) that I have a literal
    string that needs no interpolation.

    '||' and 'or' have different operator precedences. Note also that '||'
    has higher precendence than '=' which is higher than 'or'. So,
    my($cdir) = shift || '';
    is the same as
    my($cdir) = (shift || '');

    Test:
    $x = undef || 'alternate';
    print '$x is ', (defined $x ? "'$x'" : 'undefined!'), "\n";

    Conversly,
    my($cdir) = shift or '';
    is the same as
    ( my($cdir) = shift ) or '';
    and thus $cdir may still be undefined.

    Test:
    $y = undef or 'alternate';
    print '$y is ', (defined $y ? "'$y'" : 'undefined!'), "\n";

    Another way of proving default values is the '||=' operator, as in:
    my $cdir = shift;
    $cdir ||= ''; # set cdir to the empty string if previously undefined.

    --
    Glenn Jackman
    Ulterior Designer
     
    Glenn Jackman, Dec 22, 2005
    #8
  9. IanW

    Tintin Guest

    "IanW" <> wrote in message
    news:doenus$765$...
    > > 3) There is an already-invented (and tested) wheel for doing
    > > recursive directory searching, the File::Find module.

    >
    > The only thing that sometimes puts me off using modules for relatively
    > simple things like this, is that I wonder how much extra resources they

    use
    > or whether they compromise performance in some way. That is, File: Find

    must
    > be quite a sizable module with a stack of function/options, so couldn't

    that
    > mean lots more memory to run, or is that an incorrect presumption?


    The "overhead" of File::Find is absolutely miniscule. Wouldn't you rather
    write efficient code using tried and true
    methods, rather than the "overhead" of hacking your own code?
     
    Tintin, Dec 22, 2005
    #9
  10. IanW

    Paul Lalli Guest

    IanW wrote:
    > "A. Sinan Unur" <> wrote in message
    > news:Xns97344FEAA98EDasu1cornelledu@127.0.0.1...
    > >> my $fcount = 0;
    > >> my $fsize = 0;
    > >> my $dcount = 0;

    > >
    > > These are values that should be returned by your sub. You might want to
    > > check for calling context in the sub and supply an appropriate scalar
    > > value.

    >
    > I see the way you've done it in the modified code below, however I didn't
    > think there was anything wrong with a few global scope vars as long as you
    > don't forget you've used them globally and then try and use the same names
    > in another unrelated part of the script... but it's not a huge script and I
    > can keep track of those things easily enough.


    You've just listed two conditionals that aren't especially guaranteed,
    and given the proviso that your reasoning is only valid if the script's
    size remains as it is now. This paragraph sounds a lot more like an
    argument *against* doing it the way you did rather than *for*.

    I don't quite get the reasoning behind using poor programming practices
    for "quick and dirty" scripts. Why not just do things the "right" way
    each time? Programming definately involves developing habbits. It's
    much better, in my opinion, to use short scripts to develop *good*
    programming habbits.

    > >> dircount();
    > >>
    > >> sub dircount {
    > >> my($cdir) = shift;
    > >> $cdir .= "/" if $cdir ne "";
    > >> my $dh = "DH" . length($cdir);

    > >
    > > Using lexical dirhandles, this should not be necessary.

    >
    > lexical is one of those words that I've never got my head round in
    > programming terms,


    Lexical, at least as far as Perl is concerned at any rate, simply means
    "scope exists only in the physical block in which it was declared". If
    a variable is declared within a block, it is visible only in that
    block, regardless of any other subroutines or control paths called from
    within that block. (Contrast with dynamic scope (such as with local)
    in which the scope of the temporary value extends to any subroutines
    called from within the same block as the declaration).

    > Must admit I get a bit lazy in CGI scripts with that, because to be
    > user-friendly, it means more than just adding "die..." bit to the end of the
    > open line.


    Er, does that mean it's more user friendly to let the program attempt
    to read from or write to a possibly-closed filehandle? ;-)

    > I've also never come across a directory or file that wouldn't
    > open on any of my scripts..


    Again, this goes back to developing the right kinds of habbits. Just
    because you haven't encountered an error yet is no reason not to guard
    against that error in the future.

    > > while (my $fl = readdir($dh)){
    > > next if $fl eq '.' or $fl eq '..';

    >
    > is there any reason for doing it that way over my original line using a
    > regexp? is it a performance thing?


    Performance aside, I think this is more readable than the regexp
    equivalent. However. . .

    #!/usr/bin/perl
    use strict;
    use warnings;
    use Benchmark qw/cmpthese/;

    sub re {
    my @files;
    opendir my $dh, '.' or die "Cannot open current directory: $!";
    while (my $file = readdir ($dh)){
    next if $file =~ /^\.{1,2}$/;
    push @files, $file;
    }
    }

    sub eqor {
    my @files;
    opendir my $dh, '.' or die "Cannot open current directory: $!";
    while (my $file = readdir ($dh)){
    next if $file eq '.' or $file eq '..';
    push @files, $file;
    }
    }

    cmpthese(10000, {Regexp=>\&re, Equality=>\&eqor} );
    __END__
    Benchmark: timing 10000 iterations of Equality, Regexp...
    Equality: 15 wallclock secs (12.72 usr + 2.44 sys = 15.16 CPU) @
    659.63/s (n=10000)
    Regexp: 16 wallclock secs (13.20 usr + 2.64 sys = 15.84 CPU) @
    631.31/s (n=10000)
    Rate Regexp Equality
    Regexp 631/s -- -4%
    Equality 660/s 4% --

    Obviously, a rather miniscule benefit...

    >
    > > if(-d (my $d = catfile($path, $fl))){
    > > $dcount++;
    > > my ($fc, $dc, $fs) = dircount($d);
    > > $fcount += $fc;
    > > $dcount += $dc;
    > > $fsize += $fs;

    >
    > Would the following work, a a shortened version of those 3 lines?
    >
    > ($fcount, $dcount, $fsize) += ($fc, $dc, $fs);


    Why ask if something would work? Why not try it for yourself and see?

    (The answer is "no", however. += expects a scalar on each side. Read
    perldoc perlop to see what the comma operator does in scalar context,
    and see if you can use that to predict the results).

    For syntax similar to what you'd like that to do, check out the
    pairwise() function in the List::MoreUtils module from CPAN

    Paul Lalli
     
    Paul Lalli, Dec 22, 2005
    #10
  11. IanW

    Paul Lalli Guest

    IanW wrote:
    > > use File::Find;
    > >
    > > sub dircount {
    > > my $cdir = shift;
    > > find sub {
    > > if ( -d ) {
    > > ++ $dcount;
    > > } else {
    > > ++ $fcount;
    > > $fsize += -s;
    > > }
    > > }, $cdir;
    > > }

    >
    > that's very concise, thanks! I looked at the File:Find module docs before
    > but the document made my eyes glaze over. I suppose it's one of those
    > modules that really useful once you've taken teh time to plow through the
    > docs and understand it properly.


    File::Find is one of those modules that looks a lot more complicated
    than it is. You only really need to know 3 things to use it:
    (1) It exports one function, find(), which takes a subroutine and a
    list of directories to recurse.
    (2) find() will recurse each of the directories, calling that
    subroutine once for each and every file and directory found in the list
    of directories you provided
    (3) Within that subroutine, $_ is the name of current file it's looking
    at, $File::Find::name is the full path of that file, and
    $File::Find::dir is the directory containing that file.

    Once you've got those three facts set straight, you just have to write
    the subroutine that you want called for each file. The subroutine
    should do whatever manipulations or storing you want to happen.

    (There is also a CPAN module, File::Find::Rule which supposedly makes
    File::Find easier to use and/or comprehend, but I can't say I
    personally have ever found it particularly necessary).

    Paul Lalli
     
    Paul Lalli, Dec 22, 2005
    #11
  12. IanW

    Paul Lalli Guest

    IanW wrote:
    > "Tad McClellan" <> wrote in message
    > news:...
    > > IanW <> wrote:
    > > 3) There is an already-invented (and tested) wheel for doing
    > > recursive directory searching, the File::Find module.

    >
    > The only thing that sometimes puts me off using modules for relatively
    > simple things like this, is that I wonder how much extra resources they use
    > or whether they compromise performance in some way. That is, File: Find must
    > be quite a sizable module with a stack of function/options, so couldn't that
    > mean lots more memory to run, or is that an incorrect presumption?


    Why presume anything? There exist tools to determine this sort of
    thing. Checkout the Benchmark and Dprof modules, and make the actual
    comparisons. My guess (because I haven't written the comparisons
    myself) is that the "overhead" of using the File::Find module will be
    miniscule in comparison to the overhead of writing new, possibly buggy
    code that you must maintain.

    > >>================
    > >> #use strict;

    > >
    > > You lose all of the benefits of that statement when you comment it out!

    >
    > yes, I know - I had it commented out to double check that the script worked
    > without use strict.


    Er. You have a misunderstanding about use strict. If code works with
    use strict, by definition it will work without it. (The inverse,
    however, is completely false).

    use strict; does three things:
    1) Prevents you from using a package variable without fully qualifying
    it (saying $main::foo rather than $foo), or pre-declaring it with our.
    Since your strict-compliant code is obviously not doing that, removing
    that restriction can't hurt you.
    2) Prevents you from using symbolic references ($x = 'foo'; $$x =
    'bar'; sets $foo equal to 'bar'). Again, removing the restriction
    against something you can't be doing can't have any effect.
    3) Prevents you from using barewords as a string ($x = Hello; will set
    $x to 'Hello' if no &Hello subroutine exists).

    So strictures are really just restrictions agains relatively unsafe
    programming practices. By removing the restrictions, you lose the
    checks that you're not doing anything unsafe, but you don't change
    anything about the code you've already written.


    > > sub dircount {
    > > my($cdir) = shift || '';

    >
    > that's a neat way of avoiding getting a warning (yes, I did have use
    > warnings in there for a while :).. is there any particular reason you use
    > single quotes there instead of double quotes? I tend to use "" for pretty
    > much everything.


    Single quotes tell the interpreter and the reader that nothing in the
    enclosed string needs to be interpreted. This creates (1) a very
    miniscule performance boost, and (2) a non-trivial readability boost.
    Double quotes, conversely, serve as a visual clue to the reader that
    there is a variable or escape sequence within the enclosed string that
    should be taken note of.

    > Also, I don't ever seem to use "||" - "or" would work as
    > well in that scenario wouldn't it?


    No. || and or are *functionally* equivalent, but differ by precedence.
    Observe:
    $ perl -MO=Deparse,-p -e'my $foo = shift || default()'
    (my $foo = (shift(@ARGV) || default()));
    -e syntax OK
    $ perl -MO=Deparse,-p -e'my $foo = shift or default()'
    ((my $foo = shift(@ARGV)) or default());
    -e syntax OK

    As you can see, the first one assigns $foo to the return value of the
    expression (shift @ARGV || default()). The second one assigns $foo to
    the return value of (shift @ARGV). If that assignment produced a false
    value, default() is then evaluated, but its return value is *not*
    assigned to anything. $foo still has whatever shift returned.

    Paul Lalli
     
    Paul Lalli, Dec 22, 2005
    #12
  13. IanW <> wrote:


    > Must admit I get a bit lazy in CGI scripts



    So CGI programming is a hobby for you rather than a profession?

    Being lazy at your job is Not Good. :)


    > I've also never come across a directory or file that wouldn't
    > open on any of my scripts..



    I've never been in a car accident, so I don't need seat belts. Right?


    > I was thinking of sth along those lines



    s/sth/something/;

    Please don't use "cutsie" spellings in Usenet posts.

    It is inconsiderate of folks whose first language is not English.


    >> next if $fl eq '.' or $fl eq '..';

    >
    > is there any reason for doing it that way over my original line using a
    > regexp?



    Yes, the same reason that you should be applying to all the code
    you write: it is easier to read and understand.

    Optimize for labor, optimize for labor, optimize for labor.


    > is it a performance thing?



    Yes, your maintenance programmer will perform better.

    (and it will execute faster, but that is almost never a
    valid consideration in this day and age.
    )


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Dec 23, 2005
    #13
  14. IanW <> wrote:
    > "Tad McClellan" <> wrote in message
    > news:...



    >> 3) There is an already-invented (and tested) wheel for doing
    >> recursive directory searching, the File::Find module.

    >
    > The only thing that sometimes puts me off using modules for relatively
    > simple things like this, is that I wonder how much extra resources they use
    > or whether they compromise performance in some way.



    Cost to spend extra CPU cycles: $0.000001

    Cost to develop code that saves those cycles: $1000.00

    Your program will have to execute an awfully large number of
    times for your approach to be economical.

    Having a room full of programmers working on shaving off a few
    cycles or bytes was commonplace in the '70s, but nowadays cycles
    are cheap, RAM is cheap, what is expensive is your salary.

    (though payday may make you argue that you are not expensive enough. :)


    > That is, File: Find must
    > be quite a sizable module with a stack of function/options, so couldn't that
    > mean lots more memory to run,



    Does your application have to run on a cell phone or some other
    place where RAM costs a premium?


    > or is that an incorrect presumption?



    It was correct 30 years ago, but it has changed due to Moore's Law.


    >> If $fl is a symlink to a "higher" directory, then your
    >> code will go into an infinite loop here.

    >
    > it's a script that will only run on my Windows servers, so that wasn't an
    > issue



    Windows does not have symbolic links?


    > is there any particular reason you use
    > single quotes there instead of double quotes?



    Yes, I use single quotes unless I require one of the two extra
    things (escapes and interpolation) that double quotes brings with it.


    > I tend to use "" for pretty
    > much everything.



    Some strings contain variables, some strings don't.

    During debugging, you are very often looking for variables.

    You have to examine _every_ string, looking for variables.

    I get to skip careful examination of many strings, because
    they have been marked "no variables here" by the single quotes.

    ie. it enables faster debugging.


    > Also, I don't ever seem to use "||" - "or" would work as
    > well in that scenario wouldn't it?



    What happened when you tried it?


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Dec 23, 2005
    #14
  15. IanW

    IanW Guest

    "Tad McClellan" <> wrote in message
    news:...
    > IanW <> wrote:
    >> I was thinking of sth along those lines

    >
    > s/sth/something/;
    >
    > Please don't use "cutsie" spellings in Usenet posts.
    >
    > It is inconsiderate of folks whose first language is not English.


    As is using the word "cutsie", which I always thought was spelt "cutesy",
    and since such folks may have to use a dictionary for a word of less
    frequent usage like that, it would help them if you spell it correctly ;-)

    >>> next if $fl eq '.' or $fl eq '..';

    >>
    >> is there any reason for doing it that way over my original line using a
    >> regexp?

    >
    >
    > Yes, the same reason that you should be applying to all the code
    > you write: it is easier to read and understand.


    I find a short regexp like that just as easy to understand as the non-regexp
    version

    > Optimize for labor, optimize for labor, optimize for labor.
    >
    >
    >> is it a performance thing?

    >
    >
    > Yes, your maintenance programmer will perform better.


    hehe, that's me in this case

    Ian
     
    IanW, Dec 23, 2005
    #15
  16. IanW

    IanW Guest

    "Paul Lalli" <> wrote in message
    news:...
    > IanW wrote:
    >> I see the way you've done it in the modified code below, however I didn't
    >> think there was anything wrong with a few global scope vars as long as
    >> you
    >> don't forget you've used them globally and then try and use the same
    >> names
    >> in another unrelated part of the script... but it's not a huge script and
    >> I
    >> can keep track of those things easily enough.

    >
    > You've just listed two conditionals that aren't especially guaranteed,
    > and given the proviso that your reasoning is only valid if the script's
    > size remains as it is now. This paragraph sounds a lot more like an
    > argument *against* doing it the way you did rather than *for*.
    >
    > I don't quite get the reasoning behind using poor programming practices
    > for "quick and dirty" scripts. Why not just do things the "right" way
    > each time? Programming definately involves developing habbits. It's
    > much better, in my opinion, to use short scripts to develop *good*
    > programming habbits.


    OK fair enough, I'll work on that habit!

    >> > Using lexical dirhandles, this should not be necessary.

    >>
    >> lexical is one of those words that I've never got my head round in
    >> programming terms,

    >
    > Lexical, at least as far as Perl is concerned at any rate, simply means
    > "scope exists only in the physical block in which it was declared". If


    Oh, I see.. anything declared with "my" then...

    > cmpthese(10000, {Regexp=>\&re, Equality=>\&eqor} );
    > __END__
    > Benchmark: timing 10000 iterations of Equality, Regexp...
    > Equality: 15 wallclock secs (12.72 usr + 2.44 sys = 15.16 CPU) @
    > 659.63/s (n=10000)
    > Regexp: 16 wallclock secs (13.20 usr + 2.64 sys = 15.84 CPU) @
    > 631.31/s (n=10000)
    > Rate Regexp Equality
    > Regexp 631/s -- -4%
    > Equality 660/s 4% --
    >
    > Obviously, a rather miniscule benefit...


    thanks, negligable indeed. that cmpthese function looks useful :)

    >> > if(-d (my $d = catfile($path, $fl))){
    >> > $dcount++;
    >> > my ($fc, $dc, $fs) = dircount($d);
    >> > $fcount += $fc;
    >> > $dcount += $dc;
    >> > $fsize += $fs;

    >>
    >> Would the following work, a a shortened version of those 3 lines?
    >>
    >> ($fcount, $dcount, $fsize) += ($fc, $dc, $fs);

    >
    > Why ask if something would work? Why not try it for yourself and see?
    >
    > (The answer is "no", however. += expects a scalar on each side. Read
    > perldoc perlop to see what the comma operator does in scalar context,
    > and see if you can use that to predict the results).
    >
    > For syntax similar to what you'd like that to do, check out the
    > pairwise() function in the List::MoreUtils module from CPAN


    I suppose there's always the obvious:

    ($fcount, $dcount, $fsize) = ($fcount+$fc, $dcount+$dc, $fsize+$fs);

    but it's more typing!

    Ian
     
    IanW, Dec 23, 2005
    #16
  17. IanW

    IanW Guest

    "Paul Lalli" <> wrote in message
    news:...
    > IanW wrote:
    >> > use File::Find;
    >> >
    >> > sub dircount {
    >> > my $cdir = shift;
    >> > find sub {
    >> > if ( -d ) {
    >> > ++ $dcount;
    >> > } else {
    >> > ++ $fcount;
    >> > $fsize += -s;
    >> > }
    >> > }, $cdir;
    >> > }

    >>
    >> that's very concise, thanks! I looked at the File:Find module docs before
    >> but the document made my eyes glaze over. I suppose it's one of those
    >> modules that really useful once you've taken teh time to plow through the
    >> docs and understand it properly.

    >
    > File::Find is one of those modules that looks a lot more complicated
    > than it is. You only really need to know 3 things to use it:
    > (1) It exports one function, find(), which takes a subroutine and a
    > list of directories to recurse.
    > (2) find() will recurse each of the directories, calling that
    > subroutine once for each and every file and directory found in the list
    > of directories you provided
    > (3) Within that subroutine, $_ is the name of current file it's looking
    > at, $File::Find::name is the full path of that file, and
    > $File::Find::dir is the directory containing that file.
    >
    > Once you've got those three facts set straight, you just have to write
    > the subroutine that you want called for each file. The subroutine
    > should do whatever manipulations or storing you want to happen.


    A clear & concise summary like that in the documentation would be a benefit!

    Thanks
    Ian
     
    IanW, Dec 23, 2005
    #17
  18. IanW

    IanW Guest

    "Glenn Jackman" <> wrote in message
    news:...
    > At 2005-12-22 12:32PM, IanW <> wrote:
    >> "Tad McClellan" <> wrote in message
    >> news:...
    >>
    >> >> #use strict;
    >> >
    >> > You lose all of the benefits of that statement when you comment it out!

    >>
    >> yes, I know - I had it commented out to double check that the script
    >> worked
    >> without use strict.

    >
    > If your code runs with strict, it will certainly run without.


    yes, when I originally wrote the script in a test file I forgot to put the
    use strict in it

    > [...]
    >> > my($cdir) = shift || '';

    >>
    >> that's a neat way of avoiding getting a warning (yes, I did have use
    >> warnings in there for a while :).. is there any particular reason you
    >> use
    >> single quotes there instead of double quotes? I tend to use "" for
    >> pretty
    >> much everything. Also, I don't ever seem to use "||" - "or" would work
    >> as
    >> well in that scenario wouldn't it?

    >
    > I use single quotes to remind myself (and perl) that I have a literal
    > string that needs no interpolation.


    that sounds like another good habit to adopt..

    > '||' and 'or' have different operator precedences. Note also that '||'
    > has higher precendence than '=' which is higher than 'or'. So,
    > my($cdir) = shift || '';
    > is the same as
    > my($cdir) = (shift || '');
    >
    > Test:
    > $x = undef || 'alternate';
    > print '$x is ', (defined $x ? "'$x'" : 'undefined!'), "\n";
    >
    > Conversly,
    > my($cdir) = shift or '';
    > is the same as
    > ( my($cdir) = shift ) or '';
    > and thus $cdir may still be undefined.
    >
    > Test:
    > $y = undef or 'alternate';
    > print '$y is ', (defined $y ? "'$y'" : 'undefined!'), "\n";
    >
    > Another way of proving default values is the '||=' operator, as in:
    > my $cdir = shift;
    > $cdir ||= ''; # set cdir to the empty string if previously undefined.


    OK, got that.

    Thanks
    Ian
     
    IanW, Dec 23, 2005
    #18
  19. On 22 Dec 2005 11:21:24 -0800, in comp.lang.perl.misc , "Paul Lalli"
    <> in
    <> wrote:

    [snip]

    >I don't quite get the reasoning behind using poor programming practices
    >for "quick and dirty" scripts. Why not just do things the "right" way
    >each time? Programming definately involves developing habbits. It's
    >much better, in my opinion, to use short scripts to develop *good*
    >programming habbits.


    I agree with you, mostly. But I always wonder about commenting. I want
    to comment when the code is fresh, but not when it is being tried
    since half that code gets tossed quickly. In a perfect world I would
    do my commenting at the end of each session/sub-session.

    [snip]


    --
    Matt Silberstein

    Do something today about the Darfur Genocide

    http://www.beawitness.org
    http://www.darfurgenocide.org
    http://www.savedarfur.org

    "Darfur: A Genocide We can Stop"
     
    Matt Silberstein, Dec 26, 2005
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Larry Maturo
    Replies:
    1
    Views:
    444
    Jason Newell
    Nov 4, 2005
  2. SRam

    creating Handles

    SRam, Aug 19, 2003, in forum: Perl
    Replies:
    1
    Views:
    1,030
  3. Alex
    Replies:
    3
    Views:
    1,503
    Alvin Bruney
    Dec 2, 2003
  4. eino
    Replies:
    1
    Views:
    446
    =?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=
    May 8, 2007
  5. Graham Drabble

    Threads and Directory Handles

    Graham Drabble, Apr 20, 2010, in forum: Perl Misc
    Replies:
    2
    Views:
    83
    Steve C
    Apr 20, 2010
Loading...

Share This Page