Dynamic directory handles?

IanW · Dec 22, 2005

I have a chunk of code that counts files.dirs and size for a directory tree.
It goes like this:

================
#use strict;

my $basedir = "j:/files/sw-test";
my $fcount = 0;
my $fsize = 0;
my $dcount = 0;
dircount();

sub dircount {
my($cdir) = shift;
$cdir .= "/" if $cdir ne "";
my $dh = "DH" . length($cdir);
opendir($dh,"$basedir/$cdir");
while(my $fl = readdir($dh)){
next if $fl =~ /^\.{1,2}$/;
if(-d "$basedir/$cdir$fl"){
$dcount++;
dircount("$cdir$fl");
}
else{
$fcount++;
$fsize += -s "$basedir/$cdir$fl";
}
}
close($dh);
}

print "$fcount files and $dcount directories totalling $fsize bytes in
size";
================

If I "use strict" it says "Can't use string ("DH0") as a symbol ref while
"strict refs" in use at D:\test.pl line 14". What's the best way to get
round this, since I need a dynamic dir handle for the routine to work
properly.

Thanks
Ian

A. Sinan Unur · Dec 22, 2005

I have a chunk of code that counts files.dirs

First off, you are better off doing this using the File::Find module
rather than using recursion. If this is not a learning exercise, then I
would also urge you to look at File::Find::Rule to simplify matters when
processing items one by one.

================
#use strict;

my $basedir = "j:/files/sw-test";

The program should check @ARGV for an argument, and supply a reasonable
default if one is not present.

my $fcount = 0;
my $fsize = 0;
my $dcount = 0;

These are values that should be returned by your sub. You might want to
check for calling context in the sub and supply an appropriate scalar
value.

dircount();

sub dircount {
my($cdir) = shift;
$cdir .= "/" if $cdir ne "";
my $dh = "DH" . length($cdir);

Using lexical dirhandles, this should not be necessary.

opendir($dh,"$basedir/$cdir");

You should *always* check if calls to open/opendir succeeded.

while(my $fl = readdir($dh)){
next if $fl =~ /^\.{1,2}$/;
if(-d "$basedir/$cdir$fl"){
$dcount++;
dircount("$cdir$fl");
}

I do prefer using File::Spec for path manipulation.

If I "use strict" it says "Can't use string ("DH0") as a symbol ref
while "strict refs" in use at D:\test.pl line 14". What's the best way
to get round this, since I need a dynamic dir handle for the routine
to work properly.

Here is a revised version of your code. I cannot vouch for its accuracy,
since there seems to be discrepancy between the output of this script
and that of du on my system (probably because the space taken by zero-
length files is not taken to account). As always, corrections welcome:

#!/usr/bin/perl

use strict;
use warnings;

use File::Spec::Functions qw(canonpath catfile);

my $basedir = canonpath($ARGV[0] || '.');
my ($fcount, $dcount, $fsize) = dircount($basedir);

printf("%d files and %d directories totalling %d bytes in size\n",
$fcount, $dcount, $fsize);

sub dircount {
my ($cdir) = @_;
my ($fcount, $dcount, $fsize) = (0, 0, 0);
my $path = catfile($basedir, $cdir);
opendir my $dh, $path or die "Cannot open directory '$path': $!";
while (my $fl = readdir($dh)){
next if $fl eq '.' or $fl eq '..';
if(-d (my $d = catfile($path, $fl))){
$dcount++;
my ($fc, $dc, $fs) = dircount($d);
$fcount += $fc;
$dcount += $dc;
$fsize += $fs;
} else {
$fcount++;
$fsize += -s $d;
}
}
close($dh);
return ($fcount, $dcount, $fsize);
}

Anno Siegel · Dec 22, 2005

IanW said:
I have a chunk of code that counts files.dirs and size for a directory tree.
It goes like this:

================
#use strict;

my $basedir = "j:/files/sw-test";
my $fcount = 0;
my $fsize = 0;
my $dcount = 0;
dircount();

sub dircount {
my($cdir) = shift;
$cdir .= "/" if $cdir ne "";
my $dh = "DH" . length($cdir);
opendir($dh,"$basedir/$cdir");
while(my $fl = readdir($dh)){
next if $fl =~ /^\.{1,2}$/;
if(-d "$basedir/$cdir$fl"){
$dcount++;
dircount("$cdir$fl");
}
else{
$fcount++;
$fsize += -s "$basedir/$cdir$fl";
}
}
close($dh);
}

print "$fcount files and $dcount directories totalling $fsize bytes in
size";
================

If I "use strict" it says "Can't use string ("DH0") as a symbol ref while
"strict refs" in use at D:\test.pl line 14". What's the best way to get
round this, since I need a dynamic dir handle for the routine to work
properly.

Just leave $dh undefined instead of setting it to a string value.
opendir() will then create an anonymous directory handle. So change

my $dh = "DH" . length($cdir);

to

my $dh;

Also, you call dircount() without an argument. Presumably you wanted
to say

dircount( $basedir);

A better solution would be to use the standard module File::Find:

use File::Find;

sub dircount {
my $cdir = shift;
find sub {
if ( -d ) {
++ $dcount;
} else {
++ $fcount;
$fsize += -s;
}
}, $cdir;
}

Anno

Tad McClellan · Dec 22, 2005

IanW said:
I have a chunk of code that counts files.dirs and size for a directory tree.

You have a whole bunch of problems, some big, some small.

I'll mention the lesser ones in comments about your code below,
but the 3 big ones are:

1) You should always enable warnings (and strict) when
developing Perl code.

2) You get a dynamic dirhandle the same way you get a dynamic
filehandle, so:

perldoc -q filehandle

How can I make a filehandle local to a subroutine? How do I pass file-
handles between subroutines? How do I make an array of filehandles?

3) There is an already-invented (and tested) wheel for doing
recursive directory searching, the File::Find module.

You can read the module's docs with:

perldoc File::Find

================
#use strict;

You lose all of the benefits of that statement when you comment it out!

my $basedir = "j:/files/sw-test";
my $fcount = 0;
my $fsize = 0;
my $dcount = 0;
dircount();

sub dircount {
my($cdir) = shift;

$cdir will be undef for the top-level call.

$cdir .= "/" if $cdir ne "";

You have all of the path components separated already, so I'd
paste the dir separator in myself on each usage instead of
burying one inside of a variable's value.

my $dh = "DH" . length($cdir);
opendir($dh,"$basedir/$cdir");

You get a dynamic dirhandle when the variable is undef.

Your variable is not undef, so there is no dynamic dirhandle...

You should always check the return value to see if you actually
got what you asked for:

opendir($dh,"$basedir/$cdir") or die "could not open '$basedir/$cdir' $!";

while(my $fl = readdir($dh)){
next if $fl =~ /^\.{1,2}$/;
if(-d "$basedir/$cdir$fl"){
$dcount++;
dircount("$cdir$fl");

If $fl is a symlink to a "higher" directory, then your
code will go into an infinite loop here.

Applying the minimum changes to fix (IMO) your code, I get:

------------------------
#!/usr/bin/perl
use warnings;
use strict;

my $basedir = '/home/tadmc/temp';
my $fcount = 0;
my $fsize = 0;
my $dcount = 0;
dircount();

sub dircount {
my($cdir) = shift || '';
opendir(my $dh,"$basedir/$cdir") or die "could not open dir $!";
while(my $fl = readdir($dh)){
next if $fl =~ /^\.{1,2}$/;
if(-d "$basedir/$cdir/$fl"){
$dcount++;
dircount("$cdir/$fl");
}
else{
$fcount++;
$fsize += -s "$basedir/$cdir/$fl";
}
}
close($dh);
}

print "$fcount files and $dcount directories totalling $fsize bytes in size\n";
------------------------

Recasting it to use the tried-and-true module, I get:

------------------------
#!/usr/bin/perl
use warnings;
use strict;
use File::Find;

my $basedir = '/home/tadmc/temp';
my $fcount = 0;
my $fsize = 0;
my $dcount = 0;
find( \&dircount, $basedir );

sub dircount {
return if $_ eq '.' or $_ eq '..';
$dcount++ if -d;
return unless -f; # only care about plain files at this point
$fcount++;
$fsize += -s;
}

print "$fcount files and $dcount directories totalling $fsize bytes in size\n";

IanW · Dec 22, 2005

First off, you are better off doing this using the File::Find module
rather than using recursion. If this is not a learning exercise, then I
would also urge you to look at File::Find::Rule to simplify matters when
processing items one by one.

The program should check @ARGV for an argument, and supply a reasonable
default if one is not present.

It will actually be a subroutine that forms part of a larger CGI script and
the basedir will actually be passed from a form field.

These are values that should be returned by your sub. You might want to
check for calling context in the sub and supply an appropriate scalar
value.

I see the way you've done it in the modified code below, however I didn't
think there was anything wrong with a few global scope vars as long as you
don't forget you've used them globally and then try and use the same names
in another unrelated part of the script... but it's not a huge script and I
can keep track of those things easily enough.

Using lexical dirhandles, this should not be necessary.

lexical is one of those words that I've never got my head round in
programming terms, but I see in the example you've not given $dh a value,
which ties in with what Anno says about opendir creating an anonymous
handle.

You should *always* check if calls to open/opendir succeeded.

Must admit I get a bit lazy in CGI scripts with that, because to be
user-friendly, it means more than just adding "die..." bit to the end of the
open line. I've also never come across a directory or file that wouldn't
open on any of my scripts..

while(my $fl = readdir($dh)){
next if $fl =~ /^\.{1,2}$/;
if(-d "$basedir/$cdir$fl"){
$dcount++;
dircount("$cdir$fl");
}

Click to expand...

I do prefer using File::Spec for path manipulation.

If I "use strict" it says "Can't use string ("DH0") as a symbol ref
while "strict refs" in use at D:\test.pl line 14". What's the best way
to get round this, since I need a dynamic dir handle for the routine
to work properly.

Click to expand...

Here is a revised version of your code. I cannot vouch for its accuracy,
since there seems to be discrepancy between the output of this script
and that of du on my system (probably because the space taken by zero-
length files is not taken to account). As always, corrections welcome:

#!/usr/bin/perl

use strict;
use warnings;

use File::Spec::Functions qw(canonpath catfile);

my $basedir = canonpath($ARGV[0] || '.');
my ($fcount, $dcount, $fsize) = dircount($basedir);

printf("%d files and %d directories totalling %d bytes in size\n",
$fcount, $dcount, $fsize);

sub dircount {
my ($cdir) = @_;
my ($fcount, $dcount, $fsize) = (0, 0, 0);

ahh yes of course

I was thinking of sth along those lines, though thought
I might have to use qw//

my $path = catfile($basedir, $cdir);
opendir my $dh, $path or die "Cannot open directory '$path': $!";
while (my $fl = readdir($dh)){
next if $fl eq '.' or $fl eq '..';

is there any reason for doing it that way over my original line using a
regexp? is it a performance thing?

if(-d (my $d = catfile($path, $fl))){
$dcount++;
my ($fc, $dc, $fs) = dircount($d);
$fcount += $fc;
$dcount += $dc;
$fsize += $fs;

Would the following work, a a shortened version of those 3 lines?

($fcount, $dcount, $fsize) += ($fc, $dc, $fs);

} else {
$fcount++;
$fsize += -s $d;
}
}
close($dh);
return ($fcount, $dcount, $fsize);
}

thanks

Ian

IanW · Dec 22, 2005

Just leave $dh undefined instead of setting it to a string value.

opendir() will then create an anonymous directory handle. So change

my $dh = "DH" . length($cdir);

to

my $dh;

I like that solution

Also, you call dircount() without an argument. Presumably you wanted
to say

dircount( $basedir);

well, $basedir is a global var and I put it in all the places it's needed
anyway

A better solution would be to use the standard module File::Find:

use File::Find;

sub dircount {
my $cdir = shift;
find sub {
if ( -d ) {
++ $dcount;
} else {
++ $fcount;
$fsize += -s;
}
}, $cdir;
}

that's very concise, thanks! I looked at the File:Find module docs before
but the document made my eyes glaze over. I suppose it's one of those
modules that really useful once you've taken teh time to plow through the
docs and understand it properly.

Regards
Ian

IanW · Dec 22, 2005

Tad McClellan said:
You have a whole bunch of problems, some big, some small.

oh dear.. I was worried that might happen!

I'll mention the lesser ones in comments about your code below,
but the 3 big ones are:

1) You should always enable warnings (and strict) when
developing Perl code.

2) You get a dynamic dirhandle the same way you get a dynamic
filehandle, so:

perldoc -q filehandle

How can I make a filehandle local to a subroutine? How do I pass
file-
handles between subroutines? How do I make an array of filehandles?

3) There is an already-invented (and tested) wheel for doing
recursive directory searching, the File::Find module.

The only thing that sometimes puts me off using modules for relatively
simple things like this, is that I wonder how much extra resources they use
or whether they compromise performance in some way. That is, File: Find must
be quite a sizable module with a stack of function/options, so couldn't that
mean lots more memory to run, or is that an incorrect presumption?

You can read the module's docs with:

perldoc File::Find

You lose all of the benefits of that statement when you comment it out!

yes, I know - I had it commented out to double check that the script worked
without use strict.

$cdir will be undef for the top-level call.

You have all of the path components separated already, so I'd
paste the dir separator in myself on each usage instead of
burying one inside of a variable's value.

there was a reason I did that, but I can't recall what it was now (bear with
me - it's nearly home-time and my brain is frazzled!)

You get a dynamic dirhandle when the variable is undef.

Your variable is not undef, so there is no dynamic dirhandle...

You should always check the return value to see if you actually
got what you asked for:

opendir($dh,"$basedir/$cdir") or die "could not open '$basedir/$cdir'
$!";

If $fl is a symlink to a "higher" directory, then your
code will go into an infinite loop here.

it's a script that will only run on my Windows servers, so that wasn't an
issue

Applying the minimum changes to fix (IMO) your code, I get:

------------------------
#!/usr/bin/perl
use warnings;
use strict;

my $basedir = '/home/tadmc/temp';
my $fcount = 0;
my $fsize = 0;
my $dcount = 0;
dircount();

sub dircount {
my($cdir) = shift || '';

that's a neat way of avoiding getting a warning (yes, I did have use
warnings in there for a while

.. is there any particular reason you use
single quotes there instead of double quotes? I tend to use "" for pretty
much everything. Also, I don't ever seem to use "||" - "or" would work as
well in that scenario wouldn't it?

opendir(my $dh,"$basedir/$cdir") or die "could not open dir $!";
while(my $fl = readdir($dh)){
next if $fl =~ /^\.{1,2}$/;
if(-d "$basedir/$cdir/$fl"){
$dcount++;
dircount("$cdir/$fl");
}
else{
$fcount++;
$fsize += -s "$basedir/$cdir/$fl";
}
}
close($dh);
}

print "$fcount files and $dcount directories totalling $fsize bytes in
size\n";
------------------------

Recasting it to use the tried-and-true module, I get:

------------------------
#!/usr/bin/perl
use warnings;
use strict;
use File::Find;

my $basedir = '/home/tadmc/temp';
my $fcount = 0;
my $fsize = 0;
my $dcount = 0;
find( \&dircount, $basedir );

sub dircount {
return if $_ eq '.' or $_ eq '..';
$dcount++ if -d;
return unless -f; # only care about plain files at this point
$fcount++;
$fsize += -s;
}

print "$fcount files and $dcount directories totalling $fsize bytes in
size\n";
------------------------

thanks
Ian

Glenn Jackman · Dec 22, 2005

At 2005-12-22 12:32PM said:
yes, I know - I had it commented out to double check that the script worked
without use strict.

If your code runs with strict, it will certainly run without.

[...]

that's a neat way of avoiding getting a warning (yes, I did have use
warnings in there for a while .. is there any particular reason you use
single quotes there instead of double quotes? I tend to use "" for pretty
much everything. Also, I don't ever seem to use "||" - "or" would work as
well in that scenario wouldn't it?

I use single quotes to remind myself (and perl) that I have a literal
string that needs no interpolation.

'||' and 'or' have different operator precedences. Note also that '||'
has higher precendence than '=' which is higher than 'or'. So,
my($cdir) = shift || '';
is the same as
my($cdir) = (shift || '');

Test:
$x = undef || 'alternate';
print '$x is ', (defined $x ? "'$x'" : 'undefined!'), "\n";

Conversly,
my($cdir) = shift or '';
is the same as
( my($cdir) = shift ) or '';
and thus $cdir may still be undefined.

Test:
$y = undef or 'alternate';
print '$y is ', (defined $y ? "'$y'" : 'undefined!'), "\n";

Another way of proving default values is the '||=' operator, as in:
my $cdir = shift;
$cdir ||= ''; # set cdir to the empty string if previously undefined.

Tintin · Dec 22, 2005

IanW said:
The only thing that sometimes puts me off using modules for relatively
simple things like this, is that I wonder how much extra resources they use
or whether they compromise performance in some way. That is, File: Find must
be quite a sizable module with a stack of function/options, so couldn't that
mean lots more memory to run, or is that an incorrect presumption?

The "overhead" of File::Find is absolutely miniscule. Wouldn't you rather
write efficient code using tried and true
methods, rather than the "overhead" of hacking your own code?

Paul Lalli · Dec 22, 2005

IanW said:
I see the way you've done it in the modified code below, however I didn't
think there was anything wrong with a few global scope vars as long as you
don't forget you've used them globally and then try and use the same names
in another unrelated part of the script... but it's not a huge script and I
can keep track of those things easily enough.

You've just listed two conditionals that aren't especially guaranteed,
and given the proviso that your reasoning is only valid if the script's
size remains as it is now. This paragraph sounds a lot more like an
argument *against* doing it the way you did rather than *for*.

I don't quite get the reasoning behind using poor programming practices
for "quick and dirty" scripts. Why not just do things the "right" way
each time? Programming definately involves developing habbits. It's
much better, in my opinion, to use short scripts to develop *good*
programming habbits.

lexical is one of those words that I've never got my head round in
programming terms,

Lexical, at least as far as Perl is concerned at any rate, simply means
"scope exists only in the physical block in which it was declared". If
a variable is declared within a block, it is visible only in that
block, regardless of any other subroutines or control paths called from
within that block. (Contrast with dynamic scope (such as with local)
in which the scope of the temporary value extends to any subroutines
called from within the same block as the declaration).

Must admit I get a bit lazy in CGI scripts with that, because to be
user-friendly, it means more than just adding "die..." bit to the end of the
open line.

Er, does that mean it's more user friendly to let the program attempt
to read from or write to a possibly-closed filehandle? ;-)

I've also never come across a directory or file that wouldn't
open on any of my scripts..

Again, this goes back to developing the right kinds of habbits. Just
because you haven't encountered an error yet is no reason not to guard
against that error in the future.

is there any reason for doing it that way over my original line using a
regexp? is it a performance thing?

Performance aside, I think this is more readable than the regexp
equivalent. However. . .

#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw/cmpthese/;

sub re {
my @files;
opendir my $dh, '.' or die "Cannot open current directory: $!";
while (my $file = readdir ($dh)){
next if $file =~ /^\.{1,2}$/;
push @files, $file;
}
}

sub eqor {
my @files;
opendir my $dh, '.' or die "Cannot open current directory: $!";
while (my $file = readdir ($dh)){
next if $file eq '.' or $file eq '..';
push @files, $file;
}
}

cmpthese(10000, {Regexp=>\&re, Equality=>\&eqor} );
__END__
Benchmark: timing 10000 iterations of Equality, Regexp...
Equality: 15 wallclock secs (12.72 usr + 2.44 sys = 15.16 CPU) @
659.63/s (n=10000)
Regexp: 16 wallclock secs (13.20 usr + 2.64 sys = 15.84 CPU) @
631.31/s (n=10000)
Rate Regexp Equality
Regexp 631/s -- -4%
Equality 660/s 4% --

Obviously, a rather miniscule benefit...

Would the following work, a a shortened version of those 3 lines?

($fcount, $dcount, $fsize) += ($fc, $dc, $fs);

Why ask if something would work? Why not try it for yourself and see?

(The answer is "no", however. += expects a scalar on each side. Read
perldoc perlop to see what the comma operator does in scalar context,
and see if you can use that to predict the results).

For syntax similar to what you'd like that to do, check out the
pairwise() function in the List::MoreUtils module from CPAN

Paul Lalli

Paul Lalli · Dec 22, 2005

IanW said:
that's very concise, thanks! I looked at the File:Find module docs before
but the document made my eyes glaze over. I suppose it's one of those
modules that really useful once you've taken teh time to plow through the
docs and understand it properly.

File::Find is one of those modules that looks a lot more complicated
than it is. You only really need to know 3 things to use it:
(1) It exports one function, find(), which takes a subroutine and a
list of directories to recurse.
(2) find() will recurse each of the directories, calling that
subroutine once for each and every file and directory found in the list
of directories you provided
(3) Within that subroutine, $_ is the name of current file it's looking
at, $File::Find::name is the full path of that file, and
$File::Find::dir is the directory containing that file.

Once you've got those three facts set straight, you just have to write
the subroutine that you want called for each file. The subroutine
should do whatever manipulations or storing you want to happen.

(There is also a CPAN module, File::Find::Rule which supposedly makes
File::Find easier to use and/or comprehend, but I can't say I
personally have ever found it particularly necessary).

Paul Lalli

Paul Lalli · Dec 22, 2005

IanW said:
The only thing that sometimes puts me off using modules for relatively
simple things like this, is that I wonder how much extra resources they use
or whether they compromise performance in some way. That is, File: Find must
be quite a sizable module with a stack of function/options, so couldn't that
mean lots more memory to run, or is that an incorrect presumption?

Why presume anything? There exist tools to determine this sort of
thing. Checkout the Benchmark and Dprof modules, and make the actual
comparisons. My guess (because I haven't written the comparisons
myself) is that the "overhead" of using the File::Find module will be
miniscule in comparison to the overhead of writing new, possibly buggy
code that you must maintain.

yes, I know - I had it commented out to double check that the script worked
without use strict.

Er. You have a misunderstanding about use strict. If code works with
use strict, by definition it will work without it. (The inverse,
however, is completely false).

use strict; does three things:
1) Prevents you from using a package variable without fully qualifying
it (saying $main::foo rather than $foo), or pre-declaring it with our.
Since your strict-compliant code is obviously not doing that, removing
that restriction can't hurt you.
2) Prevents you from using symbolic references ($x = 'foo'; $$x =
'bar'; sets $foo equal to 'bar'). Again, removing the restriction
against something you can't be doing can't have any effect.
3) Prevents you from using barewords as a string ($x = Hello; will set
$x to 'Hello' if no &Hello subroutine exists).

So strictures are really just restrictions agains relatively unsafe
programming practices. By removing the restrictions, you lose the
checks that you're not doing anything unsafe, but you don't change
anything about the code you've already written.

that's a neat way of avoiding getting a warning (yes, I did have use
warnings in there for a while .. is there any particular reason you use
single quotes there instead of double quotes? I tend to use "" for pretty
much everything.

Single quotes tell the interpreter and the reader that nothing in the
enclosed string needs to be interpreted. This creates (1) a very
miniscule performance boost, and (2) a non-trivial readability boost.
Double quotes, conversely, serve as a visual clue to the reader that
there is a variable or escape sequence within the enclosed string that
should be taken note of.

Also, I don't ever seem to use "||" - "or" would work as
well in that scenario wouldn't it?

No. || and or are *functionally* equivalent, but differ by precedence.
Observe:
$ perl -MO=Deparse,-p -e'my $foo = shift || default()'
(my $foo = (shift(@ARGV) || default()));
-e syntax OK
$ perl -MO=Deparse,-p -e'my $foo = shift or default()'
((my $foo = shift(@ARGV)) or default());
-e syntax OK

As you can see, the first one assigns $foo to the return value of the
expression (shift @ARGV || default()). The second one assigns $foo to
the return value of (shift @ARGV). If that assignment produced a false
value, default() is then evaluated, but its return value is *not*
assigned to anything. $foo still has whatever shift returned.

Paul Lalli

Tad McClellan · Dec 23, 2005

Must admit I get a bit lazy in CGI scripts

So CGI programming is a hobby for you rather than a profession?

Being lazy at your job is Not Good.

I've also never come across a directory or file that wouldn't
open on any of my scripts..

I've never been in a car accident, so I don't need seat belts. Right?

I was thinking of sth along those lines

s/sth/something/;

Please don't use "cutsie" spellings in Usenet posts.

It is inconsiderate of folks whose first language is not English.

is there any reason for doing it that way over my original line using a
regexp?

Yes, the same reason that you should be applying to all the code
you write: it is easier to read and understand.

Optimize for labor, optimize for labor, optimize for labor.

is it a performance thing?

Yes, your maintenance programmer will perform better.

(and it will execute faster, but that is almost never a
valid consideration in this day and age.
)

Tad McClellan · Dec 23, 2005

IanW said:
The only thing that sometimes puts me off using modules for relatively
simple things like this, is that I wonder how much extra resources they use
or whether they compromise performance in some way.

Cost to spend extra CPU cycles: $0.000001

Cost to develop code that saves those cycles: $1000.00

Your program will have to execute an awfully large number of
times for your approach to be economical.

Having a room full of programmers working on shaving off a few
cycles or bytes was commonplace in the '70s, but nowadays cycles
are cheap, RAM is cheap, what is expensive is your salary.

(though payday may make you argue that you are not expensive enough.

That is, File: Find must
be quite a sizable module with a stack of function/options, so couldn't that
mean lots more memory to run,

Does your application have to run on a cell phone or some other
place where RAM costs a premium?

or is that an incorrect presumption?

It was correct 30 years ago, but it has changed due to Moore's Law.

it's a script that will only run on my Windows servers, so that wasn't an
issue

Windows does not have symbolic links?

is there any particular reason you use
single quotes there instead of double quotes?

Yes, I use single quotes unless I require one of the two extra
things (escapes and interpolation) that double quotes brings with it.

I tend to use "" for pretty
much everything.

Some strings contain variables, some strings don't.

During debugging, you are very often looking for variables.

You have to examine _every_ string, looking for variables.

I get to skip careful examination of many strings, because
they have been marked "no variables here" by the single quotes.

ie. it enables faster debugging.

Also, I don't ever seem to use "||" - "or" would work as
well in that scenario wouldn't it?

What happened when you tried it?

IanW · Dec 23, 2005

Tad McClellan said:
s/sth/something/;

Please don't use "cutsie" spellings in Usenet posts.

It is inconsiderate of folks whose first language is not English.

As is using the word "cutsie", which I always thought was spelt "cutesy",
and since such folks may have to use a dictionary for a word of less
frequent usage like that, it would help them if you spell it correctly ;-)

Yes, the same reason that you should be applying to all the code
you write: it is easier to read and understand.

I find a short regexp like that just as easy to understand as the non-regexp
version

Optimize for labor, optimize for labor, optimize for labor.

Yes, your maintenance programmer will perform better.

hehe, that's me in this case

Ian

IanW · Dec 23, 2005

Paul Lalli said:
You've just listed two conditionals that aren't especially guaranteed,
and given the proviso that your reasoning is only valid if the script's
size remains as it is now. This paragraph sounds a lot more like an
argument *against* doing it the way you did rather than *for*.

I don't quite get the reasoning behind using poor programming practices
for "quick and dirty" scripts. Why not just do things the "right" way
each time? Programming definately involves developing habbits. It's
much better, in my opinion, to use short scripts to develop *good*
programming habbits.

OK fair enough, I'll work on that habit!

Lexical, at least as far as Perl is concerned at any rate, simply means
"scope exists only in the physical block in which it was declared". If

Oh, I see.. anything declared with "my" then...

cmpthese(10000, {Regexp=>\&re, Equality=>\&eqor} );
__END__
Benchmark: timing 10000 iterations of Equality, Regexp...
Equality: 15 wallclock secs (12.72 usr + 2.44 sys = 15.16 CPU) @
659.63/s (n=10000)
Regexp: 16 wallclock secs (13.20 usr + 2.64 sys = 15.84 CPU) @
631.31/s (n=10000)
Rate Regexp Equality
Regexp 631/s -- -4%
Equality 660/s 4% --

Obviously, a rather miniscule benefit...

thanks, negligable indeed. that cmpthese function looks useful

Why ask if something would work? Why not try it for yourself and see?

(The answer is "no", however. += expects a scalar on each side. Read
perldoc perlop to see what the comma operator does in scalar context,
and see if you can use that to predict the results).

For syntax similar to what you'd like that to do, check out the
pairwise() function in the List::MoreUtils module from CPAN

I suppose there's always the obvious:

($fcount, $dcount, $fsize) = ($fcount+$fc, $dcount+$dc, $fsize+$fs);

but it's more typing!

Ian

IanW · Dec 23, 2005

Paul Lalli said:
File::Find is one of those modules that looks a lot more complicated
than it is. You only really need to know 3 things to use it:
(1) It exports one function, find(), which takes a subroutine and a
list of directories to recurse.
(2) find() will recurse each of the directories, calling that
subroutine once for each and every file and directory found in the list
of directories you provided
(3) Within that subroutine, $_ is the name of current file it's looking
at, $File::Find::name is the full path of that file, and
$File::Find::dir is the directory containing that file.

Once you've got those three facts set straight, you just have to write
the subroutine that you want called for each file. The subroutine
should do whatever manipulations or storing you want to happen.

A clear & concise summary like that in the documentation would be a benefit!

Thanks
Ian

IanW · Dec 23, 2005

Glenn Jackman said:
If your code runs with strict, it will certainly run without.

yes, when I originally wrote the script in a test file I forgot to put the
use strict in it

[...]

that's a neat way of avoiding getting a warning (yes, I did have use
warnings in there for a while .. is there any particular reason you
use
single quotes there instead of double quotes? I tend to use "" for
pretty
much everything. Also, I don't ever seem to use "||" - "or" would work
as
well in that scenario wouldn't it?

Click to expand...

I use single quotes to remind myself (and perl) that I have a literal
string that needs no interpolation.

that sounds like another good habit to adopt..

'||' and 'or' have different operator precedences. Note also that '||'
has higher precendence than '=' which is higher than 'or'. So,
my($cdir) = shift || '';
is the same as
my($cdir) = (shift || '');

Test:
$x = undef || 'alternate';
print '$x is ', (defined $x ? "'$x'" : 'undefined!'), "\n";

Conversly,
my($cdir) = shift or '';
is the same as
( my($cdir) = shift ) or '';
and thus $cdir may still be undefined.

Test:
$y = undef or 'alternate';
print '$y is ', (defined $y ? "'$y'" : 'undefined!'), "\n";

Another way of proving default values is the '||=' operator, as in:
my $cdir = shift;
$cdir ||= ''; # set cdir to the empty string if previously undefined.

OK, got that.

Thanks
Ian

Matt Silberstein · Dec 26, 2005

On 22 Dec 2005 11:21:24 -0800, in comp.lang.perl.misc , "Paul Lalli"
<[email protected]> in

[snip]

I don't quite get the reasoning behind using poor programming practices
for "quick and dirty" scripts. Why not just do things the "right" way
each time? Programming definately involves developing habbits. It's
much better, in my opinion, to use short scripts to develop *good*
programming habbits.

I agree with you, mostly. But I always wonder about commenting. I want
to comment when the code is fresh, but not when it is being tried
since half that code gets tossed quickly. In a perfect world I would
do my commenting at the end of each session/sub-session.

[snip]

--
Matt Silberstein

Do something today about the Darfur Genocide

http://www.beawitness.org
http://www.darfurgenocide.org
http://www.savedarfur.org

"Darfur: A Genocide We can Stop"

Having Trouble Recursing a Function	1	Feb 24, 2005
Threads and Directory Handles	2	Apr 20, 2010
Sorting	3	May 12, 2012
WinXP, Python3.1.2,dir-listing to XML - problem with unicode file names	0	Apr 3, 2010
randomly choose some uniq elements of an array	18	Jan 19, 2006
UTF8 strings and filesystem access	6	Oct 11, 2007
folder parsing (newbie)problem	5	Jan 22, 2008
Regex substitute w/ match variables	12	May 5, 2005

Dynamic directory handles?

IanW

A. Sinan Unur

Anno Siegel

Tad McClellan

IanW

IanW

IanW

Glenn Jackman

Tintin

Paul Lalli

Paul Lalli

Paul Lalli

Tad McClellan

Tad McClellan

IanW

IanW

IanW

IanW

Matt Silberstein

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads