T
The King of Pots and Pans
I wrote the following perl script to wander through my hard drive
(from current working directory) and tell me how many of which file
types I have. It works for small directory hierarchies. It seems to
freeze up on some arbitrary file when trundling through a large
directory hierarchy. Not sure why.
It uses the 'file' shell command, and is extremely slow! How could
this script accomplish the same goal faster?
Here is a sample of the output:
[steve@sol testcode]$ ./classify_files.pl
Classifying all files recursively...
Found 902 files.
436 - ASCII C program text
252 - C++ program text
85 - directory
38 - ASCII text
15 - ASCII C++ program text
14 - ASCII English text
13 - ASCII C program text, with CRLF line terminators
9 - a /usr/bin/perl -w script text executable
6 - ASCII text, with CRLF line terminators
4 - a /usr/bin/perl script text executable
4 - Bourne-Again shell script text executable
3 - ASCII make commands text
2 - ASCII C program text, with very long lines
2 - ISO-8859 C program text, with CRLF line terminators
2 - ASCII text, with very long lines, with CRLF line terminators
2 - ASCII C++ program text, with CRLF line terminators
2 - ASCII English text, with CRLF line terminators
2 - character Computer Graphics Metafile
2 - ASCII English text, with very long lines
1 - data
(etc ... the rest removed for brevity)
I am a neophyte perl person ... don't do much of it but would like to
get better. Certainly there are much much better ways to do this so
that is why I am asking. Thanks. Here's the perl script. How can it be
faster?
#!/usr/bin/perl -w
use strict;
use Cwd;
my %ftype;
my $total_files = 0;
sub classify_file
{
my $output = `file $_`;
# ignore these ones
if($output =~ /broken symbolic link/ or
$output =~ /symbolic link to/ or
$output =~ /can\'t stat/)
{
return;
}
# remove filename
$output =~ s/.+: //;
# increment value for this key
++$ftype{$output};
++$total_files;
}
sub recurse_dir
{
# only get cwd, faster that way
my $cwd = getcwd();
# read all files in current dir
foreach(<*>)
{
# recurse into directories
if(-d $_)
{
chdir("$cwd/$_");
recurse_dir($_);
chdir($cwd);
}
# perform the 'file' shell command
classify_file("$cwd/$_");
}
}
print "\nClassifying all files recursively...";
recurse_dir(getcwd());
print "\nFound $total_files files.";
print "\n";
# print in descending order by value
foreach my $key (sort { $ftype{$b} <=> $ftype{$a} } (keys %ftype))
{
print "$ftype{$key} - $key";
}
(from current working directory) and tell me how many of which file
types I have. It works for small directory hierarchies. It seems to
freeze up on some arbitrary file when trundling through a large
directory hierarchy. Not sure why.
It uses the 'file' shell command, and is extremely slow! How could
this script accomplish the same goal faster?
Here is a sample of the output:
[steve@sol testcode]$ ./classify_files.pl
Classifying all files recursively...
Found 902 files.
436 - ASCII C program text
252 - C++ program text
85 - directory
38 - ASCII text
15 - ASCII C++ program text
14 - ASCII English text
13 - ASCII C program text, with CRLF line terminators
9 - a /usr/bin/perl -w script text executable
6 - ASCII text, with CRLF line terminators
4 - a /usr/bin/perl script text executable
4 - Bourne-Again shell script text executable
3 - ASCII make commands text
2 - ASCII C program text, with very long lines
2 - ISO-8859 C program text, with CRLF line terminators
2 - ASCII text, with very long lines, with CRLF line terminators
2 - ASCII C++ program text, with CRLF line terminators
2 - ASCII English text, with CRLF line terminators
2 - character Computer Graphics Metafile
2 - ASCII English text, with very long lines
1 - data
(etc ... the rest removed for brevity)
I am a neophyte perl person ... don't do much of it but would like to
get better. Certainly there are much much better ways to do this so
that is why I am asking. Thanks. Here's the perl script. How can it be
faster?
#!/usr/bin/perl -w
use strict;
use Cwd;
my %ftype;
my $total_files = 0;
sub classify_file
{
my $output = `file $_`;
# ignore these ones
if($output =~ /broken symbolic link/ or
$output =~ /symbolic link to/ or
$output =~ /can\'t stat/)
{
return;
}
# remove filename
$output =~ s/.+: //;
# increment value for this key
++$ftype{$output};
++$total_files;
}
sub recurse_dir
{
# only get cwd, faster that way
my $cwd = getcwd();
# read all files in current dir
foreach(<*>)
{
# recurse into directories
if(-d $_)
{
chdir("$cwd/$_");
recurse_dir($_);
chdir($cwd);
}
# perform the 'file' shell command
classify_file("$cwd/$_");
}
}
print "\nClassifying all files recursively...";
recurse_dir(getcwd());
print "\nFound $total_files files.";
print "\n";
# print in descending order by value
foreach my $key (sort { $ftype{$b} <=> $ftype{$a} } (keys %ftype))
{
print "$ftype{$key} - $key";
}