G
George Mpouras
Ok let’s talk about a little more about creating directories (or any other
trie structure entries).
There are three basic methods :
1) The simple, just start create them from top to bottom. This is give us
N disk accesses always. Silly but do not underestimate it because it does
not consume cpu.
2) The smarter. Search from bottom to top for the first existing node and
start creating from this point and bellow; this is give us N only at the
worst case.
3) The most wise approach is to define an array from all the upper nodes.
Then perform a binary tree search at this array against a node existence
code. This is give as log(N) accesses at worst case.
So what is the best solution; Unfortunately there is not ….
If your directories “usually†exist, the 2nd method will be the faster
because it will finish at the very first iterations.
If your directories “usually†do not exist, the 3rd method will be the
faster because it will dig faster at the bottomless abyss.
Because the 2nd method is almost trivila I will write the code only for the
3rd binary tree method
mkdir_btree( 'd1/d2/d3/d4/d5' ) or die "$^E\n";
sub mkdir_btree
{
my @array = split /[\/\\]/, $_[0];
return 1 if -1 == $#array;
$array[0] eq '' && splice @array, 0, 2, "/$array[1]";
my ($begin, $end) = (0, scalar @array);
while ($begin < $end)
{
my $cut = int(($begin + $end)/2);
-d join('/', @array[0..$cut]) ? ($begin = $cut + 1) : ($end = $cut)
}
return 1 unless $begin < @array;
mkdir(join '/', @array[0 .. $_]) || return 0 for $begin .. $#array;1
}
Also here is small benchmark. If you want to test it , (its better to run it
multiple times)
#!/usr/bin/perl
# Dir creation benchmark
use strict;
use warnings;
use Time::HiRes;
use File:ath;
my $how_many = 1000;
my $how_deep = 40;
my $root_test_dir = '/tmp/test_this';
# create the dirs array;
print "Creating the test dirs array\n";
my @Test_dirs;
my $start_time=[];
for (0 .. $how_many) { my @R = $root_test_dir;
for (0 .. $how_deep) {
push @R, sprintf "%s%02d", ('a'..'z')[int(rand 24)], int(rand 100) }
push @Test_dirs, join('/',@R) }
foreach my $code (qw/
Mkdir_recursive
mkdir_btree
File_Path_module
/){
system("/bin/rm -rf $root_test_dir") if -d $root_test_dir;
$start_time = [ Time::HiRes::gettimeofday ];
print "Testing $code ... ";
foreach (@Test_dirs) { &{\&$code}($_) or die("$!") }
print "finished at ", Time::HiRes::tv_interval($start_time) ," sec\n"
}
sub File_Path_module
{
my $err;
File:ath::mkpath( $_[0], {'error' => \$err} );
@{$err} ? 0 : 1
}
sub Mkdir_recursive
{
return 1 if $_[0] eq '' || -d $_[0];
Mkdir_recursive( $_[0] =~/^(.*?)\/[^\/]+$/ ) || return undef;
mkdir $_[0] || return undef
}
sub mkdir_btree
{
my @array = split /\//, $_[0];
return 1 if -1 == $#array;
$array[0] eq '' && splice @array, 0, 2, "/$array[1]";
splice @array, 0, 2, "/$array[1]" if '' eq $array[0];
my ($begin, $end) = (0, scalar @array);
while ($begin < $end)
{
my $cut = int(($begin + $end)/2);
-d join('/', @array[0..$cut]) ? ($begin = $cut + 1) : ($end = $cut)
}
return 1 unless $begin < @array;
mkdir(join '/', @array[0 .. $_]) || return 0 for $begin .. $#array;1
}
END {
print "Clearing test dir: $root_test_dir\n";
system("/bin/rm -rf $root_test_dir") if -d $root_test_dir }
trie structure entries).
There are three basic methods :
1) The simple, just start create them from top to bottom. This is give us
N disk accesses always. Silly but do not underestimate it because it does
not consume cpu.
2) The smarter. Search from bottom to top for the first existing node and
start creating from this point and bellow; this is give us N only at the
worst case.
3) The most wise approach is to define an array from all the upper nodes.
Then perform a binary tree search at this array against a node existence
code. This is give as log(N) accesses at worst case.
So what is the best solution; Unfortunately there is not ….
If your directories “usually†exist, the 2nd method will be the faster
because it will finish at the very first iterations.
If your directories “usually†do not exist, the 3rd method will be the
faster because it will dig faster at the bottomless abyss.
Because the 2nd method is almost trivila I will write the code only for the
3rd binary tree method
mkdir_btree( 'd1/d2/d3/d4/d5' ) or die "$^E\n";
sub mkdir_btree
{
my @array = split /[\/\\]/, $_[0];
return 1 if -1 == $#array;
$array[0] eq '' && splice @array, 0, 2, "/$array[1]";
my ($begin, $end) = (0, scalar @array);
while ($begin < $end)
{
my $cut = int(($begin + $end)/2);
-d join('/', @array[0..$cut]) ? ($begin = $cut + 1) : ($end = $cut)
}
return 1 unless $begin < @array;
mkdir(join '/', @array[0 .. $_]) || return 0 for $begin .. $#array;1
}
Also here is small benchmark. If you want to test it , (its better to run it
multiple times)
#!/usr/bin/perl
# Dir creation benchmark
use strict;
use warnings;
use Time::HiRes;
use File:ath;
my $how_many = 1000;
my $how_deep = 40;
my $root_test_dir = '/tmp/test_this';
# create the dirs array;
print "Creating the test dirs array\n";
my @Test_dirs;
my $start_time=[];
for (0 .. $how_many) { my @R = $root_test_dir;
for (0 .. $how_deep) {
push @R, sprintf "%s%02d", ('a'..'z')[int(rand 24)], int(rand 100) }
push @Test_dirs, join('/',@R) }
foreach my $code (qw/
Mkdir_recursive
mkdir_btree
File_Path_module
/){
system("/bin/rm -rf $root_test_dir") if -d $root_test_dir;
$start_time = [ Time::HiRes::gettimeofday ];
print "Testing $code ... ";
foreach (@Test_dirs) { &{\&$code}($_) or die("$!") }
print "finished at ", Time::HiRes::tv_interval($start_time) ," sec\n"
}
sub File_Path_module
{
my $err;
File:ath::mkpath( $_[0], {'error' => \$err} );
@{$err} ? 0 : 1
}
sub Mkdir_recursive
{
return 1 if $_[0] eq '' || -d $_[0];
Mkdir_recursive( $_[0] =~/^(.*?)\/[^\/]+$/ ) || return undef;
mkdir $_[0] || return undef
}
sub mkdir_btree
{
my @array = split /\//, $_[0];
return 1 if -1 == $#array;
$array[0] eq '' && splice @array, 0, 2, "/$array[1]";
splice @array, 0, 2, "/$array[1]" if '' eq $array[0];
my ($begin, $end) = (0, scalar @array);
while ($begin < $end)
{
my $cut = int(($begin + $end)/2);
-d join('/', @array[0..$cut]) ? ($begin = $cut + 1) : ($end = $cut)
}
return 1 unless $begin < @array;
mkdir(join '/', @array[0 .. $_]) || return 0 for $begin .. $#array;1
}
END {
print "Clearing test dir: $root_test_dir\n";
system("/bin/rm -rf $root_test_dir") if -d $root_test_dir }