S
s1037989
I whipped up this quick and ugly script and I wanted to post it for
code review and others' benefit.
With an array such as:
qw(aaaa aaab aaac bbbb bccc bcdd bcee bcff cccc dddd)
The program returns:
# perl list2fs.pl 2
/a/aa/aaa/aaaa/aaaa
/a/aa/aaa/aaab/aaab
/a/aa/aaa/aaac/aaac
/b/bb/bbbb
/b/bc/bcc/bccc
/b/bc/bcd/bcdd
/b/bc/bce/bcee
/b/bc/bcf/bcff
/c/cccc
/d/dddd
Now as you can see, what this program does is take a list of filenames
and "hashifies" it like mailbox storing allowing no more than 2 (or
whatever $ARGV[0] is) filenames to be in a single directory. The
point, obviously, is if you have 100000 filenames and ext3 won't store
100000 files in a single directory, you can use this technique to break
them down.
I had to make it recursive because I'm dealing with unknown data. I
don't know that 3 levels or even 6 levels will suffice. Therefore, the
recursion figures it out.
Now, that said, this is NOT intended for "hashifying" mail storage
dirs. It IS intended to "hashify" a HUGE list of filenames.
Unfortunately this code is VERY inefficient. So, I post it here so
people can see my idea if it helps, and so that people can maybe direct
me to an existing CPAN module that would accomplish the same thing?
Or, perhaps someone likes what I've started and wants to help improve
the code?
---------------------
#!/usr/bin/perl -w
use strict;
$ARGV[0] ||= 1;
#my @list = map { chomp; $_ } <STDIN>;
my @list = @_ = qw(aaaa aaab aaac bbbb bccc bcdd bcee bcff cccc dddd);
my %hash = a(1, \@list);
b('', %hash);
sub a {
my %list = map { lc(substr($_, 0, $_[0])) => 1 } @{$_[1]};
my %hash = ();
foreach my $hash ( keys %list ) {
my @hash = grep { lc(substr($hash, 0)) eq lc(substr($_,
0, length($hash))) } @{$_[1]};
if ( $#hash >= $ARGV[0] ) {
%{$hash{$hash}} = a($_[0]+1, \@hash);
} elsif ( $#hash >= 0 ) {
%{$hash{$hash}} = map { lc($_) => 1 } @hash;
} elsif ( $#hash == -1 ) {
# %{$hash{$hash}} = ();
}
}
return %hash;
}
sub b {
my ($root, %hash) = @_;
foreach ( sort keys %hash ) {
if ( ref $hash{$_} eq 'HASH' ) {
b($root.'/'.$_, %{$hash{$_}});
} else {
print "$root/$_\n";
}
}
}
code review and others' benefit.
With an array such as:
qw(aaaa aaab aaac bbbb bccc bcdd bcee bcff cccc dddd)
The program returns:
# perl list2fs.pl 2
/a/aa/aaa/aaaa/aaaa
/a/aa/aaa/aaab/aaab
/a/aa/aaa/aaac/aaac
/b/bb/bbbb
/b/bc/bcc/bccc
/b/bc/bcd/bcdd
/b/bc/bce/bcee
/b/bc/bcf/bcff
/c/cccc
/d/dddd
Now as you can see, what this program does is take a list of filenames
and "hashifies" it like mailbox storing allowing no more than 2 (or
whatever $ARGV[0] is) filenames to be in a single directory. The
point, obviously, is if you have 100000 filenames and ext3 won't store
100000 files in a single directory, you can use this technique to break
them down.
I had to make it recursive because I'm dealing with unknown data. I
don't know that 3 levels or even 6 levels will suffice. Therefore, the
recursion figures it out.
Now, that said, this is NOT intended for "hashifying" mail storage
dirs. It IS intended to "hashify" a HUGE list of filenames.
Unfortunately this code is VERY inefficient. So, I post it here so
people can see my idea if it helps, and so that people can maybe direct
me to an existing CPAN module that would accomplish the same thing?
Or, perhaps someone likes what I've started and wants to help improve
the code?
---------------------
#!/usr/bin/perl -w
use strict;
$ARGV[0] ||= 1;
#my @list = map { chomp; $_ } <STDIN>;
my @list = @_ = qw(aaaa aaab aaac bbbb bccc bcdd bcee bcff cccc dddd);
my %hash = a(1, \@list);
b('', %hash);
sub a {
my %list = map { lc(substr($_, 0, $_[0])) => 1 } @{$_[1]};
my %hash = ();
foreach my $hash ( keys %list ) {
my @hash = grep { lc(substr($hash, 0)) eq lc(substr($_,
0, length($hash))) } @{$_[1]};
if ( $#hash >= $ARGV[0] ) {
%{$hash{$hash}} = a($_[0]+1, \@hash);
} elsif ( $#hash >= 0 ) {
%{$hash{$hash}} = map { lc($_) => 1 } @hash;
} elsif ( $#hash == -1 ) {
# %{$hash{$hash}} = ();
}
}
return %hash;
}
sub b {
my ($root, %hash) = @_;
foreach ( sort keys %hash ) {
if ( ref $hash{$_} eq 'HASH' ) {
b($root.'/'.$_, %{$hash{$_}});
} else {
print "$root/$_\n";
}
}
}