A
Arvin Portlock
I need to extract values from very large but numerically
sorted arrays. I can think of lots of ways to do it but
I'd like to try a binary search and prefer to use an exist-
ing module rather than reinvent the wheel.
Unfortunately I can't make sense of the documentation
in Search:Binary (perhaps there's another more appropriate
module?). Are there any scripts out there that use
Search::Binary which one can find on the web?
I have a set of tens of thousands of books and need to
take a random page sample from the set, i.e., examine
the 102,411th page of the set which might occur in the
971st book. My array is an ordered list of cumulative
page counts for each book. So if each book has a hundred
pages the first element in the array would be 1, the
second would be 101, the third 201, etc. If the random
sample required page 114, then the search would discover
it was page 14 of the second book.
Here's an inefficient way to do it. (Yes, I could
narrow each successive search by incrementing array
index start positions or shifting elements off the
array I know will never match as the random numbers
increase. But I'm not doing that here for simplicity
since I want to illustrate the problem and really
want to learn how to use Search::Binary anyway.)
# Some processing of raw data here. The result is a hash
# and an array. The hash has as keys the place within the
# larger sequence where the first page of the book falls.
my %books = (
1 => 'book 1',
116 => 'book 2',
433 => 'book 3',
762 => 'book 4'
);
# The array consists of the sorted keys of %books above.
# Created when raw data is read so no need for expensive
# sort keys.
my @pages = (1, 116, 433, 762);
# Random numbers are already sorted numerically
my @random_nos = (13, 18, 344, 390, 601);
RANDOM: foreach my $no (@random_nos) {
foreach my $page (@pages) {
if ($no >= $page) {
print "$no is in $books{$page}\n";
next RANDOM;
}
}
}
sorted arrays. I can think of lots of ways to do it but
I'd like to try a binary search and prefer to use an exist-
ing module rather than reinvent the wheel.
Unfortunately I can't make sense of the documentation
in Search:Binary (perhaps there's another more appropriate
module?). Are there any scripts out there that use
Search::Binary which one can find on the web?
I have a set of tens of thousands of books and need to
take a random page sample from the set, i.e., examine
the 102,411th page of the set which might occur in the
971st book. My array is an ordered list of cumulative
page counts for each book. So if each book has a hundred
pages the first element in the array would be 1, the
second would be 101, the third 201, etc. If the random
sample required page 114, then the search would discover
it was page 14 of the second book.
Here's an inefficient way to do it. (Yes, I could
narrow each successive search by incrementing array
index start positions or shifting elements off the
array I know will never match as the random numbers
increase. But I'm not doing that here for simplicity
since I want to illustrate the problem and really
want to learn how to use Search::Binary anyway.)
# Some processing of raw data here. The result is a hash
# and an array. The hash has as keys the place within the
# larger sequence where the first page of the book falls.
my %books = (
1 => 'book 1',
116 => 'book 2',
433 => 'book 3',
762 => 'book 4'
);
# The array consists of the sorted keys of %books above.
# Created when raw data is read so no need for expensive
# sort keys.
my @pages = (1, 116, 433, 762);
# Random numbers are already sorted numerically
my @random_nos = (13, 18, 344, 390, 601);
RANDOM: foreach my $no (@random_nos) {
foreach my $page (@pages) {
if ($no >= $page) {
print "$no is in $books{$page}\n";
next RANDOM;
}
}
}