I have a very large list of data (up to 20,000 lines), and I just want
to print out the largest 50 values. What is the quickest way to sort
this using perl?
Most people wouldn't consider 20,000 items to be a "very large list."
I'd suggest first just trying the brute-force approach of reading them
into an array, sorting them, and taking the last 50 results. Only if
this turns out to be unacceptably slow or memory-consuming for your
application should you consider something different. Remember that
perl's sort routines are implemented in optimized C code, so even if you
implement a lower-time-complexity algorithm yourself, the value of N for
which it becomes faster may be quite large.
If the brute-force approach turns out to be unacceptable (again, based on
actual measurement, not guesswork), here's a linear-time (O(N)) selection
algorithm:
1) Read the first 50 values into the top 50 list.
2) For each subsequent value, find the smallest value in the top 50 list
that's less than it. If there isn't one, do nothing. If there is, kick
it out of the list and add the new value to the list.
Step 2 will be simpler to program if you sort the list once after step 1.
You'll find shift() and splice() particularly helpful.
[UNTESTED]
my @top;
while (<>) {
push @top,$_ if $.<=50;
@top=sort @top if $.=50;
if ($.>50) {
for (my $i=49;$i>=0;--$i) {
if ($top[$i] lt $_) {
# this is where the new value belongs
splice @top,$i,0,$_; # insert the new value
shift @top; # remove the smallest value
last;
}
}
}
}