Top 10 list algorithm

John Bokma · Oct 8, 2004

Bubble sort does insertion by swapping elements. Which is a pretty
aweful way to do insertion. Especially when insertion is supported
with very low (even hardware-level) optimizations.

swapping doesn't mean insertion, you can reuse the allocated memory.

Anno Siegel · Oct 10, 2004

Abigail said:
Anno Siegel ([email protected]) wrote on MMMMLVI September
MCMXCIII in <URL:\\ > Fatted ([email protected]) wrote on MMMMLV September MCMXCIII in
\\ > <URL:\\ > !! Is there a better (faster) way of implementing my top_sort function?
\\ > !! (Creating a top 10 list of the highest numbers from a large set).
\\ > !! Or did I get lucky?
\\ >
\\ >
\\ > #!/usr/bin/perl
\\ >
\\ > use strict;
\\ > use warnings;
\\ > no warnings qw /syntax/;
\\ >
\\ > my @heap = map {int rand 1_000_000} 1 .. 100_000;
\\ >
\\ > sub heapify;
\\ > sub heapify {
\\ > my $index = shift;
\\ >
\\ > my ($c1, $c2) = (2 * $index + 1, 2 * $index + 2);
\\ > my $max = $index;
\\ > $max = $c1 if $c1 < @heap && $heap [$c1] > $heap [$max];
\\ > $max = $c2 if $c2 < @heap && $heap [$c2] > $heap [$max];
\\ >
\\ > return if $max == $index;
\\ >
\\ > @heap [$index, $max] = @heap [$max, $index];
\\ > heapify $max;
\\ > }
\\ >
\\ >
\\ > for (my $i = int (@heap / 2); $i >= 0; $i --) {heapify $i}
\\ >
\\ > if (@heap) {
\\ > for (1 .. 10) {
\\ > print $heap [0], "\n";
\\ > my $tmp = pop @heap;
\\ > last unless @heap;
\\ > $heap [0] = $tmp;
\\ > heapify 0;
\\ > }
\\ > }
\\
\\ Well, you're heapifying the entire list, which is still an n log n
\\ operation. Might as well sort

No! It's not. Look carefully, and do the math. It's an O (n) operation.

Well, I'll be damned, it is. It doesn't look like it ought to be

I didn't do the math, I looked it up in _Introduction to Algorithms_
(Cormen, Leiserson, Rivest) which, I suppose, you have also consulted.
So your simple solution, beautifully presented as always, is also
optimal, at least asymptotically.

I still think that in practice it would pay to keep the heap size down
to k (the number of elements to be extracted). You'd still have to
*inspect* all the input, but the number of "heap operations" (heap
insertions and original (non-recursive) calls to heapify()) would be
much smaller, and operate on a smaller heap.

Anno

Shawn Corey · Oct 10, 2004

Anno said:
I still think that in practice it would pay to keep the heap size down
to k (the number of elements to be extracted). You'd still have to
*inspect* all the input, but the number of "heap operations" (heap
insertions and original (non-recursive) calls to heapify()) would be
much smaller, and operate on a smaller heap.

Anno

Unfortunately, no. I wrote some benchmarks just to see what happens in
practice. The subroutine heap_each() is commented out since it takes so
long to complete.

--- Shawn

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark;

# Un-comment for testing; do forget to un-comment
# the print statements in each suborutine.
# heap_once();
# heap_occasional();
# heap_each();
# sort_once();
# sort_occasional();
# sort_each();
# simple();
# exit;

timethese( 100,
{
simple => \&simple,
heap_once => \&heap_once,
heap_occasional => \&heap_occasional,
# heap_each => \&heap_each,
sort_once => \&sort_once,
sort_occasional => \&sort_occasional,
sort_each => \&sort_each,
},
);

sub heap_once {
my @top = ();

srand( 1 );
for my $i ( 1 .. 10_000 ){
my $num = rand();
push @top, $num;
my $i = $#top;
my $j = 0;
for(;

{
$j = int( $i / 2 );
if( $top[$j] < $top[$i] ){
( $top[$j] , $top[$i] ) = ( $top[$i] , $top[$j] );
}else{
last;
}
last if $j <= 0;
$i = $j;
}
}

my @list = ();
for my $i ( 0 .. 9 ){
push @list, $top[0];
my $j = 0;
while( $j*2+1 <= $#top ){
if( $j*2+2 <= $#top && $top[$j*2+1] < $top[$j*2+2] ){
$top[$j] = $top[$j*2+2];
$j = $j*2+2;
}else{
$top[$j] = $top[$j*2+1];
$j = $j*2+1;
}
}
$top[$j] = 0;
}

# print "@list\n";
}

sub heap_each {
my @top = ();

srand( 1 );
for my $i ( 1 .. 10_000 ){
my $num = rand();
push @top, $num;
my $i = $#top;
my $j = 0;
for(;

{
$j = int( $i / 2 );
if( $top[$j] < $top[$i] ){
( $top[$j] , $top[$i] ) = ( $top[$i] , $top[$j] );
}else{
last;
}
last if $j <= 0;
$i = $j;
}

my @list = ();
for my $i ( 0 .. 9 ){
push @list, $top[0];
my $j = 0;
while( $j*2+1 <= $#top ){
if( $j*2+2 <= $#top && $top[$j*2+1] < $top[$j*2+2] ){
$top[$j] = $top[$j*2+2];
$j = $j*2+2;
}else{
$top[$j] = $top[$j*2+1];
$j = $j*2+1;
}
}
$top[$j] = 0;
}
@top = @list;
}

# print "@top\n";
}

sub heap_occasional {
my @top = ();

srand( 1 );
for my $i ( 1 .. 10_000 ){
my $num = rand();
push @top, $num;
my $i = $#top;
my $j = 0;
for(;

{
$j = int( $i / 2 );
if( $top[$j] < $top[$i] ){
( $top[$j] , $top[$i] ) = ( $top[$i] , $top[$j] );
}else{
last;
}
last if $j <= 0;
$i = $j;
}

if( @top > 1_000 ){ # Change 1_000 to be as large as your memory
can handle
my @list = ();
for my $i ( 0 .. 9 ){
push @list, $top[0];
my $j = 0;
while( $j*2+1 <= $#top ){
if( $j*2+2 <= $#top && $top[$j*2+1] < $top[$j*2+2] ){
$top[$j] = $top[$j*2+2];
$j = $j*2+2;
}else{
$top[$j] = $top[$j*2+1];
$j = $j*2+1;
}
}
$top[$j] = 0;
}
@top = @list;
}
}

my @list = ();
for my $i ( 0 .. 9 ){
push @list, $top[0];
my $j = 0;
while( $j*2+1 <= $#top ){
if( $j*2+2 <= $#top && $top[$j*2+1] < $top[$j*2+2] ){
$top[$j] = $top[$j*2+2];
$j = $j*2+2;
}else{
$top[$j] = $top[$j*2+1];
$j = $j*2+1;
}
}
$top[$j] = 0;
}

# print "@list\n";
}

sub simple {
my @top = ( 0 ) x 10;

srand( 1 );
for my $i ( 1 .. 10_000 ){
my $num = rand();
my $i = -1;
for( $i = $#top; $i >= 0; $i -- ){
if( $top[$i] > $num ){
@top = ( @top[ 0 .. $i ], $num, @top[ $i + 1 .. $#top ] );
pop @top;
last;
}
}
if( $i < 0 ){
unshift @top, $num;
pop @top;
}
}

# print "@top\n";
}

sub sort_each {
my @top = ( 0 ) x 10;

srand( 1 );
for my $i ( 1 .. 10_000 ){
my $num = rand();
push @top, $num;
@top = sort { $b <=> $a } @top;
$#top = 9;
}

# print "@top\n";
}

sub sort_occasional {
my @top = ( 0 ) x 10;

srand( 1 );
for my $i ( 1 .. 10_000 ){
my $num = rand();
push @top, $num;
if( @top > 1_000 ){ # Change 1_000 to be as large as your memory
can handle
@top = sort { $b <=> $a } @top;
$#top = 9;
}
}
@top = sort { $b <=> $a } @top;
$#top = 9;

# print "@top\n";
}

sub sort_once {
my @top = ( 0 ) x 10;

srand( 1 );
for my $i ( 1 .. 10_000 ){
my $num = rand();
push @top, $num;
}
@top = sort { $b <=> $a } @top;
$#top = 9;

# print "@top\n";
}

__END__

Anno Siegel · Oct 11, 2004

Shawn Corey said:
Unfortunately, no. I wrote some benchmarks just to see what happens in
practice. The subroutine heap_each() is commented out since it takes so
long to complete.

Then there must be something wrong with the implementation. My
benchmark (see below) shows the "small-heap" solution I suggested
about 8 times faster than Abigail's "big-heap" solution.

Anno

The benchmark compares Abigail's top-ten extraction with a bottom-ten
extraction using a small heap. Otherwise, implementations of both
a maximum-heap and a minimum heap would have been needed. To make the
results comparable, I'm feeding Abigail's solution with the negatives
of the same numbers I'm using for mine.

#!/usr/local/bin/perl
use strict; use warnings; $| = 1;
use Vi::QuickFix;
use Benchmark;

use constant N => 100_000;
use constant SIZE => 10;
sub _heapify;
my @heap;
my $count;

our @raw = map int rand 1_000_000, 1 .. N;

bench: {
@heap = map -$_, @raw;
timethis 1, \ &abigail;
print "Heap operations: $count\n";
timethis 1, \ &anno;
print "Heap operations: $count\n";
}
exit;

sub anno {
@heap = ();
$count = 0;
for ( @raw ) {
if ( @heap < SIZE or $_ < $heap[ 0] ) {
insert( $_);
extract_max() if @heap > SIZE;
}
}
print extract_max(), "\n" while @heap;
}

sub abigail {
$count = 0;
for (my $i = int (@heap / 2); $i >= 0; $i --) {_heapify $i; $count ++}
print extract_max(), "\n" for 1 .. 10;
}

###############################################################

sub count { print "count: $count\n" }

sub extract_max {
return unless @heap;
my $max = shift @heap;
if ( @heap > 1 ) {
unshift @heap, pop @heap;
$count ++;
_heapify( 0);
}
$max;
}

sub insert {
my $x = shift;
my $i = @heap;
$count ++;
while ( $i > 0 ) {
my $parent = int( ($i - 1)/2);
last if $heap[ $parent] >= $x;
$heap[ $i] = $heap[ $parent];
$i = $parent;
}
$heap[ $i] = $x;
}

sub _heapify {
my $i = shift;
my ( $l, $r) = ( 2*$i + 1, 2*$i + 2);
my $max = $i;
$max = $l if $l < @heap and $heap[ $l] > $heap[ $max];
$max = $r if $r < @heap and $heap[ $r] > $heap[ $max];
if ( $max != $i ) {
@heap[ $i, $max] = @heap[ $max, $i];
_heapify( $max);
}
}

Shawn Corey · Oct 11, 2004

Anno said:
Then there must be something wrong with the implementation. My
benchmark (see below) shows the "small-heap" solution I suggested
about 8 times faster than Abigail's "big-heap" solution.

Anno

The benchmark compares Abigail's top-ten extraction with a bottom-ten
extraction using a small heap. Otherwise, implementations of both
a maximum-heap and a minimum heap would have been needed. To make the
results comparable, I'm feeding Abigail's solution with the negatives
of the same numbers I'm using for mine.

The reason why your routine is so much faster is that it does not do
heap operations for each item. heap_each() does (that's what each
means), and is therefore very slow. However your routine has inspired a
change in my simple routine that blows the socks off everything else.

Also I corrected a bug in it that is not a bug. At least it did what I
wanted and didn't complain about it. I leave it as an exercise to the
reader to find the change.

--- Shawn

use constant SIZE => 10;
my $seed = time;

sub simple {
my @top = ( 0 ) x SIZE;

srand( $seed );
for my $i ( 1 .. 10_000 ){
my $num = rand();
next if $num < $top[-1];
my $j = $#top;
for( $j = $#top; $j >= 0; $j -- ){
if( $top[$j] > $num ){
@top = ( @top[ 0 .. $j ], $num, @top[ $j + 1 .. $#top ] );
pop @top;
last;
}
}
if( $j < 0 ){
unshift @top, $num;
pop @top;
}
}

# print "@top\n";
}

John W. Krahn · Oct 14, 2004

Shawn said:
The reason why your routine is so much faster is that it does not do
heap operations for each item. heap_each() does (that's what each
means), and is therefore very slow. However your routine has inspired a
change in my simple routine that blows the socks off everything else.

Also I corrected a bug in it that is not a bug. At least it did what I
wanted and didn't complain about it. I leave it as an exercise to the
reader to find the change.

use constant SIZE => 10;
my $seed = time;

sub simple {
my @top = ( 0 ) x SIZE;

srand( $seed );
for my $i ( 1 .. 10_000 ){
my $num = rand();
next if $num < $top[-1];
my $j = $#top;
for( $j = $#top; $j >= 0; $j -- ){
if( $top[$j] > $num ){
@top = ( @top[ 0 .. $j ], $num, @top[ $j + 1 .. $#top ] );

Why not use splice?

splice @top, $j + 1, 0, $num;

pop @top;
last;
}
}
if( $j < 0 ){
unshift @top, $num;
pop @top;
}
}

# print "@top\n";
}

John

Shawn Corey · Oct 14, 2004

John said:
Why not use splice?

splice @top, $j + 1, 0, $num;

Not enough coffee? Too early in the morning? Failed to read the
documentation?

--- Shawn
(That's my story and I sticking to it!)

John W. Krahn · Oct 14, 2004

Shawn said:
Not enough coffee? Too early in the morning? Failed to read the
documentation?

Which documentation?

John

Shawn Corey · Oct 15, 2004

Which documentation?

perldoc -f splice

--- Shawn

John W. Krahn · Oct 16, 2004

Shawn said:
perldoc -f splice

What do you think I didn't read correctly?

John

Eugene Mikheyev · Oct 18, 2004

What do you think I didn't read correctly?
I think he confessed about him having read incorrectly

Algorithm	1	Dec 15, 2019
Top 10 players minheap sort - need help	1	Oct 3, 2022
Python List Comprehension Error: Unexpected Output	1	Aug 28, 2023
Anyone familiar with WP Bakery and/or Visual Composer?	4	Jan 27, 2023
Range / empty list issues??	1	Dec 11, 2023
Trying to get the average value of the elements, please help ! JavaScript	3	Dec 13, 2022
Cash register challenge	2	Mar 8, 2022
Java	1	Jan 18, 2022

Top 10 list algorithm

John Bokma

Anno Siegel

Shawn Corey

Anno Siegel

Shawn Corey

John W. Krahn

Shawn Corey

John W. Krahn

Shawn Corey

John W. Krahn

Eugene Mikheyev

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads