[PerlMonks] About List::Util's pure Perl shuffle()

M

Michele Dondi

This is a question of mine I asked in PerlMonks
(<http://perlmonks.org>) that turned into a very interesting
discussion, so interestin IMHO that I feel like reporting it here:

http://perlmonks.org/?node_id=625977

I would like to reproduce the whole thread, but I direct anyone who
could be interested there. For completeness I will adapt to pure text
the original post and paste it hereafter.

---------------------------------------------------------------------

In a recent discussion (<in ctt,
PerlTeX was mentioned (and I'm also mentioning it here because it is
not that widely known and IMHO it would deserve being) in relation to
shuffling, at which point in reply to a posted piece of code I wrote
about the

perldoc -q shuffle

FAQ entry, talking in particular about both Fisher-Yates and
List::Util's shuffle(). Now, I happened to check the source code for
the latter module, and since it has pure Perl code for the given
function, I noticed that in particular it is as follows:

sub shuffle (@) {
my @a=\(@_);
my $n;
my $i=@_;
map {
$n = rand($i--);
(${$a[$n]}, $a[$n] = $a[$i])[0];
} @_;
}

Now, at first sight it is a nice piece of Obfu, ain't it? Well, then
if you look at it, it's easy to see how it works and in particular
that it is still Fisher-Yates. But of course you have to think about
it for a while... and I wondered why it is like that...

To be fair, I think that taking references in the first place is to
avoid duplicating data in memory that could be heavy in memory
usage... but then I also thought that all that playing with references
would impose a performance penalty and -for once- I suppose that
performance does matter. So I tried the following Benchmark, comparing
a naive and fairly readable implementation of the algorithm with
List::Util's:

#!/usr/bin/perl

use strict;
use warnings;
use Benchmark qw/:all :hireswallclock/;

sub naive (@) {
my @l=@_;
for (reverse 1..$#l) {
my $r=int rand($_+1);
@l[$_,$r]=@l[$r,$_];
}
@l;
}

sub listutil (@) {
my @a=\(@_);
my $n;
my $i=@_;
map {
$n = rand($i--);
(${$a[$n]}, $a[$n] = $a[$i])[0];
} @_;
}

cmpthese -60, { map { $_ => "$_ 1..1000" } qw/naive listutil/ };

__END__

The results are as follows:

C:\temp>perl lus.pl
Rate naive listutil
naive 588/s -- -14%
listutil 684/s 16% --

So that pretty much may answer my question as to why the sub is
implemented like that... but then this raises the question as to how
could one come up with such an idea, because I'm sure I wouldn't
have...

Any comments?
 
M

Michele Dondi

I would like to reproduce the whole thread, but I direct anyone who
could be interested there. For completeness I will adapt to pure text
the original post and paste it hereafter.

Oh, and at least the first reply
(<http://perlmonks.org/?node_id=626004>), by BrowserUk
(<http://perlmonks.org/?node=BrowserUk>).

---------------------------------------------------------------------

If you accept the principle that code in modules is destined to be
reused in lots of situations, including performance critical ones, and
should therefore do it's utmost to be as efficient as the author is
comfortable maintaining, then List::Util's use of aliasing to sqeeze a
little more performance seems entirely reasonable.

That said, they missed a trick. Using a block form map creates a
additional level of scope that they aren't using and slows
performance.

They can squeeze another 30%, and almost 50% over the 'naive'
implementation, by avoiding it:

sub buk (@) {
my @a = \( @_ );
my $n;
my $i = @_;
map+( $n=rand($i--), ${ $a[ $n ] }, $a[ $n ]=$a[ $i ] )[ 1 ],
@_;
}

cmpthese -1, { map { $_ => "$_ 1..1000" } qw/naive listutil buk/ };

__END__
C:\test>junk
Rate naive listutil buk
naive 520/s -- -13% -32%
listutil 597/s 15% -- -22%
buk 769/s 48% 29% --
 
P

Peter J. Holzer

#!/usr/bin/perl

use strict;
use warnings;
use Benchmark qw/:all :hireswallclock/;

sub naive (@) {
my @l=@_;
for (reverse 1..$#l) {
my $r=int rand($_+1);
@l[$_,$r]=@l[$r,$_];
}
@l;
}

sub listutil (@) {
my @a=\(@_);
my $n;
my $i=@_;
map {
$n = rand($i--);
(${$a[$n]}, $a[$n] = $a[$i])[0];
} @_;
}

cmpthese -60, { map { $_ => "$_ 1..1000" } qw/naive listutil/ };

__END__

The results are as follows:

C:\temp>perl lus.pl
Rate naive listutil
naive 588/s -- -14%
listutil 684/s 16% --

The main performance improvement comes from avoiding hash slices. Just
replacing @l[$_,$r]=@l[$r,$_]; in your code with a conventional swap via
a temporary variable:

my $x = $l[$r];
$l[$r] = $l[$_];
$l[$_] = $x;

gives me a 10 % performance boost.

The rest of the improvement comes from not swapping in-place at all, but
building a new array from scratch:

sub naive2 (@) {
my @l=@_;
my @l2;
for (reverse 0..$#l) {
my $r=int rand($_+1);
push @l2, $l[$r];
$l[$r] = $l[$_];
}
@l2;
}

is just as fast as the map (and it does the same thing, really).

Fascinating. I would have expected both changes to make the performance
worse, not better. Just shows again that intuition is an unreliable
guide on perl performance.

hp
 
M

Michele Dondi

Fascinating. I would have expected both changes to make the performance
worse, not better. Just shows again that intuition is an unreliable
guide on perl performance.

For more (possibly) counter-intuitive stuff, please see the whole PM
thread... feel free to report here your own experiences and ideas
about what you see there!


Michele
 
U

Uri Guttman

PJH> sub naive2 (@) {
PJH> my @l=@_;
PJH> my @l2;
PJH> for (reverse 0..$#l) {
PJH> my $r=int rand($_+1);
PJH> push @l2, $l[$r];
PJH> $l[$r] = $l[$_];
PJH> }
PJH> @l2;
PJH> }

a minor optimization would be to drop the int() call as any number can
be used as an array index as it will do int for you. it may not be
faster or by much if any but it will be less perl code and that is
usually a speed win.

also if your data is large things, the all the extra copying will cost
you vs copying refs. and in the above code the copy of @_ isn't needed
since you can rearrange @_ safely.

counting down from $#l should be faster than a reverse on a list range
especially for longer arrays.

indexing into arrays is slow in perl. so maybe this change would be
faster too (factoring out the $l[$r]):

my $r= \$l[rand($_+1)] ;
push @l2, $$r;
$$r = $l[$_];

i don't have the time to play with all of these and benchmark them so i
leave them to the reader. :)

uri
 
M

Michele Dondi

i don't have the time to play with all of these and benchmark them so i
leave them to the reader. :)

I see you all are too lazy to read the whole thread @ PM. To put it
briefly probably the fastest subroutine that does a shuffle on an
input list and returns a list (there are faster alternatives accepting
an arrayref as input - still not shuffling in place) is

sub blokhead (@) {
my @a = (0 .. $#_);
my $i = @_;
my $n;
map+( $n=rand($i--), $_[$a[$n]], $a[$n]=$a[$i] )[ 1 ], @_;
}

The full code for the benchmarks run by BrowserUk is:

#!/usr/bin/perl -slw
use strict;
use List::Util qw[ shuffle ];
use Benchmark qw/:all/;

sub naive (@) {
my @l=@_;
for (reverse 1..$#l) {
my $r=int rand($_+1);
@l[$_,$r]=@l[$r,$_];
}
@l;
}

sub listutil (@) {
my @a=\(@_);
my $n;
my $i=@_;
map {
$n = rand($i--);
(${$a[$n]}, $a[$n] = $a[$i])[0];
} @_;
}

sub buk (@) {
my @a = \( @_ );
my $n;
my $i = @_;
map+( $n = rand($i--), ${ $a[ $n ] }, $a[ $n ] = $a[ $i ] )[ 1
],
+@_;
}

sub bukNew ($) {
my( $ref ) = @_;
my @x = 0 .. $#$ref;
@{ $ref }[ map splice( @x, rand @x, 1 ), @x ];
}

#my %stats;
#++$stats{ join' ', bukNew( ['A'..'D'] ) } for 1 .. 1e4;
#print "$_ => $stats{ $_ }" for sort keys %stats;

sub blokhead (@) {
my @a = (0 .. $#_);
my $i = @_;
my $n;
map+( $n=rand($i--), $_[$a[$n]], $a[$n]=$a[$i] )[ 1 ], @_;
}

sub blokhead_ref ($) {
my( $ref ) = @_;
my @a = (0 .. $#$ref);
my $i = @$ref;
my $n;
map+( $n=rand($i--), $ref->[$a[$n]], $a[$n]=$a[$i] )[ 1 ], @a;
}

#%stats = ();
#++$stats{ join' ', blokhead_ref( ['A'..'D'] ) } for 1 .. 1e4;
#print "$_ => $stats{ $_ }" for sort keys %stats;

for my $c ( map{ 10**$_ } 1..3 ) {
for my $l ( map{ 10**$_ } 0.1, 1, 3 ) {
print "\n$c strings length $l";
our @test = map $_ x $l, 1..$c;
cmpthese -3, {
bukNew => q[ bukNew( \@test ) ],
blokhead => q[ blokhead_ref( \@test ) ],
map {
$_ => "$_ \@test"
} qw/naive listutil blokhead buk/
};
}
}


Michele
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top