Also sprach Sam Holden:
Clearly it doesn't. Consider the simple case of a two element list (a,b).
There are two possible shuffle results: (a,b) and (b,a) each of which will
have equal probability of occuring in a fair shuffle.
Any sane implementation of sort will when asked to sort a two element list
do a single comparison, so for the above there are three cases:
(int(rand(3)) -1)==-1 -> (a,b)
(int(rand(3)) -1)==0 -> (a,b)
(int(rand(3)) -1)==1 -> (b,a)
Clearly the shuffle isn't fair since (a,b) is twice as likely. Of course
it might improve with a larger list, but it doesn't bode well when the
algorithm fails for such a simple case.
The naive solution to that would be disallowing '0' so that pairs are
always swapped.
However, this wont make the shuffling fair either. One problem (which
can be verified with some simple empirical tests), is that the fairness
depends on the algorithm used.
Taking this little program:
my @list = (1 .. 4);
my %perms;
shuffle() for 1 .. 10000;
while (my ($key, $val) = each %perms) {
print "$key => $val\n";
}
sub shuffle {
my @shuffled = sort { int(rand(2)) == 0 ? -1 : 1 } @list;
$perms{ join "-", @shuffled }++;
}
On perl5.8.4 with merge-sort, this yields a distribution ressembling:
2-1-4-3 => 620
1-2-4-3 => 606
3-2-1-4 => 313
1-4-2-3 => 331
2-4-3-1 => 300
1-3-4-2 => 330
2-1-3-4 => 583
4-1-2-3 => 320
1-2-3-4 => 678
3-4-2-1 => 619
2-3-4-1 => 313
3-1-4-2 => 301
4-3-1-2 => 646
2-3-1-4 => 325
4-3-2-1 => 627
1-3-2-4 => 317
4-2-1-3 => 296
1-4-3-2 => 323
2-4-1-3 => 289
3-2-4-1 => 303
3-4-1-2 => 609
4-1-3-2 => 332
4-2-3-1 => 303
3-1-2-4 => 316
You consistently get certain permutations that show up only around 300
times where others happen at least 600 times (2-1-4-3 is one of them).
Now running the same with 'use sort "_qsort"' gives a very different
picture. It's even more biased:
2-1-4-3 => 621
1-2-4-3 => 584
3-2-1-4 => 651
1-4-2-3 => 321
2-4-3-1 => 155
1-3-4-2 => 323
2-1-3-4 => 1242
4-1-2-3 => 299
1-2-3-4 => 1199
3-4-2-1 => 142
2-3-4-1 => 344
3-1-4-2 => 286
4-3-1-2 => 169
4-3-2-1 => 155
2-3-1-4 => 647
1-3-2-4 => 629
4-2-1-3 => 328
1-4-3-2 => 154
2-4-1-3 => 290
3-2-4-1 => 316
4-1-3-2 => 174
3-4-1-2 => 147
3-1-2-4 => 659
4-2-3-1 => 165
Lists 2-1-3-4 and 1-2-3-4 are always at least around 1200.
In a strict sense, this doesn't prove anything and the randomized sort
could still be fair. But really, it's not.
It might be possible to write a sort algorithm in a way that it could be
used for shuffling. Perl's implementations however weren't designed that
way. Especially perl's quicksort has been optimized to death and I
wouldn't be surprised if those optimizations contribute to this strong
bias towards certain permutations.
Having said that, I tried it with these two sort-routines:
sub bubble (&@) {
my ($cmp, @copy) = @_;
for my $i (reverse 1 .. $#copy) {
for my $j (1 .. $i) {
local ($a, $b) = @copy[$j-1, $j];
@copy[$j-1, $j] = @copy[$j, $j-1] if &$cmp == 1;
}
}
return @copy;
}
sub selection (&@) {
my ($cmp, @copy) = @_;
my $min;
for my $i (0 .. $#copy-1) {
$min = $i;
for my $j ($i+1 .. $#copy) {
local ($a, $b) = @copy[$j, $min];
$min = $j if &$cmp == -1;
}
@copy[$min, $i] = @copy[$i, $min];
}
return @copy;
}
in the hope that maybe those naive sort algorithms could be used (as
they tend to compare each element with all remaining ones). But I was
wrong. Selection-sort has a similar bias as quicksort has (this time
favoring 4-1-2-3 and 4-1-3-2). Bubble-sort is slightly better, but still
more biased than perl's built-in mergesort.
There are other interesting things to note. When allowing the compare
function to return -1, 1 and 0 (as opposed to only -1 and 1), selection
sort gets less biased, whereas bubble sort gets more biased (4-3-2-1
only shows up around ten times and 1-2-3-4 1800 times on average).
Without being a formal proof for anything, these observations show that
the shuffle-by-sort approach is probably very rotten and should best be
left alone.
Tassilo