How to randomize foreach

J

Josh

I have an array that I need to cycle through once, but in a random
order. For now I'm dumping the variables into a hash, pulling a random
number, retrieving that value, using it, deleting it, and repeating
the process until there are no more members in the hash. This
obviously won't be efficient with a large set. Seems there must be a
better way.

Here's what I'm currently doing:

%sections = (
'0'=>'x',
'1'=>'t',
'2'=>'m',
'3'=>'v'
);
$count_sect = (keys %sections); # counts the sections in the keys

do {
$rand_sect = int(rand($count_sect)); # generates a random integer <
$count_sect
print 'key:'.$rand_sect."\n";

if (exists $sections{$rand_sect})
{ $sect=$sections{$rand_sect};
print "SECTION:".$sect."\n"; # do things with the returned section
delete $sections{$rand_sect};
}
else
{ print '-key non existent'."\n";}

} until (!keys %sections);


Anyone have suggestions for improvement? Thanks.
 
G

Gunnar Hjalmarsson

Josh said:
I have an array that I need to cycle through once, but in a random
order. For now I'm dumping the variables into a hash, pulling a
random number, retrieving that value, using it, deleting it, and
repeating the process until there are no more members in the hash.
This obviously won't be efficient with a large set. Seems there
must be a better way.

This is at least significantly less code:

my @arr = qw/x t m v/;
print splice(@arr, rand $_, 1) for reverse 1 .. @arr;

As Brian suggested, you should also read the applicable FAQ entry,
especially if your array is large.
 
A

Anno Siegel

Gunnar Hjalmarsson said:
This is at least significantly less code:

my @arr = qw/x t m v/;
print splice(@arr, rand $_, 1) for reverse 1 .. @arr;

Or even

print splice(@arr, rand @arr, 1) while @arr;
As Brian suggested, you should also read the applicable FAQ entry,
especially if your array is large.

There is also List::Util::shuffle, ready to use.

Anno
 
G

Gunnar Hjalmarsson

Abigail said:
Let's not. Splicing out elements one-by-one of an array is very
inefficient.

Yeah, that's what the FAQ says, too. OTOH, I measured the speed with
help of the Benchmark module, and code that created the array

my @arr = 1 .. 5000;

and then spliced it randomly using those inefficient methods, run
about 15 times per second. So, if you don't have a really big array,
or are about to run it very frequently, is there anything wrong with
keeping it simple?
Shuffling can be done in linear time, while the splice
method takes time quadratic in the length of the array.

I measured with the List::Util::shuffle() function, too, and it run
about 70 times per second. However, when I included loading of the
module in the benchmark, the overall speed was reduced to 12 times per
second, i.e. slightly slower than the slice() method...

use Benchmark 'cmpthese';
my %oldINC = %INC;

cmpthese -5, {
'splice()' => sub {
my @arr = 1 .. 5000;
my @shuffled;
push @shuffled, splice(@arr, rand @arr, 1) while @arr;
},
'shuffle()' => sub {
my @arr = 1 .. 5000;
%INC = %oldINC;
require List::Util;
my @shuffled = List::Util::shuffle(@arr);
},
};

Output including module loading:

Rate shuffle() splice()
shuffle() 12.0/s -- -23%
splice() 15.5/s 30% --
 
M

Matthew Braid

Gunnar said:
I measured with the List::Util::shuffle() function, too, and it run
about 70 times per second. However, when I included loading of the
module in the benchmark, the overall speed was reduced to 12 times per
second, i.e. slightly slower than the slice() method...

use Benchmark 'cmpthese';
my %oldINC = %INC;

cmpthese -5, {
'splice()' => sub {
my @arr = 1 .. 5000;
my @shuffled;
push @shuffled, splice(@arr, rand @arr, 1) while @arr;
},
'shuffle()' => sub {
my @arr = 1 .. 5000;
%INC = %oldINC;
require List::Util;
my @shuffled = List::Util::shuffle(@arr);
},
};

Output including module loading:

Rate shuffle() splice()
shuffle() 12.0/s -- -23%
splice() 15.5/s 30% --

Odd. I just tried the same thing (well, I added a third option - sort
{int(rand(3)) - 1} @arr - to double check:

use Benchmark 'cmpthese';
my %oldINC = %INC;
cmpthese -5, {
'splice()' => sub {
my @arr = 1..5000;
my @shuffled;
push @shuffled, splice(@arr, rand @arr, 1) while @arr;
},
'shuffle()' => sub {
my @arr = 1..5000;
%INC = %oldINC;
require List::Util;
my @shuffled = List::Util::shuffle(@arr);
},
'rand()' => sub {
my @arr = 1..5000;
my @shuffled = sort {int(rand(3)) - 1} @arr;
}
};

and I got:

Rate rand() splice() shuffle()
rand() 23.1/s -- -47% -88%
splice() 43.3/s 87% -- -77%
shuffle() 189/s 719% 338% --

Where shuffle was the clear winner by a mile.... Normally I'd say YMMV, but this
seems like a slightly over the top difference. I'm using an slightly old perl
(5.8.1) - maybe something is different between our versions?

MB
 
J

jdog

Thanks for the feedback. Right now the array is fairly small so I'm
using the most convenient option (List::Util::Shuffle). If the script
becomes slow when the array grows I'll do some benchmarking.

Josh
 
G

Gunnar Hjalmarsson

Matthew said:
Odd. I just tried the same thing (well, I added a third option -
sort {int(rand(3)) - 1} @arr - to double check:

and I got:

Rate rand() splice() shuffle()
rand() 23.1/s -- -47% -88%
splice() 43.3/s 87% -- -77%
shuffle() 189/s 719% 338% --

Where shuffle was the clear winner by a mile.... Normally I'd say
YMMV, but this seems like a slightly over the top difference. I'm
using an slightly old perl (5.8.1) - maybe something is different
between our versions?

Well, I ran it using 5.8.0 on Windows 98. Just tried it with 5.8.1 on
Linux, and then the results were similar to yours. Can the explanation
possibly be that loading a compiled module is much slower on Windows?
 
S

Sam Holden

Matthew Braid ([email protected]) wrote on MMMMX September MCMXCIII in
<URL:''
'' Odd. I just tried the same thing (well, I added a third option - sort
'' {int(rand(3)) - 1} @arr - to double check:

I've seen variations of the third option before, being presented as a
way to shuffle a list. But I've yet to see any proof that this gives a
fair shuffle (fair being every possible permutation of the list has the
same chance of being produced, given a truely random 'rand()' function).

Clearly it doesn't. Consider the simple case of a two element list (a,b).
There are two possible shuffle results: (a,b) and (b,a) each of which will
have equal probability of occuring in a fair shuffle.

Any sane implementation of sort will when asked to sort a two element list
do a single comparison, so for the above there are three cases:

(int(rand(3)) -1)==-1 -> (a,b)
(int(rand(3)) -1)==0 -> (a,b)
(int(rand(3)) -1)==1 -> (b,a)

Clearly the shuffle isn't fair since (a,b) is twice as likely. Of course
it might improve with a larger list, but it doesn't bode well when the
algorithm fails for such a simple case.

Now the middle case, it could be argued, could result in (b,a) if the
sort was not stable - but even so if it did it always would and hence
(b,a) would be favoured and the problem remains. Well, for a sane
sort implementation anyway.
 
T

Tassilo v. Parseval

Also sprach Sam Holden:
Clearly it doesn't. Consider the simple case of a two element list (a,b).
There are two possible shuffle results: (a,b) and (b,a) each of which will
have equal probability of occuring in a fair shuffle.

Any sane implementation of sort will when asked to sort a two element list
do a single comparison, so for the above there are three cases:

(int(rand(3)) -1)==-1 -> (a,b)
(int(rand(3)) -1)==0 -> (a,b)
(int(rand(3)) -1)==1 -> (b,a)

Clearly the shuffle isn't fair since (a,b) is twice as likely. Of course
it might improve with a larger list, but it doesn't bode well when the
algorithm fails for such a simple case.

The naive solution to that would be disallowing '0' so that pairs are
always swapped.

However, this wont make the shuffling fair either. One problem (which
can be verified with some simple empirical tests), is that the fairness
depends on the algorithm used.

Taking this little program:

my @list = (1 .. 4);
my %perms;

shuffle() for 1 .. 10000;

while (my ($key, $val) = each %perms) {
print "$key => $val\n";
}

sub shuffle {
my @shuffled = sort { int(rand(2)) == 0 ? -1 : 1 } @list;
$perms{ join "-", @shuffled }++;
}

On perl5.8.4 with merge-sort, this yields a distribution ressembling:

2-1-4-3 => 620
1-2-4-3 => 606
3-2-1-4 => 313
1-4-2-3 => 331
2-4-3-1 => 300
1-3-4-2 => 330
2-1-3-4 => 583
4-1-2-3 => 320
1-2-3-4 => 678
3-4-2-1 => 619
2-3-4-1 => 313
3-1-4-2 => 301
4-3-1-2 => 646
2-3-1-4 => 325
4-3-2-1 => 627
1-3-2-4 => 317
4-2-1-3 => 296
1-4-3-2 => 323
2-4-1-3 => 289
3-2-4-1 => 303
3-4-1-2 => 609
4-1-3-2 => 332
4-2-3-1 => 303
3-1-2-4 => 316

You consistently get certain permutations that show up only around 300
times where others happen at least 600 times (2-1-4-3 is one of them).

Now running the same with 'use sort "_qsort"' gives a very different
picture. It's even more biased:

2-1-4-3 => 621
1-2-4-3 => 584
3-2-1-4 => 651
1-4-2-3 => 321
2-4-3-1 => 155
1-3-4-2 => 323
2-1-3-4 => 1242
4-1-2-3 => 299
1-2-3-4 => 1199
3-4-2-1 => 142
2-3-4-1 => 344
3-1-4-2 => 286
4-3-1-2 => 169
4-3-2-1 => 155
2-3-1-4 => 647
1-3-2-4 => 629
4-2-1-3 => 328
1-4-3-2 => 154
2-4-1-3 => 290
3-2-4-1 => 316
4-1-3-2 => 174
3-4-1-2 => 147
3-1-2-4 => 659
4-2-3-1 => 165

Lists 2-1-3-4 and 1-2-3-4 are always at least around 1200.

In a strict sense, this doesn't prove anything and the randomized sort
could still be fair. But really, it's not.

It might be possible to write a sort algorithm in a way that it could be
used for shuffling. Perl's implementations however weren't designed that
way. Especially perl's quicksort has been optimized to death and I
wouldn't be surprised if those optimizations contribute to this strong
bias towards certain permutations.

Having said that, I tried it with these two sort-routines:

sub bubble (&@) {
my ($cmp, @copy) = @_;
for my $i (reverse 1 .. $#copy) {
for my $j (1 .. $i) {
local ($a, $b) = @copy[$j-1, $j];
@copy[$j-1, $j] = @copy[$j, $j-1] if &$cmp == 1;
}
}
return @copy;
}

sub selection (&@) {
my ($cmp, @copy) = @_;
my $min;
for my $i (0 .. $#copy-1) {
$min = $i;
for my $j ($i+1 .. $#copy) {
local ($a, $b) = @copy[$j, $min];
$min = $j if &$cmp == -1;
}
@copy[$min, $i] = @copy[$i, $min];
}
return @copy;
}

in the hope that maybe those naive sort algorithms could be used (as
they tend to compare each element with all remaining ones). But I was
wrong. Selection-sort has a similar bias as quicksort has (this time
favoring 4-1-2-3 and 4-1-3-2). Bubble-sort is slightly better, but still
more biased than perl's built-in mergesort.

There are other interesting things to note. When allowing the compare
function to return -1, 1 and 0 (as opposed to only -1 and 1), selection
sort gets less biased, whereas bubble sort gets more biased (4-3-2-1
only shows up around ten times and 1-2-3-4 1800 times on average).

Without being a formal proof for anything, these observations show that
the shuffle-by-sort approach is probably very rotten and should best be
left alone.

Tassilo
 
A

Anno Siegel

Abigail said:
Anno Siegel ([email protected]) wrote on MMMMIX September
MCMXCIII in <URL:%% > Josh wrote:
%% > > I have an array that I need to cycle through once, but in a random
%% > > order. For now I'm dumping the variables into a hash, pulling a
%% > > random number, retrieving that value, using it, deleting it, and
%% > > repeating the process until there are no more members in the hash.
%% > > This obviously won't be efficient with a large set. Seems there
%% > > must be a better way.
%% >
%% > This is at least significantly less code:
%% >
%% > my @arr = qw/x t m v/;
%% > print splice(@arr, rand $_, 1) for reverse 1 .. @arr;
%%
%% Or even
%%
%% print splice(@arr, rand @arr, 1) while @arr;

Let's not. Splicing out elements one-by-one of an array is very
inefficient. Shuffling can be done in linear time, while the splice
method takes time quadratic in the length of the array.

True, though the quadratic term has a small factor. One time-linear
shuffler swaps list elements instead of extracting them. That saves
the time splice needs to shift the array tight:

sub swapping {
for ( reverse 1 .. $#_ ) {
my $r = rand( 1 + $_);
@_[ $r, $_] = @_[ $_, $r];
}
@_;
}

Benchmarking this against

sub splicing {
map splice( @_, rand $_, 1), reverse 1 .. @_;
}

shows splicing() about twice as fast for arrays shorter than 1000.
The break-even point is with lists of length 12_000; from then on
swapping() wins out by an increasing margin. At 30_000, swapping
is twice as fast as splicing. On my machine.

That leaves quite a few applications (such as card shuffling) where
the splice solution is practical.
It's even worse than suggesting to use Bubblesort to sort an array.

Even Bubblesort has found respectable applications. It is fast for small
lists that are almost in order and has been used to speed up another
sort algorithm. I think the example is documented in the _Programming
Pearls_ series, though the example I find [1] uses insertion sort (another
n**2 sorting algorithm).

Anno

[1] Bentley, _Programming Pearls_, p. 112f
 
G

Gunnar Hjalmarsson

Anno said:
One time-linear shuffler swaps list elements instead of extracting
them. That saves the time splice needs to shift the array tight:

sub swapping {
for ( reverse 1 .. $#_ ) {
my $r = rand( 1 + $_);
@_[ $r, $_] = @_[ $_, $r];
}
@_;
}

Benchmarking this against

sub splicing {
map splice( @_, rand $_, 1), reverse 1 .. @_;
}

shows splicing() about twice as fast for arrays shorter than 1000.
The break-even point is with lists of length 12_000; from then on
swapping() wins out by an increasing margin. At 30_000, swapping is
twice as fast as splicing. On my machine.

The benchmark I posted showed that the execution time of
List::Util::shuffle() is much faster than that. Is the explanation,
then, that I was actually comparing apples and oranges, since
List::Util is a compiled module?
 
A

Anno Siegel

Gunnar Hjalmarsson said:
Anno said:
One time-linear shuffler swaps list elements instead of extracting
them. That saves the time splice needs to shift the array tight:

sub swapping {
for ( reverse 1 .. $#_ ) {
my $r = rand( 1 + $_);
@_[ $r, $_] = @_[ $_, $r];
}
@_;
}

Benchmarking this against

sub splicing {
map splice( @_, rand $_, 1), reverse 1 .. @_;
}

shows splicing() about twice as fast for arrays shorter than 1000.
The break-even point is with lists of length 12_000; from then on
swapping() wins out by an increasing margin. At 30_000, swapping is
twice as fast as splicing. On my machine.

The benchmark I posted showed that the execution time of
List::Util::shuffle() is much faster than that. Is the explanation,
then, that I was actually comparing apples and oranges, since
List::Util is a compiled module?

It is. It also contains a pure Perl implementation, but by default it
loads an XS version. That's why I wanted to add a benchmark against
a Perl implementation of the in-place shuffle.

Anno
 
A

Anno Siegel

Tassilo v. Parseval said:
Also sprach Sam Holden:


The naive solution to that would be disallowing '0' so that pairs are
always swapped.

However, this wont make the shuffling fair either. One problem (which
can be verified with some simple empirical tests), is that the fairness
depends on the algorithm used.

Taking this little program:

[interesting results snipped]

One problem with this approach is that it violates the assumptions of
the underlying algorithm. Sort procedures assume that comparisons are
consistent with a linear ordering of the list elements. In particular,
they assume that comparing the same elements twice will have the same
outcome, but also transitivity of the order relation and more. Who
is to say what a sort algorithm is going to do? It may not even halt,
for instance if the random decisions indicate a cyclic ordering.

Nor are any predictions on running time valid, so the expectation
that it runs in n*log n time may not be true. Since that is worse
than the linear time of an in-place shuffle, there isn't much to
recommend the "random sort" at all.

Anno
 
S

Scott W Gifford

(e-mail address removed)-berlin.de (Anno Siegel) writes:

[...]
One problem with this approach is that it violates the assumptions of
the underlying algorithm. Sort procedures assume that comparisons are
consistent with a linear ordering of the list elements. In particular,
they assume that comparing the same elements twice will have the same
outcome, but also transitivity of the order relation and more. Who
is to say what a sort algorithm is going to do? It may not even halt,
for instance if the random decisions indicate a cyclic ordering.

Indeed, the documentation for Perl's sort function says:

The comparison function is required to behave. If it returns
inconsistent results (sometimes saying $x[1] is less than $x[2]
and sometimes saying the opposite, for example) the results are
not well-defined.

-----ScottG.
 
A

Anno Siegel

Abigail said:
Anno Siegel ([email protected]) wrote on MMMMX September
MCMXCIII in <URL:|| > Anno Siegel wrote:
|| > > One time-linear shuffler swaps list elements instead of extracting
|| > > them. That saves the time splice needs to shift the array tight:
|| > >
|| > > sub swapping {
|| > > for ( reverse 1 .. $#_ ) {
|| > > my $r = rand( 1 + $_);
|| > > @_[ $r, $_] = @_[ $_, $r];
|| > > }
|| > > @_;
|| > > }
|| > >
|| > > Benchmarking this against
|| > >
|| > > sub splicing {
|| > > map splice( @_, rand $_, 1), reverse 1 .. @_;
|| > > }
|| > >
|| > > shows splicing() about twice as fast for arrays shorter than 1000.
|| > > The break-even point is with lists of length 12_000; from then on
|| > > swapping() wins out by an increasing margin. At 30_000, swapping is
|| > > twice as fast as splicing. On my machine.
|| >
|| > The benchmark I posted showed that the execution time of
|| > List::Util::shuffle() is much faster than that. Is the explanation,
|| > then, that I was actually comparing apples and oranges, since
|| > List::Util is a compiled module?
||
|| It is. It also contains a pure Perl implementation, but by default it
|| loads an XS version. That's why I wanted to add a benchmark against
|| a Perl implementation of the in-place shuffle.


So, it's fair to compare a splice who is doing its work in C with a
shuffle written in pure Perl?

I didn't want to imply any unfairness, just to add a data point.
I'd say comparing the XS shuffle with the C splice is more fair.

Replacing n calls to a C-level function by a single one is fair?

We know that the overhead swamps out a lot of what may be happening
low level, and these benchmarks prove it again. So setting one
swap operation against one call to splice seems fair enough from
that point of view.

Anno
 
M

Matthew Braid

Abigail said:
Matthew Braid ([email protected]) wrote on MMMMX September MCMXCIII in
<URL:''
'' Odd. I just tried the same thing (well, I added a third option - sort
'' {int(rand(3)) - 1} @arr - to double check:

I've seen variations of the third option before, being presented as a
way to shuffle a list. But I've yet to see any proof that this gives a
fair shuffle (fair being every possible permutation of the list has the
same chance of being produced, given a truely random 'rand()' function).


Abigail

I only included the rand() method out of curiousity (I knew it was gonna be
slow, I just put it there as a comparison). I'd never actually _use_ that method
in my code :)

MB
 
B

Bart Lateur

Matthew said:
Odd. I just tried the same thing (well, I added a third option - sort
{int(rand(3)) - 1} @arr - to double check:

I've always heard that using an inconsequent sorting function in perl,
like this one, is a recipe for a disaster.

You could just generate a random number per entry, and sort according to
that list. Like:

@sort = map rand, 0 .. $#item;
@shuffled = @item[sort { $sort[$a] <=> $sort[$b] } 0 .. $#item];
 
A

Anno Siegel

Bart Lateur said:
Matthew said:
Odd. I just tried the same thing (well, I added a third option - sort
{int(rand(3)) - 1} @arr - to double check:

I've always heard that using an inconsequent sorting function in perl,
like this one, is a recipe for a disaster.

You could just generate a random number per entry, and sort according to
that list. Like:

@sort = map rand, 0 .. $#item;
@shuffled = @item[sort { $sort[$a] <=> $sort[$b] } 0 .. $#item];

It may still be hard to prove that this generates an unbiased sequence.
There will usually be a few duplicates among the generated numbers.
For these, the sequence is determined by the sort algorithm, not by
the random generator.

Anno
 
B

Bart Lateur

Anno said:
There will usually be a few duplicates among the generated numbers.

Not *exact* duplicates, in the numerical sense (though perhaps it
looksthat way when stringified). Otherwise, you'd have reached the end
of the loop, and start getting the exact same sequence again.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top