How to randomize foreach

Discussion in 'Perl Misc' started by Josh, Aug 22, 2004.

  1. Josh

    Josh Guest

    I have an array that I need to cycle through once, but in a random
    order. For now I'm dumping the variables into a hash, pulling a random
    number, retrieving that value, using it, deleting it, and repeating
    the process until there are no more members in the hash. This
    obviously won't be efficient with a large set. Seems there must be a
    better way.

    Here's what I'm currently doing:

    %sections = (
    '0'=>'x',
    '1'=>'t',
    '2'=>'m',
    '3'=>'v'
    );
    $count_sect = (keys %sections); # counts the sections in the keys

    do {
    $rand_sect = int(rand($count_sect)); # generates a random integer <
    $count_sect
    print 'key:'.$rand_sect."\n";

    if (exists $sections{$rand_sect})
    { $sect=$sections{$rand_sect};
    print "SECTION:".$sect."\n"; # do things with the returned section
    delete $sections{$rand_sect};
    }
    else
    { print '-key non existent'."\n";}

    } until (!keys %sections);


    Anyone have suggestions for improvement? Thanks.
     
    Josh, Aug 22, 2004
    #1
    1. Advertising

  2. On 22 Aug 2004 09:27:43 -0700, (Josh) wrote:

    >I have an array that I need to cycle through once, but in a random
    >order.


    perldoc -q shuffle
     
    Brian Greenfield, Aug 22, 2004
    #2
    1. Advertising

  3. Josh wrote:
    > I have an array that I need to cycle through once, but in a random
    > order. For now I'm dumping the variables into a hash, pulling a
    > random number, retrieving that value, using it, deleting it, and
    > repeating the process until there are no more members in the hash.
    > This obviously won't be efficient with a large set. Seems there
    > must be a better way.


    This is at least significantly less code:

    my @arr = qw/x t m v/;
    print splice(@arr, rand $_, 1) for reverse 1 .. @arr;

    As Brian suggested, you should also read the applicable FAQ entry,
    especially if your array is large.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Aug 22, 2004
    #3
  4. Josh

    Anno Siegel Guest

    Gunnar Hjalmarsson <> wrote in comp.lang.perl.misc:
    > Josh wrote:
    > > I have an array that I need to cycle through once, but in a random
    > > order. For now I'm dumping the variables into a hash, pulling a
    > > random number, retrieving that value, using it, deleting it, and
    > > repeating the process until there are no more members in the hash.
    > > This obviously won't be efficient with a large set. Seems there
    > > must be a better way.

    >
    > This is at least significantly less code:
    >
    > my @arr = qw/x t m v/;
    > print splice(@arr, rand $_, 1) for reverse 1 .. @arr;


    Or even

    print splice(@arr, rand @arr, 1) while @arr;

    > As Brian suggested, you should also read the applicable FAQ entry,
    > especially if your array is large.


    There is also List::Util::shuffle, ready to use.

    Anno
     
    Anno Siegel, Aug 22, 2004
    #4
  5. Abigail wrote:
    > Anno Siegel wrote:
    >> Gunnar Hjalmarsson wrote:
    >>>
    >>> my @arr = qw/x t m v/;
    >>> print splice(@arr, rand $_, 1) for reverse 1 .. @arr;

    >>
    >> Or even
    >>
    >> print splice(@arr, rand @arr, 1) while @arr;

    >
    > Let's not. Splicing out elements one-by-one of an array is very
    > inefficient.


    Yeah, that's what the FAQ says, too. OTOH, I measured the speed with
    help of the Benchmark module, and code that created the array

    my @arr = 1 .. 5000;

    and then spliced it randomly using those inefficient methods, run
    about 15 times per second. So, if you don't have a really big array,
    or are about to run it very frequently, is there anything wrong with
    keeping it simple?

    > Shuffling can be done in linear time, while the splice
    > method takes time quadratic in the length of the array.


    I measured with the List::Util::shuffle() function, too, and it run
    about 70 times per second. However, when I included loading of the
    module in the benchmark, the overall speed was reduced to 12 times per
    second, i.e. slightly slower than the slice() method...

    use Benchmark 'cmpthese';
    my %oldINC = %INC;

    cmpthese -5, {
    'splice()' => sub {
    my @arr = 1 .. 5000;
    my @shuffled;
    push @shuffled, splice(@arr, rand @arr, 1) while @arr;
    },
    'shuffle()' => sub {
    my @arr = 1 .. 5000;
    %INC = %oldINC;
    require List::Util;
    my @shuffled = List::Util::shuffle(@arr);
    },
    };

    Output including module loading:

    Rate shuffle() splice()
    shuffle() 12.0/s -- -23%
    splice() 15.5/s 30% --

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Aug 23, 2004
    #5
  6. Gunnar Hjalmarsson wrote:
    > I measured with the List::Util::shuffle() function, too, and it run
    > about 70 times per second. However, when I included loading of the
    > module in the benchmark, the overall speed was reduced to 12 times per
    > second, i.e. slightly slower than the slice() method...
    >
    > use Benchmark 'cmpthese';
    > my %oldINC = %INC;
    >
    > cmpthese -5, {
    > 'splice()' => sub {
    > my @arr = 1 .. 5000;
    > my @shuffled;
    > push @shuffled, splice(@arr, rand @arr, 1) while @arr;
    > },
    > 'shuffle()' => sub {
    > my @arr = 1 .. 5000;
    > %INC = %oldINC;
    > require List::Util;
    > my @shuffled = List::Util::shuffle(@arr);
    > },
    > };
    >
    > Output including module loading:
    >
    > Rate shuffle() splice()
    > shuffle() 12.0/s -- -23%
    > splice() 15.5/s 30% --
    >


    Odd. I just tried the same thing (well, I added a third option - sort
    {int(rand(3)) - 1} @arr - to double check:

    use Benchmark 'cmpthese';
    my %oldINC = %INC;
    cmpthese -5, {
    'splice()' => sub {
    my @arr = 1..5000;
    my @shuffled;
    push @shuffled, splice(@arr, rand @arr, 1) while @arr;
    },
    'shuffle()' => sub {
    my @arr = 1..5000;
    %INC = %oldINC;
    require List::Util;
    my @shuffled = List::Util::shuffle(@arr);
    },
    'rand()' => sub {
    my @arr = 1..5000;
    my @shuffled = sort {int(rand(3)) - 1} @arr;
    }
    };

    and I got:

    Rate rand() splice() shuffle()
    rand() 23.1/s -- -47% -88%
    splice() 43.3/s 87% -- -77%
    shuffle() 189/s 719% 338% --

    Where shuffle was the clear winner by a mile.... Normally I'd say YMMV, but this
    seems like a slightly over the top difference. I'm using an slightly old perl
    (5.8.1) - maybe something is different between our versions?

    MB
     
    Matthew Braid, Aug 23, 2004
    #6
  7. Josh

    jdog Guest

    Thanks for the feedback. Right now the array is fairly small so I'm
    using the most convenient option (List::Util::Shuffle). If the script
    becomes slow when the array grows I'll do some benchmarking.

    Josh
     
    jdog, Aug 23, 2004
    #7
  8. Matthew Braid wrote:
    > Gunnar Hjalmarsson wrote:
    >> I measured with the List::Util::shuffle() function, too, and it
    >> run about 70 times per second. However, when I included loading
    >> of the module in the benchmark, the overall speed was reduced to
    >> 12 times per second, i.e. slightly slower than the slice()
    >> method...


    <snip>

    >> Output including module loading:
    >>
    >> Rate shuffle() splice()
    >> shuffle() 12.0/s -- -23%
    >> splice() 15.5/s 30% --

    >
    > Odd. I just tried the same thing (well, I added a third option -
    > sort {int(rand(3)) - 1} @arr - to double check:


    <snip>

    > and I got:
    >
    > Rate rand() splice() shuffle()
    > rand() 23.1/s -- -47% -88%
    > splice() 43.3/s 87% -- -77%
    > shuffle() 189/s 719% 338% --
    >
    > Where shuffle was the clear winner by a mile.... Normally I'd say
    > YMMV, but this seems like a slightly over the top difference. I'm
    > using an slightly old perl (5.8.1) - maybe something is different
    > between our versions?


    Well, I ran it using 5.8.0 on Windows 98. Just tried it with 5.8.1 on
    Linux, and then the results were similar to yours. Can the explanation
    possibly be that loading a compiled module is much slower on Windows?

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Aug 23, 2004
    #8
  9. Josh

    Sam Holden Guest

    On 23 Aug 2004 07:27:36 GMT, Abigail <> wrote:
    > Matthew Braid () wrote on MMMMX September MCMXCIII in
    ><URL:news:cgbd5q$j40$>:
    > ''
    > '' Odd. I just tried the same thing (well, I added a third option - sort
    > '' {int(rand(3)) - 1} @arr - to double check:
    >
    > I've seen variations of the third option before, being presented as a
    > way to shuffle a list. But I've yet to see any proof that this gives a
    > fair shuffle (fair being every possible permutation of the list has the
    > same chance of being produced, given a truely random 'rand()' function).


    Clearly it doesn't. Consider the simple case of a two element list (a,b).
    There are two possible shuffle results: (a,b) and (b,a) each of which will
    have equal probability of occuring in a fair shuffle.

    Any sane implementation of sort will when asked to sort a two element list
    do a single comparison, so for the above there are three cases:

    (int(rand(3)) -1)==-1 -> (a,b)
    (int(rand(3)) -1)==0 -> (a,b)
    (int(rand(3)) -1)==1 -> (b,a)

    Clearly the shuffle isn't fair since (a,b) is twice as likely. Of course
    it might improve with a larger list, but it doesn't bode well when the
    algorithm fails for such a simple case.

    Now the middle case, it could be argued, could result in (b,a) if the
    sort was not stable - but even so if it did it always would and hence
    (b,a) would be favoured and the problem remains. Well, for a sane
    sort implementation anyway.

    --
    Sam Holden
     
    Sam Holden, Aug 23, 2004
    #9
  10. Also sprach Sam Holden:

    > On 23 Aug 2004 07:27:36 GMT, Abigail <> wrote:
    >> Matthew Braid () wrote on MMMMX September MCMXCIII in
    >><URL:news:cgbd5q$j40$>:
    >> ''
    >> '' Odd. I just tried the same thing (well, I added a third option - sort
    >> '' {int(rand(3)) - 1} @arr - to double check:
    >>
    >> I've seen variations of the third option before, being presented as a
    >> way to shuffle a list. But I've yet to see any proof that this gives a
    >> fair shuffle (fair being every possible permutation of the list has the
    >> same chance of being produced, given a truely random 'rand()' function).

    >
    > Clearly it doesn't. Consider the simple case of a two element list (a,b).
    > There are two possible shuffle results: (a,b) and (b,a) each of which will
    > have equal probability of occuring in a fair shuffle.
    >
    > Any sane implementation of sort will when asked to sort a two element list
    > do a single comparison, so for the above there are three cases:
    >
    > (int(rand(3)) -1)==-1 -> (a,b)
    > (int(rand(3)) -1)==0 -> (a,b)
    > (int(rand(3)) -1)==1 -> (b,a)
    >
    > Clearly the shuffle isn't fair since (a,b) is twice as likely. Of course
    > it might improve with a larger list, but it doesn't bode well when the
    > algorithm fails for such a simple case.


    The naive solution to that would be disallowing '0' so that pairs are
    always swapped.

    However, this wont make the shuffling fair either. One problem (which
    can be verified with some simple empirical tests), is that the fairness
    depends on the algorithm used.

    Taking this little program:

    my @list = (1 .. 4);
    my %perms;

    shuffle() for 1 .. 10000;

    while (my ($key, $val) = each %perms) {
    print "$key => $val\n";
    }

    sub shuffle {
    my @shuffled = sort { int(rand(2)) == 0 ? -1 : 1 } @list;
    $perms{ join "-", @shuffled }++;
    }

    On perl5.8.4 with merge-sort, this yields a distribution ressembling:

    2-1-4-3 => 620
    1-2-4-3 => 606
    3-2-1-4 => 313
    1-4-2-3 => 331
    2-4-3-1 => 300
    1-3-4-2 => 330
    2-1-3-4 => 583
    4-1-2-3 => 320
    1-2-3-4 => 678
    3-4-2-1 => 619
    2-3-4-1 => 313
    3-1-4-2 => 301
    4-3-1-2 => 646
    2-3-1-4 => 325
    4-3-2-1 => 627
    1-3-2-4 => 317
    4-2-1-3 => 296
    1-4-3-2 => 323
    2-4-1-3 => 289
    3-2-4-1 => 303
    3-4-1-2 => 609
    4-1-3-2 => 332
    4-2-3-1 => 303
    3-1-2-4 => 316

    You consistently get certain permutations that show up only around 300
    times where others happen at least 600 times (2-1-4-3 is one of them).

    Now running the same with 'use sort "_qsort"' gives a very different
    picture. It's even more biased:

    2-1-4-3 => 621
    1-2-4-3 => 584
    3-2-1-4 => 651
    1-4-2-3 => 321
    2-4-3-1 => 155
    1-3-4-2 => 323
    2-1-3-4 => 1242
    4-1-2-3 => 299
    1-2-3-4 => 1199
    3-4-2-1 => 142
    2-3-4-1 => 344
    3-1-4-2 => 286
    4-3-1-2 => 169
    4-3-2-1 => 155
    2-3-1-4 => 647
    1-3-2-4 => 629
    4-2-1-3 => 328
    1-4-3-2 => 154
    2-4-1-3 => 290
    3-2-4-1 => 316
    4-1-3-2 => 174
    3-4-1-2 => 147
    3-1-2-4 => 659
    4-2-3-1 => 165

    Lists 2-1-3-4 and 1-2-3-4 are always at least around 1200.

    In a strict sense, this doesn't prove anything and the randomized sort
    could still be fair. But really, it's not.

    It might be possible to write a sort algorithm in a way that it could be
    used for shuffling. Perl's implementations however weren't designed that
    way. Especially perl's quicksort has been optimized to death and I
    wouldn't be surprised if those optimizations contribute to this strong
    bias towards certain permutations.

    Having said that, I tried it with these two sort-routines:

    sub bubble (&@) {
    my ($cmp, @copy) = @_;
    for my $i (reverse 1 .. $#copy) {
    for my $j (1 .. $i) {
    local ($a, $b) = @copy[$j-1, $j];
    @copy[$j-1, $j] = @copy[$j, $j-1] if &$cmp == 1;
    }
    }
    return @copy;
    }

    sub selection (&@) {
    my ($cmp, @copy) = @_;
    my $min;
    for my $i (0 .. $#copy-1) {
    $min = $i;
    for my $j ($i+1 .. $#copy) {
    local ($a, $b) = @copy[$j, $min];
    $min = $j if &$cmp == -1;
    }
    @copy[$min, $i] = @copy[$i, $min];
    }
    return @copy;
    }

    in the hope that maybe those naive sort algorithms could be used (as
    they tend to compare each element with all remaining ones). But I was
    wrong. Selection-sort has a similar bias as quicksort has (this time
    favoring 4-1-2-3 and 4-1-3-2). Bubble-sort is slightly better, but still
    more biased than perl's built-in mergesort.

    There are other interesting things to note. When allowing the compare
    function to return -1, 1 and 0 (as opposed to only -1 and 1), selection
    sort gets less biased, whereas bubble sort gets more biased (4-3-2-1
    only shows up around ten times and 1-2-3-4 1800 times on average).

    Without being a formal proof for anything, these observations show that
    the shuffle-by-sort approach is probably very rotten and should best be
    left alone.

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
     
    Tassilo v. Parseval, Aug 23, 2004
    #10
  11. Josh

    Anno Siegel Guest

    Abigail <> wrote in comp.lang.perl.misc:
    > Anno Siegel (-berlin.de) wrote on MMMMIX September
    > MCMXCIII in <URL:news:cgapo7$82q$-Berlin.DE>:
    > %% Gunnar Hjalmarsson <> wrote in comp.lang.perl.misc:
    > %% > Josh wrote:
    > %% > > I have an array that I need to cycle through once, but in a random
    > %% > > order. For now I'm dumping the variables into a hash, pulling a
    > %% > > random number, retrieving that value, using it, deleting it, and
    > %% > > repeating the process until there are no more members in the hash.
    > %% > > This obviously won't be efficient with a large set. Seems there
    > %% > > must be a better way.
    > %% >
    > %% > This is at least significantly less code:
    > %% >
    > %% > my @arr = qw/x t m v/;
    > %% > print splice(@arr, rand $_, 1) for reverse 1 .. @arr;
    > %%
    > %% Or even
    > %%
    > %% print splice(@arr, rand @arr, 1) while @arr;
    >
    > Let's not. Splicing out elements one-by-one of an array is very
    > inefficient. Shuffling can be done in linear time, while the splice
    > method takes time quadratic in the length of the array.


    True, though the quadratic term has a small factor. One time-linear
    shuffler swaps list elements instead of extracting them. That saves
    the time splice needs to shift the array tight:

    sub swapping {
    for ( reverse 1 .. $#_ ) {
    my $r = rand( 1 + $_);
    @_[ $r, $_] = @_[ $_, $r];
    }
    @_;
    }

    Benchmarking this against

    sub splicing {
    map splice( @_, rand $_, 1), reverse 1 .. @_;
    }

    shows splicing() about twice as fast for arrays shorter than 1000.
    The break-even point is with lists of length 12_000; from then on
    swapping() wins out by an increasing margin. At 30_000, swapping
    is twice as fast as splicing. On my machine.

    That leaves quite a few applications (such as card shuffling) where
    the splice solution is practical.

    > It's even worse than suggesting to use Bubblesort to sort an array.


    Even Bubblesort has found respectable applications. It is fast for small
    lists that are almost in order and has been used to speed up another
    sort algorithm. I think the example is documented in the _Programming
    Pearls_ series, though the example I find [1] uses insertion sort (another
    n**2 sorting algorithm).

    Anno

    [1] Bentley, _Programming Pearls_, p. 112f
     
    Anno Siegel, Aug 23, 2004
    #11
  12. Anno Siegel wrote:
    > One time-linear shuffler swaps list elements instead of extracting
    > them. That saves the time splice needs to shift the array tight:
    >
    > sub swapping {
    > for ( reverse 1 .. $#_ ) {
    > my $r = rand( 1 + $_);
    > @_[ $r, $_] = @_[ $_, $r];
    > }
    > @_;
    > }
    >
    > Benchmarking this against
    >
    > sub splicing {
    > map splice( @_, rand $_, 1), reverse 1 .. @_;
    > }
    >
    > shows splicing() about twice as fast for arrays shorter than 1000.
    > The break-even point is with lists of length 12_000; from then on
    > swapping() wins out by an increasing margin. At 30_000, swapping is
    > twice as fast as splicing. On my machine.


    The benchmark I posted showed that the execution time of
    List::Util::shuffle() is much faster than that. Is the explanation,
    then, that I was actually comparing apples and oranges, since
    List::Util is a compiled module?

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Aug 23, 2004
    #12
  13. Josh

    Anno Siegel Guest

    Gunnar Hjalmarsson <> wrote in comp.lang.perl.misc:
    > Anno Siegel wrote:
    > > One time-linear shuffler swaps list elements instead of extracting
    > > them. That saves the time splice needs to shift the array tight:
    > >
    > > sub swapping {
    > > for ( reverse 1 .. $#_ ) {
    > > my $r = rand( 1 + $_);
    > > @_[ $r, $_] = @_[ $_, $r];
    > > }
    > > @_;
    > > }
    > >
    > > Benchmarking this against
    > >
    > > sub splicing {
    > > map splice( @_, rand $_, 1), reverse 1 .. @_;
    > > }
    > >
    > > shows splicing() about twice as fast for arrays shorter than 1000.
    > > The break-even point is with lists of length 12_000; from then on
    > > swapping() wins out by an increasing margin. At 30_000, swapping is
    > > twice as fast as splicing. On my machine.

    >
    > The benchmark I posted showed that the execution time of
    > List::Util::shuffle() is much faster than that. Is the explanation,
    > then, that I was actually comparing apples and oranges, since
    > List::Util is a compiled module?


    It is. It also contains a pure Perl implementation, but by default it
    loads an XS version. That's why I wanted to add a benchmark against
    a Perl implementation of the in-place shuffle.

    Anno
     
    Anno Siegel, Aug 23, 2004
    #13
  14. Josh

    Anno Siegel Guest

    Tassilo v. Parseval <-aachen.de> wrote in comp.lang.perl.misc:
    > Also sprach Sam Holden:
    >
    > > On 23 Aug 2004 07:27:36 GMT, Abigail <> wrote:
    > >> Matthew Braid () wrote on MMMMX September MCMXCIII in
    > >><URL:news:cgbd5q$j40$>:
    > >> ''
    > >> '' Odd. I just tried the same thing (well, I added a third option - sort
    > >> '' {int(rand(3)) - 1} @arr - to double check:
    > >>
    > >> I've seen variations of the third option before, being presented as a
    > >> way to shuffle a list. But I've yet to see any proof that this gives a
    > >> fair shuffle (fair being every possible permutation of the list has the
    > >> same chance of being produced, given a truely random 'rand()' function).

    > >
    > > Clearly it doesn't. Consider the simple case of a two element list (a,b).
    > > There are two possible shuffle results: (a,b) and (b,a) each of which will
    > > have equal probability of occuring in a fair shuffle.
    > >
    > > Any sane implementation of sort will when asked to sort a two element list
    > > do a single comparison, so for the above there are three cases:
    > >
    > > (int(rand(3)) -1)==-1 -> (a,b)
    > > (int(rand(3)) -1)==0 -> (a,b)
    > > (int(rand(3)) -1)==1 -> (b,a)
    > >
    > > Clearly the shuffle isn't fair since (a,b) is twice as likely. Of course
    > > it might improve with a larger list, but it doesn't bode well when the
    > > algorithm fails for such a simple case.

    >
    > The naive solution to that would be disallowing '0' so that pairs are
    > always swapped.
    >
    > However, this wont make the shuffling fair either. One problem (which
    > can be verified with some simple empirical tests), is that the fairness
    > depends on the algorithm used.
    >
    > Taking this little program:


    [interesting results snipped]

    One problem with this approach is that it violates the assumptions of
    the underlying algorithm. Sort procedures assume that comparisons are
    consistent with a linear ordering of the list elements. In particular,
    they assume that comparing the same elements twice will have the same
    outcome, but also transitivity of the order relation and more. Who
    is to say what a sort algorithm is going to do? It may not even halt,
    for instance if the random decisions indicate a cyclic ordering.

    Nor are any predictions on running time valid, so the expectation
    that it runs in n*log n time may not be true. Since that is worse
    than the linear time of an in-place shuffle, there isn't much to
    recommend the "random sort" at all.

    Anno
     
    Anno Siegel, Aug 23, 2004
    #14
  15. -berlin.de (Anno Siegel) writes:

    [...]

    > One problem with this approach is that it violates the assumptions of
    > the underlying algorithm. Sort procedures assume that comparisons are
    > consistent with a linear ordering of the list elements. In particular,
    > they assume that comparing the same elements twice will have the same
    > outcome, but also transitivity of the order relation and more. Who
    > is to say what a sort algorithm is going to do? It may not even halt,
    > for instance if the random decisions indicate a cyclic ordering.


    Indeed, the documentation for Perl's sort function says:

    The comparison function is required to behave. If it returns
    inconsistent results (sometimes saying $x[1] is less than $x[2]
    and sometimes saying the opposite, for example) the results are
    not well-defined.

    -----ScottG.
     
    Scott W Gifford, Aug 23, 2004
    #15
  16. Josh

    Anno Siegel Guest

    Abigail <> wrote in comp.lang.perl.misc:
    > Anno Siegel (-berlin.de) wrote on MMMMX September
    > MCMXCIII in <URL:news:cgd5et$o5f$-Berlin.DE>:
    > || Gunnar Hjalmarsson <> wrote in comp.lang.perl.misc:
    > || > Anno Siegel wrote:
    > || > > One time-linear shuffler swaps list elements instead of extracting
    > || > > them. That saves the time splice needs to shift the array tight:
    > || > >
    > || > > sub swapping {
    > || > > for ( reverse 1 .. $#_ ) {
    > || > > my $r = rand( 1 + $_);
    > || > > @_[ $r, $_] = @_[ $_, $r];
    > || > > }
    > || > > @_;
    > || > > }
    > || > >
    > || > > Benchmarking this against
    > || > >
    > || > > sub splicing {
    > || > > map splice( @_, rand $_, 1), reverse 1 .. @_;
    > || > > }
    > || > >
    > || > > shows splicing() about twice as fast for arrays shorter than 1000.
    > || > > The break-even point is with lists of length 12_000; from then on
    > || > > swapping() wins out by an increasing margin. At 30_000, swapping is
    > || > > twice as fast as splicing. On my machine.
    > || >
    > || > The benchmark I posted showed that the execution time of
    > || > List::Util::shuffle() is much faster than that. Is the explanation,
    > || > then, that I was actually comparing apples and oranges, since
    > || > List::Util is a compiled module?
    > ||
    > || It is. It also contains a pure Perl implementation, but by default it
    > || loads an XS version. That's why I wanted to add a benchmark against
    > || a Perl implementation of the in-place shuffle.
    >
    >
    > So, it's fair to compare a splice who is doing its work in C with a
    > shuffle written in pure Perl?


    I didn't want to imply any unfairness, just to add a data point.

    > I'd say comparing the XS shuffle with the C splice is more fair.


    Replacing n calls to a C-level function by a single one is fair?

    We know that the overhead swamps out a lot of what may be happening
    low level, and these benchmarks prove it again. So setting one
    swap operation against one call to splice seems fair enough from
    that point of view.

    Anno
     
    Anno Siegel, Aug 24, 2004
    #16
  17. Abigail wrote:

    > Matthew Braid () wrote on MMMMX September MCMXCIII in
    > <URL:news:cgbd5q$j40$>:
    > ''
    > '' Odd. I just tried the same thing (well, I added a third option - sort
    > '' {int(rand(3)) - 1} @arr - to double check:
    >
    > I've seen variations of the third option before, being presented as a
    > way to shuffle a list. But I've yet to see any proof that this gives a
    > fair shuffle (fair being every possible permutation of the list has the
    > same chance of being produced, given a truely random 'rand()' function).
    >
    >
    > Abigail


    I only included the rand() method out of curiousity (I knew it was gonna be
    slow, I just put it there as a comparison). I'd never actually _use_ that method
    in my code :)

    MB
     
    Matthew Braid, Aug 24, 2004
    #17
  18. Josh

    Bart Lateur Guest

    Matthew Braid wrote:

    >Odd. I just tried the same thing (well, I added a third option - sort
    >{int(rand(3)) - 1} @arr - to double check:


    I've always heard that using an inconsequent sorting function in perl,
    like this one, is a recipe for a disaster.

    You could just generate a random number per entry, and sort according to
    that list. Like:

    @sort = map rand, 0 .. $#item;
    @shuffled = @item[sort { $sort[$a] <=> $sort[$b] } 0 .. $#item];

    --
    Bart.
     
    Bart Lateur, Aug 27, 2004
    #18
  19. Josh

    Anno Siegel Guest

    Bart Lateur <> wrote in comp.lang.perl.misc:
    > Matthew Braid wrote:
    >
    > >Odd. I just tried the same thing (well, I added a third option - sort
    > >{int(rand(3)) - 1} @arr - to double check:

    >
    > I've always heard that using an inconsequent sorting function in perl,
    > like this one, is a recipe for a disaster.
    >
    > You could just generate a random number per entry, and sort according to
    > that list. Like:
    >
    > @sort = map rand, 0 .. $#item;
    > @shuffled = @item[sort { $sort[$a] <=> $sort[$b] } 0 .. $#item];


    It may still be hard to prove that this generates an unbiased sequence.
    There will usually be a few duplicates among the generated numbers.
    For these, the sequence is determined by the sort algorithm, not by
    the random generator.

    Anno
     
    Anno Siegel, Aug 27, 2004
    #19
  20. Josh

    Bart Lateur Guest

    Anno Siegel wrote:

    >There will usually be a few duplicates among the generated numbers.


    Not *exact* duplicates, in the numerical sense (though perhaps it
    looksthat way when stringified). Otherwise, you'd have reached the end
    of the loop, and start getting the exact same sequence again.

    --
    Bart.
     
    Bart Lateur, Aug 29, 2004
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Badass Scotsman

    Why wont this Randomize? (Classic VB Script ASP)

    Badass Scotsman, May 5, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    549
    =?ISO-8859-1?Q?G=F6ran_Andersson?=
    May 5, 2006
  2. Zaphod

    Randomize STL stack

    Zaphod, Apr 25, 2004, in forum: C++
    Replies:
    3
    Views:
    702
    Andrew Koenig
    Apr 27, 2004
  3. Sweety

    own code for randomize

    Sweety, Jul 26, 2004, in forum: C Programming
    Replies:
    2
    Views:
    311
    Erik de Castro Lopo
    Jul 27, 2004
  4. ashu

    randomize character

    ashu, Oct 18, 2005, in forum: C++
    Replies:
    6
    Views:
    494
    Default User
    Oct 18, 2005
  5. Mr. x

    randomize function

    Mr. x, Sep 18, 2003, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    114
    Mr. x
    Sep 18, 2003
Loading...

Share This Page