need some Ruby magic

Discussion in 'Ruby' started by Hammed Malik, Dec 1, 2005.

  1. Hammed Malik

    Hammed Malik Guest

    ------=_Part_9539_13698596.1133466156105
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    I'd like to sort collections randomly. This is what I tried first:

    my_array.sort { |a,b| rand(2) }
    >


    but the results weren't very random. I then tried the following:

    class Array
    > def random_copy
    > b =3D Array.new
    > while b.length < self.length
    > random_value =3D self[rand self.length]
    > b.push random_value unless b.include? random_value
    > end
    > b
    > end
    > end




    This works but I'm sure there's some Ruby magic for doing this better.

    thanks

    ------=_Part_9539_13698596.1133466156105--
     
    Hammed Malik, Dec 1, 2005
    #1
    1. Advertising

  2. Hi --

    On Fri, 2 Dec 2005, Hammed Malik wrote:

    > I'd like to sort collections randomly. This is what I tried first:
    >
    > my_array.sort { |a,b| rand(2) }
    >>

    >
    > but the results weren't very random. I then tried the following:
    >
    > class Array
    >> def random_copy
    >> b = Array.new
    >> while b.length < self.length
    >> random_value = self[rand self.length]
    >> b.push random_value unless b.include? random_value
    >> end
    >> b
    >> end
    >> end

    >
    >
    >
    > This works but I'm sure there's some Ruby magic for doing this better.


    I think the most common idiom is:

    array.sort_by { rand }


    David

    --
    David A. Black


    "Ruby for Rails", forthcoming from Manning Publications, April 2006!
     
    David A. Black, Dec 1, 2005
    #2
    1. Advertising

  3. Hammed Malik

    Jim Weirich Guest

    hammed wrote:
    > I'd like to sort collections randomly. This is what I tried first:
    >
    > my_array.sort { |a,b| rand(2) }


    my_array.sort_by { rand }

    --
    Posted via http://www.ruby-forum.com/.
     
    Jim Weirich, Dec 1, 2005
    #3
  4. Hammed Malik

    Hammed Malik Guest

    ------=_Part_9843_32078302.1133467615836
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    excellent! thanks.

    On 01/12/05, Jim Weirich <> wrote=
    :
    >
    > hammed wrote:
    > > I'd like to sort collections randomly. This is what I tried first:
    > >
    > > my_array.sort { |a,b| rand(2) }

    >
    > my_array.sort_by { rand }
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >


    ------=_Part_9843_32078302.1133467615836--
     
    Hammed Malik, Dec 1, 2005
    #4
  5. Maybe there should be Array#randomize...
     
    Vincent Foley, Dec 1, 2005
    #5
  6. Hammed Malik

    Jeff Wood Guest

    ------=_Part_13036_22600137.1133473011020
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    Take a look @ the Facets project. There are extension methods for array to
    have a number of randomized behaviors.

    j.

    On 12/1/05, Vincent Foley <> wrote:
    >
    > Maybe there should be Array#randomize...
    >
    >
    >



    --
    "Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

    Jeff Wood

    ------=_Part_13036_22600137.1133473011020--
     
    Jeff Wood, Dec 1, 2005
    #6
  7. In article <>,
    Jim Weirich <> wrote:

    > hammed wrote:
    > > I'd like to sort collections randomly. This is what I tried first:
    > >
    > > my_array.sort { |a,b| rand(2) }

    >
    > my_array.sort_by { rand }


    A quick test seems to indicate that this works with the ruby 1.8.2
    implementation on my Mac. However, there is no guarantee that this call
    will select each of the n! permutations with equal probability.

    Whether it does will depend on the exact algorithm used by 'sort'. For
    instance, if sort uses the (inefficient for large arrays) bubble sort,
    the last key will end up staying there in half the cases.

    Reinder
     
    Reinder Verlinde, Dec 3, 2005
    #7
  8. On Dec 3, 2005, at 9:32 AM, Reinder Verlinde wrote:

    > However, there is no guarantee that this call
    > will select each of the n! permutations with equal probability.


    How about just doing this?

    class Array
    def shuffle
    newarr = self
    self.length.times do |x|
    y = rand(self.length)
    newarr[x], newarr[y] = newarr[y], newarr[x]
    end
    newarr
    end
    end

    shuffled_array = my_array.shuffle

    --Steve
     
    Stephen Waits, Dec 3, 2005
    #8
  9. Hammed Malik

    Sam Gentle Guest

    On 12/4/05, Reinder Verlinde <> wrote:
    > In article <>,
    > Jim Weirich <> wrote:
    >
    > > hammed wrote:
    > > > I'd like to sort collections randomly. This is what I tried first:
    > > >
    > > > my_array.sort { |a,b| rand(2) }

    > >
    > > my_array.sort_by { rand }

    >
    > A quick test seems to indicate that this works with the ruby 1.8.2
    > implementation on my Mac. However, there is no guarantee that this call
    > will select each of the n! permutations with equal probability.
    >
    > Whether it does will depend on the exact algorithm used by 'sort'. For
    > instance, if sort uses the (inefficient for large arrays) bubble sort,
    > the last key will end up staying there in half the cases.


    Actually, sort_by caches every value passed and uses those for
    comparison (aka a Schwartzian Transform), so it would essentially
    return the same results as sorting a list of random numbers,
    regardless of algorithm.

    Sam
     
    Sam Gentle, Dec 3, 2005
    #9
  10. Hammed Malik

    Jim Weirich Guest

    reinder wrote:
    >> my_array.sort_by { rand }

    >
    > A quick test seems to indicate that this works with the ruby 1.8.2
    > implementation on my Mac. However, there is no guarantee that this call
    > will select each of the n! permutations with equal probability.


    Actually, it will. sort_by uses a Schwartzian transform to do the
    sorting. That means that rand is only called once for each element of
    the array. As long as rand gives a decent distribution of random
    numbers, the permutation will be random too.

    Here's a little experiment:

    -------------------------------------------------
    N = (ARGV.shift || 1000).to_i
    A = %w(a b c)
    Perms = %w(abc acb bac bca cab cba)
    Score = Hash.new { |h, k| h[k] = 0 }
    N.times do
    sorted = A.dup.sort_by { rand }
    Score[sorted.join("")] += 1
    end

    Score.keys.sort.each do |key|
    puts "#{key}: #{Score[key]}"
    end
    -------------------------------------------------

    Gives:

    $ ruby sortdist.rb 100000
    abc: 16693
    acb: 16688
    bac: 16752
    bca: 16590
    cab: 16475
    cba: 16802

    Thats a pretty good distribution for the number of iterations.

    -- Jim Weirich


    --
    Posted via http://www.ruby-forum.com/.
     
    Jim Weirich, Dec 3, 2005
    #10
  11. On Sun, Dec 04, 2005 at 08:48:11AM +0900, Jim Weirich wrote:
    > reinder wrote:
    > >> my_array.sort_by { rand }

    > >
    > > A quick test seems to indicate that this works with the ruby 1.8.2
    > > implementation on my Mac. However, there is no guarantee that this call
    > > will select each of the n! permutations with equal probability.

    >
    > Actually, it will. sort_by uses a Schwartzian transform to do the
    > sorting. That means that rand is only called once for each element of
    > the array. As long as rand gives a decent distribution of random
    > numbers, the permutation will be random too.


    sort_by{ rand } is actually biased, since sort_by will preserve the
    relative order of the elements for which rand() returned the same value.

    %w[a b c d].sort_by{ 0 } # => ["a", "b", "c", "d"]
    i = 0
    %w[a b c d].sort_by{ 10 - (i += 1) / 2 } # => ["d", "b", "c", "a"]

    This means that permutations preserving the relative order of one (or
    more) pair of elements of the original array are a bit more probable.

    In praxis, the bias this introduces is often so small that the mere
    succinctness of sort_by{ rand } more than makes up for it.

    --
    Mauricio Fernandez
     
    Mauricio Fernández, Dec 4, 2005
    #11
  12. Hammed Malik

    Guest

    On Sun, 4 Dec 2005, Mauricio [iso-8859-1] Fern=E1ndez wrote:

    > On Sun, Dec 04, 2005 at 08:48:11AM +0900, Jim Weirich wrote:
    >> reinder wrote:
    >>>> my_array.sort_by { rand }
    >>>
    >>> A quick test seems to indicate that this works with the ruby 1.8.2
    >>> implementation on my Mac. However, there is no guarantee that this call
    >>> will select each of the n! permutations with equal probability.

    >>
    >> Actually, it will. sort_by uses a Schwartzian transform to do the
    >> sorting. That means that rand is only called once for each element of
    >> the array. As long as rand gives a decent distribution of random
    >> numbers, the permutation will be random too.

    >
    > sort_by{ rand } is actually biased, since sort_by will preserve the
    > relative order of the elements for which rand() returned the same value.
    >
    > %w[a b c d].sort_by{ 0 } # =3D> ["a", "b", "c",=

    "d"]
    > i =3D 0
    > %w[a b c d].sort_by{ 10 - (i +=3D 1) / 2 } # =3D> ["d", "b", "c=

    ", "a"]
    >
    > This means that permutations preserving the relative order of one (or
    > more) pair of elements of the original array are a bit more probable.
    >
    > In praxis, the bias this introduces is often so small that the mere
    > succinctness of sort_by{ rand } more than makes up for it.


    does

    %w( a b c d ).sort{ rand <=3D> rand }

    help?

    -a
    --=20
    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
    =3D=3D=3D=3D
    | ara [dot] t [dot] howard [at] noaa [dot] gov
    | all happiness comes from the desire for others to be happy. all misery
    | comes from the desire for oneself to be happy.
    | -- bodhicaryavatara
    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
    =3D=3D=3D=3D
     
    , Dec 4, 2005
    #12
  13. > sort_by{ rand } is actually biased, since sort_by will preserve the
    > relative order of the elements for which rand() returned the same value.


    How frequently would rand return a same value? (In theory it would be
    never but does anyone know the reality?)
     
    Brian Buckley, Dec 4, 2005
    #13
  14. On Dec 4, 2005, at 7:59 AM, Brian Buckley wrote:

    > How frequently would rand return a same value? (In theory it would be
    > never but does anyone know the reality?)


    According to the Pickaxe, Kernel.rand is an implementation of the
    [Mersenne Twister][1] algorithm. If that is still the case, then in
    reality it will begin repeating exactly the same numbers after
    2^19937-1 (approx. 4e6001) pseudo-random numbers, as that's the
    period of the MT. The period is the point at which the exact same
    series of PRNs will begin again. For example, if it started out as
    1, 2, 3, ..., then after 2^19937-1, it'd start again at 1, 2, 3.

    But, that's not what you really care about.

    Numbers will certainly repeat within the period, since the period is
    so much larger than our integer representation. For example, a 32
    bit integer can only represent about 4e9. So, the likelihood of
    getting any specific integer, in the 32 bit case, is 1/(2^32). This
    is an extreme example:

    5.times { p rand(2) }
    5.times { p rand(1000) }

    The bottom line is that the PRNs should be approximately uniformly
    distributed. Apply that to your specific range, and you have your
    answer.

    --Steve

    [1]: http://en.wikipedia.org/wiki/Mersenne_twister
     
    Stephen Waits, Dec 4, 2005
    #14
  15. On Dec 4, 2005, at 8:54 AM, Stephen Waits wrote:

    > Numbers will certainly repeat within the period, since the period
    > is so much larger than our integer representation. For example, a
    > 32 bit integer can only represent about 4e9. So, the likelihood of
    > getting any specific integer, in the 32 bit case, is 1/(2^32).


    I should add to this that you can approximate when you'll get
    repeats, and it's likely to be much sooner than you'd guess. For
    more information, see the [Birthday Problem][1].

    --Steve

    [1]: http://mathworld.wolfram.com/BirthdayProblem.html
     
    Stephen Waits, Dec 4, 2005
    #15
  16. Hammed Malik

    Kero Guest

    > How about just doing this?
    >
    > class Array
    > def shuffle
    > newarr = self
    > self.length.times do |x|
    > y = rand(self.length)
    > newarr[x], newarr[y] = newarr[y], newarr[x]
    > end
    > newarr
    > end
    > end
    >
    > shuffled_array = my_array.shuffle


    "Just" doing that
    1) changes the receiver (use new_arr = self.dup instead)
    2) definitely has a bias. Look e.g. at an array of length three. You will
    first swap the first element with 3 possibilities, then the second and so
    on; you're going through 3**3 possibilities. There are 3! possible
    permutations, but alas, 3! == 6 is not a perfect divisor of 3**3 == 27,
    thus you have a bias.
     
    Kero, Dec 4, 2005
    #16
  17. On Dec 4, 2005, at 12:32 PM, Kero wrote:

    > 1) changes the receiver (use new_arr = self.dup instead)
    > 2) definitely has a bias. Look e.g. at an array of length three.
    > You will
    > first swap the first element with 3 possibilities, then the
    > second and so
    > on; you're going through 3**3 possibilities. There are 3! possible
    > permutations, but alas, 3! == 6 is not a perfect divisor of 3**3
    > == 27,
    > thus you have a bias.


    Good points, thanks. How about this?

    class Array
    def shuffle
    newarr = self.dup
    2.times do
    self.length.times do |x|
    y = rand(self.length)
    newarr[x], newarr[y] = newarr[y], newarr[x]
    end
    end
    newarr
    end
    end

    # test code lifted from Jim Weirich's earlier post
    N = (ARGV.shift || 1000).to_i
    A = %w(a b c)
    Perms = %w(abc acb bac bca cab cba)
    Score = Hash.new { |h, k| h[k] = 0 }
    N.times do
    sorted = A.shuffle
    Score[sorted.join("")] += 1
    end

    Score.keys.sort.each do |key|
    puts "#{key}: #{Score[key]}"
    end


    [~/Code/private/code/snippets] 382% ./shuffle.rb 100000
    abc: 16596
    acb: 16394
    bac: 16700
    bca: 16684
    cab: 16841
    cba: 16785

    0:07.87 seconds, 96.4% CPU


    This is really just meant to be a simple hack. I think a better
    algorithm could assign a random index to each element, then sort on
    those indices.

    --Steve
     
    Stephen Waits, Dec 4, 2005
    #17
  18. Brian Buckley, Dec 5, 2005
    #18
  19. Shuffling an array, sort_by{rand}'s bias (was Re: need some Ruby magic)

    On Sun, Dec 04, 2005 at 10:21:02AM +0900, wrote:
    > On Sun, 4 Dec 2005, Mauricio [iso-8859-1] Fern=E1ndez wrote:
    > >sort_by{ rand } is actually biased, since sort_by will preserve the
    > >relative order of the elements for which rand() returned the same valu=

    e.

    I did some numbers and found the following probability estimates:

    array size P(#rand() collision)
    1000 5.54558e-11
    10000 5.55056e-09
    1000000 5.55096e-05
    10000000 5.53574e-03
    =20
    A collision implies that the associated pair of elements has got the same
    ordering in the output array, instead of 50-50 chances of it being revers=
    ed.

    More info, including the Ruby code I used, available at
    http://eigenclass.org/hiki.rb?sort_by rand is biased

    > does
    >=20
    > %w( a b c d ).sort{ rand <=3D> rand }
    >
    > help?


    I'm not totally sure it's really unbiased, but at first sight it looks go=
    od.

    When called with no arguments, rand will use

    static double
    genrand_real(void)
    {=20
    unsigned long a=3Dgenrand_int32()>>5, b=3Dgenrand_int32()>>6;=20
    return(a*67108864.0+b)*(1.0/9007199254740992.0);=20
    }

    so it would seem that two consecutive calls to rand() cannot return the
    same value, but I still have to think about the effect of qsort.

    It is however somewhat slower than sort_by{ rand }.

    [100, 1000, 10000, 100000].each do |n|
    GC.start
    t0 =3D Time.new
    (1..n).sort_by{rand}
    a =3D (t1 =3D Time.new) - t0 # =3D> 0.000425, 0.003002, 0.=
    054392, 0.939692
    (1..n).sort{rand <=3D> rand}=20
    b =3D Time.new - t1 # =3D> 0.001161, 0.026275, 0.28=
    9807, 3.791833
    "%4.2f" % [b / a] # =3D> "2.73", "8.75", "5.33", "4=
    04"
    end
    RUBY_VERSION # =3D> "1.8.3"
    RUBY_RELEASE_DATE # =3D> "2005-09-24"


    --=20
    Mauricio Fernandez
     
    Mauricio Fernández, Dec 5, 2005
    #19
  20. Re: Shuffling an array, sort_by{rand}'s bias (was Re: need some Rubymagic)

    Mauricio Fern=E1ndez wrote:
    >> does
    >>
    >> %w( a b c d ).sort{ rand <=3D> rand }
    >>
    >> help?

    >=20
    > It is however somewhat slower than sort_by{ rand }.


    It's considerably slower. Changing it to sort { rand(10000) <=3D> 5000 }=
    =20
    shows a 30% speedup on 10k element Arrays on my system. Don't know why=20
    I chose those numbers, I suppose bigger numbers reduce the "0"=20
    comparison, considering { rand(3) <=3D> 1 }.

    Anyway, I put together a small benchmark to compare my (revised)=20
    shuffle, to sort, and sort_by:

    #!/usr/bin/env ruby

    class Array
    def shuffle
    newarr =3D self.dup
    2.times do
    self.length.times do |x|
    y =3D rand(self.length)
    newarr[x], newarr[y] =3D newarr[y], newarr[x]
    end
    end
    newarr
    end
    end

    printf(" size shuffle sort sort_by\n")

    [10, 100, 1000, 10000].each do |n|

    # populate an Array size n
    a =3D Array.new(n) { |i| i }

    # my method
    start =3D Time.new
    100.times do
    b =3D a.shuffle
    end
    a_time =3D Time.new - start

    # Array's sort
    start =3D Time.new
    100.times do
    b =3D a.sort { rand(10000) <=3D> 5000 }
    end
    b_time =3D Time.new - start

    # Enumerable's sort_by
    start =3D Time.new
    100.times do
    b =3D a.sort_by { rand }
    end
    c_time =3D Time.new - start

    # results
    printf("%6d %5.2f %5.2f %5.2f\n", n, a_time, b_time, c_time)
    =09
    end


    Which, on my P4/3.0GHz (Win32) emits this:

    size shuffle sort sort_by
    10 0.00 0.02 0.00
    100 0.16 0.13 0.03
    1000 1.61 2.03 0.44
    10000 16.31 28.14 4.78

    --Steve
     
    Stephen Waits, Dec 5, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tom Willis

    Ruby black magic? Meta Programming

    Tom Willis, Mar 12, 2005, in forum: Ruby
    Replies:
    4
    Views:
    374
    Mathieu Bouchard
    Mar 13, 2005
  2. J2M
    Replies:
    12
    Views:
    199
  3. Sunny Hirai
    Replies:
    1
    Views:
    152
    Alex Young
    Jun 6, 2007
  4. Giles Bowkett
    Replies:
    9
    Views:
    429
    Giles Bowkett
    Dec 17, 2007
  5. DMG

    ruby eval magic

    DMG, Dec 14, 2009, in forum: Ruby
    Replies:
    3
    Views:
    136
Loading...

Share This Page