need some Ruby magic

H

Hammed Malik

------=_Part_9539_13698596.1133466156105
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

I'd like to sort collections randomly. This is what I tried first:

my_array.sort { |a,b| rand(2) }
but the results weren't very random. I then tried the following:

class Array
def random_copy
b =3D Array.new
while b.length < self.length
random_value =3D self[rand self.length]
b.push random_value unless b.include? random_value
end
b
end
end



This works but I'm sure there's some Ruby magic for doing this better.

thanks

------=_Part_9539_13698596.1133466156105--
 
D

David A. Black

Hi --

I'd like to sort collections randomly. This is what I tried first:

my_array.sort { |a,b| rand(2) }
but the results weren't very random. I then tried the following:

class Array
def random_copy
b = Array.new
while b.length < self.length
random_value = self[rand self.length]
b.push random_value unless b.include? random_value
end
b
end
end



This works but I'm sure there's some Ruby magic for doing this better.

I think the most common idiom is:

array.sort_by { rand }


David
 
H

Hammed Malik

------=_Part_9843_32078302.1133467615836
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

excellent! thanks.

my_array.sort_by { rand }

------=_Part_9843_32078302.1133467615836--
 
J

Jeff Wood

------=_Part_13036_22600137.1133473011020
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Take a look @ the Facets project. There are extension methods for array to
have a number of randomized behaviors.

j.

Maybe there should be Array#randomize...


--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood

------=_Part_13036_22600137.1133473011020--
 
R

Reinder Verlinde

Jim Weirich said:
my_array.sort_by { rand }

A quick test seems to indicate that this works with the ruby 1.8.2
implementation on my Mac. However, there is no guarantee that this call
will select each of the n! permutations with equal probability.

Whether it does will depend on the exact algorithm used by 'sort'. For
instance, if sort uses the (inefficient for large arrays) bubble sort,
the last key will end up staying there in half the cases.

Reinder
 
S

Stephen Waits

However, there is no guarantee that this call
will select each of the n! permutations with equal probability.

How about just doing this?

class Array
def shuffle
newarr = self
self.length.times do |x|
y = rand(self.length)
newarr[x], newarr[y] = newarr[y], newarr[x]
end
newarr
end
end

shuffled_array = my_array.shuffle

--Steve
 
S

Sam Gentle

A quick test seems to indicate that this works with the ruby 1.8.2
implementation on my Mac. However, there is no guarantee that this call
will select each of the n! permutations with equal probability.

Whether it does will depend on the exact algorithm used by 'sort'. For
instance, if sort uses the (inefficient for large arrays) bubble sort,
the last key will end up staying there in half the cases.

Actually, sort_by caches every value passed and uses those for
comparison (aka a Schwartzian Transform), so it would essentially
return the same results as sorting a list of random numbers,
regardless of algorithm.

Sam
 
J

Jim Weirich

reinder said:
A quick test seems to indicate that this works with the ruby 1.8.2
implementation on my Mac. However, there is no guarantee that this call
will select each of the n! permutations with equal probability.

Actually, it will. sort_by uses a Schwartzian transform to do the
sorting. That means that rand is only called once for each element of
the array. As long as rand gives a decent distribution of random
numbers, the permutation will be random too.

Here's a little experiment:

-------------------------------------------------
N = (ARGV.shift || 1000).to_i
A = %w(a b c)
Perms = %w(abc acb bac bca cab cba)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do
sorted = A.dup.sort_by { rand }
Score[sorted.join("")] += 1
end

Score.keys.sort.each do |key|
puts "#{key}: #{Score[key]}"
end
-------------------------------------------------

Gives:

$ ruby sortdist.rb 100000
abc: 16693
acb: 16688
bac: 16752
bca: 16590
cab: 16475
cba: 16802

Thats a pretty good distribution for the number of iterations.

-- Jim Weirich
 
M

Mauricio Fernández

Actually, it will. sort_by uses a Schwartzian transform to do the
sorting. That means that rand is only called once for each element of
the array. As long as rand gives a decent distribution of random
numbers, the permutation will be random too.

sort_by{ rand } is actually biased, since sort_by will preserve the
relative order of the elements for which rand() returned the same value.

%w[a b c d].sort_by{ 0 } # => ["a", "b", "c", "d"]
i = 0
%w[a b c d].sort_by{ 10 - (i += 1) / 2 } # => ["d", "b", "c", "a"]

This means that permutations preserving the relative order of one (or
more) pair of elements of the original array are a bit more probable.

In praxis, the bias this introduces is often so small that the mere
succinctness of sort_by{ rand } more than makes up for it.
 
A

ara.t.howard

Actually, it will. sort_by uses a Schwartzian transform to do the
sorting. That means that rand is only called once for each element of
the array. As long as rand gives a decent distribution of random
numbers, the permutation will be random too.

sort_by{ rand } is actually biased, since sort_by will preserve the
relative order of the elements for which rand() returned the same value.

%w[a b c d].sort_by{ 0 } # =3D> ["a", "b", "c",= "d"]
i =3D 0
%w[a b c d].sort_by{ 10 - (i +=3D 1) / 2 } # =3D> ["d", "b", "c= ", "a"]

This means that permutations preserving the relative order of one (or
more) pair of elements of the original array are a bit more probable.

In praxis, the bias this introduces is often so small that the mere
succinctness of sort_by{ rand } more than makes up for it.

does

%w( a b c d ).sort{ rand <=3D> rand }

help?

-a
--=20
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
| ara [dot] t [dot] howard [at] noaa [dot] gov
| all happiness comes from the desire for others to be happy. all misery
| comes from the desire for oneself to be happy.
| -- bodhicaryavatara
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
 
B

Brian Buckley

sort_by{ rand } is actually biased, since sort_by will preserve the
relative order of the elements for which rand() returned the same value.

How frequently would rand return a same value? (In theory it would be
never but does anyone know the reality?)
 
S

Stephen Waits

How frequently would rand return a same value? (In theory it would be
never but does anyone know the reality?)

According to the Pickaxe, Kernel.rand is an implementation of the
[Mersenne Twister][1] algorithm. If that is still the case, then in
reality it will begin repeating exactly the same numbers after
2^19937-1 (approx. 4e6001) pseudo-random numbers, as that's the
period of the MT. The period is the point at which the exact same
series of PRNs will begin again. For example, if it started out as
1, 2, 3, ..., then after 2^19937-1, it'd start again at 1, 2, 3.

But, that's not what you really care about.

Numbers will certainly repeat within the period, since the period is
so much larger than our integer representation. For example, a 32
bit integer can only represent about 4e9. So, the likelihood of
getting any specific integer, in the 32 bit case, is 1/(2^32). This
is an extreme example:

5.times { p rand(2) }
5.times { p rand(1000) }

The bottom line is that the PRNs should be approximately uniformly
distributed. Apply that to your specific range, and you have your
answer.

--Steve

[1]: http://en.wikipedia.org/wiki/Mersenne_twister
 
S

Stephen Waits

Numbers will certainly repeat within the period, since the period
is so much larger than our integer representation. For example, a
32 bit integer can only represent about 4e9. So, the likelihood of
getting any specific integer, in the 32 bit case, is 1/(2^32).

I should add to this that you can approximate when you'll get
repeats, and it's likely to be much sooner than you'd guess. For
more information, see the [Birthday Problem][1].

--Steve

[1]: http://mathworld.wolfram.com/BirthdayProblem.html
 
K

Kero

How about just doing this?
class Array
def shuffle
newarr = self
self.length.times do |x|
y = rand(self.length)
newarr[x], newarr[y] = newarr[y], newarr[x]
end
newarr
end
end

shuffled_array = my_array.shuffle

"Just" doing that
1) changes the receiver (use new_arr = self.dup instead)
2) definitely has a bias. Look e.g. at an array of length three. You will
first swap the first element with 3 possibilities, then the second and so
on; you're going through 3**3 possibilities. There are 3! possible
permutations, but alas, 3! == 6 is not a perfect divisor of 3**3 == 27,
thus you have a bias.
 
S

Stephen Waits

1) changes the receiver (use new_arr = self.dup instead)
2) definitely has a bias. Look e.g. at an array of length three.
You will
first swap the first element with 3 possibilities, then the
second and so
on; you're going through 3**3 possibilities. There are 3! possible
permutations, but alas, 3! == 6 is not a perfect divisor of 3**3
== 27,
thus you have a bias.

Good points, thanks. How about this?

class Array
def shuffle
newarr = self.dup
2.times do
self.length.times do |x|
y = rand(self.length)
newarr[x], newarr[y] = newarr[y], newarr[x]
end
end
newarr
end
end

# test code lifted from Jim Weirich's earlier post
N = (ARGV.shift || 1000).to_i
A = %w(a b c)
Perms = %w(abc acb bac bca cab cba)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do
sorted = A.shuffle
Score[sorted.join("")] += 1
end

Score.keys.sort.each do |key|
puts "#{key}: #{Score[key]}"
end


[~/Code/private/code/snippets] 382% ./shuffle.rb 100000
abc: 16596
acb: 16394
bac: 16700
bca: 16684
cab: 16841
cba: 16785

0:07.87 seconds, 96.4% CPU


This is really just meant to be a simple hack. I think a better
algorithm could assign a random index to each element, then sort on
those indices.

--Steve
 
M

Mauricio Fernández

e.

I did some numbers and found the following probability estimates:

array size P(#rand() collision)
1000 5.54558e-11
10000 5.55056e-09
1000000 5.55096e-05
10000000 5.53574e-03
=20
A collision implies that the associated pair of elements has got the same
ordering in the output array, instead of 50-50 chances of it being revers=
ed.

More info, including the Ruby code I used, available at
http://eigenclass.org/hiki.rb?sort_by+rand+is+biased
does
=20
%w( a b c d ).sort{ rand <=3D> rand }

help?

I'm not totally sure it's really unbiased, but at first sight it looks go=
od.

When called with no arguments, rand will use

static double
genrand_real(void)
{=20
unsigned long a=3Dgenrand_int32()>>5, b=3Dgenrand_int32()>>6;=20
return(a*67108864.0+b)*(1.0/9007199254740992.0);=20
}

so it would seem that two consecutive calls to rand() cannot return the
same value, but I still have to think about the effect of qsort.

It is however somewhat slower than sort_by{ rand }.

[100, 1000, 10000, 100000].each do |n|
GC.start
t0 =3D Time.new
(1..n).sort_by{rand}
a =3D (t1 =3D Time.new) - t0 # =3D> 0.000425, 0.003002, 0.=
054392, 0.939692
(1..n).sort{rand <=3D> rand}=20
b =3D Time.new - t1 # =3D> 0.001161, 0.026275, 0.28=
9807, 3.791833
"%4.2f" % [b / a] # =3D> "2.73", "8.75", "5.33", "4=
04"
end
RUBY_VERSION # =3D> "1.8.3"
RUBY_RELEASE_DATE # =3D> "2005-09-24"


--=20
Mauricio Fernandez
 
S

Stephen Waits

Mauricio said:
=20
It is however somewhat slower than sort_by{ rand }.

It's considerably slower. Changing it to sort { rand(10000) <=3D> 5000 }=
=20
shows a 30% speedup on 10k element Arrays on my system. Don't know why=20
I chose those numbers, I suppose bigger numbers reduce the "0"=20
comparison, considering { rand(3) <=3D> 1 }.

Anyway, I put together a small benchmark to compare my (revised)=20
shuffle, to sort, and sort_by:

#!/usr/bin/env ruby

class Array
def shuffle
newarr =3D self.dup
2.times do
self.length.times do |x|
y =3D rand(self.length)
newarr[x], newarr[y] =3D newarr[y], newarr[x]
end
end
newarr
end
end

printf(" size shuffle sort sort_by\n")

[10, 100, 1000, 10000].each do |n|

# populate an Array size n
a =3D Array.new(n) { |i| i }

# my method
start =3D Time.new
100.times do
b =3D a.shuffle
end
a_time =3D Time.new - start

# Array's sort
start =3D Time.new
100.times do
b =3D a.sort { rand(10000) <=3D> 5000 }
end
b_time =3D Time.new - start

# Enumerable's sort_by
start =3D Time.new
100.times do
b =3D a.sort_by { rand }
end
c_time =3D Time.new - start

# results
printf("%6d %5.2f %5.2f %5.2f\n", n, a_time, b_time, c_time)
=09
end


Which, on my P4/3.0GHz (Win32) emits this:

size shuffle sort sort_by
10 0.00 0.02 0.00
100 0.16 0.13 0.03
1000 1.61 2.03 0.44
10000 16.31 28.14 4.78

--Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top