[ANN] rand.rb 0.9: Random access methods for Enumerables

Ilmari Heikkinen · Dec 15, 2004

Hello all, here's a little convenience library we whipped up a couple
weeks back with Christian Neukirchen. It's very simple, but we hope
someone might find it useful

---

rand.rb is a library for picking random elements and shuffling.
This work is licensed under the same terms as Ruby itself.

Overview
========
rand.rb adds new methods to Enumerable, Array, and Hash to:

* return a random element (pick, pick_index, pick_key, pick_value
and their destructive versions suffixed with !).
* arrange elements in new, random order (shuffle,
shuffle_hash_pairs, shuffle_hash).
* use above methods in convenient ways (each_random, map_random).

It also provides these new facilities to String:

* shuffle_chars, to arrange the characters of the string in new
order.
* pick_byte, pick_char and pick_index, to return random bytes,
characters or elements.

Examples:

arr = (1..10).to_a
arr.pick # => 3
arr.pick # => 7
arr.shuffle # => [3,5,2,10,8,7,6,1,9,4]

str = "string strung strong rang"
str.pick_char # => 'n'
str.pick_byte # => 97
str.shuffle_chars # => "srsntnignrgrgttsuan r og"

hash = {'a'=>2, 'b'=>3, 'c'=>4}
hash.pick! #=> ['c', 4]
hash #=> {'a'=>2, 'b'=>3}
hash.pick_key #=> 'b'
hash.pick_value #=> 2

File.open("foo"){|f|
f.each_random{|line| puts line } # about equal to
readlines.shuffle.each{...}
}

History
=======
* November 26, 2004: Initial version done as IRC collaboration
project between kig and chris2.
* November 29, 2004: First public release 0.9.

Getting it
==========
rand.rb is packaged in RPA.
rpa install rand

You can download rand.rb at:
http://kronavita.de/chris/data/rand-0.9.0.tar.gz

Upstream source:
You can get latest development releases using darcs by executing:

darcs get http://kronavita.de/chris/data/rand

darcs send is the preferred way to send patches, but mailing diffs is
fine too.

Authors
=======
* Ilmari Heikkinen: Code and unit tests
* Christian Neukirchen: Documentation and housekeeping.

Please mail bugs, feature requests or patches to the mail addresses
above or use IRC to contact the developers.

Copyright
=========
Copyright 2004 Ilmari Heikkinen < kig misfiring net >
Parts Copyright 2004 Christian Neukirchen < chneukirchen gmail com >

P.S. I guess it's bad mojo to start versions from 0.9, but it _is_ very
simple

Ralf Müller · Dec 15, 2004

Am Mittwoch, 15. Dezember 2004 01:24 schrieb Ilmari Heikkinen:

Hello all, here's a little convenience library we whipped up a couple
weeks back with Christian Neukirchen. It's very simple, but we hope
someone might find it useful
---

rand.rb is a library for picking random elements and shuffling.
This work is licensed under the same terms as Ruby itself.

Overview
========
rand.rb adds new methods to Enumerable, Array, and Hash to:

* return a random element (pick, pick_index, pick_key, pick_value
and their destructive versions suffixed with !).
* arrange elements in new, random order (shuffle,
shuffle_hash_pairs, shuffle_hash).
* use above methods in convenient ways (each_random, map_random).

It also provides these new facilities to String:

* shuffle_chars, to arrange the characters of the string in new
order.
* pick_byte, pick_char and pick_index, to return random bytes,
characters or elements.

Examples:

arr = (1..10).to_a
arr.pick # => 3
arr.pick # => 7
arr.shuffle # => [3,5,2,10,8,7,6,1,9,4]

str = "string strung strong rang"
str.pick_char # => 'n'
str.pick_byte # => 97
str.shuffle_chars # => "srsntnignrgrgttsuan r og"

hash = {'a'=>2, 'b'=>3, 'c'=>4}
hash.pick! #=> ['c', 4]
hash #=> {'a'=>2, 'b'=>3}
hash.pick_key #=> 'b'
hash.pick_value #=> 2

File.open("foo"){|f|
f.each_random{|line| puts line } # about equal to
readlines.shuffle.each{...}
}

History
=======
* November 26, 2004: Initial version done as IRC collaboration
project between kig and chris2.
* November 29, 2004: First public release 0.9.

Getting it
==========
rand.rb is packaged in RPA.
rpa install rand

You can download rand.rb at:
http://kronavita.de/chris/data/rand-0.9.0.tar.gz

Upstream source:
You can get latest development releases using darcs by executing:

darcs get http://kronavita.de/chris/data/rand

darcs send is the preferred way to send patches, but mailing diffs is
fine too.

Authors
=======
* Ilmari Heikkinen: Code and unit tests
* Christian Neukirchen: Documentation and housekeeping.

Please mail bugs, feature requests or patches to the mail addresses
above or use IRC to contact the developers.

Copyright
=========
Copyright 2004 Ilmari Heikkinen < kig misfiring net >
Parts Copyright 2004 Christian Neukirchen < chneukirchen gmail com >

P.S. I guess it's bad mojo to start versions from 0.9, but it _is_ very
simple

Hi Ilmari,
its a bit OT, but I'd like to know, how you test the methods above.
How could you know, there is as much randomness as you want?

regards
ralf

Neil Spring · Dec 15, 2004

rand.rb is a library for picking random elements and shuffling.
This work is licensed under the same terms as Ruby itself.

umm, could you maybe replace:

def shuffle!
sort!{rand <=> 0.5}
end

with:

def shuffle!
each_index {|j| i = rand(size-j); self[j], self[j+i] = self[j+i],
self[j]}
self
end

I would not rely on sort to do the right thing, and it is less
efficient than necessary. I use the second shuffle code extensively in
my code which deals with shuffling lots and lots of elements.
Presumably the second is O(n) where the sort based scheme is at best
O(n log n). A little googling suggests this is the "Fisher-Yates
shuffle."

-neil

Ilmari Heikkinen · Dec 15, 2004

Hi,

Hi Ilmari,
its a bit OT, but I'd like to know, how you test the methods above.
How could you know, there is as much randomness as you want?

Short answer: I don't. Bad, yes.

Long answer:
The only probabilistic test is for #shuffle, which tries 10 times to
get a shuffled array that != original array. The rest trust that
Kernel#rand provides random numbers and compare the picked values to
the original array to check that they all are from the original array.

I guess one way to test the randomness would be to collect an array
from the picked values that's much larger than the original. Then check
the distribution by checking that the entry counts normalized to array
length (amt of item / array.length) are close enough to the original
array's normalized counts. And check the randomness of the order by
gzipping a long randomized string and checking that it compresses
badly.

-Ilmari

Brian Schröder · Dec 15, 2004

rand.rb is a library for picking random elements and shuffling.
This work is licensed under the same terms as Ruby itself.

Click to expand...

umm, could you maybe replace:

def shuffle!
sort!{rand <=> 0.5}
end

with:

def shuffle!
each_index {|j| i = rand(size-j); self[j], self[j+i] = self[j+i],
self[j]}
self
end

I would not rely on sort to do the right thing, and it is less
efficient than necessary. I use the second shuffle code extensively in
my code which deals with shuffling lots and lots of elements.
Presumably the second is O(n) where the sort based scheme is at best
O(n log n). A little googling suggests this is the "Fisher-Yates
shuffle."

I don't know if the first proposed solution is valid, because it depends on how
sort is implemented and if the sort algorithm makes each comparision only once.
Otherwise I imagine weird things happening, if elements change their order
while sorting.

So the correct way to do this is

sort_by{rand}

Fisher Yates is a nice, unbiased O(n) shuffler. I personally more often use the
sort shuffler, because it is so amazingly wordy and I can remember it.

Regards,

Brian

Ilmari Heikkinen · Dec 15, 2004

def shuffle!
each_index {|j| i = rand(size-j); self[j], self[j+i] = self[j+i],
self[j]}
self
end

I would not rely on sort to do the right thing, and it is less
efficient than necessary. I use the second shuffle code extensively
in my code which deals with shuffling lots and lots of elements.
Presumably the second is O(n) where the sort based scheme is at best
O(n log n). A little googling suggests this is the "Fisher-Yates
shuffle."

Thank you, changing #shuffle! to that and #shuffle to dup.shuffle!

-Ilmari

trans. (T. Onoma) · Dec 15, 2004

I wonder. Do others see #pick in relation to #pop and #push as I do? Might
#pick be destructive instead of #pick! and using something like Galvin's
#rand for non-destructive?

Thoughts?
T.

On Tuesday 14 December 2004 07:24 pm, Ilmari Heikkinen wrote:
| Hello all, here's a little convenience library we whipped up a couple
| weeks back with Christian Neukirchen. It's very simple, but we hope
| someone might find it useful

| ---
|
| rand.rb is a library for picking random elements and shuffling.
| This work is licensed under the same terms as Ruby itself.
|
| Overview
| ========
| rand.rb adds new methods to Enumerable, Array, and Hash to:
|
| * return a random element (pick, pick_index, pick_key, pick_value
| and their destructive versions suffixed with !).
| * arrange elements in new, random order (shuffle,
| shuffle_hash_pairs, shuffle_hash).
| * use above methods in convenient ways (each_random, map_random).
|
| It also provides these new facilities to String:
|
| * shuffle_chars, to arrange the characters of the string in new
| order.
| * pick_byte, pick_char and pick_index, to return random bytes,
| characters or elements.
|
| Examples:
|
| arr = (1..10).to_a
| arr.pick # => 3
| arr.pick # => 7
| arr.shuffle # => [3,5,2,10,8,7,6,1,9,4]
|
| str = "string strung strong rang"
| str.pick_char # => 'n'
| str.pick_byte # => 97
| str.shuffle_chars # => "srsntnignrgrgttsuan r og"
|
| hash = {'a'=>2, 'b'=>3, 'c'=>4}
| hash.pick! #=> ['c', 4]
| hash #=> {'a'=>2, 'b'=>3}
| hash.pick_key #=> 'b'
| hash.pick_value #=> 2
|
| File.open("foo"){|f|
| f.each_random{|line| puts line } # about equal to
| readlines.shuffle.each{...}
| }
|
| History
| =======
| * November 26, 2004: Initial version done as IRC collaboration
| project between kig and chris2.
| * November 29, 2004: First public release 0.9.
|
| Getting it
| ==========
| rand.rb is packaged in RPA.
| rpa install rand
|
| You can download rand.rb at:
| http://kronavita.de/chris/data/rand-0.9.0.tar.gz
|
| Upstream source:
| You can get latest development releases using darcs by executing:
|
| darcs get http://kronavita.de/chris/data/rand
|
| darcs send is the preferred way to send patches, but mailing diffs is
| fine too.
|
| Authors
| =======
| * Ilmari Heikkinen: Code and unit tests
| * Christian Neukirchen: Documentation and housekeeping.
|
| Please mail bugs, feature requests or patches to the mail addresses
| above or use IRC to contact the developers.
|
| Copyright
| =========
| Copyright 2004 Ilmari Heikkinen < kig misfiring net >
| Parts Copyright 2004 Christian Neukirchen < chneukirchen gmail com >
|
| P.S. I guess it's bad mojo to start versions from 0.9, but it _is_ very
| simple

--
( o _ ã‚«ãƒ©ãƒ
// trans.
/ \ (e-mail address removed)

ruby -rdrb -e
'DRb.start_service;duck=DRbObject.new(nil,"druby://whytheluckystiff.net:6503");puts
duck.toms'

I don't give a damn for a man that can only spell a word one way.
-Mark Twain

[8,16,20,29,78,65,2,14,26,12,12,28,71,114,12,13,12,82,72,21,17,4,10,2,95].
each_with_index{|x,i| $><<(x^'Begin landing your troops').chr}
-Tadayoshi Funaba

trans. (T. Onoma) · Dec 16, 2004

| > On Dec 14, 2004, at 7:24 PM, Ilmari Heikkinen wrote:
| >> rand.rb is a library for picking random elements and shuffling.
| >> This work is licensed under the same terms as Ruby itself.
| >
| > umm, could you maybe replace:
| >
| > def shuffle!
| > sort!{rand <=> 0.5}
| > end
| >
| > with:
| >
| > def shuffle!
| > each_index {|j| i = rand(size-j); self[j], self[j+i] = self[j+i],
| > self[j]}
| > self
| > end
|
| I made this a function a few days ago and it seems to be a lot faster.
| Not sure if it's completely correct though.
|
| def shuffle!
| a = 0
| while(a<length)
| b = (length*Kernel.rand).to_i
| tmp = self[a]
| self[a] = self
| self = tmp
| a+=1
| end
| self
| end

Well, lets see... a couple of equivalent transformations and:

def shuffle!
ln = length
ln.times do |a|
b = Kernel.rand(ln)
self[a], self = self, self[a]
end
self
end

If you look closely, it is almost the very same thing. The real difference
lies in the the subtraction on the index being fed into #rand. I think this
is to improve the randomness a bit.

How much faster does it run?

T.

nobu.nokada · Dec 16, 2004

Hi,

At Thu, 16 Dec 2004 07:12:15 +0900,
Maarten Boonen wrote in [ruby-talk:123730]:

I made this a function a few days ago and it seems to be a lot faster.
Not sure if it's completely correct though.

def shuffle!
a = 0
while(a<length)
b = (length*Kernel.rand).to_i
tmp = self[a]
self[a] = self
self = tmp
a+=1
end
self
end

It wouldn't be uniform. An element which was close to the head
will tend to be placed close to the tail.

trans. (T. Onoma) · Dec 16, 2004

On Thursday 16 December 2004 03:27 am, Maarten Boonen wrote:
|
| Just a very small test: (code below)
|
| -> third is your equivalent code, self[a], self = self, self[a]
| seems to slow it down quite a lot.
|
| >ruby test3.rb
|
| 1.272
| 0.571
| 1.042
|
| >Exit code: 0
| >ruby test3.rb
|
| 1.262
| 0.601
| 1.042
|
| >Exit code: 0
| >ruby test3.rb
|
| 1.302
| 0.551
| 1.041
|
| >Exit code: 0

Hmm.. Interesting. Do you mind changing version 3 to use a tmp variable, like
yours, and see how they compare again?

T.

Ilmari Heikkinen · Dec 17, 2004

How could you know, there is as much randomness as you want?

Here's a method for testing bias in array shuffle. I'd be interested in
hearing about other approaches to probabilistic testing.. anyone?

module RandTests
# RandTests.check_bias{|arr| arr.shuffle!}
# => [true, {[2, 3, 1]=>8407, [2, 1, 3]=>8323, [3, 2, 1]=>8382,
# [1, 2, 3]=>8442, [3, 1, 2]=>8255, [1, 3, 2]=>8191}]
#
# Caveat!
# Only works for (very) small arrays.
# E.g. array of size 10 has 10! permutations = 3628800,
# which means that you'd have to do around 10^9 to 10^10 tests
# for the bias check to work properly and even then the permissible
bias
# would have to be large.
#
def self.check_bias(
array_size=3,
test_count=50000,
permissible_bias=0.02,
&block
)
total_permutation_count = (1..array_size).inject{|s, i| s*i} #
factorial
array = (1..array_size).to_a
avg_permutation_count = test_count.to_f / total_permutation_count

permutation_counts = {}
(1..test_count).each{
perm = block.call(array.dup)
permutation_counts[perm] ||= 0
permutation_counts[perm] += 1
}
test_result = permutation_counts.all?{|perm, count|
count.between?(
avg_permutation_count - permissible_bias*avg_permutation_count,
avg_permutation_count + permissible_bias*avg_permutation_count)
}
[test_result, permutation_counts]
end
end

-Ilmari

Glenn Parker · Dec 17, 2004

Ilmari said:
Here's a method for testing bias in array shuffle. I'd be interested in
hearing about other approaches to probabilistic testing.. anyone?

How about examining the distributions of the displacement of each item
from its original location for a large number of shuffles? The
distribution should be relatively flat across all possible displacements.

Glenn Parker · Dec 17, 2004

--------------020408010501070609070009
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Ilmari said:
Here's a method for testing bias in array shuffle. I'd be interested in
hearing about other approaches to probabilistic testing.. anyone?

Here's an alternative that measures shuffle offsets and visibly shows
the difference between two shuffle methods.

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/>

--------------020408010501070609070009
Content-Type: text/plain;
name="shuffle_test.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="shuffle_test.rb"

#!ruby

class ShuffleTester
def reset(size)
@size = size
@counts = [].fill(0, 0, @size)
@stored = 0
end
def analyze(array)
array.each_with_index do |value, index|
# This is the key metric. The index is the shuffled position
# and the value is the starting position. The difference modulo
# the array size is the shuffle offset.
offset = (index - value) % @size
@counts[offset] += 1
end
@stored += 1
end
def test(array_size, test_count, shuffle)
@shuffle = shuffle
reset(array_size)
test_count.times do
analyze( (1..array_size).to_a.send(shuffle) )
end
end
def summary
puts "summary for #{@shuffle}"
min_count = @stored
max_count = 0
@counts.each_with_index do |count, index|
bargraph = "#" * (50 * count / @stored)
printf "%4d: %6d %s\n", index, count, bargraph
min_count = count if (min_count > count)
max_count = count if (max_count < count)
end
printf "%4s: %6d\n", "min", min_count
printf "%4s: %6d\n\n", "max", max_count
end
end

module Enumerable
def shuffle1!
each_index {|j| i = rand(size-j); self[j], self[j+i] = self[j+i], self[j]}
self
end

def shuffle2!
sort!{rand <=> 0.5}
end
end

ARRAYSIZE = 20
TESTCOUNT = 10000

tester = ShuffleTester.new
tester.test(ARRAYSIZE, TESTCOUNT, :shuffle1!)
tester.summary
tester.test(ARRAYSIZE, TESTCOUNT, :shuffle2!)
tester.summary

--------------020408010501070609070009--

[ANN] Help 0.1: access ri from irb (using the ri library)	5	Dec 22, 2004
[ANN] rs 0.1	3	Oct 12, 2006
ANN: PyTables 0.9 released	0	Nov 5, 2004
Ruby Weekly News 13th - 19th December 2004	0	Dec 22, 2004
[ANN] IHelp 0.3.0	5	Feb 25, 2005
[ANN] rs 0.1.2	0	Oct 19, 2006
[ANN] Ruwiki 0.9.3	2	Dec 22, 2004
Ruby Weekly News 6th - 12th December 2004	6	Dec 14, 2004

[ANN] rand.rb 0.9: Random access methods for Enumerables

Ilmari Heikkinen

Ralf Müller

Neil Spring

Ilmari Heikkinen

Brian Schröder

Ilmari Heikkinen

trans. (T. Onoma)

trans. (T. Onoma)

nobu.nokada

trans. (T. Onoma)

Ilmari Heikkinen

Glenn Parker

Glenn Parker

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads