# fastest method to choose a random element

Discussion in 'Python' started by caca@mailinator.com, Jan 4, 2008.

1. ### Guest

Hello,
This is a question for the best method (in terms of performance
only) to choose a random element from a list among those that satisfy
a certain property.

This is the setting: I need to pick from a list a random element
that satisfies a given property. All or none of the elements may have
the property. Most of the time, many of the elements will satisfy the
property, and the property is a bit expensive to evaluate. Chance of
having the property are uniform among elements.

A simple approach is:

import random
def random_pick(a_list,property):
'''Returns a random element from a list that has the property

Returns None if no element has the property
'''
random.shuffle(a_list)
for i in a_list:
if property(i): return i

but that requires to shuffle the list every time.

A second approach, that works if we know that at least one element of
the list has the property, is:

import random
def random_pick(a_list,property):
'''Returns a random element from a list that has the property

Loops forever if no element has the property
'''
while 1:
i=random.choice(a_list)
if property(i): return i

which is more efficient (on average) if many elements of the list have
the property and less efficient if only few elements of the list has
the property (and goes crazy if no element has the property)

Yet another one:

import random
def random_pick(a_list,property):
'''Returns a random element from a list that has the property
'''
b_list=[x for x in a_list if property(x)]
try:
return random.choice(b_list)
finally: return None

but this one checks the property on all the elements, which is no
good.

I don't need strong random numbers, so a simple solution like:
import random
globalRNG=random.Random()

def random_pick(a_list,property):
'''Returns a random element from a list that has the property

Works only if len(a_list)+1 is prime: uses Fermat's little theorem
'''
a=globalRNG(1,len(a_list))
ind=a
for i in xrange(len(a_list)):
x=a_list[a-1]
if property(x):return x
ind*=a

but this works only if len(a_list)+1 is prime!!! Now this one could be
saved if there were an efficient method to find a prime number given
than a number n but not very much greater...

Any other ideas? Thanks everybody

, Jan 4, 2008

On Jan 4, 7:55 pm, wrote:
> Hello,
> This is a question for the best method (in terms of performance
> only) to choose a random element from a list among those that satisfy
> a certain property.
>
> This is the setting: I need to pick from a list a random element
> that satisfies a given property. All or none of the elements may have
> the property. Most of the time, many of the elements will satisfy the
> property, and the property is a bit expensive to evaluate. Chance of
> having the property are uniform among elements.
>
> A simple approach is:
>
> import random
> def random_pick(a_list,property):
> '''Returns a random element from a list that has the property
>
> Returns None if no element has the property
> '''
> random.shuffle(a_list)
> for i in a_list:
> if property(i): return i
>
> but that requires to shuffle the list every time.
>
> A second approach, that works if we know that at least one element of
> the list has the property, is:
>
> import random
> def random_pick(a_list,property):
> '''Returns a random element from a list that has the property
>
> Loops forever if no element has the property
> '''
> while 1:
> i=random.choice(a_list)
> if property(i): return i
>
> which is more efficient (on average) if many elements of the list have
> the property and less efficient if only few elements of the list has
> the property (and goes crazy if no element has the property)
>
> Yet another one:
>
> import random
> def random_pick(a_list,property):
> '''Returns a random element from a list that has the property
> '''
> b_list=[x for x in a_list if property(x)]
> try:
> return random.choice(b_list)
> finally: return None
>
> but this one checks the property on all the elements, which is no
> good.
>
> I don't need strong random numbers, so a simple solution like:
> import random
> globalRNG=random.Random()
>
> def random_pick(a_list,property):
> '''Returns a random element from a list that has the property
>
> Works only if len(a_list)+1 is prime: uses Fermat's little theorem
> '''
> a=globalRNG(1,len(a_list))
> ind=a
> for i in xrange(len(a_list)):
> x=a_list[a-1]
> if property(x):return x
> ind*=a
>
> but this works only if len(a_list)+1 is prime!!! Now this one could be
> saved if there were an efficient method to find a prime number given
> than a number n but not very much greater...
>
> Any other ideas? Thanks everybody

Caching might help.

If random_pick is called several times with the same list(s) then
cache the result of
[property(i) for i in a_list] against a_list.

If random_pick is called several times with list(s) whith multiple
instances of list items then cache
property(i) against i for i in a_list .

You could do both.

You might investigate wether property(i) could be implemented using a
faster algorithm, maybe table lookup, or interpolation from initial
table lookup.

3. ### Guest

> Caching might help.
>
> If random_pick is called several times with the same list(s) then
> cache the result of
> [property(i) for i in a_list] against a_list.
>
> If random_pick is called several times with list(s) with multiple
> instances of list items then cache
> property(i) against i for i in a_list .
>
> You could do both.
>
> You might investigate wether property(i) could be implemented using a
> faster algorithm, maybe table lookup, or interpolation from initial
> table lookup.
>

problem.
By the way, I noticed two things:

I forgot to write randint in the third approach:

> > a=globalRNG.randint(1,len(a_list))

The warning "The group you are posting to is a Usenet group. Messages
the Internet." means a person, but not a bot, may see my email
address, so it is safe to use my real address next time...

, Jan 5, 2008
4. ### Paul HankinGuest

On Jan 4, 7:55 pm, wrote:
>   Hello,
>   This is a question for the best method (in terms of performance
> only) to choose a random element from a list among those that satisfy
> a certain property.
>
>   This is the setting: I need to pick from a list a random element
> that satisfies a given property. All or none of the elements may have
> the property. Most of the time, many of the elements will satisfy the
> property, and the property is a bit expensive to evaluate. Chance of
> having the property are uniform among elements.

Here's some code that uses a cached random sorting of the list. It
assumes you'll want more than one random element. It never calls
'prop' on the same element twice, and it's O(n) even when the elements
that pass 'prop' are sparse. I hope this is useful to you!

import random

class RandomPicker(object):
def __init__(self, seq, prop=lambda x:True):
seq = list(seq)
random.shuffle(seq)
# Store with the item whether we've computed prop on it
self.random_list = [(x, False) for x in seq]
self.prop = prop
def pick(self):
for i, (x, p) in enumerate(self.random_list):
if p or self.prop(x):
if not p:
# Record the fact that self.prop passed.
self.random_list = (x, True)
# Chop off the items that prop failed on.
self.random_list = self.random_list[i:]
r = self.random_list
# Instead of shuffling the whole list again, just
insert
# x back in the list randomly. Since the remaining
elements
# are randomly sorted already, this is ok.
n = random.randint(0, len(r) - 1)
r[0], r[n] = r[n], r[0]
return x
# Nothing in the list passes the 'prop' test.
return None

# Example: pick 100 odd integers from 0 to 1000.
a = RandomPicker(xrange(1000), lambda x: x % 2 == 1)
print [a.pick() for i in xrange(100)]

--
Paul Hankin

Paul Hankin, Jan 5, 2008
5. ### Arnaud DelobelleGuest

On Jan 4, 7:55 pm, wrote:
>   Hello,
>   This is a question for the best method (in terms of performance
> only) to choose a random element from a list among those that satisfy
> a certain property.
>
>   This is the setting: I need to pick from a list a random element
> that satisfies a given property. All or none of the elements may have
> the property. Most of the time, many of the elements will satisfy the
> property, and the property is a bit expensive to evaluate. Chance of
> having the property are uniform among elements.

The generator function below yields an infinite sequence of randomly
picked elements from the list who satisfy the property, or nothing if
the list contains no element satisfying the property. It guarantees
that each time, prop() will either not be called or will be called
just enough to find one single item that satisfies it. The catch is
that you need to have an estimate for the number of items that satisfy
the property in the list.

import random
from itertools import islice, ifilter

def picker(lst, prop, np):
# lst: list of items to pick elements from
# prop: property that picked elements must fulfil
# np: (good estimate of) number of items that
# satisfy the property
random.shuffle(lst)
plst = [] # items we know fulfil prop
for item in ifilter(prop, lst):
# The next random item may be one already yielded
while True:
i = random.randrange(np)
if i >= len(plst): break
yield plst
# Or it may be a new one
plst.append(item)
if len(plst) > np: np = len(plst)
yield item
# Now we know all items fulfilling prop
if not plst: return
while True:
yield plst[random.randrange(len(plst))]

def test(picker, n=1000):
res = {}
for val in islice(picker, n):
res[val] = res.get(val, 0) + 1
return res

Example:

>>> p = picker(range(20), lambda x: x % 2, 10)
>>> test(p)

{1: 113, 3: 96, 5: 87, 7: 91, 9: 109, 11: 91, 13: 106, 15: 101, 17:
109, 19: 97}

>>> p = picker(range(20), lambda x: False, 10)
>>> p.next()

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration

I don't know if there is a good idea in there, I'll let you be the
judge

--
Arnaud

Arnaud Delobelle, Jan 5, 2008
6. ### Guest

Hello, Paul and Arnaud.
avoid shuffle?
It may take unnecessary long on a long list most of whose elements
have the property.

, Jan 5, 2008
7. ### Guest

On Jan 5, 5:07 pm, wrote:
> Hello, Paul and Arnaud.
> avoid shuffle?
> It may take unnecessary long on a long list most of whose elements
> have the property.

Umm...
You provide nice answers in the case many elements are picked from the
same list.
Any ideas for the case when the picker is called many times on a
program, but never twice with the same list?

, Jan 5, 2008
8. ### ajaksuGuest

Hi there

On Jan 5, 2:14 pm, wrote:
> On Jan 5, 5:07 pm, wrote:
>
> > Hello, Paul and Arnaud.
> > avoid shuffle?
> > It may take unnecessary long on a long list most of whose elements
> > have the property.

solution: you don't need to shuffle the list if you can construct a
random list of indexes to loop through, but my timings show that I
can't do that faster than "shuffle(range(x))" for worst cases (i.e.,
iterate thru the whole list)

The rather good news is that shuffling a list of arbitrary objects is
pretty fast (as compared to a list of integers).

OTOH, if you do know that the chances are high enough, you can try
choosing items randomly without substitution (adapted from random.py's
sample):

from random import random

def randpickuntil(lst, prop=bool):
selected = set()
for i in xrange(len(lst)):
j = int(random() * n)
while j in selected:
j = int(random() * n)
if prop(lst[j]):
return lst[j]

> Umm...
> You provide nice answers in the case many elements are picked from the
> same list.
> Any ideas for the case when the picker is called many times on a
> program, but never twice with the same list?

If the lists share items, a cache could help a lot. Otherwise, you'd
benefit more from finding a good way to test the property (could you
give a hint on what it is?). How do objects acquire that property, is
it a sum of independent factors?

HTH,
Daniel

ajaksu, Jan 5, 2008
9. ### Martin v. LöwisGuest

> Any other ideas?

def random_pick(list, property):
L = len(list)
pos = start = random.randrange(L)
while 1:
x = list[pos]
if property(x): return x
pos = (pos + 1) % L
if pos == start:
raise ValueError, "no such item"

Regards,
Martin

Martin v. Löwis, Jan 5, 2008
10. ### Paul HankinGuest

On Jan 5, 4:14 pm, wrote:
> On Jan 5, 5:07 pm, wrote:
>
> > Hello, Paul and Arnaud.
> > avoid shuffle?
> > It may take unnecessary long on a long list most of whose elements
> > have the property.

You could generate a random shuffle of range(len(seq)) lazily, and use
that to iterate over your sequence.

import random
import itertools

def randxrange(n):
"Generate the numbers 0 to n-1 in a random order"
x = range(n)
for i in xrange(n):
r = random.randrange(n - i)
yield x[r]
x[r] = x[n - i - 1]

def shuffled(seq):
"Generate the elements of seq in a random order"
return (seq for i in randxrange(len(seq)))

def pick_random(seq, prop):
return itertools.ifilter(prop, shuffled(seq)).next()

--
Paul Hankin

Paul Hankin, Jan 5, 2008
11. ### Paul HankinGuest

On Jan 5, 5:12 pm, "Martin v. Löwis" <> wrote:
> > Any other ideas?

>
>
> def random_pick(list, property):
>     L = len(list)
>     pos = start = random.randrange(L)
>     while 1:
>         x = list[pos]
>         if property(x): return x
>         pos = (pos + 1) % L
>         if pos == start:
>            raise ValueError, "no such item"

This might be acceptable for the OP's use, but it's strongly biased
towards values which follow a long stream of things that fail
property.

print [random_pick(range(100), lambda x:x > 90)
for i in range(999)].count(91)

929

If it were uniform, it should be around 111.

--
Paul Hankin

Paul Hankin, Jan 5, 2008
12. ### ajaksuGuest

> OTOH, if you do know that the chances are high enough, you can try
> choosing items randomly without substitution (adapted from random.py's
> sample):

Sorry, this works:

def randpickuntil(lst, prop=bool):
selected = set()
n = len(lst)
for i in xrange(n):
j = int(random() * n)
while j in selected:
j = int(random() * n)
if prop(lst[j]):
return lst[j]

Gotta say it's much faster on average than shuffle for moderately to
highly probable tests, but might become pathological (i.e. run for
minutes) for very large lists in which prop(x) is False for all.

ajaksu, Jan 5, 2008
13. ### Paul HankinGuest

On Jan 5, 4:14 pm, wrote:
> On Jan 5, 5:07 pm, wrote:
>
> > Hello, Paul and Arnaud.
> > avoid shuffle?
> > It may take unnecessary long on a long list most of whose elements
> > have the property.

>
> Umm...
> You provide nice answers in the case many elements are picked from the
> same list.
> Any ideas for the case when the picker is called many times on a
> program, but never twice with the same list?

Here's a pragmatic optimisation for any algorithm: first test some
elements at random in case you get lucky or most of the elements are
good.

Eg, that optimisation applied to the naive shuffle algorithm.

import random
import itertools

def pick_random_fast(seq, prop):
L = len(seq)
stabs = 5
for i in xrange(stabs):
r = random.randrange(L)
if prop(seq[r]): return seq[r]
random.shuffle(seq)
return itertools.ifilter(prop, seq).next()

I've used 5 'stabs' here. Perhaps it should be a function of L, but I
suppose you can tune it for your data.

--
Paul Hankin

Paul Hankin, Jan 5, 2008
14. ### Paul HankinGuest

On Jan 5, 5:12 pm, Paul Hankin <> wrote:
> On Jan 5, 4:14 pm, wrote:
>
> > On Jan 5, 5:07 pm, wrote:

>
> > > Hello, Paul and Arnaud.
> > > While I think about your answers: do you think there is any way to
> > > avoid shuffle?
> > > It may take unnecessary long on a long list most of whose elements
> > > have the property.

>
> You could generate a random shuffle of range(len(seq)) lazily, and use
> that to iterate over your sequence.
>
> import random
> import itertools
>
> def randxrange(n):
>     "Generate the numbers 0 to n-1 in a random order"
>     x = range(n)
>     for i in xrange(n):
>         r = random.randrange(n - i)
>         yield x[r]
>         x[r] = x[n - i - 1]
>
> def shuffled(seq):
>     "Generate the elements of seq in a random order"
>     return (seq for i in randxrange(len(seq)))
>
> def pick_random(seq, prop):
>     return itertools.ifilter(prop, shuffled(seq)).next()

At the risk of filling this thread with my posts, here's a much
simplified and faster version of this code that uses no extra storage.

import random

def pick_random(seq, prop):
L = len(seq)
for i in xrange(L):
r = random.randrange(L - i)
if prop(seq[r]):
return seq[r]
seq[r] = seq[L - i - 1]

--
Paul Hankin

Paul Hankin, Jan 5, 2008
15. ### Carl BanksGuest

On Sat, 05 Jan 2008 08:14:46 -0800, caca wrote:

> On Jan 5, 5:07 pm, wrote:
>> Hello, Paul and Arnaud.
>> avoid shuffle?
>> It may take unnecessary long on a long list most of whose elements have
>> the property.

>
> Umm...
> You provide nice answers in the case many elements are picked from the
> same list.
> Any ideas for the case when the picker is called many times on a
> program, but never twice with the same list?

ISTM the best thing would be to reimplement the shuffle algorithm, but to
stop shuffling as soon as you get a hit. The drawback is that it's a
destructive operation, but that doesn't sound like it's an issue for you.

Here's something for starters:

def random_element_with_property(x,test_property_func):
for i in xrange(len(x)-1):
j = random.randrange(i+1,len(x))
tmp = x[j]
if test_property_func(tmp):
return tmp
x[j] = x
x = tmp
return None

Then, for example, use it like this:

def odd(i): return i&1
e = random_element_with_property(range(20),odd)

Carl Banks

Carl Banks, Jan 5, 2008
16. ### Arnaud DelobelleGuest

On Jan 5, 8:50 pm, Paul Hankin <> wrote:
> On Jan 5, 5:12 pm, Paul Hankin <> wrote:
>
>
>
> > On Jan 5, 4:14 pm, wrote:

>
> > > On Jan 5, 5:07 pm, wrote:

>
> > > > Hello, Paul and Arnaud.
> > > > While I think about your answers: do you think there is any way to
> > > > avoid shuffle?
> > > > It may take unnecessary long on a long list most of whose elements
> > > > have the property.

>
> > You could generate a random shuffle of range(len(seq)) lazily, and use
> > that to iterate over your sequence.

>
> > import random
> > import itertools

>
> > def randxrange(n):
> >     "Generate the numbers 0 to n-1 in a random order"
> >     x = range(n)
> >     for i in xrange(n):
> >         r = random.randrange(n - i)
> >         yield x[r]
> >         x[r] = x[n - i - 1]

>
> > def shuffled(seq):
> >     "Generate the elements of seq in a random order"
> >     return (seq for i in randxrange(len(seq)))

>
> > def pick_random(seq, prop):
> >     return itertools.ifilter(prop, shuffled(seq)).next()

>
> At the risk of filling this thread with my posts, here's a much
> simplified and faster version of this code that uses no extra storage.
>
> import random
>
> def pick_random(seq, prop):
>     L = len(seq)
>     for i in xrange(L):
>         r = random.randrange(L - i)
>         if prop(seq[r]):
>             return seq[r]
>         seq[r] = seq[L - i - 1]
>
> --
> Paul Hankin

Great! This is so simple it must be the best way to do it. Thanks
for that: I could feel there must be an elegant way, but I couldn't
put my finger on it. The only downside, of course, is that it mutates
seq. Moreover, it can be slightly modified to be a generator (in case
one wants to draw many elements from the sequence), improving vastly
on the generator function I proposed earlier.

def pick_random(seq, prop):
for i in xrange(len(seq), 0, -1):
r = random.randrange(i)
if prop(seq[r]):
return seq[r]
seq[r] = seq[i - 1]

# Generator version.
def random_picker(seq, prop):
for i in xrange(len(seq), 0, -1):
while True:
r = random.randrange(i)
if not prop(seq[r]): break
yield seq[r]
seq[r] = seq[i - 1]

--
Arnaud

Arnaud Delobelle, Jan 5, 2008
17. ### bukzorGuest

On Jan 5, 8:14 am, wrote:
> On Jan 5, 5:07 pm, wrote:
>
> > Hello, Paul and Arnaud.
> > avoid shuffle?
> > It may take unnecessary long on a long list most of whose elements
> > have the property.

>
> Umm...
> You provide nice answers in the case many elements are picked from the
> same list.
> Any ideas for the case when the picker is called many times on a
> program, but never twice with the same list?

Here's my stab:

from random import randint, seed
from time import time
from sys import stdout
seed(time())

iterations = 0#DEBUG
def pick_random(seq, prop=bool):
temp = list(seq)
global iterations#DEBUG
while temp:
iterations += 1#DEBUG
i = randint(0, len(temp) - 1)
if prop(temp): return temp
else: del temp

def generate_list(length):
l = list()
for x in xrange(length): l.append(randint(0,1) * randint(1,1000))
return l

count = 0
for x in xrange(1000):
count += 1
print pick_random(generate_list(1000)),

print
print "AVERAGE ITERATIONS:", float(iterations) / count

The average number of iterations is 1/p where p is the chance of your
property being true. It's independent of list size! Just remove the
DEBUG lines and it's ready for use.

--Buck

bukzor, Jan 5, 2008
18. ### bukzorGuest

On Jan 5, 9:12 am, "Martin v. Löwis" <> wrote:
> > Any other ideas?

>
>
> def random_pick(list, property):
> L = len(list)
> pos = start = random.randrange(L)
> while 1:
> x = list[pos]
> if property(x): return x
> pos = (pos + 1) % L
> if pos == start:
> raise ValueError, "no such item"
>
> Regards,
> Martin

bool) the 1 will be returned four times as often as the 2. Maybe
that's ok...

bukzor, Jan 5, 2008
19. ### Guest

On Jan 5, 9:50 pm, Paul Hankin <> wrote:
> On Jan 5, 5:12 pm, Paul Hankin <> wrote:
>
>
>
> > On Jan 5, 4:14 pm, wrote:

>
> > > On Jan 5, 5:07 pm, wrote:

>
> > > > Hello, Paul and Arnaud.
> > > > While I think about your answers: do you think there is any way to
> > > > avoid shuffle?
> > > > It may take unnecessary long on a long list most of whose elements
> > > > have the property.

>
> > You could generate a random shuffle of range(len(seq)) lazily, and use
> > that to iterate over your sequence.

>
> > import random
> > import itertools

>
> > def randxrange(n):
> > "Generate the numbers 0 to n-1 in a random order"
> > x = range(n)
> > for i in xrange(n):
> > r = random.randrange(n - i)
> > yield x[r]
> > x[r] = x[n - i - 1]

>
> > def shuffled(seq):
> > "Generate the elements of seq in a random order"
> > return (seq for i in randxrange(len(seq)))

>
> > def pick_random(seq, prop):
> > return itertools.ifilter(prop, shuffled(seq)).next()

>
> At the risk of filling this thread with my posts, here's a much
> simplified and faster version of this code that uses no extra storage.
>
> import random
>
> def pick_random(seq, prop):
> L = len(seq)
> for i in xrange(L):
> r = random.randrange(L - i)
> if prop(seq[r]):
> return seq[r]
> seq[r] = seq[L - i - 1]
>
> --
> Paul Hankin

This one is good. Someone commented that you destroy the list, but
that can be fixed:

def pick_random(seq, prop):
L = len(seq)
for i in xrange(L):
r = random.randrange(L - i)
if prop(seq[r]):
return seq[r]
seq[r], seq[L - i - 1]= seq[L - i - 1],seq[r]

just pushing elements without the property to the end of the list
(list is mutated, true, but shuffle will do that too). In each
iteration of the for loop, there is only one random element, one check
of the property, and rebinding of elements without altering the lenght
of the list. This is optimal and has all the functionality.

for buzcor: deleting an element in the middle of a list is costly
for martin: that is certainly enough for me

Thanks everybody!

, Jan 6, 2008
20. ### DustanGuest

On Jan 5, 4:16 am, wrote:
> The warning "The group you are posting to is a Usenet group. Messages
> the Internet." means a person, but not a bot, may see my email
> address, so it is safe to use my real address next time...

Wrong. A bot may be able to read and scan all messages sent through
Usenet.

Dustan, Jan 6, 2008