fastest method to choose a random element

A

ajaksu

This one is good. Someone commented that you destroy the list, but
that can be fixed:

def pick_random(seq, prop):
L = len(seq)
for i in xrange(L):
r = random.randrange(L - i)
if prop(seq[r]):
return seq[r]
seq[r], seq[L - i - 1]= seq[L - i - 1],seq[r]

just pushing elements without the property to the end of the list
(list is mutated, true, but shuffle will do that too). In each
iteration of the for loop, there is only one random element, one check
of the property, and rebinding of elements without altering the lenght
of the list. This is optimal and has all the functionality.

Two more comments:
for buzcor: deleting an element in the middle of a list is costly
for martin: that is certainly enough for me

Thanks everybody!

How about some benchmarks as a token of gratitude? ;) Or at least a
nice hint on your expected number of lists (and their sizes) :)
 
B

bukzor

On Jan 5, 4:14 pm, (e-mail address removed) wrote:
On Jan 5, 5:07 pm, (e-mail address removed) wrote:
Hello, Paul and Arnaud.
While I think about your answers: do you think there is any way to
avoid shuffle?
It may take unnecessary long on a long list most of whose elements
have the property.
You could generate a random shuffle of range(len(seq)) lazily, and use
that to iterate over your sequence.
import random
import itertools
def randxrange(n):
"Generate the numbers 0 to n-1 in a random order"
x = range(n)
for i in xrange(n):
r = random.randrange(n - i)
yield x[r]
x[r] = x[n - i - 1]
def shuffled(seq):
"Generate the elements of seq in a random order"
return (seq for i in randxrange(len(seq)))
def pick_random(seq, prop):
return itertools.ifilter(prop, shuffled(seq)).next()

At the risk of filling this thread with my posts, here's a much
simplified and faster version of this code that uses no extra storage.
import random
def pick_random(seq, prop):
L = len(seq)
for i in xrange(L):
r = random.randrange(L - i)
if prop(seq[r]):
return seq[r]
seq[r] = seq[L - i - 1]

This one is good. Someone commented that you destroy the list, but
that can be fixed:

def pick_random(seq, prop):
L = len(seq)
for i in xrange(L):
r = random.randrange(L - i)
if prop(seq[r]):
return seq[r]
seq[r], seq[L - i - 1]= seq[L - i - 1],seq[r]

just pushing elements without the property to the end of the list
(list is mutated, true, but shuffle will do that too). In each
iteration of the for loop, there is only one random element, one check
of the property, and rebinding of elements without altering the lenght
of the list. This is optimal and has all the functionality.

Two more comments:
for buzcor: deleting an element in the middle of a list is costly
for martin: that is certainly enough for me

Thanks everybody!


Just for fun, I profiled my answer versus the final answer over 300
seconds. I wrapped some parts of the functions in trivial functions so
they would show up in the profiling. I found that my answer was 50%
slower, but not because of the delete, but mostly the len() inside the
loop, and the bool(seq) at the start of each loop. I was able to
rewrite mine to only be 14 seconds slower (the time to make the copy
at the beginning), but don't believe that's useful enough to post.


ncalls tottime percall cumtime percall
filename:lineno(function)
1516221 48.545 0.000 158.850 0.000 picked.py:
14(pick_random_bukzor)
3066421 19.359 0.000 53.815 0.000 picked.py:8(randrange2)
3066421 16.852 0.000 24.751 0.000 picked.py:9(len2)
1516221 14.712 0.000 14.712 0.000 picked.py:12(make_copy)
3066421 9.767 0.000 9.767 0.000 picked.py:10(notempty)
1550200 7.260 0.000 7.260 0.000 picked.py:7(delete)

1516221 31.574 0.000 104.384 0.000 picked.py:
27(pick_random)
3063737 19.556 0.000 54.617 0.000 picked.py:25(randrange3)
1516221 8.485 0.000 12.582 0.000 picked.py:26(len3)
1547516 5.611 0.000 5.611 0.000 picked.py:
24(do_transform)

def delete(list, index): del list[index]
def randrange2(X): return randrange(X)
def len2(seq): return len(seq)
def notempty(seq): if seq: return True
def make_copy(l): return list(l)
def pick_random_bukzor(seq, prop=bool):
temp = make_copy(seq)
while notempty(seq):
i = randrange2(len2(temp))
if prop(temp): return temp
else: delete(temp, i)

def do_transform(seq, L, r, i): seq[r], seq[L - i - 1]= seq[L - i -
1],seq[r]
def randrange3(X): return randrange(X)
def len3(seq): return len(seq)
def pick_random(seq, prop=bool):
L = len3(seq)
for i in xrange(L):
r = randrange3(L - i)
if prop(seq[r]): return seq[r]
do_transform(seq, L, r, i)
 
P

Paddy

Hello,
This is a question for the best method (in terms of performance
only) to choose a random element from a list among those that satisfy
a certain property.
This is the setting: I need to pick from a list a random element
that satisfies a given property. All or none of the elements may have
the property. Most of the time, many of the elements will satisfy the
property, and the property is a bit expensive to evaluate. Chance of
having the property are uniform among elements.
A simple approach is:
import random
def random_pick(a_list,property):
'''Returns a random element from a list that has the property
Returns None if no element has the property
'''
random.shuffle(a_list)
for i in a_list:
if property(i): return i
but that requires to shuffle the list every time.
A second approach, that works if we know that at least one element of
the list has the property, is:
import random
def random_pick(a_list,property):
'''Returns a random element from a list that has the property
Loops forever if no element has the property
'''
while 1:
i=random.choice(a_list)
if property(i): return i
which is more efficient (on average) if many elements of the list have
the property and less efficient if only few elements of the list has
the property (and goes crazy if no element has the property)
Yet another one:
import random
def random_pick(a_list,property):
'''Returns a random element from a list that has the property
'''
b_list=[x for x in a_list if property(x)]
try:
return random.choice(b_list)
finally: return None
but this one checks the property on all the elements, which is no
good.
I don't need strong random numbers, so a simple solution like:
import random
globalRNG=random.Random()
def random_pick(a_list,property):
'''Returns a random element from a list that has the property
Works only if len(a_list)+1 is prime: uses Fermat's little theorem
'''
a=globalRNG(1,len(a_list))
ind=a
for i in xrange(len(a_list)):
x=a_list[a-1]
if property(x):return x
ind*=a
but this works only if len(a_list)+1 is prime!!! Now this one could be
saved if there were an efficient method to find a prime number given
than a number n but not very much greater...
Any other ideas? Thanks everybody

Caching might help.

If random_pick is called several times with the same list(s) then
cache the result of
[property(i) for i in a_list] against a_list.

If random_pick is called several times with list(s) whith multiple
instances of list items then cache
property(i) against i for i in a_list .

You could do both.

You might investigate wether property(i) could be implemented using a
faster algorithm, maybe table lookup, or interpolation from initial
table lookup.

- Paddy.

Here's a caching decorator that you could apply to function property:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498245

- Paddy.
 
C

caca

Just for fun, I profiled my answer versus the final answer...

This mailing list is awesome!


PS:ajaksu, I have to leave now, I hope bukzor's answer was enough to
you (at least for the moment)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top