Simple algorithm question - how to reorder a sequence economically

P

Peter Brooks

What is the easiest way to reorder a sequence pseudo-randomly?

That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
2,1,4,3) that is different each time.

I'm writing a simulation and would like to visit all the nodes in a
different order at each iteration of the simulation to remove the risk
of a fixed order introducing spurious evidence of correlation.
 
C

Chris Angelico

What is the easiest way to reorder a sequence pseudo-randomly?

That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
2,1,4,3) that is different each time.

I'm writing a simulation and would like to visit all the nodes in a
different order at each iteration of the simulation to remove the risk
of a fixed order introducing spurious evidence of correlation.

Permuting a sequence iteratively to cover every possibility? Good fun.

Here's one way of looking at it. Imagine the indices of all elements
are in some special "base" like so:

[a, b, c, d] --> a*4+b*3+c*2+d*1

Then iterate up to the highest possible value (ie 4*3*2*1), picking
indices for each accordingly. I don't know how efficient this will be,
but here's a simple piece of code to do it:
lst=lst[:] # Take a copy
ret=[None]*len(lst)
for i in range(len(lst)):
pos,idx=divmod(pos,len(lst))
ret=lst[idx]
del lst[idx]
return ret
permute([10,20,30,40],i)


[10, 20, 30, 40]
[20, 10, 30, 40]
[30, 10, 20, 40]
[40, 10, 20, 30]
[10, 30, 20, 40]
[20, 30, 10, 40]
[30, 20, 10, 40]
[40, 20, 10, 30]
[10, 40, 20, 30]
[20, 40, 10, 30]
[30, 40, 10, 20]
[40, 30, 10, 20]
[10, 20, 40, 30]
[20, 10, 40, 30]
[30, 10, 40, 20]
[40, 10, 30, 20]
[10, 30, 40, 20]
[20, 30, 40, 10]
[30, 20, 40, 10]
[40, 20, 30, 10]
[10, 40, 30, 20]
[20, 40, 30, 10]
[30, 40, 20, 10]
[40, 30, 20, 10]

It works, it produces a unique list for any given index provided, but
it's not the cleanest or most efficient. But I know someone'll improve
on it... or tell me I'm an idiot for not taking a more obvious
approach :)

ChrisA
 
F

Fábio Santos

....
It works, it produces a unique list for any given index provided, but
it's not the cleanest or most efficient. But I know someone'll improve
on it... or tell me I'm an idiot for not taking a more obvious
approach :)

ChrisA

I think that is pretty much itertools.permutations from the standard
library. The OP should check it out.
 
C

Chris Angelico

I think that is pretty much itertools.permutations from the standard
library. The OP should check it out.

That works if all the permutations are wanted at once. Is there a way,
short of iterating over it N times, to request permutation #N? Or
maybe I'm misreading the OP and that's not a requirement.

ChrisA
 
T

Terry Jan Reedy

What is the easiest way to reorder a sequence pseudo-randomly?

That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
2,1,4,3) that is different each time.

I'm writing a simulation and would like to visit all the nodes in a
different order at each iteration of the simulation to remove the risk
of a fixed order introducing spurious evidence of correlation.

random module has a shuffle function I believe
 
S

Steven D'Aprano

What is the easiest way to reorder a sequence pseudo-randomly?

import random
random.shuffle(sequence)


The sequence is modified in place, so it must be mutable. Lists are okay,
tuples are not.

That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
2,1,4,3) that is different each time.

You can't *guarantee* that it will be different each time. With a four-
item list, there are only 4! = 24 combinations, so on average you'll get
the original order one time in 24. For a ten-item list, that is once
every 3628800 times, and for a twenty-item list, once in
2432902008176640000 times. But of course these are *random*, and there's
always a chance of this:

http://dilbert.com/strips/comic/2001-10-25
 
N

Ned Batchelder

import random
random.shuffle(sequence)


The sequence is modified in place, so it must be mutable. Lists are okay,
tuples are not.


You can't *guarantee* that it will be different each time. With a four-
item list, there are only 4! = 24 combinations, so on average you'll get
the original order one time in 24. For a ten-item list, that is once
every 3628800 times, and for a twenty-item list, once in
2432902008176640000 times. But of course these are *random*, and there's
always a chance of this:

http://dilbert.com/strips/comic/2001-10-25

Also, heed the note in the docs: "Note that for even rather small
len(x), the total number of permutations of /x/ is larger than the
period of most random number generators; this implies that most
permutations of a long sequence can never be generated." The default
random number generator has a period of 2**19937-1, which means that
lists longer than 2080 have more permutations than the period of the
generator (because factorial(2081) > 2**19937). Most shuffles involve
lists far shorter than 2080, but it's still good to keep it in mind.

--Ned.
 
P

Peter Brooks

Thank you all for those most helpful suggestions! random.shuffle does
precisely the job that I need quickly. Thank you for introducing me to
itertools, though, I should have remembered APL did this in a symbol
or two and I'm sure that itertools will come in handy in future.

Thanks for the warnings about random numbers too - I hope my lists
will be short enough for the permutations of the function to be
irrelevant. I don't need every single sequence to be unique, only that
the same sequence only occurs occasionally. My lists might get up to
the ~2k length one day, and I'll keep in mind the need, at that stage,
to use a different pseudo-random generator. Actually, thinking about
it, there is probably a source of non-algorithmically-derived 'random'
numbers somewhere on the net that would do the job nicely.
 
S

Steven D'Aprano

Thanks for the warnings about random numbers too - I hope my lists will
be short enough for the permutations of the function to be irrelevant. I
don't need every single sequence to be unique, only that the same
sequence only occurs occasionally. My lists might get up to the ~2k
length one day, and I'll keep in mind the need, at that stage, to use a
different pseudo-random generator. Actually, thinking about it, there is
probably a source of non-algorithmically-derived 'random' numbers
somewhere on the net that would do the job nicely.

That's massive overkill.

Take a closer look at what Ned wrote:

"The default random number generator has a period of 2**19937-1"

and consider the numbers.

For a list with 3000 items, there are 3000! possible permutations. That
is approximately 10**21024. That is, a number with 21024 digits, or
somewhere around a trillion trillion trillion ... trillion trillion,
where there are 1752 "trillions".

If you could generate a million permutations a second, it would take you
on average 10**210988 centuries before getting the original permutation
again. That's the expected result you would get with a source of true
randomness.

Instead, with Python's default pseudo-random number generator, you cannot
get the full 3000! permutations, but only 2**19937-1 of them. Using the
same calculation as above, that means that you will have to generate a
million permutations per second for "only" 10**13783 centuries before
getting the original permutation again.

There are uses where being able to generate any possible permutation is
important, and the default PRNG is not sufficient. But merely shuffling
your data around to avoid spurious correlations is not one of them. Save
yourself a lot of development effort, and debugging, and just use
random.shuffle.
 
D

duncan smith

That works if all the permutations are wanted at once. Is there a way,
short of iterating over it N times, to request permutation #N? Or
maybe I'm misreading the OP and that's not a requirement.

ChrisA

A long time ago I wrote some code to do that.


import gmpy

def LexPermFromIndex(items, index):
n = len(items)
inds = range(n)
perm = []
for i in range(1, n+1):
r, index = divmod(index, gmpy.fac(n-i))
r = int(r)
perm.append(inds[r])
inds = inds[:r] + inds[r+1:]

return [items for i in perm]

LexPermFromIndex([1,2,3,4], 0) [1, 2, 3, 4]
LexPermFromIndex([1,2,3,4], 1) [1, 2, 4, 3]
LexPermFromIndex([1,2,3,4], 10) [2, 4, 1, 3]


I can't remember exactly why I wrote it. But I also have something for
generating a permutation's index and similar functions for combinations.

Duncan
 
C

Carlos Nepomuceno

----------------------------------------
Date: Fri, 24 May 2013 01:14:45 -0700
Subject: Simple algorithm question - how to reorder a sequence economically
From: (e-mail address removed)
To: (e-mail address removed)

What is the easiest way to reorder a sequence pseudo-randomly?

That is, for a sequence 1,2,3,4 to produce an arbitrary ordering (eg
2,1,4,3) that is different each time.

I'm writing a simulation and would like to visit all the nodes in a
different order at each iteration of the simulation to remove the risk
of a fixed order introducing spurious evidence of correlation.

I don't know what "spurious evidence of correlation" is. Can you give a mathematical definition?

Here's a snippet for creating a random shuffle by Fisher–Yates algorithm:

def FY_shuffle(l):
    from random import randint
    for i in range(len(l)-1,0,-1):
        j = randint(0,i)
        l[j],l = l,l[j]

It looks just like random.shuffle() mentioned before, but you can change it as you see fit.

If you can afford to test all permutations you can iterate over it by doing:
from itertools import permutations
l=range(4)
l [0, 1, 2, 3]
for i in permutations(l): print i
....
(0, 1, 2, 3)
(0, 1, 3, 2)
(0, 2, 1, 3)
[...]
(3, 1, 2, 0)
(3, 2, 0, 1)
(3, 2, 1, 0)
Note that 'i' is a tuple.

If you need a list of all permutations to make a selection:
l=range(4)
l [0, 1, 2, 3]
[list(i) for i in permutations(l)]
[[0, 1, 2, 3], [0, 1, 3, 2], [0, 2, 1, 3], [0, 2, 3, 1], [0, 3, 1, 2], [0, 3, 2, 1], [1, 0, 2, 3], [1, 0, 3, 2], [1, 2, 0, 3], [1, 2, 3, 0], [1, 3, 0, 2], [1, 3, 2, 0], [2, 0, 1, 3], [2, 0, 3, 1], [2, 1, 0, 3], [2, 1, 3, 0], [2, 3, 0, 1], [2, 3, 1, 0], [3, 0, 1, 2], [3, 0, 2, 1], [3,1, 0, 2], [3, 1, 2, 0], [3, 2, 0, 1], [3, 2, 1, 0]]

This will produce big lists:
-for 10 elements (l=range(10)) the size of the list is about 30MB (on Windows).
-for 11 elements (l=range(11)) the size of the list is about 335MB (on Windows). It took more than 7GB for CPython 2.7.5 to create that list.

Didn't try after that.
 
J

John Ladasky

You can't *guarantee* that it will be different each time.

Well, within limits, you can guarantee a LONG TIME between repeats. And that may be good enough for Peter's purpose.

When the number of elements in your sequence is short enough (the right value of "short enough" is at least partially a matter of opinion, and the amount of RAM available, but I would guess n < 10 myself), use itertools.permutations to generate all the possibilities at once -- call this list p. Store p in memory. Then shuffle p, and use its elements one at a time. If you get to the end of p and need to keep working, it's up to you whether to shuffle p a second time. If you don't reshuffle p, it guarantees the maximum interval between reusing the same permutation.

Use random.shuffle on your sequence directly, when your sequence has a larger number of elements. This doesn't guarantee that two successive permutations will not be identical, but the probability is of course very low.
 
P

Peter Brooks

I don't know what "spurious evidence of correlation" is. Can you give a mathematical definition?
If I run the simulation with the same sequence, then, because event E1
always comes before event E2, somebody might believe that there is a
causative connection between them in the world that's being simulated,
when, in fact, they only correlate in this way because the sequence is
not being shuffled. That's what it means.

Actually it'll be a bit more subtle than that, because each iteration
of the simulation updates all nodes in one time interval, the events
will not usually show the order of iteration - but, where there are
any secondary effects, that are related to the order in which the
nodes are updated, these will always happen the same way, which is my
concern.
 
C

Carlos Nepomuceno

----------------------------------------
Date: Fri, 24 May 2013 12:01:35 -0700
Subject: Re: Simple algorithm question - how to reorder a sequence economically
From: (e-mail address removed)
To: (e-mail address removed)


If I run the simulation with the same sequence, then, because event E1
always comes before event E2, somebody might believe that there is a
causative connection between them in the world that's being simulated,
when, in fact, they only correlate in this way because the sequence is
not being shuffled. That's what it means.

Correlation does not imply causation. If "somebody" is an expert system andyou want to avoid it's recognition and/or triggering of some kind, and you can't or don't want to change it's behavior, you may take the random way because it's cheaper.
Actually it'll be a bit more subtle than that, because each iteration
of the simulation updates all nodes in one time interval, the events
will not usually show the order of iteration - but, where there are
any secondary effects, that are related to the order in which the
nodes are updated, these will always happen the same way, which is my
concern.

You should have a more precise understanding of the dependence of the variables you taking in consideration before randomizing the series of events your are using for tests.

I suggest you start using PPMCC. If it's close to zero or negative you wouldn't have to mind about it! ;)
 
P

Peter Brooks

----------------------------------------












Correlation does not imply causation. If "somebody" is an expert system and you want to avoid it's recognition and/or triggering of some kind, and you can't or don't want to change it's behavior, you may take the random waybecause it's cheaper.


You should have a more precise understanding of the dependence of the variables you taking in consideration before randomizing the series of events your are using for tests.
If the scenario could be modelled mathematically, then there'd be no
point in writing the simulation.
 
J

John Ladasky

If you don't reshuffle p, it guarantees the maximum interval between reusing
the same permutation.

Of course, that comes at a certain price. Given two permutations p[x] and p[x+1], they will ALWAYS be adjacent, in every repetition of the list, unless you reshuffle. So the "spurious correlation" problem that Peter is worrying about would still exist, albeit at a meta-level.

You could play clever games, such as splitting p in half once you get to the end of it, shuffling each half independently, and then concatenating the halves. This algorithm scrambles the p[x]'s and p[x+1]'s pretty well, at the cost of cutting the average interval between repeats of a given p[x] from len(p) to something closer to len(p)/2.

Because someone's got to say it... "The generation of random numbers is tooimportant to be left to chance." — Robert R. Coveyou
 
R

Roy Smith

John Ladasky said:
Because someone's got to say it... "The generation of random numbers is too
important to be left to chance." ‹ Robert R. Coveyou

Absolutely. I know just enough about random number generation to
understand that I don't really know anything about it :)

That being said, people who really care about random numbers, tend to
rely on some sort of physical process instead of computer algorithms. A
classic example would be /dev/random. A somewhat more fun example is
. Something radioactive and a
geiger counter are a good source of randomness (time intervals between
decay events).
 
R

Robert Kern

True entropy is usually provided by a source such as /dev/random (on
Unix systems). It's sometimes referred to as "cryptographic"
randomness, due to its necessity in secure encryption work. There are
various ways to get this in a cross-platform way.

os.random() and os.urandom(), particularly.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top