# Using rand(), how to I avoid repeats?

Discussion in 'Perl Misc' started by Scott Stark, Jul 21, 2004.

1. ### Scott StarkGuest

I'm using rand() to generate a series of random numbers. Is there any
way to prevent the same number from being used more than once? All I
can think of is keeping a list of used numbers, and having some kind
of "if used, next" statement, which seems lame. (Theoretically that
could keep going forever, if it randomly kept selecting the same
numbers.) Any thoughts?

thanks
Scott

Scott Stark, Jul 21, 2004

2. ### thundergnatGuest

Scott Stark wrote:

> I'm using rand() to generate a series of random numbers. Is there any
> way to prevent the same number from being used more than once? All I
> can think of is keeping a list of used numbers, and having some kind
> of "if used, next" statement, which seems lame. (Theoretically that
> could keep going forever, if it randomly kept selecting the same
> numbers.) Any thoughts?
>
> thanks
> Scott

Sounds like you want some variation of a shuffle. Take a look at
Algorithm::Numerical::Shuffle on cpan.

Generate an array of numbers, shuffle it, then shift off how many items
you need.

thundergnat, Jul 21, 2004

3. ### Randal L. SchwartzGuest

>>>>> "Scott" == Scott Stark <> writes:

Scott> I'm using rand() to generate a series of random numbers. Is
Scott> there any way to prevent the same number from being used more
Scott> than once? All I can think of is keeping a list of used
Scott> numbers, and having some kind of "if used, next" statement,
Scott> which seems lame. (Theoretically that could keep going forever,
Scott> if it randomly kept selecting the same numbers.) Any thoughts?

You probably need to deal from a deck then.

use List::Util qw(shuffle); # might need this from CPAN
my @shuffled;

while (time passes) {
@shuffled = shuffle(1..100) unless @shuffled;
my \$number = shift @shuffled;

etc
}

print "Just another Perl hacker,"; # the original

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Randal L. Schwartz, Jul 21, 2004
4. ### NY TennesseanGuest

(Scott Stark) wrote in message news:<>...
> I'm using rand() to generate a series of random numbers. Is there any
> way to prevent the same number from being used more than once? All I
> can think of is keeping a list of used numbers, and having some kind
> of "if used, next" statement, which seems lame. (Theoretically that
> could keep going forever, if it randomly kept selecting the same
> numbers.) Any thoughts?

Keeping a hash of 'used' numbers could indeed be lame in that
performance would degrade as the percentage of numbers used became
higher and higher.

A better approach might be to start with an array (hash?) with an
entry for every number in the acceptable range of random numbers,
where the key and the value are identical to begin with. Here is an
example representation after this array has been created:

\$random[1] = 1;
\$random[2] = 2;
\$random[3] = 3;
#etcetera
\$random[16] = 16; #range max.

The first time you choose from a range of 1 to 16 lets say 3 is the
random number. Then you could splice a couple of array slices, so
that the 'higher' half takes up at the spot where you just eliminated
one of your possibilities:

\$random[1] = 1;
\$random[2] = 2;

#\$random[3] = 3;

\$random[3] = 4;
\$random[4] = 5;
#etcetera
\$random[15] = 16;

The next time you will choose a random number from 1 to 15: the new
maximum. If it just happens to return 15, for instance, you would
then get the value specified by that array index: 16.

There are a few gotchas, such as being sure not to try to create more
random numbers than are in the original range, and managing the array
minimum and maximum values as you continue to slice and splice.

Happy coding!

NY Tennessean, Jul 22, 2004
5. ### NY TennesseanGuest

(Scott Stark) wrote in message news:<>...
> I'm using rand() to generate a series of random numbers. Is there any
> way to prevent the same number from being used more than once? All I
> can think of is keeping a list of used numbers, and having some kind
> of "if used, next" statement, which seems lame. (Theoretically that
> could keep going forever, if it randomly kept selecting the same
> numbers.) Any thoughts?

You've got a couple of answers suggesting you use modules that allow
you to "shuffle" an array, and I can't disagree: modules are usually
the way to go as they have been used and tested more than any home
grown solution.

But still I wanted to share with you the home-grown code I've rolled
since my original suggestion. Beware, its a little clunky. It could
stand some perfecting but "works" for me, at least for the dozen or so
times I tested it:

sub uniq_random_int_in (\$\$\$) {
#based on random_int_in in perlfaq4, data manipulation
my(\$min, \$max, \$qty) = @_;
# Assumes that the three arguments are integers themselves!
# Assumes \$min < \$max
# Assumes \$qty <= \$max - \$min + 1
my @set, @uniq;
for (my \$i=0; \$i <= \$max - \$min; \$i++) {
\$set[\$i] = \$i;
}
for (my \$i=0; \$i < \$qty; \$i++) {
my \$rand = int rand(\$max - \$min - \$i + 1);
\$uniq[\$i] = \$set[\$rand] + \$min;
@set = (@set[0..\$rand-1], @set[\$rand+1..scalar(@set)-1]);
print \$uniq[\$i], "\n"; #display only, can comment out
}
return @uniq;
}

@uniq_rands = uniq_random_int_in(11,20,5); # for testing

NY Tennessean, Jul 22, 2004
6. ### Guest

(Scott Stark) wrote:
> I'm using rand() to generate a series of random numbers. Is there any
> way to prevent the same number from being used more than once?

A lot of ways.

> All I can think of is keeping a list of used numbers, and having some
> kind of "if used, next" statement, which seems lame.

Well, I'd keep a hash of used numbers. And "redo" would likely more sense
than "next". If your numbers are floats, you may have some problem with
the numeric and string representations being nonidentical, but I highly
doubt that will be a problem. It is not lame at all, unless you are
sampling very densely among all possible values. (If you are sampling very
densely, i.e. you will before you are done need a high fraction of all
possible values, then I would just shuffle the list of all of those
possible values.)

> (Theoretically that
> could keep going forever, if it randomly kept selecting the same
> numbers.) Any thoughts?

If your random number generator degenerates thus, then in that case you are
pretty much screwed regardless.

Xho

--
Usenet Newsgroup Service \$9.95/Month 30GB

, Jul 22, 2004
7. ### Juha LaihoGuest

said:
> (Scott Stark) wrote:
>> All I can think of is keeping a list of used numbers, and having some
>> kind of "if used, next" statement, which seems lame.

....
>> (Theoretically that could keep going forever, if it randomly kept
>> selecting the same numbers.) Any thoughts?

>
>If your random number generator degenerates thus, then in that case you
>are pretty much screwed regardless.

Not so -- think of a situation where your number space is f.ex.
1000 numbers, and you don't want repetitions. You'll end up with a
significant and inreasing possibility of retries for each new number.
For the last number, there's only a 1/1000 chance that the random
algorithm comes up with the desired number. I made a small script
to test this; even with range of 3 (create randomised list of numbers
of 1 to 3) I got with short testing 12 "false guesses" in generating
the list. With larger ranges the number of false guesses got significantly
worse -- with just range of 1000, the worst results I got had over 3000000
false guesses. Results with less than 1000000 false guesses seemed to be
rare.

So, for a list of random numbers without repetitions, the shuffled list
algorithms are definitely better.
--
Wolf a.k.a. Juha Laiho Espoo, Finland
(GC 3.0) GIT d- s+: a C++ ULSH++++\$ P++@ L+++ E- W+\$@ N++ !K w !O !M V
PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)

Juha Laiho, Jul 22, 2004
8. ### Anno SiegelGuest

NY Tennessean <> wrote in comp.lang.perl.misc:
> (Scott Stark) wrote in message
> news:<>...
> > I'm using rand() to generate a series of random numbers. Is there any
> > way to prevent the same number from being used more than once? All I
> > can think of is keeping a list of used numbers, and having some kind
> > of "if used, next" statement, which seems lame. (Theoretically that
> > could keep going forever, if it randomly kept selecting the same
> > numbers.) Any thoughts?

>
> You've got a couple of answers suggesting you use modules that allow
> you to "shuffle" an array, and I can't disagree: modules are usually
> the way to go as they have been used and tested more than any home
> grown solution.
>
> But still I wanted to share with you the home-grown code I've rolled
> since my original suggestion. Beware, its a little clunky. It could
> stand some perfecting but "works" for me, at least for the dozen or so
> times I tested it:

Yes, it does what it should, and yes, it *is* a little clunky.

> sub uniq_random_int_in (\$\$\$) {

^^^^^
Don't give a function a prototype unless there is a distinct advantage.
Normally, perl subs are not prototyped.

> #based on random_int_in in perlfaq4, data manipulation
> my(\$min, \$max, \$qty) = @_;
> # Assumes that the three arguments are integers themselves!
> # Assumes \$min < \$max
> # Assumes \$qty <= \$max - \$min + 1
> my @set, @uniq;

This doesn't declare @uniq. "my (@set, @uniq)" is what you need. You
didn't run this under strict, did you?

> for (my \$i=0; \$i <= \$max - \$min; \$i++) {
> \$set[\$i] = \$i;
> }

That could be more simply written

@set = 0 .. \$max - \$min;

But what you really want in @set is the set of possible results, so
you can make that

@set = \$min .. \$max;

> for (my \$i=0; \$i < \$qty; \$i++) {
> my \$rand = int rand(\$max - \$min - \$i + 1);

The number "\$max - \$min - \$i + 1" is exactly the number of elements
left in @set in the \$i-th step. You start with \$max - \$min + 1 elements
in the 0-th step, and you lose one element each time \$i increases. So

my \$rand = int rand @set;

does the same thing. The selection process is now even clearer:
Select a random number between 0 and the number of elements left
(exclusive). Pick that element.

> \$uniq[\$i] = \$set[\$rand] + \$min;

When @set contains the numbers \$min .. \$max directly, you don't need
the addition. \$set[\$rand] is the wanted random number.

> @set = (@set[0..\$rand-1], @set[\$rand+1..scalar(@set)-1]);

^^^^^^
"@set" is already in scalar context, no need for "scalar".

But Perl's splice() function is what you really want here:

splice( @set, \$rand, 1);

This returns the value(s) that are spliced out, so you can find
the random index and move the indicated element from @set to @uniq
in one step:

push @uniqe, splice( @set, rand @set, 1);

> print \$uniq[\$i], "\n"; #display only, can comment out
> }
> return @uniq;
> }
>
> @uniq_rands = uniq_random_int_in(11,20,5); # for testing

Putting it all together, this is how I would write it:

sub uniq_random_int_in {
my(\$min, \$max, \$qty) = @_;
my @set = ( \$min .. \$max);
map splice( @set, rand @set, 1), 1 .. \$qty;
}

Anno

Anno Siegel, Jul 22, 2004
9. ### NY TennesseanGuest

wrote in message news:<20040721233346.458\$>...
> (Scott Stark) wrote:
> > I'm using rand() to generate a series of random numbers. Is there any
> > way to prevent the same number from being used more than once?

>
> A lot of ways.
>
> > All I can think of is keeping a list of used numbers, and having some
> > kind of "if used, next" statement, which seems lame.

>
> Well, I'd keep a hash of used numbers.

There may be more than one way to do it, but this is definitely worse
than:

1) the above-stated ideas of using a module to shuffle an array
2) The "idea" behind my home rolled solution (the implementation could
definitely be improved)

As the hash of used numbers grows, chances increase that to generate
each random number, rand() will needlessly be called more than once,
maybe even several times. I've experienced this. I wrote a similar
program at my first programming job for a bingo supply company.
Random bingo grids, and as I recall rows and or columns have to add up
to the same number?! Well I kept an array of used combinations and
the darn thing took longer to run than it did to write! Never again
would I do that. Maybe thats why I took such an interest in this
problem.

NY Tennessean, Jul 22, 2004
10. ### Ilmari KaronenGuest

On 2004-07-21, Scott Stark <> wrote:
> I'm using rand() to generate a series of random numbers. Is there any
> way to prevent the same number from being used more than once? All I
> can think of is keeping a list of used numbers, and having some kind
> of "if used, next" statement, which seems lame. (Theoretically that
> could keep going forever, if it randomly kept selecting the same
> numbers.) Any thoughts?

Several solutions have been presented already. Here's one I haven't
seen posted in this thread yet: a "lazy sparse shuffle".

my %int_pool;
use constant MAX_UNIQUE_INT => 2**32-1;

sub get_unique_int () {
my \$n = keys %int_pool;
my \$r = \$n + int rand(MAX_UNIQUE_INT+1 - \$n);
my \$o = (exists \$int_pool{\$r} ? \$int_pool{\$r} : \$r);
\$int_pool{\$r} = (exists \$int_pool{\$n} ? \$int_pool{\$n} : \$n);
return \$o;
}

This method works the like the hash solutions already posted, except
that it has a guaranteed upper bound for execution time, even if the
number space does get densely sampled. On the other hand, it only
works for sample spaces that can be easily enumerated.

--
Ilmari Karonen
If replying by e-mail, please replace ".invalid" with ".net" in address.

Ilmari Karonen, Jul 22, 2004
11. ### Guest

Juha Laiho <> wrote:
> said:
> > (Scott Stark) wrote:
> >> All I can think of is keeping a list of used numbers, and having some
> >> kind of "if used, next" statement, which seems lame.

> ...
> >> (Theoretically that could keep going forever, if it randomly kept
> >> selecting the same numbers.) Any thoughts?

> >
> >If your random number generator degenerates thus, then in that case you
> >are pretty much screwed regardless.

>
> Not so -- think of a situation where your number space is f.ex.
> 1000 numbers, and you don't want repetitions. You'll end up with a
> significant and inreasing possibility of retries for each new number.

Increasing, but for the most part trivial.

> For the last number, there's only a 1/1000 chance that the random
> algorithm comes up with the desired number. I made a small script
> to test this; even with range of 3 (create randomised list of numbers
> of 1 to 3) I got with short testing 12 "false guesses" in generating
> the list. With larger ranges the number of false guesses got
> significantly worse -- with just range of 1000, the worst results I got
> had over 3000000 false guesses. Results with less than 1000000 false
> guesses seemed to be rare.

You clearly did something wrong. It rarely takes over 15,000 tries to get
all numbers from 1 to 1,000. The odds against needing 1,000,000 tries are
beyond astronomical.

If your random number generator is so defective that you can't get the
numbers you need from it using the hash method, then neither can you trust
its use in your shuffling procedure.

>
> So, for a list of random numbers without repetitions, the shuffled list
> algorithms are definitely better.

That is true as long as it is possible to enumerate all possible numbers
within the domain and you want to sample a large fraction of that domain
(as I said myself in the paragraph you snipped).

If we are talking about general methods, I'd rather have a method that
works in general and is slightly suboptimal in a special case, than a
method that fails in general but works in one special case.

Xho

--
Usenet Newsgroup Service \$9.95/Month 30GB

, Jul 24, 2004
12. ### Juha LaihoGuest

said:
>Juha Laiho <> wrote:
>> with just range of 1000, the worst results I got had over 3000000
>> false guesses. Results with less than 1000000 false guesses seemed to
>> be rare.

>
>You clearly did something wrong. It rarely takes over 15,000 tries to get
>all numbers from 1 to 1,000. The odds against needing 1,000,000 tries are
>beyond astronomical.

Thanks for the correction; my previous test snippet apparently did have
some dire problem -- now I seem to get repetitions in the range of 5000
when generating numbers 1..1000 - so looks like I can agree with your
results.

Apologies for criticizing your method on false grounds.
--
Wolf a.k.a. Juha Laiho Espoo, Finland
(GC 3.0) GIT d- s+: a C++ ULSH++++\$ P++@ L+++ E- W+\$@ N++ !K w !O !M V
PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)

Juha Laiho, Jul 25, 2004
13. ### Joe SmithGuest

Juha Laiho wrote:

> Not so -- think of a situation where your number space is f.ex.
> 1000 numbers, and you don't want repetitions. You'll end up with a
> significant and inreasing possibility of retries for each new number.
> For the last number, there's only a 1/1000 chance that the random
> algorithm comes up with the desired number.

If you have 1000 numbers available, and have used 999 of them,
then the last number is *NOT* random! Using rand() to return
the one and only possible number is not the way to do things.
Use shuffle algorithm instead.
-Joe

Joe Smith, Jul 25, 2004