How to generate random number without replacement?

David Combs · Jun 26, 2010

UG> my %seen ;
UG> while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
UG> $seen{$x} = 1; print $x }

XJ> I'd be very surprised if Bit::Vector had faster performance, at least
XJ> until the other method started swapping.

It's C internally so performance is pretty good. But you're probably
right anyhow, I wasn't thinking.

UG> he said he wanted 1k random numbers out of a large range so a hash would
UG> be fine for that.

You're right, I wasn't paying enough attention.

Ted

What, you're saying a div and a mod (let's assume 64 bits, if integet) to
find the desired bit, versus dealing each time with a hash bucket to choose
an ensuing chain to hunt down through?

So you make the bucket-array longer, so the chains are short -- get it
long enough so only very, very few chains are longer than one (most
being zero) in length, and you're almost emulating the bit-vector
scheme, but taking up FAR FAR more space.

?

David

Peter J. Holzer · Jun 26, 2010

What, you're saying a div and a mod (let's assume 64 bits, if integet) to
find the desired bit, versus dealing each time with a hash bucket to choose
an ensuing chain to hunt down through?

So you make the bucket-array longer, so the chains are short -- get it
long enough so only very, very few chains are longer than one (most
being zero) in length, and you're almost emulating the bit-vector
scheme, but taking up FAR FAR more space.

No, for the use case given (1k random numbers out of a large range) a
hash takes *less* space.

The bit vector takes (max-min)/8 bytes. So for 0 .. 100_000_000 it takes
approximately 12.5 MB. 1000 hash entries take (depending on pointer
size, malloc implementation, etc.) something like 60 to 150 kB. So
that's about 1/100th of the space. I'd still think a bit vector should
be faster.

hp

John Kelly · Jun 26, 2010

It's exactly this Ogrish behavior of a few regulars that makes this
place such a pain in the ass. :-(.

Luckily for the newbies there are sites like stackoverflow.com where at
least they can vote down such behaviour.

I hate web forum interfaces.

Again: Dear regular, you /don't have to post/.

Without grumpy old men, what would Usenet be?

Seriously, though, some of us wear many hats and don't have time to
become expert in a tool like Perl. So we wander in and out of various
newsgroups as the need or interest arises.

As for me, I was working on a local problem and found a good solution
using UNIX datagram sockets. It was something I had never used before,
so it took me a while to figure things out and put the pieces together
into a working solution. In the process, I noticed it was hard to find
good *working* examples of UNIX datagram socket code. Since I coded my
solution in Perl I thought why not post it. Maybe no one else needs it,
but if only one person does, it could save them time.

So along trot I, an unknown, into c.l.p.m, and post my news. And what
is the first reply I see? WHO THE HELL ARE YOU AND WHAT ARE YOU DOING
HERE? That's what it felt like anyway.

Some newsgroups suffer from the presence of territorial control freaks.
In one, a particular individual has been obsessed with some of my posts,
to the point of stalking. So if I overreacted upon my arrival here, I
apologize to anyone offended. Except for the stalker. :-(

Ted Zlatanov · Jun 29, 2010

UG> he said he wanted 1k random numbers out of a large range so a hash would
UG> be fine for that.
PJH> No, for the use case given (1k random numbers out of a large range) a
PJH> hash takes *less* space.

PJH> The bit vector takes (max-min)/8 bytes. So for 0 .. 100_000_000 it takes
PJH> approximately 12.5 MB. 1000 hash entries take (depending on pointer
PJH> size, malloc implementation, etc.) something like 60 to 150 kB. So
PJH> that's about 1/100th of the space. I'd still think a bit vector should
PJH> be faster.

Right. I somehow missed the 1K sample requirement and thought this was
for any set of random numbers. Certainly a bit vector has better O(n)
performance but the requirement is very far from hitting any O(n)
bounds. Also as Peter pointed out memory usage is really high.
Finally, Perl's hashes are very well optimized so I'd rely on them over
Bit::Vector for performance with small sets.

If I expected *clusters* of numbers in a large space, like say Unicode
codepoints, I would use inversion lists
(http://search.cpan.org/~teodor/Algorithm-InversionList-0.03/). If the
domain was 32-bit or smaller ints and I expected at least 10% to be
randomly picked, I would use Bit::Vector. For almost all other
reasonable use cases, though, hashes would work better.

Ted

Random number fun	0	Jul 29, 2025
Replacement Problem	1	Apr 28, 2022
Large random integers	4	Nov 2, 2025
If I need to do regex replacement 27000 times, how to shorten it?	1	Dec 24, 2024
How to keep count of right answer and wrong answers in C++?	0	Nov 3, 2021
Range / empty list issues??	1	Dec 10, 2023
Boomer trying to learn coding in C and C++	6	Dec 16, 2022
Even basic math is at risk? Why is a simple math and logic solution being ignored?	2	Jul 3, 2025

How to generate random number without replacement?

David Combs

Peter J. Holzer

John Kelly

Ted Zlatanov

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads