How to generate random number without replacement?

D

David Combs

UG> my %seen ;
UG> while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
UG> $seen{$x} = 1; print $x }

XJ> I'd be very surprised if Bit::Vector had faster performance, at least
XJ> until the other method started swapping.

It's C internally so performance is pretty good. But you're probably
right anyhow, I wasn't thinking.


UG> he said he wanted 1k random numbers out of a large range so a hash would
UG> be fine for that.

You're right, I wasn't paying enough attention.

Ted

What, you're saying a div and a mod (let's assume 64 bits, if integet) to
find the desired bit, versus dealing each time with a hash bucket to choose
an ensuing chain to hunt down through?

So you make the bucket-array longer, so the chains are short -- get it
long enough so only very, very few chains are longer than one (most
being zero) in length, and you're almost emulating the bit-vector
scheme, but taking up FAR FAR more space.

?

David
 
P

Peter J. Holzer

What, you're saying a div and a mod (let's assume 64 bits, if integet) to
find the desired bit, versus dealing each time with a hash bucket to choose
an ensuing chain to hunt down through?

So you make the bucket-array longer, so the chains are short -- get it
long enough so only very, very few chains are longer than one (most
being zero) in length, and you're almost emulating the bit-vector
scheme, but taking up FAR FAR more space.

No, for the use case given (1k random numbers out of a large range) a
hash takes *less* space.

The bit vector takes (max-min)/8 bytes. So for 0 .. 100_000_000 it takes
approximately 12.5 MB. 1000 hash entries take (depending on pointer
size, malloc implementation, etc.) something like 60 to 150 kB. So
that's about 1/100th of the space. I'd still think a bit vector should
be faster.

hp
 
J

John Kelly

It's exactly this Ogrish behavior of a few regulars that makes this
place such a pain in the ass. :-(.
Luckily for the newbies there are sites like stackoverflow.com where at
least they can vote down such behaviour.

I hate web forum interfaces.

Again: Dear regular, you /don't have to post/.

Without grumpy old men, what would Usenet be?

Seriously, though, some of us wear many hats and don't have time to
become expert in a tool like Perl. So we wander in and out of various
newsgroups as the need or interest arises.

As for me, I was working on a local problem and found a good solution
using UNIX datagram sockets. It was something I had never used before,
so it took me a while to figure things out and put the pieces together
into a working solution. In the process, I noticed it was hard to find
good *working* examples of UNIX datagram socket code. Since I coded my
solution in Perl I thought why not post it. Maybe no one else needs it,
but if only one person does, it could save them time.

So along trot I, an unknown, into c.l.p.m, and post my news. And what
is the first reply I see? WHO THE HELL ARE YOU AND WHAT ARE YOU DOING
HERE? That's what it felt like anyway.

Some newsgroups suffer from the presence of territorial control freaks.
In one, a particular individual has been obsessed with some of my posts,
to the point of stalking. So if I overreacted upon my arrival here, I
apologize to anyone offended. Except for the stalker. :-(
 
T

Ted Zlatanov

UG> he said he wanted 1k random numbers out of a large range so a hash would
UG> be fine for that.
PJH> No, for the use case given (1k random numbers out of a large range) a
PJH> hash takes *less* space.

PJH> The bit vector takes (max-min)/8 bytes. So for 0 .. 100_000_000 it takes
PJH> approximately 12.5 MB. 1000 hash entries take (depending on pointer
PJH> size, malloc implementation, etc.) something like 60 to 150 kB. So
PJH> that's about 1/100th of the space. I'd still think a bit vector should
PJH> be faster.

Right. I somehow missed the 1K sample requirement and thought this was
for any set of random numbers. Certainly a bit vector has better O(n)
performance but the requirement is very far from hitting any O(n)
bounds. Also as Peter pointed out memory usage is really high.
Finally, Perl's hashes are very well optimized so I'd rely on them over
Bit::Vector for performance with small sets.

If I expected *clusters* of numbers in a large space, like say Unicode
codepoints, I would use inversion lists
(http://search.cpan.org/~teodor/Algorithm-InversionList-0.03/). If the
domain was 32-bit or smaller ints and I expected at least 10% to be
randomly picked, I would use Bit::Vector. For almost all other
reasonable use cases, though, hashes would work better.

Ted
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top