How to generate random number without replacement?

Discussion in 'Perl Misc' started by Peng Yu, Jun 1, 2010.

  1. Peng Yu

    Peng Yu Guest

    It seems that the int(rand(10)) generate random with replacement. I'm
    wondering how to generate random number without replacement in perl.
    Is there a function for it?
     
    Peng Yu, Jun 1, 2010
    #1
    1. Advertisements

  2. Peng> It seems that the int(rand(10)) generate random with replacement. I'm
    Peng> wondering how to generate random number without replacement in perl.
    Peng> Is there a function for it?

    $ perldoc -q shuffle
    Found in /usr/local/lib/perl5/5.10.1/pod/perlfaq4.pod
    How do I shuffle an array randomly?
    If you either have Perl 5.8.0 or later installed, or if you have
    Scalar-List-Utils 1.03 or later installed, you can say:

    use List::Util 'shuffle';

    @shuffled = shuffle(@list);

    If not, you can use a Fisher-Yates shuffle.

    sub fisher_yates_shuffle {
    my $deck = shift; # $deck is a reference to an array
    return unless @$deck; # must not be empty!

    my $i = @$deck;
    while (--$i) {
    my $j = int rand ($i+1);
    @$deck[$i,$j] = @$deck[$j,$i];
    }
    }

    # shuffle my mpeg collection
    #
    my @mpeg = <audio/*/*.mp3>;
    fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
    print @mpeg;

    Note that the above implementation shuffles an array in place, unlike
    the "List::Util::shuffle()" which takes a list and returns a new
    shuffled list.

    You've probably seen shuffling algorithms that work using splice,
    randomly picking another element to swap the current element with

    srand;
    @new = ();
    @old = 1 .. 10; # just a demo
    while (@old) {
    push(@new, splice(@old, rand @old, 1));
    }

    This is bad because splice is already O(N), and since you do it N times,
    you just invented a quadratic algorithm; that is, O(N**2). This does not
    scale, although Perl is so efficient that you probably won't notice this
    until you have rather largish arrays.

    print "Just another Perl hacker,"; # the original
     
    Randal L. Schwartz, Jun 1, 2010
    #2
    1. Advertisements

  3. Peng Yu

    Peng Yu Guest

    I don't really need shuffle. For example, if I want to sample 1000
    number (without replacement) from the numbers between 1 and
    1000,000,000, shuffle will take more runtime than necessary.
     
    Peng Yu, Jun 1, 2010
    #3
  4. Could you please explain what you mean by "with/without replacement"?
    A number is a number, it doesn't replace anything....

    jue
     
    Jürgen Exner, Jun 1, 2010
    #4
  5. Peng Yu

    Uri Guttman Guest

    Peng> It seems that the int(rand(10)) generate random with replacement. I'm
    Peng> wondering how to generate random number without replacement in perl.
    Peng> Is there a function for it?
    PY> I don't really need shuffle. For example, if I want to sample 1000
    PY> number (without replacement) from the numbers between 1 and
    PY> 1000,000,000, shuffle will take more runtime than necessary.

    then just call rand but cache your hits with a hash. if found in the
    hash, then try again. this will work if your sample size of 1k is much
    lower than the large number. it will still work but be slow if the
    sampling size is close to the total size.

    uri
     
    Uri Guttman, Jun 1, 2010
    #5
  6. Peng Yu

    Peng Yu Guest

    These are standard concepts in statistics. Please see the following
    webpage for the explanations on sampling 'with/without replacement'.

    http://www.ma.utexas.edu/users/parker/sampling/repl.htm
     
    Peng Yu, Jun 1, 2010
    #6
  7. Ok. For those like me not familiar with this term: he means random
    numbers with and without repetition.

    jue
     
    Jürgen Exner, Jun 1, 2010
    #7
  8. Peng Yu

    Uri Guttman Guest

    JE> Ok. For those like me not familiar with this term: he means random
    JE> numbers with and without repetition.

    and i told him how to do it. i won't tell him again. it is a simple
    problem and hashes solve it.

    uri
     
    Uri Guttman, Jun 1, 2010
    #8
  9. Peng Yu

    Peng Yu Guest

    What do you mean? I didn't ask you to tell me again.

    But I feel sorry that perl doesn't provide such a function out of the
    box.
     
    Peng Yu, Jun 1, 2010
    #9
  10. Peng Yu

    Uri Guttman Guest

    PY> What do you mean? I didn't ask you to tell me again.

    i told you how to do it. either you didn't read it or you didn't get the
    solution.

    PY> But I feel sorry that perl doesn't provide such a function out of the
    PY> box.

    i feel sorry for you that you can't code this up in 5 minutes. it is
    trivial to do as i outlined. name another lang that has this built in.
    it isn't needed as it is easily made from a hash and a loop and calls to
    rand. this is about 2 lines of code and possibly 1 line. here, i will
    code it up on the fly and possibly even get it right. i leave making
    into a sub as your exercise.

    my %seen ;
    while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
    $seen{$x} = 1; print $x }

    oops, it wrapped into 3 lines.

    was that too complex? it will print numbers until it runs out of
    them. no duplicates. make it a for loop to control the number of
    prints. can you handle that? do you feel sorry for perl now? show me
    another lang that could do that as easily.

    uri
     
    Uri Guttman, Jun 1, 2010
    #10
  11. Uri, nobody asked you to tell him again. Why do insist on telling us
    three times that you won't do it? Nobody asked me to write a haiku, but
    I don't feel the slightest urge to tell you all that I won't write a
    haiku,

    hp
     
    Peter J. Holzer, Jun 1, 2010
    #11
  12. I feel neither sorry nor that Perl should provide such a function. You
    are confusing generating random numbers with sampling a given data set.

    Now, you could argue that beside equal distribution Perl should also
    provide additional distributions of random numbers like e.g. Gaussian
    distributions or random number without repetition or any other
    distribution.
    However IMNSHO Perl is a general purpose programming language and
    functions like that are WAAAAYYYY to subject matter specific. If you
    want them, then put them in a module. And of course you very welcome to
    submit such a module to CPAN.

    jue
     
    Jürgen Exner, Jun 1, 2010
    #12
  13. Peng Yu

    John Bokma Guest

    It's exactly this Ogrish behavior of a few regulars that makes this
    place such a pain in the ass. :-(.

    Luckily for the newbies there are sites like stackoverflow.com where at
    least they can vote down such behaviour.

    Again: Dear regular, you /don't have to post/. If a newbie pisses you
    off, just move on to the next message in this group. You will make this
    group a way more friendlier place, and hey, maybe it gets a bit more
    traffic that way.

    Ah, well, one can only hope...
     
    John Bokma, Jun 1, 2010
    #13
  14. Peng Yu

    Ted Zlatanov Guest

    UG> my %seen ;
    UG> while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
    UG> $seen{$x} = 1; print $x }

    This will grow pretty quickly with a hash. Bit::Vector already has
    Bit_On($index) and bit_test($index) so memory usage and probably
    performance will be a bit (heh) better.

    Ted
     
    Ted Zlatanov, Jun 1, 2010
    #14
  15. Peng Yu

    Uri Guttman Guest

    UG> my %seen ;
    UG> while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
    UG> $seen{$x} = 1; print $x }

    TZ> This will grow pretty quickly with a hash. Bit::Vector already has
    TZ> Bit_On($index) and bit_test($index) so memory usage and probably
    TZ> performance will be a bit (heh) better.

    he said he wanted 1k random numbers out of a large range so a hash would
    be fine for that.

    uri
     
    Uri Guttman, Jun 1, 2010
    #15
  16. I'd be very surprised if Bit::Vector had faster performance, at least
    until the other method started swapping.

    Xho
     
    Xho Jingleheimerschmidt, Jun 2, 2010
    #16
  17. There are many ways to implement this, and which is best depends on how
    large of a set you are sampling and how densely you are sampling it, as
    well as perhaps other things (do you know how many you need in advance,
    or will you know when you get there? Are you happy with standard rand()
    or do you want something better, etc.). I'm entirely unsurprised Perl
    doesn't provide this out of the box, as it is so context dependent, plus
    not all that common. It is trivial to implement for yourself, and throw
    your implementation into a toolkit module you use for all your code.

    Xho
     
    Xho Jingleheimerschmidt, Jun 2, 2010
    #17
  18. Peng Yu

    Ted Zlatanov Guest

    UG> my %seen ;
    UG> while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
    UG> $seen{$x} = 1; print $x }
    XJ> I'd be very surprised if Bit::Vector had faster performance, at least
    XJ> until the other method started swapping.

    It's C internally so performance is pretty good. But you're probably
    right anyhow, I wasn't thinking.


    UG> he said he wanted 1k random numbers out of a large range so a hash would
    UG> be fine for that.

    You're right, I wasn't paying enough attention.

    Ted
     
    Ted Zlatanov, Jun 2, 2010
    #18
  19. Peng Yu

    David Combs Guest

    I see it differently.

    He's got every right to get pissed off from time to time.

    How many hours a week does he sit in front of the terminal
    answering questions for us (when he could be making money,
    presumably, a nice sacrifice for us all!) and after a while
    of being eaten by "I have to get back to WORK sometime --
    so I can pay some BILLS" or whatever else might be eating
    him now and then, that almost it doesn't matter WHAT the
    next question is, the least perceived-foolishness/stupidity
    in it will elicit a ROAR and accompanying dragon-mouth FLAME
    BLAST.

    That done, he bypasses a few questions he might have liked
    answering, and he's back to his nice helpful self, again
    answering hairy questions.

    So thank you, Uri, for being there!

    David
     
    David Combs, Jun 26, 2010
    #19
  20. Peng Yu

    David Combs Guest

    WERE this to be a cpan module, maybe a mix would be nice. Bit vector for
    the first N of them, N of course requiring N/8 bytes preallocated probably,
    and for > N, the hash scheme?

    David
     
    David Combs, Jun 26, 2010
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.