randomly choose some uniq elements of an array

Discussion in 'Perl Misc' started by Martin Kissner, Jan 19, 2006.

  1. hello together,

    I want to choose a number of files from a directory randomly.

    This is what I have so far (after reading "perlodc -f "How do I shuffle
    an array randomly?"").


    use warnings;
    use strict;

    my $dir = "/path/to/folder";
    opendir DH, $dir || die "can not open $dir: $!";
    my @files = grep !/^\.\.?$|/ ,readdir DH;
    closedir DH;

    my @shuffled;
    while (@files) {
    # the FAQ says this is bad
    push(@shuffled, splice(@files, rand @files, 1));

    for (1..3) {
    print pop @shuffled,"\n";

    I am pretty sure that there is a better an simpler solution.
    Any suggestions will be appreciated.

    Martin Kissner, Jan 19, 2006
    1. Advertisements

  2. No need to worry unless the array is large.
    It appears as if you don't need to shuffle the whole array.

    for (1..3) { print splice(@files, rand @files, 1), "\n" }
    Gunnar Hjalmarsson, Jan 19, 2006
    1. Advertisements

  3. Gunnar Hjalmarsson wrote :
    Thank you very much.
    I knew that there must be something simple like this ;).

    Best Regards
    Martin Kissner, Jan 19, 2006
  4. Martin Kissner

    xhoster Guest

    Is "a number" always going to be 3, or at least small?
    Apparently you have read the FAQ. So why did you implement the method the
    FAQ said was bad, rather than one of the methods that it said were good?
    Reading the FAQ so that you can do exactly what they tell you *not* to do
    seems like an odd programming strategy.
    Yes, the FAQ pointed you to two of them. Now, the fact that you only take
    the first 3 elements allows you to engage in certain optimizations, but
    they are almost surely not worthwhile and will make things more complex
    rather than simpler.

    xhoster, Jan 19, 2006
  5. Usually correct, of course. But please check out this thread, where this
    issue was discussed in considerable detail:
    No, it didn't. The FAQ discusses shuffling of a whole array, while the
    OP wanted to pick a few elements.
    Xho, did you read the rest of the thread before posting your follow-up?
    Gunnar Hjalmarsson, Jan 19, 2006
  6. Martin Kissner

    xhoster Guest

    Er, what I got out of that thread was that when the list is small, the FAQ
    way is better but who cares as they are generally fast anyway, and when the
    list is large, the FAQ way is much better.

    I'm all for questioning/challening the FAQs, I do it occasionally myself.
    But if you are doing so, you might want to come out and explicitly state
    your question or challenge.
    As I said, the optimizations that this allows are almost certainly not
    worthwhile. In my book, that doesn't make them better solutions. Keeping
    all of his code as it is but replacing the splice-based shuffle with a the
    FAQ shuffle is better if the list of files is very large.
    There is no such thing as "the rest of the thread". Usenet posts show up
    when they show up. That will be a different time and order for different

    If you are refering to your post where you propose switching from the
    original n*n+k algorithm to a k*n algorithm, I don't recall whether I read
    it before or after I suggested going with a n+k algorithm which I consider
    to be simpler as well as more scalable. So even if I did read your post
    first, I still thought my opinion was worth stating.

    xhoster, Jan 19, 2006
  7. For the purpose I need the script now, it's always 3.
    I had a working solution. I mainly posted to expand my understanding of
    I needed the script for a CGI based website (for showing random images
    of the alps to be precise ;) )
    Implemented the method wich I understood quickly enough to go on.
    Since my array is not very large I was satisfied for the moment.

    Thank you for your feedback.
    Martin Kissner, Jan 19, 2006
  8. This is how I read that thread: One of the FAQ ways is clearly slower
    for small arrays, and the other FAQ way may or may not be slower
    depending on whether you take into account the time for loading the
    module and which platform you are on.
    I fail to see why the shuffle entry in the FAQ would be an appropriate
    starting point in the first place for the problem described by the OP. I
    also fail to see how you can be so certain that the OP's list of files
    would be large enough to make the FAQ's shuffle approaches relevant from
    an efficiency point of view.
    Of course there is.
    That's true, but unless your newsserver delayed my post more than 1.5h,
    it's not relevant, is it?
    That did surprise me. This was the solution I had suggested:

    for (1..3) { print splice(@files, rand @files, 1), "\n" }
    I didn't mean to say anything else. My apologies if I gave you another
    Gunnar Hjalmarsson, Jan 20, 2006
  9. Martin Kissner

    usenet Guest

    You are going to a lot of trouble there, my friend.

    use List::Utils;
    my @shuffled_list = shuffle( @list );

    or if you don't really need to save a shuffled copy of the array, but
    you just want to iterate over a shuffled version of the original:

    foreach my $random_item ( shuffle @list ) {

    So very simple!
    usenet, Jan 20, 2006
  10. Martin Kissner

    usenet Guest

    Urp, sorry... code above was buggy and incomplete - disregard...

    Here's an easy (and tested, dammit) way to print three random items
    from an array:

    use strict; use warnings;
    use List::Utils;

    my @b = qw{a b c d e f g};

    foreach my $random_item ( (shuffle @b)[0-2] ) {

    usenet, Jan 20, 2006
  11. Martin Kissner

    usenet Guest

    Arrgh - make that [0..2]
    usenet, Jan 20, 2006
  12. Seems as if you are the friend who are going through a lot of trouble. ;-)

    Why do you think it makes sense to shuffle the whole array in order to
    pick three elements?

    Did you read the rest of the thread before posting your follow-up?
    Gunnar Hjalmarsson, Jan 20, 2006
  13. Martin Kissner

    usenet Guest

    No, sorry - I'm only banging on three cylinders today, I guess...

    I'll shut up now...
    usenet, Jan 20, 2006
  14. Martin Kissner

    xhoster Guest

    Probably for the same reason that you think it makes sense to splice the
    whole array (i.e. do a length-changing splice) in order to pick three

    Both shuffle and length-changing splice are linear in the size of the
    array, although admittedly the multiplier for splice is lower than for
    shuffle. On the other hand, the shuffle only needs to be done once, while
    splice is done once for each element chosen.

    If one is really concerned about efficiency, then the below would be a good
    way to go. It uses a length-preserving splice, which is very fast compared
    to a length-changing splice:

    for (1..$k) {
    push @selected, splice(@files, rand @files, 1, $files[-1]);
    pop @files;

    If one isn't really concerned about efficiency, then who cares that we are
    shuffling the entire array just to pick 3 elements?

    xhoster, Jan 20, 2006
  15. The precedence of '||' is too high for that to work correctly (it will never
    die.) You have to either use parentheses:

    opendir( DH, $dir ) || die "can not open $dir: $!";

    Or use the lower precedence 'or' operator:

    opendir DH, $dir or die "can not open $dir: $!";

    Your regular expression says to match the pattern '^\.\.?$' or alternatively
    to match nothing and since the expression is negated that means that @files
    will receive nothing!

    my @files = grep !/^\.\.?$/, readdir DH;

    my @shuffled = map splice( @files, rand @files, 1 ), 1 .. 3;

    John W. Krahn, Jan 21, 2006
  16. John W. Krahn wrote :
    Thanks for pointing that out. I once knew this, but I forgot.
    Actually its grep "!/^\.\.?$|\.DS_Store/" to match the .DS_Store files
    used by Mac OS X. I deleted this in the post for the sake of simplicity
    - not carefully enough as I must see now ;).

    Thanks for your feedback
    Martin Kissner, Jan 22, 2006
  17. Martin Kissner

    robic0 Guest

    Not sure where this follow will follow but..
    I want to remind the OP that opendir will actually "change directory" on the OS
    level so that a subsiquent glob will read from that directory.
    The "opendir" is only meant to facillitate not having to parse a directory from
    the filename. I strongly suggest to the OP to NOT use opendir at all, ever,
    and to stay in the current directory and call glob with a path, then parse out
    the filename.

    I consider "opendir" and company one of Perls worst alloweces, without allowances
    for an immediate "pop" to the last directory when "glob"'ing. Its a "fopa", period!

    A simple regex can separate the filename. How can you call another function after
    changing dir's, and expect them to know where you are? And don't make a overrideing
    class function out of it, Jesus!

    "Opendir" is Perls fopa....
    robic0, Jan 22, 2006
  18. Martin Kissner

    Paul Lalli Guest

    It will be followed by anyone who hasn't plonked you pointing out that
    you're simply completely wrong.
    You're simply completely wrong.

    $ ls
    in_main_dir.txt subdir/
    $ ls subdir
    $ perl -le'opendir my $dh, q{subdir} or die $!; print for <*>'

    Where on earth did you get the idea that opendir() changes the current
    directory? Are you confusing opendir() with chdir()?
    Wow. That's almost complete idiocy.
    What on earth is a "fopa"? If you're going to use your own personal
    slang, perhaps you should attempt to explain what you mean by it.

    And frankly, I can't imagine who could possibly care what you consider
    about Perl's built-ins, given your demonstrated complete lack of any
    knowledge about the topic.
    Who is them? Who's changing directories? What in the hell are you
    talking about?

    More senseless drivel. Clearly, your goal is to get *everyone* to
    plonk you. I assure you, you're well on your way.

    Paul Lalli
    Paul Lalli, Jan 22, 2006
  19. Martin Kissner

    Anno Siegel Guest

    True, arrays must be huge to make a difference.

    However, your loop works just as well with a swapping technique, without

    for ( 1 .. $k ) {
    @files[ $_, -1] = @files[ -1, $_] for rand @files;
    push @selected, pop @files;

    It generates a different sequence, even with a seeded random generator,
    but the sequence is just as random.

    Anno Siegel, Jan 23, 2006
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.