randomly choose some uniq elements of an array

M

Martin Kissner

hello together,

I want to choose a number of files from a directory randomly.

This is what I have so far (after reading "perlodc -f "How do I shuffle
an array randomly?"").

-------
#!/usr/bin/perl

use warnings;
use strict;

my $dir = "/path/to/folder";
opendir DH, $dir || die "can not open $dir: $!";
my @files = grep !/^\.\.?$|/ ,readdir DH;
closedir DH;

my @shuffled;
while (@files) {
# the FAQ says this is bad
push(@shuffled, splice(@files, rand @files, 1));
}

for (1..3) {
print pop @shuffled,"\n";
}
-------

I am pretty sure that there is a better an simpler solution.
Any suggestions will be appreciated.

Regards
Martin
 
G

Gunnar Hjalmarsson

Martin said:
I want to choose a number of files from a directory randomly.

This is what I have so far (after reading "perlodc -f "How do I shuffle
an array randomly?"").

my @shuffled;
while (@files) {
# the FAQ says this is bad

No need to worry unless the array is large.
push(@shuffled, splice(@files, rand @files, 1));
}

for (1..3) {
print pop @shuffled,"\n";
}

It appears as if you don't need to shuffle the whole array.

for (1..3) { print splice(@files, rand @files, 1), "\n" }
 
M

Martin Kissner

Gunnar Hjalmarsson wrote :
It appears as if you don't need to shuffle the whole array.

for (1..3) { print splice(@files, rand @files, 1), "\n" }

Thank you very much.
I knew that there must be something simple like this ;).

Best Regards
Martin
 
X

xhoster

Martin Kissner said:
hello together,

I want to choose a number of files from a directory randomly.

Is "a number" always going to be 3, or at least small?
This is what I have so far (after reading "perlodc -f "How do I shuffle
an array randomly?"").

-------
#!/usr/bin/perl

use warnings;
use strict;

my $dir = "/path/to/folder";
opendir DH, $dir || die "can not open $dir: $!";
my @files = grep !/^\.\.?$|/ ,readdir DH;
closedir DH;

my @shuffled;
while (@files) {
# the FAQ says this is bad
push(@shuffled, splice(@files, rand @files, 1));
}

Apparently you have read the FAQ. So why did you implement the method the
FAQ said was bad, rather than one of the methods that it said were good?
Reading the FAQ so that you can do exactly what they tell you *not* to do
seems like an odd programming strategy.
for (1..3) {
print pop @shuffled,"\n";
}

Yes, the FAQ pointed you to two of them. Now, the fact that you only take
the first 3 elements allows you to engage in certain optimizations, but
they are almost surely not worthwhile and will make things more complex
rather than simpler.

Xho
 
G

Gunnar Hjalmarsson

Apparently you have read the FAQ. So why did you implement the method the
FAQ said was bad, rather than one of the methods that it said were good?
Reading the FAQ so that you can do exactly what they tell you *not* to do
seems like an odd programming strategy.

Usually correct, of course. But please check out this thread, where this
issue was discussed in considerable detail:
http://groups.google.com/group/comp.lang.perl.misc/browse_frm/thread/da705489201879ba
Yes, the FAQ pointed you to two of them.

No, it didn't. The FAQ discusses shuffling of a whole array, while the
OP wanted to pick a few elements.
Now, the fact that you only take
the first 3 elements allows you to engage in certain optimizations, but
they are almost surely not worthwhile and will make things more complex
rather than simpler.

Xho, did you read the rest of the thread before posting your follow-up?
http://groups.google.com/group/comp.lang.perl.misc/browse_frm/thread/b9b4eeb7aabaf998
 
X

xhoster

Gunnar Hjalmarsson said:
Usually correct, of course. But please check out this thread, where this
issue was discussed in considerable detail:
http://groups.google.com/group/comp.lang.perl.misc/browse_frm/thread/da70
5489201879ba

Er, what I got out of that thread was that when the list is small, the FAQ
way is better but who cares as they are generally fast anyway, and when the
list is large, the FAQ way is much better.

I'm all for questioning/challening the FAQs, I do it occasionally myself.
But if you are doing so, you might want to come out and explicitly state
your question or challenge.
No, it didn't. The FAQ discusses shuffling of a whole array, while the
OP wanted to pick a few elements.

As I said, the optimizations that this allows are almost certainly not
worthwhile. In my book, that doesn't make them better solutions. Keeping
all of his code as it is but replacing the splice-based shuffle with a the
FAQ shuffle is better if the list of files is very large.
Xho, did you read the rest of the thread before posting your follow-up?

There is no such thing as "the rest of the thread". Usenet posts show up
when they show up. That will be a different time and order for different
people.

If you are refering to your post where you propose switching from the
original n*n+k algorithm to a k*n algorithm, I don't recall whether I read
it before or after I suggested going with a n+k algorithm which I consider
to be simpler as well as more scalable. So even if I did read your post
first, I still thought my opinion was worth stating.

Xho
 
M

Martin Kissner

Is "a number" always going to be 3, or at least small?

For the purpose I need the script now, it's always 3.
I had a working solution. I mainly posted to expand my understanding of
Perl.
Apparently you have read the FAQ. So why did you implement the method the
FAQ said was bad, rather than one of the methods that it said were good?
Reading the FAQ so that you can do exactly what they tell you *not* to do
seems like an odd programming strategy.

I needed the script for a CGI based website (for showing random images
of the alps to be precise ;) )
Implemented the method wich I understood quickly enough to go on.
Since my array is not very large I was satisfied for the moment.

Thank you for your feedback.
Martin
 
G

Gunnar Hjalmarsson

Er, what I got out of that thread was that when the list is small, the FAQ
way is better

This is how I read that thread: One of the FAQ ways is clearly slower
for small arrays, and the other FAQ way may or may not be slower
depending on whether you take into account the time for loading the
module and which platform you are on.
As I said, the optimizations that this allows are almost certainly not
worthwhile.

I fail to see why the shuffle entry in the FAQ would be an appropriate
starting point in the first place for the problem described by the OP. I
also fail to see how you can be so certain that the OP's list of files
would be large enough to make the FAQ's shuffle approaches relevant from
an efficiency point of view.
There is no such thing as "the rest of the thread".

Of course there is.
Usenet posts show up when they show up. That will be a different time
and order for different people.

That's true, but unless your newsserver delayed my post more than 1.5h,
it's not relevant, is it?
If you are refering to your post where you propose switching from the
original n*n+k algorithm to a k*n algorithm, I don't recall whether I read
it before or after I suggested going with a n+k algorithm which I consider
to be simpler as well as more scalable.

That did surprise me. This was the solution I had suggested:

for (1..3) { print splice(@files, rand @files, 1), "\n" }
So even if I did read your post
first, I still thought my opinion was worth stating.

I didn't mean to say anything else. My apologies if I gave you another
impression.
 
U

usenet

Martin said:
I want to choose a number of files from a directory randomly.

You are going to a lot of trouble there, my friend.

use List::Utils;
my @shuffled_list = shuffle( @list );

or if you don't really need to save a shuffled copy of the array, but
you just want to iterate over a shuffled version of the original:

foreach my $random_item ( shuffle @list ) {
...
}


So very simple!
 
U

usenet

CRAP CODE

Urp, sorry... code above was buggy and incomplete - disregard...

Here's an easy (and tested, dammit) way to print three random items
from an array:

#!/usr/bin/perl
use strict; use warnings;
use List::Utils;

my @b = qw{a b c d e f g};

foreach my $random_item ( (shuffle @b)[0-2] ) {
print;
}

__END__
 
G

Gunnar Hjalmarsson

Martin Kissner wrote:

I want to choose a number of files from a directory randomly.

You are going to a lot of trouble there, my friend.

use List::Utils;
my @shuffled_list = shuffle( @list );

or if you don't really need to save a shuffled copy of the array, but
you just want to iterate over a shuffled version of the original:

foreach my $random_item ( shuffle @list ) {
...
}

So very simple!

Urp, sorry... code above was buggy and incomplete - disregard...

Here's an easy (and tested, dammit) way to print three random items
from an array:

#!/usr/bin/perl
use strict; use warnings;
use List::Utils;

my @b = qw{a b c d e f g};

foreach my $random_item ( (shuffle @b)[0-2] ) {
print;
}

Arrgh - make that [0..2]

Seems as if you are the friend who are going through a lot of trouble. ;-)

Why do you think it makes sense to shuffle the whole array in order to
pick three elements?

Did you read the rest of the thread before posting your follow-up?
 
U

usenet

Gunnar said:
Did you read the rest of the thread before posting your follow-up?

No, sorry - I'm only banging on three cylinders today, I guess...

I'll shut up now...
 
X

xhoster

Gunnar Hjalmarsson said:
(e-mail address removed) wrote:

Why do you think it makes sense to shuffle the whole array in order to
pick three elements?

Probably for the same reason that you think it makes sense to splice the
whole array (i.e. do a length-changing splice) in order to pick three
elements.

Both shuffle and length-changing splice are linear in the size of the
array, although admittedly the multiplier for splice is lower than for
shuffle. On the other hand, the shuffle only needs to be done once, while
splice is done once for each element chosen.

If one is really concerned about efficiency, then the below would be a good
way to go. It uses a length-preserving splice, which is very fast compared
to a length-changing splice:

for (1..$k) {
push @selected, splice(@files, rand @files, 1, $files[-1]);
pop @files;
}

If one isn't really concerned about efficiency, then who cares that we are
shuffling the entire array just to pick 3 elements?

Xho
 
J

John W. Krahn

Martin said:
I want to choose a number of files from a directory randomly.

This is what I have so far (after reading "perlodc -f "How do I shuffle
an array randomly?"").

-------
#!/usr/bin/perl

use warnings;
use strict;

my $dir = "/path/to/folder";
opendir DH, $dir || die "can not open $dir: $!";

The precedence of '||' is too high for that to work correctly (it will never
die.) You have to either use parentheses:

opendir( DH, $dir ) || die "can not open $dir: $!";

Or use the lower precedence 'or' operator:

opendir DH, $dir or die "can not open $dir: $!";

my @files = grep !/^\.\.?$|/ ,readdir DH;
^
Your regular expression says to match the pattern '^\.\.?$' or alternatively
to match nothing and since the expression is negated that means that @files
will receive nothing!

my @files = grep !/^\.\.?$/, readdir DH;

closedir DH;

my @shuffled;
while (@files) {
# the FAQ says this is bad
push(@shuffled, splice(@files, rand @files, 1));
}

for (1..3) {
print pop @shuffled,"\n";
}

my @shuffled = map splice( @files, rand @files, 1 ), 1 .. 3;



John
 
M

Martin Kissner

John W. Krahn wrote :
Martin Kissner wrote:

The precedence of '||' is too high for that to work correctly (it will never
die.) You have to either use parentheses:

opendir( DH, $dir ) || die "can not open $dir: $!";

Or use the lower precedence 'or' operator:

opendir DH, $dir or die "can not open $dir: $!";

Thanks for pointing that out. I once knew this, but I forgot.
^
Your regular expression says to match the pattern '^\.\.?$' or alternatively
to match nothing and since the expression is negated that means that @files
will receive nothing!
Actually its grep "!/^\.\.?$|\.DS_Store/" to match the .DS_Store files
used by Mac OS X. I deleted this in the post for the sake of simplicity
- not carefully enough as I must see now ;).

Thanks for your feedback
Regards
Martin
 
R

robic0

John W. Krahn wrote :

Thanks for pointing that out. I once knew this, but I forgot.

Actually its grep "!/^\.\.?$|\.DS_Store/" to match the .DS_Store files
used by Mac OS X. I deleted this in the post for the sake of simplicity
- not carefully enough as I must see now ;).

Thanks for your feedback
Regards
Martin
Not sure where this follow will follow but..
I want to remind the OP that opendir will actually "change directory" on the OS
level so that a subsiquent glob will read from that directory.
The "opendir" is only meant to facillitate not having to parse a directory from
the filename. I strongly suggest to the OP to NOT use opendir at all, ever,
and to stay in the current directory and call glob with a path, then parse out
the filename.

I consider "opendir" and company one of Perls worst alloweces, without allowances
for an immediate "pop" to the last directory when "glob"'ing. Its a "fopa", period!

A simple regex can separate the filename. How can you call another function after
changing dir's, and expect them to know where you are? And don't make a overrideing
class function out of it, Jesus!

"Opendir" is Perls fopa....
 
P

Paul Lalli

robic0 said:
Not sure where this follow will follow but..

It will be followed by anyone who hasn't plonked you pointing out that
you're simply completely wrong.
I want to remind the OP that opendir will actually "change directory" on the OS
level so that a subsiquent glob will read from that directory.

You're simply completely wrong.

$ ls
in_main_dir.txt subdir/
$ ls subdir
in_sub_dir.txt
$ perl -le'opendir my $dh, q{subdir} or die $!; print for <*>'
in_main_dir.txt
subdir

Where on earth did you get the idea that opendir() changes the current
directory? Are you confusing opendir() with chdir()?
The "opendir" is only meant to facillitate not having to parse a directory from
the filename. I strongly suggest to the OP to NOT use opendir at all, ever,
and to stay in the current directory and call glob with a path, then parse out
the filename.

Wow. That's almost complete idiocy.
I consider "opendir" and company one of Perls worst alloweces, without allowances
for an immediate "pop" to the last directory when "glob"'ing. Its a "fopa", period!

What on earth is a "fopa"? If you're going to use your own personal
slang, perhaps you should attempt to explain what you mean by it.

And frankly, I can't imagine who could possibly care what you consider
about Perl's built-ins, given your demonstrated complete lack of any
knowledge about the topic.
A simple regex can separate the filename. How can you call another function after
changing dir's, and expect them to know where you are?

Who is them? Who's changing directories? What in the hell are you
talking about?

More senseless drivel. Clearly, your goal is to get *everyone* to
plonk you. I assure you, you're well on your way.

Paul Lalli
 
A

Anno Siegel

Gunnar Hjalmarsson said:
(e-mail address removed) wrote:

Why do you think it makes sense to shuffle the whole array in order to
pick three elements?

Probably for the same reason that you think it makes sense to splice the
whole array (i.e. do a length-changing splice) in order to pick three
elements.

Both shuffle and length-changing splice are linear in the size of the
array, although admittedly the multiplier for splice is lower than for
shuffle. On the other hand, the shuffle only needs to be done once, while
splice is done once for each element chosen.

If one is really concerned about efficiency, then the below would be a good
way to go. It uses a length-preserving splice, which is very fast compared
to a length-changing splice:

for (1..$k) {
push @selected, splice(@files, rand @files, 1, $files[-1]);
pop @files;
}

If one isn't really concerned about efficiency, then who cares that we are
shuffling the entire array just to pick 3 elements?

True, arrays must be huge to make a difference.

However, your loop works just as well with a swapping technique, without
splice:

for ( 1 .. $k ) {
@files[ $_, -1] = @files[ -1, $_] for rand @files;
push @selected, pop @files;
}

It generates a different sequence, even with a seeded random generator,
but the sequence is just as random.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top