effienct way to select random value from the hash

viki · Apr 16, 2009

sub RandomElement { # select random element from the hash
my ($hashref) = @_;
my @keys = keys(%{$hashref});
my $random = $hashref->{ $keys[ int(rand( @keys )) ]};
}

When the hash is large, the creation of @keys for every
invocation is invefficient. It's waste that's O((N)) for every call to
RandomElement.

What would be the fastest version of RandomElement ?

Thanks
Viki

xhoster · Apr 16, 2009

viki said:
sub RandomElement { # select random element from the hash
my ($hashref) = @_;
my @keys = keys(%{$hashref});
my $random = $hashref->{ $keys[ int(rand( @keys )) ]};
}

When the hash is large, the creation of @keys for every
invocation is invefficient. It's waste that's O((N)) for every call to
RandomElement.

Use a different data structure.

What would be the fastest version of RandomElement ?

That depends on what is done with $hashref between calls to RandomElement.

And on how random the random element must be.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Jürgen Exner · Apr 16, 2009

viki said:
sub RandomElement { # select random element from the hash
my ($hashref) = @_;
my @keys = keys(%{$hashref});
my $random = $hashref->{ $keys[ int(rand( @keys )) ]};
}

When the hash is large, the creation of @keys for every
invocation is invefficient. It's waste that's O((N)) for every call to
RandomElement.

If you flatten the hash into a list then the keys will be every other
element of that list which you could easily pick by multiplying the
random number by 2 (after adjusting the range accordingly).
I don't know for sure, but treating $hashref as a reference to an array
might do the trick already.

jue

John W. Krahn · Apr 16, 2009

viki said:
sub RandomElement { # select random element from the hash
my ($hashref) = @_;
my @keys = keys(%{$hashref});
my $random = $hashref->{ $keys[ int(rand( @keys )) ]};
}

When the hash is large, the creation of @keys for every
invocation is invefficient. It's waste that's O((N)) for every call to
RandomElement.

What would be the fastest version of RandomElement ?

This appears to be a bit faster:

sub RandomElement { # select random element from the hash
my ( $hashref ) = @_;
my $count;
my $random = ( grep 1 > rand ++$count, values %$hashref )[ -1 ];
}

John

John W. Krahn · Apr 16, 2009

Ben said:
Quoth "John W. Krahn said:

viki said:

sub RandomElement { # select random element from the hash
my ($hashref) = @_;
my @keys = keys(%{$hashref});
my $random = $hashref->{ $keys[ int(rand( @keys )) ]};
}

When the hash is large, the creation of @keys for every
invocation is invefficient. It's waste that's O((N)) for every call to
RandomElement.

What would be the fastest version of RandomElement ?

Click to expand...

This appears to be a bit faster:

sub RandomElement { # select random element from the hash
my ( $hashref ) = @_;
my $count;
my $random = ( grep 1 > rand ++$count, values %$hashref )[ -1 ];
}

Click to expand...

Presumably

sub RandomElement {
my ($hashref) = @_;
my ($count, $got);

scalar keys %$hashref; # reset iterator
while (1) {
my ($k, $v) = each %$hashref;
$got = $v if 1 > rand ++$count;
}
return $got;
}

would be more efficient for a large hash, since it doesn't build a list
of the values. Since none of these are using the hash for lookup, it
would be better to keep the values in an array and just use
$ary[rand @ary] instead.

Your while (1) loop doesn't end so it would probably be less efficient.

ITYM:

while ( my ( $k, $v ) = each %$hashref ) {
$got = $v if 1 > rand ++$count;
}

But under my Benchmark tests that is less efficient than the OP's version.

John

smallpond · Apr 16, 2009

sub RandomElement { # select random element from the hash
my ($hashref) = @_;
my @keys = keys(%{$hashref});
my $random = $hashref->{ $keys[ int(rand( @keys )) ]};

}

When the hash is large, the creation of @keys for every
invocation is invefficient. It's waste that's O((N)) for every call to
RandomElement.

What would be the fastest version of RandomElement ?

Thanks
Viki

sub ran3($) {
my ($hashref) = @_;
my @vals = values(%{$hashref});
$vals[ int(rand( @vals )) ];
}

10,000 calls to this took 2.341 s

to yours took 3.947 s

I also tried flattening the whole hash into
key/value pairs and choosing an odd element.
That took 6.161 s.

John W. Krahn · Apr 16, 2009

Ben said:
Quoth "John W. Krahn said:

Ben said:

Presumably

sub RandomElement {
my ($hashref) = @_;
my ($count, $got);

scalar keys %$hashref; # reset iterator
while (1) {
my ($k, $v) = each %$hashref;
$got = $v if 1 > rand ++$count;
}
return $got;
}

would be more efficient for a large hash, since it doesn't build a list
of the values. Since none of these are using the hash for lookup, it
would be better to keep the values in an array and just use
$ary[rand @ary] instead.

Click to expand...

Your while (1) loop doesn't end so it would probably be less efficient.

Click to expand...

I had initially forgotten that the algorithm needs to iterate over the
whole hash, and was planning something like

last if 1 > rand...;

to exit early.

ITYM:

while ( my ( $k, $v ) = each %$hashref ) {
$got = $v if 1 > rand ++$count;
}

But under my Benchmark tests that is less efficient than the OP's version.

Click to expand...

Interesting. Presumably it's because there are more ops involved, and a
new BLOCK every time around the while loop.

If you exit early then it is not truly random. You have to iterate over
the complete list to get a random element.

Actually, this appears to be a lot more efficient than the others:

sub RandomElement3 {
my ( $hashref ) = @_;
my $index = rand values %$hashref;
my $random = ( values %$hashref )[ $index ];
}

John

A. Sinan Unur · Apr 16, 2009

sub RandomElement { # select random element from the hash
my ($hashref) = @_;
my @keys = keys(%{$hashref});
my $random = $hashref->{ $keys[ int(rand( @keys )) ]};

}

When the hash is large, the creation of @keys for every
invocation is invefficient. It's waste that's O((N)) for every call
to RandomElement.

What would be the fastest version of RandomElement ?

Click to expand...

....

sub ran3($) {
my ($hashref) = @_;
my @vals = values(%{$hashref});
$vals[ int(rand( @vals )) ];
}

10,000 calls to this took 2.341 s

Interesting because here is what I get:

C:\Temp> cat t.pl

#!/usr/bin/perl

use strict;
use warnings;
use Benchmark qw( cmpthese );

my $h;

for (1 .. 10_000) {
$h->{"key$_"} = rand;
}

cmpthese -5, {
smallpond => sub { ran3($h) },
sinan => sub { ran42($h) },
};

sub ran3 {
my ($h) = @_;
my @v = values %$h;
$v[rand @v];
}

sub ran42 {
my ( $h ) = @_;
(values %$h)[rand values %$h ];
}

__END__

C:\Temp> t
Rate smallpond sinan
smallpond 288/s -- -82%
sinan 1569/s 446% --

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

John W. Krahn · Apr 17, 2009

darkon said:
Are you sure? The Perl FAQ has a code sample for selecting a random
line from a file without reading the entire file, and references Knuth
for a proof:

perldoc -q "random line"

perldoc -q "random line"

How do I select a random line from a file?

Here’s an algorithm from the Camel Book:

srand;
rand($.) < 1 && ($line = $_) while <>;

This has a significant advantage in space over reading the whole
file in. You can find a proof of this method in The Art of Computer
Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.

Please read the code carefully. At no point does it exit the while loop.

The text says "reading the whole file in" which the code does not do.
But it does read the whole file one line at a time.

John

Eric Pozharski · Apr 18, 2009

On 2009-04-16 said:
I don't know for sure, but treating $hashref as a reference to an array
might do the trick already.

No, it doesn't

{2565:11} [0:0]$ perl -Mstrict -wle 'my $x= { a => "b" }; print scalar @$x'
Not an ARRAY reference at -e line 1.

However,

{2565:11} [0:0]$ perl -Mstrict -wle 'my $x= { a => "b" }; print scalar(() = %$x)'
2

And one more point of concern: serializing hash into array is a subject
of Perl's own key reordering. Maybe just do it once?

Jürgen Exner · Apr 18, 2009

Eric Pozharski said:
On 2009-04-16 said:

I don't know for sure, but treating $hashref as a reference to an array
might do the trick already.

Click to expand...

No, it doesn't

{2565:11} [0:0]$ perl -Mstrict -wle 'my $x= { a => "b" }; print scalar @$x'
Not an ARRAY reference at -e line 1.

However,

{2565:11} [0:0]$ perl -Mstrict -wle 'my $x= { a => "b" }; print scalar(() = %$x)'
2
Thanks!

And one more point of concern: serializing hash into array is a subject
of Perl's own key reordering. Maybe just do it once?

Hmmm, when would perl reorder the keys? I would assume only when the
hash itself changes, i.e. when elements are added/removed. And in that
case probably you want to restart the random sequence anyway.

But the key question still remains unanswered: is
serialization/flattening of the hash faster than a keys()?

jue

Eric Pozharski · Apr 19, 2009

Hmmm, when would perl reorder the keys? I would assume only when the
hash itself changes, i.e. when elements are added/removed. And in that
case probably you want to restart the random sequence anyway.

My point was, that when picking random item of already randomized
sequence, the picked item becomes more or less random? I don't know.

But the key question still remains unanswered: is
serialization/flattening of the hash faster than a keys()?

That's easy

perl -MBenchmark=cmpthese,timethese -wle '
my %h = qw| a b c d e f g h i j k l m n |;
my @ph = %h;
my $z;
cmpthese timethese -5, {
keys => sub { $z = (keys %h)[2] },
flat => sub { $z = (() = %h)[4] },
pflt => sub { $z = $ph[4] },
};
'
Benchmark:
running
flat, keys, pflt
for at least 5 CPU seconds
...

flat: 5 wallclock secs ( 5.07 usr + 0.02 sys = 5.09 CPU) @ 203814.73/s (n=1037417)

keys: 5 wallclock secs ( 5.05 usr + 0.01 sys = 5.06 CPU) @ 187490.71/s (n=948703)

pflt: 6 wallclock secs ( 5.74 usr + 0.00 sys = 5.74 CPU) @ 1735072.47/s (n=9959316)

Rate keys flat pflt
keys 187491/s -- -8% -89%
flat 203815/s 9% -- -88%
pflt 1735072/s 825% 751% --

And, the sequence is the same:

perl -we '
my %h = qw| a b c d e f g h i j k l m n |;
my @ph = %h;
print "$_ " foreach keys %h;
print "\n";
print "$ph[$_ * 2] " for 0 .. $#ph / 2;
print "\n";
'
e c k a g m i
e c k a g m i

Xho Jingleheimerschmidt · Apr 19, 2009

Eric said:
My point was, that when picking random item of already randomized
sequence, the picked item becomes more or less random? I don't know.

The order in a hash is not random, it is arbitrary. (In some
implementations of perl, in some situations, at some times, it may be
close to randomized.)

Xho

Eric Pozharski · Apr 20, 2009

Eric Pozharski wrote: *SKIP*

The order in a hash is not random, it is arbitrary. (In some
implementations of perl, in some situations, at some times, it may be
close to randomized.)

Oh, my god! It's even worse -- the order is arbitrary (good point,
BTW). In such circumstances, the only thing could help (me?) --
B<sort>. Luckily, I'm not the OP.

smallpond · Apr 20, 2009

smallpond said:
smallpond said:

sub RandomElement { # select random element from the hash
my ($hashref) = @_;
my @keys = keys(%{$hashref});
my $random = $hashref->{ $keys[ int(rand( @keys )) ]};
}
When the hash is large, the creation of @keys for every
invocation is invefficient. It's waste that's O((N)) for every call
to RandomElement.
What would be the fastest version of RandomElement ?

Click to expand...

...

sub ran3($) {
my ($hashref) = @_;
my @vals = values(%{$hashref});
$vals[ int(rand( @vals )) ];
}

Click to expand...

10,000 calls to this took 2.341 s

Click to expand...

Interesting because here is what I get:

C:\Temp> cat t.pl

#!/usr/bin/perl

use strict;
use warnings;
use Benchmark qw( cmpthese );

my $h;

for (1 .. 10_000) {
$h->{"key$_"} = rand;

}

cmpthese -5, {
smallpond => sub { ran3($h) },
sinan => sub { ran42($h) },

};

sub ran3 {
my ($h) = @_;
my @v = values %$h;
$v[rand @v];

}

sub ran42 {
my ( $h ) = @_;
(values %$h)[rand values %$h ];

}

__END__

C:\Temp> t
Rate smallpond sinan
smallpond 288/s -- -82%
sinan 1569/s 446% --

Your code is clearly better since it avoids creating the
unneeded array. What I was trying to compare was
extracting values vs keys vs flattening the entire hash
and picking an odd element (a value). Using your
method, extracting the list of values still wins,
although maybe there is a better way to do ranf.

cmpthese -5, {
flatten => sub { ranf($h) },
values => sub { ranv($h) },
keys => sub { rank($h) },

};

sub ranf {
my ($h) = @_;
(%$h)[1 + 2 * rand (keys %$h)];
}

sub ranv {
my ($h) = @_;
(values %$h)[rand values %$h];
}

sub rank {
my ($h) = @_;
$h->{(keys %$h)[rand keys %$h]};
}

Rate flatten keys values
flatten 71.3/s -- -3% -87%
keys 73.5/s 3% -- -86%
values 542/s 659% 637% --

How to display input options only after selecting an option from the 'select class' tag JS?	6	May 12, 2023
scalar and hash slices -- is it supposed to work this way?	8	Sep 16, 2005
FAQ 4.50 How do I select a random element from an array?	0	Jan 5, 2011
data to hash	1	Jul 29, 2011
FAQ 4.10 Why aren't my random numbers random?	0	Apr 27, 2011
FAQ 4.59 How can I know how many entries are in a hash?	0	Jan 27, 2011
FAQ 4.60 How do I sort a hash (optionally by value instead of key)?	0	Mar 14, 2011
FAQ 4.55 How do I process an entire hash?	0	Apr 7, 2011

effienct way to select random value from the hash

viki

xhoster

Jürgen Exner

John W. Krahn

John W. Krahn

smallpond

John W. Krahn

A. Sinan Unur

John W. Krahn

Eric Pozharski

Jürgen Exner

Eric Pozharski

Xho Jingleheimerschmidt

Eric Pozharski

smallpond

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads