Maximum length of hash key

R

Robert Manea

Hello,

I'm searching for different code snippets (Golfers welcome) to solve the
following problem:

#v+

%category = {
'Mozilla' => 20,
'Forte Agent' => 12,
'slrn' => 45
};
#v-

Find the longest key (most characters) and return
'$longest_key_length + 1;'.

Since the hash can get pretty big, efficiency should be an issue.

Currently I'm using this:
#v+

my @arr; for ( keys %category ) {$arr[length $_] = defined;};
my $longest_key_length = $#arr+1;

#v-

in favor over the more golfish one liner:

#v+

my $longest_key_length =
1 + length +(sort { length $b <=> length $a } keys %category)[0];

#v-


Any other suggestions are highly appreciated.


Thx & Greets, Rob
 
C

Craig Dunn

Find the longest key (most characters) and return
'$longest_key_length + 1;'.
[snip]


in favor over the more golfish one liner:

my $longest_key_length =
1 + length +(sort { length $b <=> length $a } keys %category)[0];



Or another way:

my $l = (sort{$b<=>$a} map{length($_)} keys(%category))[0]+1;

Craig
 
P

Peter Corlett

Robert Manea said:
I'm searching for different code snippets (Golfers welcome) to solve the
following problem:
%category = {
'Mozilla' => 20,
'Forte Agent' => 12,
'slrn' => 45
};

I don't think that means what you think it means.
Find the longest key (most characters) and return
'$longest_key_length + 1;'.

If we're playing golf, I'm going to offer:

my $l;
$l|=$_ foreach keys %category; $l=1+length $l;
 
P

Paul Lalli

Robert Manea said:
Hello,

I'm searching for different code snippets (Golfers welcome) to solve the
following problem:

#v+

%category = {
'Mozilla' => 20,
'Forte Agent' => 12,
'slrn' => 45
};
#v-

Do you know you just created a hash with one key - a hash ref - that
contains no value?
Find the longest key (most characters) and return
'$longest_key_length + 1;'.

Since the hash can get pretty big, efficiency should be an issue.

Then sorting is definately not something you want to be doing.
Currently I'm using this:
my @arr; for ( keys %category ) {$arr[length $_] = defined;};

Why 'defined'? Why bother checking the return value of defined($_),
instead of just using 1?
my $longest_key_length = $#arr+1;


in favor over the more golfish one liner:

my $longest_key_length =
1 + length +(sort { length $b <=> length $a } keys %category)[0];

Again, sorting will be a costly operation if the point is just to find
the maximal value. An iterative solution would seem to be preferred:
Any other suggestions are highly appreciated.

%hash = ('foo'=>1, 'foobar'=>2, 'a'=>3, 'asdfasd'=>4, 'ab'=>5);
length($_) > $len and $len = length($_) for keys %hash;
print "Max Length plus 1 = " . ($len + 1) . "\n";


Paul Lalli
 
U

Uri Guttman

CD> [snip]
in favor over the more golfish one liner:

my $longest_key_length =
1 + length +(sort { length $b <=> length $a } keys %category)[0];

CD> Or another way:

CD> my $l = (sort{$b<=>$a} map{length($_)} keys(%category))[0]+1;

ewww! O(N log N) for an O(N) problem? use List::Util::max

uri
 
C

Craig Dunn

Robert Manea said:
Again, sorting will be a costly operation if the point is just to find
the maximal value. An iterative solution would seem to be preferred:


%hash = ('foo'=>1, 'foobar'=>2, 'a'=>3, 'asdfasd'=>4, 'ab'=>5);
length($_) > $len and $len = length($_) for keys %hash;
print "Max Length plus 1 = " . ($len + 1) . "\n";

Or back to the golf...

$len = length>$len?length:$len for keys %hash;$len++;
 
R

Robert Manea

Segfault in module "Peter Corlett" - dump details are as follows:
I don't think that means what you think it means.

You're pretty much thinking duly. Obviously I meant to mean '(' and ')'.
If we're playing golf, I'm going to offer:
my $l;
$l|=$_ foreach keys %category; $l=1+length $l;

That's a real nice one. Thanks for it.


Greets, Rob
 
U

Uri Guttman

RM> Find the longest key (most characters) and return
RM> '$longest_key_length + 1;'.

RM> Since the hash can get pretty big, efficiency should be an issue.

RM> my @arr; for ( keys %category ) {$arr[length $_] = defined;};
^^^^^^^

what is 'defined' in this case? it may actually work since keys are
always defined and $_ is aliased to each key and defined works on $_ by
default. but that is a FUGLY way to get a simple true value. just use 1.

RM> my $longest_key_length = $#arr+1;

@arr is better.

use List::Util qw( max ) ;

my $max_len = max( map length, keys %category ) + 1 ;

uri
 
U

Uri Guttman

RM> That's a real nice one. Thanks for it.

should be very slow as it has to copy and or each of the keys and you
said this is large hash. the solution i posted aliases to each key (so
no copying) and length and max are fast. for your education benchmark
them (on a large hash) and post the results.

uri
 
B

Big and Blue

Craig said:
Or back to the golf...

$len = length>$len?length:$len for keys %hash;$len++;

Shouldn't that be ... for $len keys ....

And there is this:

$k^=$_ for keys%hash;length($k)+1


However, this needs to read every character of every key - I would
expect the original idea (set an array with length(key) and then check the
length of the array) to be much quicker.
 
R

Robert Manea

Segfault in module "Uri Guttman" - dump details are as follows:

[...]
for your education benchmark them (on a large hash) and post the
results.

Well, here it goes...

The test code:
#v+

#!/usr/bin/perl
#

use strict;
use warnings;
use Benchmark qw:)all);

our %category;
for( 1 .. 5000 ) {
my $rand_key = 'x' x (int(rand(30)) + 1);
$category{$rand_key} = 1;
}

timethese(100000, {
'Array_Indices' => sub {
my @arr;
for ( keys %category ) {$arr[length $_] = 1;}
my $longest_key_length = $#arr+1;
},
'Sorting' => sub {
my $longest_key_length =
1 + length +(sort { length $b <=> length $a } keys %category)[0];
},
'ORing' => sub {
my $longest_key_len;
$longest_key_len |=$_ foreach keys %category;
$longest_key_len = 1 + length $longest_key_len;
},
'XORing' => sub {
my $longest_key_len;
$longest_key_len ^= $_ for keys %category;
$longest_key_len = length($longest_key_len)+1;
},
'Length_compare' => sub {
my $len=0;
$len = length>$len?length:$len for keys %category;
$len++;
}
});

#v-

And the results on my machine:

Benchmark: timing 100000 iterations of Array_Indices, Length_compare, ORing, Sorting, XORing...
Array_Indices: 3 wallclock secs ( 4.38 usr + 0.01 sys = 4.39 CPU) @ 22779.04/s (n=100000)
Length_compare: 4 wallclock secs ( 3.57 usr + 0.02 sys = 3.59 CPU) @ 27855.15/s (n=100000)
ORing: 4 wallclock secs ( 3.70 usr + 0.01 sys = 3.71 CPU) @ 26954.18/s (n=100000)
Sorting: 6 wallclock secs ( 7.01 usr + 0.03 sys = 7.04 CPU) @ 14204.55/s (n=100000)
XORing: 4 wallclock secs ( 3.68 usr + 0.03 sys = 3.71 CPU) @ 26954.18/s (n=100000)


Same snippets as above and 'cmpthese':

Rate Sorting Array_Indices ORing XORing Length_compare
Sorting 14306/s -- -37% -47% -47% -49%
Array_Indices 22883/s 60% -- -15% -16% -18%
ORing 26954/s 88% 18% -- -1% -4%
XORing 27174/s 90% 19% 1% -- -3%
Length_compare 27933/s 95% 22% 4% 3% --


Quite suprisingly for me the 'Array_Indeces' method runs slower than the
'Length_compare' one which obviously the fastets.

No that surprising is the 'Sorting' case.

And again OR/XOR quite fast although the need to scan each string char by char.
(But, isn't this needed to compute the strings' length with 'length()' anyway?)

Well, all in all not quite the results I expected.



Greets, Rob
 
R

Robert Manea

Segfault in module "Robert Manea" - dump details are as follows:
Segfault in module "Uri Guttman" - dump details are as follows:
[...]
for your education benchmark them (on a large hash) and post the
results.
Well, here it goes...
The test code:

use strict;
use warnings;
use Benchmark qw:)all);
our %category;
for( 1 .. 5000 ) {
my $rand_key = 'x' x (int(rand(30)) + 1);
$category{$rand_key} = 1;
}

General brains overload. Please use this instead of the (sensless) above
code:

our %category;
my $cnt;
for( 0 .. 1000 ) {
my $rand_key = $cnt++ . 'x' x (int(rand(30)) + 1);
$category{$rand_key} = 1;
}

[...]
And the results on my machine:
[ Now only 5000 iterations have been made ]

Benchmark: timing 5000 iterations of Array_Indices, Length_compare, ORing, Sorting, XORing...
Array_Indices: 7 wallclock secs ( 6.25 usr + 0.05 sys = 6.30 CPU) @ 793.65/s (n=5000)
Length_compare: 6 wallclock secs ( 5.33 usr + 0.03 sys = 5.36 CPU) @ 932.84/s (n=5000)
ORing: 7 wallclock secs ( 5.80 usr + 0.04 sys = 5.84 CPU) @ 856.16/s (n=5000)
Sorting: 24 wallclock secs (22.79 usr + 0.14 sys = 22.93 CPU) @ 218.05/s (n=5000)
XORing: 6 wallclock secs ( 5.73 usr + 0.03 sys = 5.76 CPU) @ 868.06/s (n=5000)
Same snippets as above and 'cmpthese':

Rate Sorting XORing Array_Indices ORing Length_compare
Sorting 219/s -- -75% -75% -75% -76%
XORing 864/s 294% -- -1% -2% -7%
Array_Indices 873/s 298% 1% -- -1% -6%
ORing 877/s 301% 2% 1% -- -6%
Length_compare 931/s 325% 8% 7% 6% --

Greets, Rob



Greets, Rob
 
P

Peter Corlett

Uri Guttman said:
should be very slow as it has to copy and or each of the keys and
you said this is large hash. the solution i posted aliases to each
key (so no copying) and length and max are fast.

It turns out that copying in Perl isn't always as expensive as you
might think. What was that saying about premature optimisation? ;)
for your education benchmark them (on a large hash) and post the
results.

#!/usr/bin/perl -w
use strict;

use Benchmark ':all';

foreach my $keys (2, 10, 100, 1000, 1e4) {
srand 1;
my %hash;
for (1..$keys) {
my $key=join '',map { chr(32+rand 95) } (0..rand 40);
$hash{$key}++;
}
print "\nFor $keys keys\n\n";
cmpthese(1e6/$keys, {
or_keys => sub {
my $l;
$l|=$_ foreach keys %hash; $l=1+length $l;
},
or_each => sub {
my($l, $k);
$l|=$k while($k=each %hash); $l=1+length $l;
},
uri_keys => sub {
my @arr; for ( keys %hash ) {$arr[length $_] = defined;};
my $l = $#arr+1;
}
});
}
__END__

And its output (on a 400MHz PII):

For 2 keys

Rate uri_keys or_keys or_each
uri_keys 27778/s -- -26% -55%
or_keys 37707/s 36% -- -39%
or_each 61728/s 122% 64% --

For 10 keys

Rate uri_keys or_keys or_each
uri_keys 11669/s -- -37% -42%
or_keys 18450/s 58% -- -8%
or_each 20000/s 71% 8% --

For 100 keys

Rate uri_keys or_each or_keys
uri_keys 2004/s -- -15% -25%
or_each 2370/s 18% -- -12%
or_keys 2681/s 34% 13% --

For 1000 keys

Rate or_each uri_keys or_keys
or_each 241/s -- -7% -12%
uri_keys 260/s 8% -- -5%
or_keys 275/s 14% 6% --

For 10000 keys

Rate or_keys uri_keys or_each
or_keys 21.8/s -- -2% -2%
uri_keys 22.3/s 2% -- -0%
or_each 22.3/s 2% 0% --


It appears that my cute golfing exercise/hack is actually faster than
your array-based approach for small hashes, but you win on larger
hashes. I'm also somewhat surprised that an each-based approach is
slower on a relatively-large 1000 key hash.
 
T

Tad McClellan

Peter Corlett said:
If we're playing golf, I'm going to offer:

my $l;
$l|=$_ foreach keys %category; $l=1+length $l;


If you're playing golf you never use "foreach",
it costs 4 strokes for no gain.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top