Maximum length of hash key

Robert Manea · Dec 8, 2004

Hello,

I'm searching for different code snippets (Golfers welcome) to solve the
following problem:

#v+

%category = {
'Mozilla' => 20,
'Forte Agent' => 12,
'slrn' => 45
};
#v-

Find the longest key (most characters) and return
'$longest_key_length + 1;'.

Since the hash can get pretty big, efficiency should be an issue.

Currently I'm using this:
#v+

my @arr; for ( keys %category ) {$arr[length $_] = defined;};
my $longest_key_length = $#arr+1;

#v-

in favor over the more golfish one liner:

#v+

my $longest_key_length =
1 + length +(sort { length $b <=> length $a } keys %category)[0];

#v-

Any other suggestions are highly appreciated.

Thx & Greets, Rob

Craig Dunn · Dec 8, 2004

Find the longest key (most characters) and return
'$longest_key_length + 1;'.
[snip]

in favor over the more golfish one liner:

my $longest_key_length =
1 + length +(sort { length $b <=> length $a } keys %category)[0];

Or another way:

my $l = (sort{$b<=>$a} map{length($_)} keys(%category))[0]+1;

Craig

Peter Corlett · Dec 8, 2004

Robert Manea said:
I'm searching for different code snippets (Golfers welcome) to solve the
following problem:

%category = {
'Mozilla' => 20,
'Forte Agent' => 12,
'slrn' => 45
};

I don't think that means what you think it means.

Find the longest key (most characters) and return
'$longest_key_length + 1;'.

If we're playing golf, I'm going to offer:

my $l;
$l|=$_ foreach keys %category; $l=1+length $l;

Paul Lalli · Dec 8, 2004

Robert Manea said:
Hello,

I'm searching for different code snippets (Golfers welcome) to solve the
following problem:

#v+

%category = {
'Mozilla' => 20,
'Forte Agent' => 12,
'slrn' => 45
};
#v-

Do you know you just created a hash with one key - a hash ref - that
contains no value?

Find the longest key (most characters) and return
'$longest_key_length + 1;'.

Since the hash can get pretty big, efficiency should be an issue.

Then sorting is definately not something you want to be doing.

Currently I'm using this:
my @arr; for ( keys %category ) {$arr[length $_] = defined;};

Why 'defined'? Why bother checking the return value of defined($_),
instead of just using 1?

my $longest_key_length = $#arr+1;

in favor over the more golfish one liner:

my $longest_key_length =
1 + length +(sort { length $b <=> length $a } keys %category)[0];

Again, sorting will be a costly operation if the point is just to find
the maximal value. An iterative solution would seem to be preferred:

Any other suggestions are highly appreciated.

%hash = ('foo'=>1, 'foobar'=>2, 'a'=>3, 'asdfasd'=>4, 'ab'=>5);
length($_) > $len and $len = length($_) for keys %hash;
print "Max Length plus 1 = " . ($len + 1) . "\n";

Paul Lalli

Uri Guttman · Dec 8, 2004

CD> [snip]

in favor over the more golfish one liner:

my $longest_key_length =
1 + length +(sort { length $b <=> length $a } keys %category)[0];

Click to expand...

CD> Or another way:

CD> my $l = (sort{$b<=>$a} map{length($_)} keys(%category))[0]+1;

ewww! O(N log N) for an O(N) problem? use List::Util::max

uri

Craig Dunn · Dec 8, 2004

Robert Manea said:
Again, sorting will be a costly operation if the point is just to find
the maximal value. An iterative solution would seem to be preferred:

%hash = ('foo'=>1, 'foobar'=>2, 'a'=>3, 'asdfasd'=>4, 'ab'=>5);
length($_) > $len and $len = length($_) for keys %hash;
print "Max Length plus 1 = " . ($len + 1) . "\n";

Or back to the golf...

$len = length>$len?length:$len for keys %hash;$len++;

Robert Manea · Dec 8, 2004

Segfault in module "Peter Corlett" - dump details are as follows:

I don't think that means what you think it means.

You're pretty much thinking duly. Obviously I meant to mean '(' and ')'.

If we're playing golf, I'm going to offer:

my $l;
$l|=$_ foreach keys %category; $l=1+length $l;

That's a real nice one. Thanks for it.

Greets, Rob

Uri Guttman · Dec 8, 2004

RM> Find the longest key (most characters) and return
RM> '$longest_key_length + 1;'.

RM> Since the hash can get pretty big, efficiency should be an issue.

RM> my @arr; for ( keys %category ) {$arr[length $_] = defined;};
^^^^^^^

what is 'defined' in this case? it may actually work since keys are
always defined and $_ is aliased to each key and defined works on $_ by
default. but that is a FUGLY way to get a simple true value. just use 1.

RM> my $longest_key_length = $#arr+1;

@arr is better.

use List::Util qw( max ) ;

my $max_len = max( map length, keys %category ) + 1 ;

uri

Uri Guttman · Dec 8, 2004

RM> That's a real nice one. Thanks for it.

should be very slow as it has to copy and or each of the keys and you
said this is large hash. the solution i posted aliases to each key (so
no copying) and length and max are fast. for your education benchmark
them (on a large hash) and post the results.

uri

Big and Blue · Dec 8, 2004

Craig said:
Or back to the golf...

$len = length>$len?length:$len for keys %hash;$len++;

Shouldn't that be ... for $len keys ....

And there is this:

$k^=$_ for keys%hash;length($k)+1

However, this needs to read every character of every key - I would
expect the original idea (set an array with length(key) and then check the
length of the array) to be much quicker.

Craig Dunn · Dec 8, 2004

Or back to the golf...

Shouldn't that be ... for $len keys ....

Nope. You're setting $len to $_ if the length of $_ is greater than $len.

$k^=$_ for keys%hash;length($k)+1

Nicer

Robert Manea · Dec 8, 2004

Segfault in module "Uri Guttman" - dump details are as follows:

[...]

for your education benchmark them (on a large hash) and post the
results.

Well, here it goes...

The test code:
#v+

#!/usr/bin/perl
#

use strict;
use warnings;
use Benchmark qw

all);

our %category;
for( 1 .. 5000 ) {
my $rand_key = 'x' x (int(rand(30)) + 1);
$category{$rand_key} = 1;
}

timethese(100000, {
'Array_Indices' => sub {
my @arr;
for ( keys %category ) {$arr[length $_] = 1;}
my $longest_key_length = $#arr+1;
},
'Sorting' => sub {
my $longest_key_length =
1 + length +(sort { length $b <=> length $a } keys %category)[0];
},
'ORing' => sub {
my $longest_key_len;
$longest_key_len |=$_ foreach keys %category;
$longest_key_len = 1 + length $longest_key_len;
},
'XORing' => sub {
my $longest_key_len;
$longest_key_len ^= $_ for keys %category;
$longest_key_len = length($longest_key_len)+1;
},
'Length_compare' => sub {
my $len=0;
$len = length>$len?length:$len for keys %category;
$len++;
}
});

#v-

And the results on my machine:

Benchmark: timing 100000 iterations of Array_Indices, Length_compare, ORing, Sorting, XORing...
Array_Indices: 3 wallclock secs ( 4.38 usr + 0.01 sys = 4.39 CPU) @ 22779.04/s (n=100000)
Length_compare: 4 wallclock secs ( 3.57 usr + 0.02 sys = 3.59 CPU) @ 27855.15/s (n=100000)
ORing: 4 wallclock secs ( 3.70 usr + 0.01 sys = 3.71 CPU) @ 26954.18/s (n=100000)
Sorting: 6 wallclock secs ( 7.01 usr + 0.03 sys = 7.04 CPU) @ 14204.55/s (n=100000)
XORing: 4 wallclock secs ( 3.68 usr + 0.03 sys = 3.71 CPU) @ 26954.18/s (n=100000)

Same snippets as above and 'cmpthese':

Rate Sorting Array_Indices ORing XORing Length_compare
Sorting 14306/s -- -37% -47% -47% -49%
Array_Indices 22883/s 60% -- -15% -16% -18%
ORing 26954/s 88% 18% -- -1% -4%
XORing 27174/s 90% 19% 1% -- -3%
Length_compare 27933/s 95% 22% 4% 3% --

Quite suprisingly for me the 'Array_Indeces' method runs slower than the
'Length_compare' one which obviously the fastets.

No that surprising is the 'Sorting' case.

And again OR/XOR quite fast although the need to scan each string char by char.
(But, isn't this needed to compute the strings' length with 'length()' anyway?)

Well, all in all not quite the results I expected.

uri

Greets, Rob

Robert Manea · Dec 8, 2004

Segfault in module "Robert Manea" - dump details are as follows:

Segfault in module "Uri Guttman" - dump details are as follows:

[...]
for your education benchmark them (on a large hash) and post the
results.

Click to expand...

Well, here it goes...

The test code:

use strict;
use warnings;
use Benchmark qwall);

our %category;
for( 1 .. 5000 ) {
my $rand_key = 'x' x (int(rand(30)) + 1);
$category{$rand_key} = 1;
}

General brains overload. Please use this instead of the (sensless) above
code:

our %category;
my $cnt;
for( 0 .. 1000 ) {
my $rand_key = $cnt++ . 'x' x (int(rand(30)) + 1);
$category{$rand_key} = 1;
}

[...]

And the results on my machine:

[ Now only 5000 iterations have been made ]

Benchmark: timing 5000 iterations of Array_Indices, Length_compare, ORing, Sorting, XORing...
Array_Indices: 7 wallclock secs ( 6.25 usr + 0.05 sys = 6.30 CPU) @ 793.65/s (n=5000)
Length_compare: 6 wallclock secs ( 5.33 usr + 0.03 sys = 5.36 CPU) @ 932.84/s (n=5000)
ORing: 7 wallclock secs ( 5.80 usr + 0.04 sys = 5.84 CPU) @ 856.16/s (n=5000)
Sorting: 24 wallclock secs (22.79 usr + 0.14 sys = 22.93 CPU) @ 218.05/s (n=5000)
XORing: 6 wallclock secs ( 5.73 usr + 0.03 sys = 5.76 CPU) @ 868.06/s (n=5000)

Same snippets as above and 'cmpthese':

Rate Sorting XORing Array_Indices ORing Length_compare
Sorting 219/s -- -75% -75% -75% -76%
XORing 864/s 294% -- -1% -2% -7%
Array_Indices 873/s 298% 1% -- -1% -6%
ORing 877/s 301% 2% 1% -- -6%
Length_compare 931/s 325% 8% 7% 6% --

Greets, Rob

Greets, Rob

Peter Corlett · Dec 9, 2004

Uri Guttman said:
should be very slow as it has to copy and or each of the keys and
you said this is large hash. the solution i posted aliases to each
key (so no copying) and length and max are fast.

It turns out that copying in Perl isn't always as expensive as you
might think. What was that saying about premature optimisation?

for your education benchmark them (on a large hash) and post the
results.

#!/usr/bin/perl -w
use strict;

use Benchmark ':all';

foreach my $keys (2, 10, 100, 1000, 1e4) {
srand 1;
my %hash;
for (1..$keys) {
my $key=join '',map { chr(32+rand 95) } (0..rand 40);
$hash{$key}++;
}
print "\nFor $keys keys\n\n";
cmpthese(1e6/$keys, {
or_keys => sub {
my $l;
$l|=$_ foreach keys %hash; $l=1+length $l;
},
or_each => sub {
my($l, $k);
$l|=$k while($k=each %hash); $l=1+length $l;
},
uri_keys => sub {
my @arr; for ( keys %hash ) {$arr[length $_] = defined;};
my $l = $#arr+1;
}
});
}
__END__

And its output (on a 400MHz PII):

For 2 keys

Rate uri_keys or_keys or_each
uri_keys 27778/s -- -26% -55%
or_keys 37707/s 36% -- -39%
or_each 61728/s 122% 64% --

For 10 keys

Rate uri_keys or_keys or_each
uri_keys 11669/s -- -37% -42%
or_keys 18450/s 58% -- -8%
or_each 20000/s 71% 8% --

For 100 keys

Rate uri_keys or_each or_keys
uri_keys 2004/s -- -15% -25%
or_each 2370/s 18% -- -12%
or_keys 2681/s 34% 13% --

For 1000 keys

Rate or_each uri_keys or_keys
or_each 241/s -- -7% -12%
uri_keys 260/s 8% -- -5%
or_keys 275/s 14% 6% --

For 10000 keys

Rate or_keys uri_keys or_each
or_keys 21.8/s -- -2% -2%
uri_keys 22.3/s 2% -- -0%
or_each 22.3/s 2% 0% --

It appears that my cute golfing exercise/hack is actually faster than
your array-based approach for small hashes, but you win on larger
hashes. I'm also somewhat surprised that an each-based approach is
slower on a relatively-large 1000 key hash.

Tad McClellan · Dec 9, 2004

Peter Corlett said:
If we're playing golf, I'm going to offer:

my $l;
$l|=$_ foreach keys %category; $l=1+length $l;

If you're playing golf you never use "foreach",
it costs 4 strokes for no gain.

Maximum length of array#hash	6	Oct 7, 2010
Hash key types and equality of hash keys	2	Mar 1, 2012
How do I get address of scalars?	1	Jan 13, 2013
Subclassing Hash to enforce value uniqueness ala key uniqueness.	5	Nov 18, 2008
Hash of Hashes	5	Jan 25, 2007
Hash of hashes	5	Apr 10, 2006
localize values of an object's hash key	3	Jul 21, 2003
Facile user-agent statistics tool	10	Dec 9, 2004

Maximum length of hash key

Robert Manea

Craig Dunn

Peter Corlett

Paul Lalli

Uri Guttman

Craig Dunn

Robert Manea

Uri Guttman

Uri Guttman

Big and Blue

Craig Dunn

Robert Manea

Robert Manea

Peter Corlett

Tad McClellan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads