J

#### John

HI,

I have this.

@arr = qw(ab a b ab ba ba ab ba);

and I would like to classify the index with the values:

@output = ([0, 3, 6],
[1],
[2],
[4, 5, 7]
);
or in hash

%output = (
0 => [0, 3, 6],
1 => [1],
2 => [2],
4 => [4, 5, 7]
);

on this situation I don't know which approach is the best the array
one or the hash, in order to classify then and also in performance.

What will be you approach?

J

S

#### Sam Holden

HI,

I have this.

@arr = qw(ab a b ab ba ba ab ba);

and I would like to classify the index with the values:

@output = ([0, 3, 6],
[1],
[2],
[4, 5, 7]
);
or in hash

%output = (
0 => [0, 3, 6],
1 => [1],
2 => [2],
4 => [4, 5, 7]
);

on this situation I don't know which approach is the best the array
one or the hash, in order to classify then and also in performance.

It depends on what you want to do with it, and how big the data
will be, and numerous other factors.

For example, if you want to know what the nth unique element is
(where, for example, the 4th unique element in that example is 'ba'),
then the array will allow you to do so in constant time:

\$arr[\$output[\$i][0]]

While the hash will require O(NlogN) time (you'd have to sort the keys).

Then again, if you know the index of the first occurance of an item you
wish to find the list of occurances for (or want to know if that index
is the first occurance of the item), then the hash approach will
allow you to do so in constant time, while the array approach will
require O(logN).

So performance depends on the operations you need (as always).

T

John said:
and I would like to classify the index with the values:
What will be you approach?

My approach would depend on how I plan to _use_ the data structures,
which you have not shared with us...

If performance really matters (it probably doesn't), then try it
both ways and benchmark them.

------------------------
#!/usr/bin/perl
use strict;
use warnings;
use Data:umper;

my @arr = qw(ab a b ab ba ba ab ba);

my @array = by_array(@arr);
print Dumper \@array;

my %hash = by_hash(@arr);
print Dumper \%hash;

sub by_array {
my @array;

my %seen;
foreach my \$i ( 0 .. \$#_ ) {
\$seen{\$_[\$i]} = \$i unless exists \$seen{\$_[\$i]};
push @{ \$array[\$seen{\$_[\$i]}] }, \$i;
}

return @array;
}

sub by_hash {
my %hash;

my %seen;
foreach my \$i ( 0 .. \$#_ ) {
\$seen{\$_[\$i]} = \$i unless exists \$seen{\$_[\$i]};
push @{ \$hash{\$seen{\$_[\$i]}} }, \$i;
}

return %hash;
}