[please don't quote .sigs]
Ok so I looked at the perldoc's and I see this (which looks the best
option for me, I think)
"undef %saw;
This is not necessary. What is necessary, though, is to declare the
variable:
my %saw;
@saw{@in}=();
@out=sort keys %saw;"
If you don't mind could you explain to me how this works?
my %saw;
creates a new (empty) hash called $saw.
@saw{@in} = ();
creates entries in %saw for everything in @in, with no values. How this
works is rather subtle, a combination of several effects. The first is
hash slices:
my %h = (a => 'b', c => 'd', e => 'f');
print @h{'a', 'c'};
This prints 'ad'. The @ sigil says that the result is a list. The {...}
says that the variable is a hash. The output is the list of values
corresponding to the given keys.
The second is autovivification[0]: this is a fancy way of saying 'if you
pretend a hash key/array index exists, Perl will create it for you'. So
you can say
my %h;
$h{a} = 'b';
and this creates the $h{a} key for you. Combining this with hash slices
gives
my %h;
@h{'a', 'b'} = ('c', 'd');
which is the same as
my %h = (a => 'c', b => 'd');
Finally, you are assigning the empty list, so the keys are just created
without undef for the values.
@out = sort keys %saw;
retrieves the list of keys, sorts it and puts in into @out (which also
should have been declared).
[0] OK, this isn't quite what is meant by autoviv.. that is more like
'if you pretend a hash key exists and is a reference, Perl will make it
so'. It's closely related, though.
and if possible, and if you have time how I can use this with
a structure like this
"$HOA_protein[$pro_name]=[@pep];"
[FWIW, it's rather confusing to quote Perl code with "". I generally
indent it; another thing some people do is quote it with C<> like in
POD.]
[Given that @HOH_protein is an array, therefore $pro_name is a number,
it would be better called $pro_num or some such. Or did you mean
$HOA_protein{$pro_name} = [@pep];
? If so you'll need to adapt the below; replacing @HOA_protein with
values %HOA_protein throughout should do.]
[All code below is untested. Apologies for any silly errors

]
You need to explain a bit more what you are trying to do here. Assuming
the above statement is in some kind of loop, you will end up with a
different arrayref every time ([] always creates a new arrayref) so the
method above will not find any duplicates. Guessing that you want to
find the set of unique lists of values, so that out of
(a, b, c)
(d, e, f)
(a, b, c)
(f, e, d)
you want to keep
(a, b, c)
(d, e, f)
(f, e, d)
there are three ways to proceed. The most simple-minded is to build a
list of unique values by hand:
my @uniqs;
# checks if two arrays are equal (to one level)
# call with to array refs
sub eq_array {
my ($l, $r) = @_;
return unless @$l == @$r;
for (0..$#$l) {
return unless $l->[$_] eq $r->[$_];
}
return 1;
}
PROTIEN: for my $new (@HOA_protein) {
for my $got (@uniqs) {
next PROTEIN if eq_array $new, $got;
}
push @uniqs, $new;
}
There are more compact ways of writing this, with grep or
List::Util::first, but this is probably the most comprehensible for a
Perl newbie. You could also use is_deeply from Test::More. Make sure you
really understand what's going on with references there: until you do,
you'll be pretty stuck with multi-level data structures in Perl.
This is clean and neat, but it gives up using a hash to find uniques,
which means it's probably slow. To use a hash, you need to find a way of
converting your arrays into strings, because hash keys are always
strings. One way is to find a character you *know* won't occur in any of
the values, and join the array with that character. Say you choose \034
(a pretty uncommon character); then you can do it like this
my %uniqs;
for my $prot (@HOA_protein) {
my $string = join "\034", @$prot;
$uniqs{$string} = 1;
}
my @uniqs = keys %uniqs;
or, more compactly and also probably faster
my %uniqs;
@uniqs{ map { join "\034", @$_ } @HOA_protein } = ();
my @uniqs = keys %uniqs;
.. Notice how this looks quite like the extract you started with: the
difference is we need to convert (map) the values in the array to an
appropriate form.
Both these solutions suffer from depending on details of your data
structure. Both assume that what you have is an array of arrays of
strings; the second assumes that none of these strings contain "\034".
The Storable module gives you a way to convert a data structure of
arbitrary complexity into a string, fast, in such a way as to preserve
the whole structure. You can use it like this
use Storable qw/freeze/;
$Storable::canonical = 1; # needed when comparing frozen hashes
my %uniqs;
@uniqs{ map freeze($_), @HOA_protein } = ();
my @uniqs = keys %uniqs;
I'm sorry it's just HOA's and references are still a bit new to me
That's fine. They are quite a complicated topic: once you understand all
the code in this article, you'll be well on yor way to being a decent
Perl programmer. Feel free to ask about anything you can't follow
(although be warned that some of it will take quite a lot of thinking
about, for which there is no substitute).
Ben