HOA redundancy in array.

P

PB0711

Hello,
So I have an Hash of Arrays and I would like to remove all of the
redundancy from the array.
So I did this
"for my $i (0 .. $#{HOA_protein{$pro_name}}){
if ($HOA_protein{$pro_name}[$i] eq
$HOA_protein{$pro_name}[$i+1]){
$HOA_protein{$pro_name}[$i] = undef;
}
}"
Please excuse any grammer errors or misspellings I rewrote this on
windows from Linux, network problems today :(
I was wondering if anyone knows of a CPAN module that I can wave a
magic wand with and get ride of it.
It could be that the piece's in the array are not the same, only by eye
do they look the same. But the HOA was generated by RE's so.....
thanks for any help.
PB
 
B

Ben Morrow

Quoth "PB0711 said:
Hello,
So I have an Hash of Arrays and I would like to remove all of the
redundancy from the array.
So I did this
"for my $i (0 .. $#{HOA_protein{$pro_name}}){
if ($HOA_protein{$pro_name}[$i] eq
$HOA_protein{$pro_name}[$i+1]){
$HOA_protein{$pro_name}[$i] = undef;
}
}"
Please excuse any grammer errors or misspellings I rewrote this on
windows from Linux, network problems today :(
I was wondering if anyone knows of a CPAN module that I can wave a
magic wand with and get ride of it.

List::MoreUtils::uniq

Ben
 
J

John Bokma

PB0711 said:
Hello,
So I have an Hash of Arrays and I would like to remove all of the
redundancy from the array.
So I did this
"for my $i (0 .. $#{HOA_protein{$pro_name}}){
if ($HOA_protein{$pro_name}[$i] eq
$HOA_protein{$pro_name}[$i+1]){
$HOA_protein{$pro_name}[$i] = undef;
}
}"


If your problem is: how to remove duplicates from an array, see:

perldoc -q duplicate
 
J

John Bokma

PB0711 said:
Thank you, sorry didn't look at manpages :|

Most of the times it's the first place you should look :-D Also because
that information is probably the most accurate you can find, and most
often experienced people here are going to refer to it anyway (give
someone a fish v.s. learn him/her how to fish :-D ).
 
P

PB0711

John said:
Most of the times it's the first place you should look :-D Also because
that information is probably the most accurate you can find, and most
often experienced people here are going to refer to it anyway (give
someone a fish v.s. learn him/her how to fish :-D ).

Ok so I looked at the perldoc's and I see this (which looks the best
option for me, I think)
"undef %saw;
@saw{@in}=();
@out=sort keys %saw;"

If you don't mind could you explain to me how this works?
and if possible, and if you have time how I can use this with
a structure like this
"$HOA_protein[$pro_name]=[@pep];"
I'm sorry it's just HOA's and references are still a bit new to me :)
 
B

Ben Morrow

[please don't quote .sigs]
Ok so I looked at the perldoc's and I see this (which looks the best
option for me, I think)
"undef %saw;

This is not necessary. What is necessary, though, is to declare the
variable:

my %saw;
@saw{@in}=();
@out=sort keys %saw;"

If you don't mind could you explain to me how this works?

my %saw;

creates a new (empty) hash called $saw.

@saw{@in} = ();

creates entries in %saw for everything in @in, with no values. How this
works is rather subtle, a combination of several effects. The first is
hash slices:

my %h = (a => 'b', c => 'd', e => 'f');
print @h{'a', 'c'};

This prints 'ad'. The @ sigil says that the result is a list. The {...}
says that the variable is a hash. The output is the list of values
corresponding to the given keys.

The second is autovivification[0]: this is a fancy way of saying 'if you
pretend a hash key/array index exists, Perl will create it for you'. So
you can say

my %h;
$h{a} = 'b';

and this creates the $h{a} key for you. Combining this with hash slices
gives

my %h;
@h{'a', 'b'} = ('c', 'd');

which is the same as

my %h = (a => 'c', b => 'd');

Finally, you are assigning the empty list, so the keys are just created
without undef for the values.

@out = sort keys %saw;

retrieves the list of keys, sorts it and puts in into @out (which also
should have been declared).

[0] OK, this isn't quite what is meant by autoviv.. that is more like
'if you pretend a hash key exists and is a reference, Perl will make it
so'. It's closely related, though.
and if possible, and if you have time how I can use this with
a structure like this
"$HOA_protein[$pro_name]=[@pep];"

[FWIW, it's rather confusing to quote Perl code with "". I generally
indent it; another thing some people do is quote it with C<> like in
POD.]

[Given that @HOH_protein is an array, therefore $pro_name is a number,
it would be better called $pro_num or some such. Or did you mean

$HOA_protein{$pro_name} = [@pep];

? If so you'll need to adapt the below; replacing @HOA_protein with
values %HOA_protein throughout should do.]

[All code below is untested. Apologies for any silly errors :)]

You need to explain a bit more what you are trying to do here. Assuming
the above statement is in some kind of loop, you will end up with a
different arrayref every time ([] always creates a new arrayref) so the
method above will not find any duplicates. Guessing that you want to
find the set of unique lists of values, so that out of

(a, b, c)
(d, e, f)
(a, b, c)
(f, e, d)

you want to keep

(a, b, c)
(d, e, f)
(f, e, d)

there are three ways to proceed. The most simple-minded is to build a
list of unique values by hand:

my @uniqs;

# checks if two arrays are equal (to one level)
# call with to array refs

sub eq_array {
my ($l, $r) = @_;
return unless @$l == @$r;
for (0..$#$l) {
return unless $l->[$_] eq $r->[$_];
}
return 1;
}

PROTIEN: for my $new (@HOA_protein) {
for my $got (@uniqs) {
next PROTEIN if eq_array $new, $got;
}
push @uniqs, $new;
}

There are more compact ways of writing this, with grep or
List::Util::first, but this is probably the most comprehensible for a
Perl newbie. You could also use is_deeply from Test::More. Make sure you
really understand what's going on with references there: until you do,
you'll be pretty stuck with multi-level data structures in Perl.

This is clean and neat, but it gives up using a hash to find uniques,
which means it's probably slow. To use a hash, you need to find a way of
converting your arrays into strings, because hash keys are always
strings. One way is to find a character you *know* won't occur in any of
the values, and join the array with that character. Say you choose \034
(a pretty uncommon character); then you can do it like this

my %uniqs;

for my $prot (@HOA_protein) {
my $string = join "\034", @$prot;
$uniqs{$string} = 1;
}

my @uniqs = keys %uniqs;

or, more compactly and also probably faster

my %uniqs;
@uniqs{ map { join "\034", @$_ } @HOA_protein } = ();
my @uniqs = keys %uniqs;

.. Notice how this looks quite like the extract you started with: the
difference is we need to convert (map) the values in the array to an
appropriate form.

Both these solutions suffer from depending on details of your data
structure. Both assume that what you have is an array of arrays of
strings; the second assumes that none of these strings contain "\034".
The Storable module gives you a way to convert a data structure of
arbitrary complexity into a string, fast, in such a way as to preserve
the whole structure. You can use it like this

use Storable qw/freeze/;

$Storable::canonical = 1; # needed when comparing frozen hashes
my %uniqs;
@uniqs{ map freeze($_), @HOA_protein } = ();
my @uniqs = keys %uniqs;
I'm sorry it's just HOA's and references are still a bit new to me :)

That's fine. They are quite a complicated topic: once you understand all
the code in this article, you'll be well on yor way to being a decent
Perl programmer. Feel free to ask about anything you can't follow
(although be warned that some of it will take quite a lot of thinking
about, for which there is no substitute).

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top