Efficiently de-duping an array

Dan Otterburn · Aug 24, 2007

I have an array of a number of items, some of which are duplicates. I
need to "de-dupe" the array, keeping the item with the lowest index.

my @fruits = qw(
apple
apple
pear
banana
pear
apple
banana
plum
plum
apple
plum
peach
kiwi
pear
plum
banana
cherry
);

The "apple" I want is $fruits[0], the "pear" $fruits[2] etc...

My current solution is:

my @fruits_deduped;
while (my $fruit = pop @fruits) {
next if grep { $_ eq $fruit } @fruits;
push @fruits_deduped, $fruit;
}
@fruits = reverse @fruits_deduped;

This seems to be a lot of work, is there a better way to do this?

Gunnar Hjalmarsson · Aug 24, 2007

Dan said:
I have an array of a number of items, some of which are duplicates. I
need to "de-dupe" the array, keeping the item with the lowest index.

My current solution is:

my @fruits_deduped;
while (my $fruit = pop @fruits) {
next if grep { $_ eq $fruit } @fruits;
push @fruits_deduped, $fruit;
}
@fruits = reverse @fruits_deduped;

This seems to be a lot of work, is there a better way to do this?

Use a hash.

my ( @fruits_deduped, %seen );
while ( my $fruit = shift @fruits ) {
push @fruits_deduped, $fruit unless $seen{$fruit}++;
}

See also "perldoc -q duplicate".

Dan Otterburn · Aug 24, 2007

Use a hash.

my ( @fruits_deduped, %seen );
while ( my $fruit = shift @fruits ) {
push @fruits_deduped, $fruit unless $seen{$fruit}++;
}

Many thanks. Just to clarify my understanding: this works because
"unless" binds tighter than "++" so $seen{$fruit} - on the first pass for
each different $fruit - isn't auto-vivified until *after* "unless" has
tested? i.e. it is short-hand for:

while ( my $fruit = shift @fruits ) {
if ( !$seen{$fruit} ) {
push @fruits_deduped, $fruit;
$seen{$fruit} += 1;
}
}

See also "perldoc -q duplicate".

Apologies, I should have been able to find this without posting.

Tad McClellan · Aug 24, 2007

Dan Otterburn said:
I have an array of a number of items, some of which are duplicates. I
need to "de-dupe" the array,

Your Question is Asked Frequently:

perldoc -q duplicate

How can I remove duplicate elements from a list or array?

Gunnar Hjalmarsson · Aug 24, 2007

Dan said:
Many thanks. Just to clarify my understanding: this works because
"unless" binds tighter than "++" so $seen{$fruit} - on the first pass for
each different $fruit - isn't auto-vivified until *after* "unless" has
tested?

Well, it's rather about what $seen{$fruit}++ _returns_; please read
about auto-increment in "perldoc perlop".

i.e. it is short-hand for:

while ( my $fruit = shift @fruits ) {
if ( !$seen{$fruit} ) {
push @fruits_deduped, $fruit;
$seen{$fruit} += 1;
}
}

Yes, almost. (Unlike my code, your code doesn't keep incrementing the
hash values.)

Apologies, I should have been able to find this without posting.

Yes. ;-) Apology accepted.

Tad McClellan · Aug 24, 2007

Dan Otterburn said:
Many thanks. Just to clarify my understanding: this works because
"unless" binds tighter than "++"

^^^^^^^^^^^^^

"unless" is not an operator, so talking about its precedence makes
no sense.

each different $fruit - isn't auto-vivified until *after* "unless" has
tested?

That part is accurate though.

i.e. it is short-hand for:

while ( my $fruit = shift @fruits ) {
if ( !$seen{$fruit} ) {
push @fruits_deduped, $fruit;
$seen{$fruit} += 1;
}
}

....then do it with a grep():

my %seen;
@fruits = grep !$seen{$_}++, @fruits;

And it even reads kind of Englishy "grep not seen fruits"

Dan Otterburn · Aug 24, 2007

Your Question is Asked Frequently:

Thanks to both of you for being gentle and taking the time to answer
(and explain the answer to) a question that should never have been
asked.

We learn by our mistakes - and I have made plenty here - so, if it is
any consolation, I have learnt more than I would have done had I found
the FAQ in the first place. I will endeavour not to make the same
mistakes twice!

Dr.Ruud · Aug 24, 2007

Dan Otterburn schreef:

We learn by our mistakes - and I have made plenty here - so, if it is
any consolation, I have learnt more than I would have done had I found
the FAQ in the first place. I will endeavour not to make the same
mistakes twice!

don't_be_too_embarassed() if $seen($mistake}++;

Tad McClellan · Aug 25, 2007

Dan Otterburn said:
Thanks to both of you for being gentle and taking the time to answer
(and explain the answer to) a question that should never have been
asked.

We learn by our mistakes - and I have made plenty here - so, if it is
any consolation, I have learnt more than I would have done had I found
the FAQ in the first place. I will endeavour not to make the same
mistakes twice!

Making mistakes in a public forum is a very good way to "internalize"
a lesson. You're not likely to forget what has been learned.

I've "internalized" a bunch of stuff myself.

counting word occurances	22	Jun 3, 2005
[QUIZ] Encyclopedia Construction (#205)	4	May 15, 2009
associating an array to multiple text fields in a form	1	Jul 2, 2007
FAQ 4.42 How can I tell whether a certain element is contained in a list or array?	0	Feb 8, 2011
lost in an array of array references...	2	Mar 7, 2006
randomly choose some uniq elements of an array	18	Jan 19, 2006
perl script to generate server round-robin assignments	3	Aug 29, 2009
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006

Efficiently de-duping an array

Dan Otterburn

Gunnar Hjalmarsson

Dan Otterburn

Tad McClellan

Gunnar Hjalmarsson

Tad McClellan

Dan Otterburn

Dr.Ruud

Tad McClellan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads