Efficiently de-duping an array

Discussion in 'Perl Misc' started by Dan Otterburn, Aug 24, 2007.

  1. I have an array of a number of items, some of which are duplicates. I
    need to "de-dupe" the array, keeping the item with the lowest index.

    my @fruits = qw(
    apple
    apple
    pear
    banana
    pear
    apple
    banana
    plum
    plum
    apple
    plum
    peach
    kiwi
    pear
    plum
    banana
    cherry
    );

    The "apple" I want is $fruits[0], the "pear" $fruits[2] etc...

    My current solution is:

    my @fruits_deduped;
    while (my $fruit = pop @fruits) {
    next if grep { $_ eq $fruit } @fruits;
    push @fruits_deduped, $fruit;
    }
    @fruits = reverse @fruits_deduped;

    This seems to be a lot of work, is there a better way to do this?

    --
    Dan Otterburn <>
     
    Dan Otterburn, Aug 24, 2007
    #1
    1. Advertising

  2. Dan Otterburn wrote:
    > I have an array of a number of items, some of which are duplicates. I
    > need to "de-dupe" the array, keeping the item with the lowest index.


    <snip>

    > My current solution is:
    >
    > my @fruits_deduped;
    > while (my $fruit = pop @fruits) {
    > next if grep { $_ eq $fruit } @fruits;
    > push @fruits_deduped, $fruit;
    > }
    > @fruits = reverse @fruits_deduped;
    >
    > This seems to be a lot of work, is there a better way to do this?


    Use a hash.

    my ( @fruits_deduped, %seen );
    while ( my $fruit = shift @fruits ) {
    push @fruits_deduped, $fruit unless $seen{$fruit}++;
    }

    See also "perldoc -q duplicate".

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Aug 24, 2007
    #2
    1. Advertising

  3. On Fri, 24 Aug 2007 02:42:03 +0200, Gunnar Hjalmarsson wrote:

    > Use a hash.
    >
    > my ( @fruits_deduped, %seen );
    > while ( my $fruit = shift @fruits ) {
    > push @fruits_deduped, $fruit unless $seen{$fruit}++;
    > }


    Many thanks. Just to clarify my understanding: this works because
    "unless" binds tighter than "++" so $seen{$fruit} - on the first pass for
    each different $fruit - isn't auto-vivified until *after* "unless" has
    tested? i.e. it is short-hand for:

    while ( my $fruit = shift @fruits ) {
    if ( !$seen{$fruit} ) {
    push @fruits_deduped, $fruit;
    $seen{$fruit} += 1;
    }
    }

    > See also "perldoc -q duplicate".


    Apologies, I should have been able to find this without posting.

    --
    Dan Otterburn <>
     
    Dan Otterburn, Aug 24, 2007
    #3
  4. Dan Otterburn <> wrote:

    > I have an array of a number of items, some of which are duplicates. I
    > need to "de-dupe" the array,



    Your Question is Asked Frequently:

    perldoc -q duplicate

    How can I remove duplicate elements from a list or array?


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad McClellan, Aug 24, 2007
    #4
  5. Dan Otterburn wrote:
    > On Fri, 24 Aug 2007 02:42:03 +0200, Gunnar Hjalmarsson wrote:
    >> Use a hash.
    >>
    >> my ( @fruits_deduped, %seen );
    >> while ( my $fruit = shift @fruits ) {
    >> push @fruits_deduped, $fruit unless $seen{$fruit}++;
    >> }

    >
    > Many thanks. Just to clarify my understanding: this works because
    > "unless" binds tighter than "++" so $seen{$fruit} - on the first pass for
    > each different $fruit - isn't auto-vivified until *after* "unless" has
    > tested?


    Well, it's rather about what $seen{$fruit}++ _returns_; please read
    about auto-increment in "perldoc perlop".

    > i.e. it is short-hand for:
    >
    > while ( my $fruit = shift @fruits ) {
    > if ( !$seen{$fruit} ) {
    > push @fruits_deduped, $fruit;
    > $seen{$fruit} += 1;
    > }
    > }


    Yes, almost. (Unlike my code, your code doesn't keep incrementing the
    hash values.)

    >> See also "perldoc -q duplicate".

    >
    > Apologies, I should have been able to find this without posting.


    Yes. ;-) Apology accepted.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Aug 24, 2007
    #5
  6. Dan Otterburn <> wrote:
    > On Fri, 24 Aug 2007 02:42:03 +0200, Gunnar Hjalmarsson wrote:
    >
    >> Use a hash.
    >>
    >> my ( @fruits_deduped, %seen );
    >> while ( my $fruit = shift @fruits ) {
    >> push @fruits_deduped, $fruit unless $seen{$fruit}++;
    >> }

    >
    > Many thanks. Just to clarify my understanding: this works because
    > "unless" binds tighter than "++"

    ^^^^^^^^^^^^^

    "unless" is not an operator, so talking about its precedence makes
    no sense.


    > each different $fruit - isn't auto-vivified until *after* "unless" has
    > tested?



    That part is accurate though.


    > i.e. it is short-hand for:
    >
    > while ( my $fruit = shift @fruits ) {
    > if ( !$seen{$fruit} ) {
    > push @fruits_deduped, $fruit;
    > $seen{$fruit} += 1;
    > }
    > }
    >
    >> See also "perldoc -q duplicate".



    ....then do it with a grep():

    my %seen;
    @fruits = grep !$seen{$_}++, @fruits;

    And it even reads kind of Englishy "grep not seen fruits" :)


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad McClellan, Aug 24, 2007
    #6
  7. On 24 Aug, 02:31, Tad McClellan <> wrote:

    > Your Question is Asked Frequently:


    Thanks to both of you for being gentle and taking the time to answer
    (and explain the answer to) a question that should never have been
    asked.

    We learn by our mistakes - and I have made plenty here - so, if it is
    any consolation, I have learnt more than I would have done had I found
    the FAQ in the first place. I will endeavour not to make the same
    mistakes twice!
     
    Dan Otterburn, Aug 24, 2007
    #7
  8. Dan Otterburn

    Dr.Ruud Guest

    Dan Otterburn schreef:

    > We learn by our mistakes - and I have made plenty here - so, if it is
    > any consolation, I have learnt more than I would have done had I found
    > the FAQ in the first place. I will endeavour not to make the same
    > mistakes twice!


    don't_be_too_embarassed() if $seen($mistake}++;

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Aug 24, 2007
    #8
  9. Dan Otterburn <> wrote:
    > On 24 Aug, 02:31, Tad McClellan <> wrote:
    >
    >> Your Question is Asked Frequently:

    >
    > Thanks to both of you for being gentle and taking the time to answer
    > (and explain the answer to) a question that should never have been
    > asked.
    >
    > We learn by our mistakes - and I have made plenty here - so, if it is
    > any consolation, I have learnt more than I would have done had I found
    > the FAQ in the first place. I will endeavour not to make the same
    > mistakes twice!



    Making mistakes in a public forum is a very good way to "internalize"
    a lesson. You're not likely to forget what has been learned.

    I've "internalized" a bunch of stuff myself. :)


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad McClellan, Aug 25, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Adam Hartshorne
    Replies:
    7
    Views:
    439
    Ivan Vecerina
    Feb 21, 2005
  2. b83503104
    Replies:
    3
    Views:
    3,956
    Al Bowers
    May 21, 2004
  3. Frederick Gotham
    Replies:
    1
    Views:
    314
    Pete Becker
    Nov 15, 2006
  4. Thomas Jollans
    Replies:
    5
    Views:
    1,721
    Thomas Jollans
    Jun 14, 2010
  5. Trans

    Duping a class causes error

    Trans, Jan 27, 2005, in forum: Ruby
    Replies:
    21
    Views:
    265
    Trans
    Jan 28, 2005
Loading...

Share This Page