deleting duplicates in array using references

Discussion in 'Perl Misc' started by billb, Jul 30, 2007.

  1. billb

    billb Guest

    i have a multidimensional array, but i want to delete duplicate
    entries based on the first element of each 'row'.

    my array is:

    my @array = ( [UK9004411, A140, B, 0.040] , [UK0030239, H7140, H,
    0.030] , [UK0030239, S1393, M1, 0.030] , [UK0012821, H4030, H,
    0.010] , [UK0012821, H4060, H, 0.010] );

    and I want to end up with
    ( [UK9004411, A140, B, 0.040] , [UK0030239, H7140, H, 0.030] ,
    [UK0012821, H4030, H, 0.010] )

    (no real preference in which row is dropped...just on a first come
    first served basis.)

    i.e. take out the duplicate codes based on the first element of each
    row $array[$row] -> [0]

    i looked into splice() function based on the index but not sure this
    is the best way or the syntax for this?

    splice (@array , $row, 1); ?

    thanks.
    billb, Jul 30, 2007
    #1
    1. Advertising

  2. billb

    Paul Lalli Guest

    On Jul 30, 2:37 pm, billb <> wrote:
    > i have a multidimensional array, but i want to delete duplicate
    > entries based on the first element of each 'row'.


    $ perldoc -q duplicate
    Found in /opt2/Perl5_8_4/lib/perl5/5.8.4/pod/perlfaq4.pod
    How can I remove duplicate elements from a list or array?

    > my array is:
    >
    > my @array = ( [UK9004411, A140, B, 0.040] , [UK0030239, H7140, H,
    > 0.030] , [UK0030239, S1393, M1, 0.030] , [UK0012821, H4030, H,
    > 0.010] , [UK0012821, H4060, H, 0.010] );
    >
    > and I want to end up with
    > ( [UK9004411, A140, B, 0.040] , [UK0030239, H7140, H, 0.030] ,
    > [UK0012821, H4030, H, 0.010] )
    >
    > (no real preference in which row is dropped...just on a first come
    > first served basis.)
    >
    > i.e. take out the duplicate codes based on the first element of each
    > row $array[$row] -> [0]


    $ perl -MData::Dumper -e'
    my @array = (
    [UK9004411, A140, B, 0.040] ,
    [UK0030239, H7140, H, 0.030] ,
    [UK0030239, S1393, M1, 0.030] ,
    [UK0012821, H4030, H, 0.010] ,
    [UK0012821, H4060, H, 0.010] ,
    );
    my %seen;
    my @nodups = grep { !$seen{$_->[0]}++ } @array;
    print Dumper(\@nodups);
    '
    $VAR1 = [
    [
    'UK9004411',
    'A140',
    'B',
    '0.04'
    ],
    [
    'UK0030239',
    'H7140',
    'H',
    '0.03'
    ],
    [
    'UK0012821',
    'H4030',
    'H',
    '0.01'
    ]
    ];

    > i looked into splice() function based on the index but not sure this
    > is the best way or the syntax for this?
    >
    > splice (@array , $row, 1); ?


    splice() is fine for removing the elements once you know which ones
    you want to remove, but it's useless for actually finding which
    elements to remove.

    Paul Lalli
    Paul Lalli, Jul 30, 2007
    #2
    1. Advertising

  3. billb

    billb Guest

    On 30 Jul, 19:46, Paul Lalli <> wrote:
    > On Jul 30, 2:37 pm, billb <> wrote:
    >
    > > i have a multidimensional array, but i want to delete duplicate
    > > entries based on the first element of each 'row'.

    >
    > $ perldoc -q duplicate
    > Found in /opt2/Perl5_8_4/lib/perl5/5.8.4/pod/perlfaq4.pod
    > How can I remove duplicate elements from a list or array?
    >
    > > my array is:

    >
    > > my @array = ( [UK9004411, A140, B, 0.040] , [UK0030239, H7140, H,
    > > 0.030] , [UK0030239, S1393, M1, 0.030] , [UK0012821, H4030, H,
    > > 0.010] , [UK0012821, H4060, H, 0.010] );

    >
    > > and I want to end up with
    > > ( [UK9004411, A140, B, 0.040] , [UK0030239, H7140, H, 0.030] ,
    > > [UK0012821, H4030, H, 0.010] )

    >
    > > (no real preference in which row is dropped...just on a first come
    > > first served basis.)

    >
    > > i.e. take out the duplicate codes based on the first element of each
    > > row $array[$row] -> [0]

    >
    > $ perl -MData::Dumper -e'
    > my @array = (
    > [UK9004411, A140, B, 0.040] ,
    > [UK0030239, H7140, H, 0.030] ,
    > [UK0030239, S1393, M1, 0.030] ,
    > [UK0012821, H4030, H, 0.010] ,
    > [UK0012821, H4060, H, 0.010] ,
    > );
    > my %seen;
    > my @nodups = grep { !$seen{$_->[0]}++ } @array;
    > print Dumper(\@nodups);
    > '
    > $VAR1 = [
    > [
    > 'UK9004411',
    > 'A140',
    > 'B',
    > '0.04'
    > ],
    > [
    > 'UK0030239',
    > 'H7140',
    > 'H',
    > '0.03'
    > ],
    > [
    > 'UK0012821',
    > 'H4030',
    > 'H',
    > '0.01'
    > ]
    > ];
    >
    > > i looked into splice() function based on the index but not sure this
    > > is the best way or the syntax for this?

    >
    > > splice (@array , $row, 1); ?

    >
    > splice() is fine for removing the elements once you know which ones
    > you want to remove, but it's useless for actually finding which
    > elements to remove.
    >
    > Paul Lalli


    ah, very simple and very fast as well! I'll have to understand how
    this is working. It uses a hash I see. Many thanks.
    billb, Jul 30, 2007
    #3
  4. billb

    Paul Lalli Guest

    On Jul 30, 5:18 pm, billb <> wrote:
    > On 30 Jul, 19:46, Paul Lalli <> wrote:
    >
    >
    >
    >
    >
    > > On Jul 30, 2:37 pm, billb <> wrote:

    >
    > > > i have a multidimensional array, but i want to delete duplicate
    > > > entries based on the first element of each 'row'.


    > > > my @array = ( [UK9004411, A140, B, 0.040] , [UK0030239, H7140, H,
    > > > 0.030] , [UK0030239, S1393, M1, 0.030] , [UK0012821, H4030, H,
    > > > 0.010] , [UK0012821, H4060, H, 0.010] );

    >
    > > > and I want to end up with
    > > > ( [UK9004411, A140, B, 0.040] , [UK0030239, H7140, H, 0.030] ,
    > > > [UK0012821, H4030, H, 0.010] )


    > > my %seen;
    > > my @nodups = grep { !$seen{$_->[0]}++ } @array;


    > ah, very simple and very fast as well! I'll have to understand how
    > this is working. It uses a hash I see.


    It helps if you expand it out to remove all the "shortcuts"

    my %seen;
    my @nodups;
    foreach my $elem (@array) {
    if (! $seen{$elem->[0]}) {
    push @nodups, $elem;
    }
    $seen{$elem->[0]}++;
    }

    So we're looping through the 2d array, and we check to see if the
    first element of the current array reference has been "seen" yet. If
    not, we add this array reference to our list of no duplicates. Then
    we increment the number of times we've "seen" this element, so that if
    the same element is seen again, we won't add it next time.

    The shortcuts:
    * a foreach-if-push combination is equivalent to grep(). grep selects
    only those elements from a list for which the if condition holds.
    * in the grep, $_ is used to represent the current element of the
    array (rather than $elem as in the above expansion)
    * The ++ operator is applied to the same expression as when we're
    checking the current value of $seen{$_->[0]}, because a post-fix ++
    increments the value *after* returning that value. That is:
    $x = $foo++;
    is equivalent to:
    $x = $foo;
    $foo++;

    In contrast,
    $x = ++$foo;
    is equivalent to
    $foo++;
    $x = $foo;

    > Many thanks.


    You're welcome

    Paul Lalli
    Paul Lalli, Jul 31, 2007
    #4
  5. billb

    -berlin.de Guest

    billb <> wrote in comp.lang.perl.misc:
    > On 30 Jul, 19:46, Paul Lalli <> wrote:
    > > On Jul 30, 2:37 pm, billb <> wrote:
    > >
    > > > i have a multidimensional array, but i want to delete duplicate

    ^^^^^^^^^
    [Paul's solution snipped]

    > ah, very simple and very fast as well! I'll have to understand how
    > this is working. It uses a hash I see. Many thanks.


    On hearing the word "duplicate", like a Pavlovian dog a Perl programmer
    goes "Hash, hash, hash...". The word "unique" hash the same effect.

    Anno
    -berlin.de, Jul 31, 2007
    #5
  6. billb

    -berlin.de Guest

    billb <> wrote in comp.lang.perl.misc:
    > On 30 Jul, 19:46, Paul Lalli <> wrote:
    > > On Jul 30, 2:37 pm, billb <> wrote:
    > >
    > > > i have a multidimensional array, but i want to delete duplicate

    ^^^^^^^^^
    [Paul's solution snipped]

    > ah, very simple and very fast as well! I'll have to understand how
    > this is working. It uses a hash I see. Many thanks.


    On hearing the word "duplicate", like a Pavlovian dog a Perl programmer
    goes "Hash, hash, hash...". The word "unique" has the same effect.

    Anno
    -berlin.de, Jul 31, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fred
    Replies:
    15
    Views:
    70,944
    Archer
    Mar 12, 2005
  2. Harry Barker
    Replies:
    2
    Views:
    505
    Alf P. Steinbach
    Apr 19, 2006
  3. ak

    duplicates in array

    ak, May 21, 2007, in forum: C Programming
    Replies:
    13
    Views:
    723
    Richard Tobin
    May 23, 2007
  4. Arvin Portlock
    Replies:
    2
    Views:
    89
    Arvin Portlock
    Dec 10, 2003
  5. crea
    Replies:
    2
    Views:
    398
    Nobody
    Dec 28, 2012
Loading...

Share This Page