Merge multiple rows and remove duplicates --based on the first value

Discussion in 'Perl Misc' started by Susan, Jan 26, 2006.

  1. Susan

    Susan Guest

    There must be a simple solution, but I am struck with this.

    I have a file --

    Thomas Jacob Emily Madison
    Corner Joshua Emma Isabella
    Thomas Ethan Emily Samantha
    Williams Mathew John Lina
    Corner Christopher Emma Daniel
    Corner Joshua Matthew Hannah
    ..
    ..
    ...

    How do I merge these into one based on the first column?

    Based the name "Thomas" I would like to merge the rest of the 3 columns
    and get

    Thomas Jacob Emily Madison Ethan Samantha
    Corner Joshua Emma Isabella Christopher Daniel
    Matthew Hannah
    Williams Mathew John Lina

    Can someone help?

    Thanks.
     
    Susan, Jan 26, 2006
    #1
    1. Advertising

  2. Susan

    Ch Lamprecht Guest

    Re: Merge multiple rows and remove duplicates --based on the firstvalue

    Hi,
    Susan wrote:

    > I have a file --

    to make it run I will use an array:


    use warnings;
    use strict;
    use Data::Dumper;

    my @data =(
    [qw/Thomas Jacob Emily Madison/],
    [qw/ Corner Joshua Emma Isabella/],
    [qw/Thomas Ethan Emily Samantha/],
    [qw/Williams Mathew John Lina/],
    [qw/Corner Christopher Emma Daniel/],
    [qw/Corner Joshua Matthew Hannah/]
    );
    print Dumper \@data;
    my %result;
    foreach (@data){
    my $key = shift (@$_);
    foreach (@$_){
    $result{$key}{$_}='true';
    }
    }
    print Dumper \%result;



    use function 'keys' if you prefer a hash of arrayrefs as output.

    HTH
    Christoph
    --
    please reply to

    perl -e "print scalar reverse q//"
     
    Ch Lamprecht, Jan 26, 2006
    #2
    1. Advertising

  3. Susan

    Xicheng Guest

    Susan wrote:
    > There must be a simple solution, but I am struck with this.
    > How do I merge these into one based on the first column?
    > Based the name "Thomas" I would like to merge the rest of the 3 columns
    > and get

    use hash..
    ==================
    #!/usr/bin/perl -w
    use strict;
    use Data::Dumper;
    my %h=();
    while(<DATA>) {
    chomp;
    my @tmp=split' ',$_,2;
    $h{$tmp[0]} .= "$tmp[1] ";
    }
    print Dumper \%h;
    __DATA__
    Thomas Jacob Emily Madison
    Corner Joshua Emma Isabella
    Thomas Ethan Emily Samantha
    Williams Mathew John Lina
    Corner Christopher Emma Daniel
    Corner Joshua Matthew Hannah
    =========
    Xicheng

    > Thomas Jacob Emily Madison Ethan Samantha
    > Corner Joshua Emma Isabella Christopher Daniel
    > Matthew Hannah
    > Williams Mathew John Lina
    >
    > Can someone help?
    >
    > Thanks.
     
    Xicheng, Jan 27, 2006
    #3
  4. Susan

    Guest

    Susan <> wrote:
    > There must be a simple solution, but I am struck with this.


    > I have a file --


    > Thomas Jacob Emily Madison
    > Corner Joshua Emma Isabella
    > Thomas Ethan Emily Samantha
    > Williams Mathew John Lina
    > Corner Christopher Emma Daniel
    > Corner Joshua Matthew Hannah
    > .
    > .
    > ..


    > How do I merge these into one based on the first column?


    > Based the name "Thomas" I would like to merge the rest of the 3 columns
    > and get


    > Thomas Jacob Emily Madison Ethan Samantha
    > Corner Joshua Emma Isabella Christopher Daniel
    > Matthew Hannah
    > Williams Mathew John Lina



    One way would be to create a hash whose keys are the entries in the
    first column. The values of each entry in this hash would be
    a reference to another hash whose keys are the entries in the
    other columns (the values being immaterial as long as they are
    defined, e.g. just use '1').

    Axel
     
    , Jan 27, 2006
    #4
  5. Susan

    Guest

    Susan wrote:
    > How do I merge these into one based on the first column?


    This solution is simple but not terribly efficient (I wouldn't use it
    on a huge input list). Exception handling is left as an exercise to the
    reader:

    #!/usr/bin/perl
    use strict; use warnings;
    use Data::Dumper;
    use List::MoreUtils qw{uniq};

    my %name;
    while (<DATA>) {
    my ($col1_name, @other_names) = split;
    @{$name{$col1_name}} = uniq( @{$name{$col1_name}},
    @other_names );
    }
    print Dumper \%name;

    __DATA__
    Thomas Jacob Emily Madison
    Corner Joshua Emma Isabella
    Thomas Ethan Emily Samantha
    Williams Mathew John Lina
    Corner Christopher Emma Daniel
    Corner Joshua Matthew Hannah



    --
    http://DavidFilmer.com
     
    , Jan 27, 2006
    #5
  6. Susan

    Xicheng Guest

    Xicheng wrote:
    > Susan wrote:
    > > There must be a simple solution, but I am struck with this.
    > > How do I merge these into one based on the first column?
    > > Based the name "Thomas" I would like to merge the rest of the 3 columns
    > > and get

    > use hash..
    > ==================
    > #!/usr/bin/perl -w
    > use strict;
    > use Data::Dumper;
    > my %h=();
    > while(<DATA>) {
    > chomp;
    > my @tmp=split' ',$_,2;
    > $h{$tmp[0]} .= "$tmp[1] ";
    > }

    #add the following lines to regulate the output.
    for my $k(keys %h){
    $h{$k} =~ s/\s+/ /g;
    print "$k => $h{$k}\n";
    }

    > print Dumper \%h;
    > __DATA__
    > Thomas Jacob Emily Madison
    > Corner Joshua Emma Isabella
    > Thomas Ethan Emily Samantha
    > Williams Mathew John Lina
    > Corner Christopher Emma Daniel
    > Corner Joshua Matthew Hannah
    > =========
    > Xicheng
    >
    > > Thomas Jacob Emily Madison Ethan Samantha
    > > Corner Joshua Emma Isabella Christopher Daniel
    > > Matthew Hannah
    > > Williams Mathew John Lina
    > >
    > > Can someone help?
    > >
    > > Thanks.
     
    Xicheng, Jan 27, 2006
    #6
  7. Susan

    Xicheng Guest

    Jim Gibson wrote:
    > In article <>,
    > Xicheng <> wrote:
    > > use hash..
    > > ==================
    > > #!/usr/bin/perl -w
    > > use strict;
    > > use Data::Dumper;
    > > my %h=();
    > > while(<DATA>) {
    > > chomp;
    > > my @tmp=split' ',$_,2;
    > > $h{$tmp[0]} .= "$tmp[1] ";
    > > }
    > > print Dumper \%h;
    > > __DATA__
    > > Thomas Jacob Emily Madison
    > > Corner Joshua Emma Isabella
    > > Thomas Ethan Emily Samantha
    > > Williams Mathew John Lina
    > > Corner Christopher Emma Daniel
    > > Corner Joshua Matthew Hannah

    >
    > The OP doesn't want duplicate entries in the output. Your program does
    > not fulfill that requirement. For example, it includes 'Emily' twice in
    > the entry for 'Thomas'.

    yup, use hash again, you can fix it in a minute:
    #if just print:
    for my $k(keys %h){
    my %t=();
    print "$k => @{[grep{!$t{$_}++}split' ',$h{$k}]}\n";
    }

    #or put the list to a scalar:
    for my $k(keys %h){
    my %t=();
    my $z=join' ',grep{!$t{$_}++}split' '=>$h{$k};
    print "$k=>$z\n";
    }

    Xicheng
     
    Xicheng, Jan 27, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steven Bethard
    Replies:
    11
    Views:
    1,069
    Alex Martelli
    Feb 7, 2005
  2. Jesper Mortensen
    Replies:
    1
    Views:
    504
  3. bob

    remove duplicates?

    bob, Sep 5, 2011, in forum: Java
    Replies:
    4
    Views:
    426
    Roedy Green
    Sep 5, 2011
  4. senthil
    Replies:
    10
    Views:
    418
  5. JimJx

    Sort and remove duplicates

    JimJx, Sep 28, 2007, in forum: Perl Misc
    Replies:
    5
    Views:
    151
    Martijn Lievaart
    Sep 28, 2007
Loading...

Share This Page