Merge multiple rows and remove duplicates --based on the first value

Discussion in 'Perl Misc' started by Susan, Jan 26, 2006.

  1. Susan

    Susan Guest

    There must be a simple solution, but I am struck with this.

    I have a file --

    Thomas Jacob Emily Madison
    Corner Joshua Emma Isabella
    Thomas Ethan Emily Samantha
    Williams Mathew John Lina
    Corner Christopher Emma Daniel
    Corner Joshua Matthew Hannah
    ..
    ..
    ...

    How do I merge these into one based on the first column?

    Based the name "Thomas" I would like to merge the rest of the 3 columns
    and get

    Thomas Jacob Emily Madison Ethan Samantha
    Corner Joshua Emma Isabella Christopher Daniel
    Matthew Hannah
    Williams Mathew John Lina

    Can someone help?

    Thanks.
     
    Susan, Jan 26, 2006
    #1
    1. Advertisements

  2. Susan

    Ch Lamprecht Guest

    Re: Merge multiple rows and remove duplicates --based on the firstvalue

    Hi,
    Susan wrote:

    > I have a file --

    to make it run I will use an array:


    use warnings;
    use strict;
    use Data::Dumper;

    my @data =(
    [qw/Thomas Jacob Emily Madison/],
    [qw/ Corner Joshua Emma Isabella/],
    [qw/Thomas Ethan Emily Samantha/],
    [qw/Williams Mathew John Lina/],
    [qw/Corner Christopher Emma Daniel/],
    [qw/Corner Joshua Matthew Hannah/]
    );
    print Dumper \@data;
    my %result;
    foreach (@data){
    my $key = shift (@$_);
    foreach (@$_){
    $result{$key}{$_}='true';
    }
    }
    print Dumper \%result;



    use function 'keys' if you prefer a hash of arrayrefs as output.

    HTH
    Christoph
    --
    please reply to

    perl -e "print scalar reverse q//"
     
    Ch Lamprecht, Jan 26, 2006
    #2
    1. Advertisements

  3. Susan

    Xicheng Guest

    Susan wrote:
    > There must be a simple solution, but I am struck with this.
    > How do I merge these into one based on the first column?
    > Based the name "Thomas" I would like to merge the rest of the 3 columns
    > and get

    use hash..
    ==================
    #!/usr/bin/perl -w
    use strict;
    use Data::Dumper;
    my %h=();
    while(<DATA>) {
    chomp;
    my @tmp=split' ',$_,2;
    $h{$tmp[0]} .= "$tmp[1] ";
    }
    print Dumper \%h;
    __DATA__
    Thomas Jacob Emily Madison
    Corner Joshua Emma Isabella
    Thomas Ethan Emily Samantha
    Williams Mathew John Lina
    Corner Christopher Emma Daniel
    Corner Joshua Matthew Hannah
    =========
    Xicheng

    > Thomas Jacob Emily Madison Ethan Samantha
    > Corner Joshua Emma Isabella Christopher Daniel
    > Matthew Hannah
    > Williams Mathew John Lina
    >
    > Can someone help?
    >
    > Thanks.
     
    Xicheng, Jan 27, 2006
    #3
  4. Susan

    Guest

    Susan <> wrote:
    > There must be a simple solution, but I am struck with this.


    > I have a file --


    > Thomas Jacob Emily Madison
    > Corner Joshua Emma Isabella
    > Thomas Ethan Emily Samantha
    > Williams Mathew John Lina
    > Corner Christopher Emma Daniel
    > Corner Joshua Matthew Hannah
    > .
    > .
    > ..


    > How do I merge these into one based on the first column?


    > Based the name "Thomas" I would like to merge the rest of the 3 columns
    > and get


    > Thomas Jacob Emily Madison Ethan Samantha
    > Corner Joshua Emma Isabella Christopher Daniel
    > Matthew Hannah
    > Williams Mathew John Lina



    One way would be to create a hash whose keys are the entries in the
    first column. The values of each entry in this hash would be
    a reference to another hash whose keys are the entries in the
    other columns (the values being immaterial as long as they are
    defined, e.g. just use '1').

    Axel
     
    , Jan 27, 2006
    #4
  5. Susan

    Guest

    Susan wrote:
    > How do I merge these into one based on the first column?


    This solution is simple but not terribly efficient (I wouldn't use it
    on a huge input list). Exception handling is left as an exercise to the
    reader:

    #!/usr/bin/perl
    use strict; use warnings;
    use Data::Dumper;
    use List::MoreUtils qw{uniq};

    my %name;
    while (<DATA>) {
    my ($col1_name, @other_names) = split;
    @{$name{$col1_name}} = uniq( @{$name{$col1_name}},
    @other_names );
    }
    print Dumper \%name;

    __DATA__
    Thomas Jacob Emily Madison
    Corner Joshua Emma Isabella
    Thomas Ethan Emily Samantha
    Williams Mathew John Lina
    Corner Christopher Emma Daniel
    Corner Joshua Matthew Hannah



    --
    http://DavidFilmer.com
     
    , Jan 27, 2006
    #5
  6. Susan

    Xicheng Guest

    Xicheng wrote:
    > Susan wrote:
    > > There must be a simple solution, but I am struck with this.
    > > How do I merge these into one based on the first column?
    > > Based the name "Thomas" I would like to merge the rest of the 3 columns
    > > and get

    > use hash..
    > ==================
    > #!/usr/bin/perl -w
    > use strict;
    > use Data::Dumper;
    > my %h=();
    > while(<DATA>) {
    > chomp;
    > my @tmp=split' ',$_,2;
    > $h{$tmp[0]} .= "$tmp[1] ";
    > }

    #add the following lines to regulate the output.
    for my $k(keys %h){
    $h{$k} =~ s/\s+/ /g;
    print "$k => $h{$k}\n";
    }

    > print Dumper \%h;
    > __DATA__
    > Thomas Jacob Emily Madison
    > Corner Joshua Emma Isabella
    > Thomas Ethan Emily Samantha
    > Williams Mathew John Lina
    > Corner Christopher Emma Daniel
    > Corner Joshua Matthew Hannah
    > =========
    > Xicheng
    >
    > > Thomas Jacob Emily Madison Ethan Samantha
    > > Corner Joshua Emma Isabella Christopher Daniel
    > > Matthew Hannah
    > > Williams Mathew John Lina
    > >
    > > Can someone help?
    > >
    > > Thanks.
     
    Xicheng, Jan 27, 2006
    #6
  7. Susan

    Xicheng Guest

    Jim Gibson wrote:
    > In article <>,
    > Xicheng <> wrote:
    > > use hash..
    > > ==================
    > > #!/usr/bin/perl -w
    > > use strict;
    > > use Data::Dumper;
    > > my %h=();
    > > while(<DATA>) {
    > > chomp;
    > > my @tmp=split' ',$_,2;
    > > $h{$tmp[0]} .= "$tmp[1] ";
    > > }
    > > print Dumper \%h;
    > > __DATA__
    > > Thomas Jacob Emily Madison
    > > Corner Joshua Emma Isabella
    > > Thomas Ethan Emily Samantha
    > > Williams Mathew John Lina
    > > Corner Christopher Emma Daniel
    > > Corner Joshua Matthew Hannah

    >
    > The OP doesn't want duplicate entries in the output. Your program does
    > not fulfill that requirement. For example, it includes 'Emily' twice in
    > the entry for 'Thomas'.

    yup, use hash again, you can fix it in a minute:
    #if just print:
    for my $k(keys %h){
    my %t=();
    print "$k => @{[grep{!$t{$_}++}split' ',$h{$k}]}\n";
    }

    #or put the list to a scalar:
    for my $k(keys %h){
    my %t=();
    my $z=join' ',grep{!$t{$_}++}split' '=>$h{$k};
    print "$k=>$z\n";
    }

    Xicheng
     
    Xicheng, Jan 27, 2006
    #7
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Subba Rao via DotNetMonster.com

    script for moving rows up and down and traverse thru rows of HTML table

    Subba Rao via DotNetMonster.com, Mar 19, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    8,947
    Subba Rao via DotNetMonster.com
    Mar 19, 2005
  2. Steven Bethard
    Replies:
    11
    Views:
    1,212
    Alex Martelli
    Feb 7, 2005
  3. Jesper Mortensen
    Replies:
    1
    Views:
    585
  4. bob

    remove duplicates?

    bob, Sep 5, 2011, in forum: Java
    Replies:
    4
    Views:
    489
    Roedy Green
    Sep 5, 2011
  5. Chuck Insight

    How can I remove duplicates from an ArrayList?

    Chuck Insight, Mar 6, 2005, in forum: ASP .Net Web Controls
    Replies:
    3
    Views:
    351
    Chuck Insight
    Mar 10, 2005
  6. senthil
    Replies:
    10
    Views:
    588
  7. andrea
    Replies:
    2
    Views:
    222
    andrea
    May 12, 2008
  8. JimJx

    Sort and remove duplicates

    JimJx, Sep 28, 2007, in forum: Perl Misc
    Replies:
    5
    Views:
    238
    Martijn Lievaart
    Sep 28, 2007
Loading...