Merge multiple rows and remove duplicates --based on the first value

S

Susan

There must be a simple solution, but I am struck with this.

I have a file --

Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah
..
..
...

How do I merge these into one based on the first column?

Based the name "Thomas" I would like to merge the rest of the 3 columns
and get

Thomas Jacob Emily Madison Ethan Samantha
Corner Joshua Emma Isabella Christopher Daniel
Matthew Hannah
Williams Mathew John Lina

Can someone help?

Thanks.
 
C

Ch Lamprecht

Hi,
Susan said:
I have a file --
to make it run I will use an array:


use warnings;
use strict;
use Data::Dumper;

my @data =(
[qw/Thomas Jacob Emily Madison/],
[qw/ Corner Joshua Emma Isabella/],
[qw/Thomas Ethan Emily Samantha/],
[qw/Williams Mathew John Lina/],
[qw/Corner Christopher Emma Daniel/],
[qw/Corner Joshua Matthew Hannah/]
);
print Dumper \@data;
my %result;
foreach (@data){
my $key = shift (@$_);
foreach (@$_){
$result{$key}{$_}='true';
}
}
print Dumper \%result;



use function 'keys' if you prefer a hash of arrayrefs as output.

HTH
Christoph
 
X

Xicheng

Susan said:
There must be a simple solution, but I am struck with this.
How do I merge these into one based on the first column?
Based the name "Thomas" I would like to merge the rest of the 3 columns
and get
use hash..
==================
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my %h=();
while(<DATA>) {
chomp;
my @tmp=split' ',$_,2;
$h{$tmp[0]} .= "$tmp[1] ";
}
print Dumper \%h;
__DATA__
Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah
=========
Xicheng
 
A

axel

Susan said:
There must be a simple solution, but I am struck with this.
I have a file --
Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah
.
.
..
How do I merge these into one based on the first column?
Based the name "Thomas" I would like to merge the rest of the 3 columns
and get
Thomas Jacob Emily Madison Ethan Samantha
Corner Joshua Emma Isabella Christopher Daniel
Matthew Hannah
Williams Mathew John Lina


One way would be to create a hash whose keys are the entries in the
first column. The values of each entry in this hash would be
a reference to another hash whose keys are the entries in the
other columns (the values being immaterial as long as they are
defined, e.g. just use '1').

Axel
 
U

usenet

Susan said:
How do I merge these into one based on the first column?

This solution is simple but not terribly efficient (I wouldn't use it
on a huge input list). Exception handling is left as an exercise to the
reader:

#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper;
use List::MoreUtils qw{uniq};

my %name;
while (<DATA>) {
my ($col1_name, @other_names) = split;
@{$name{$col1_name}} = uniq( @{$name{$col1_name}},
@other_names );
}
print Dumper \%name;

__DATA__
Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah
 
X

Xicheng

Xicheng said:
Susan said:
There must be a simple solution, but I am struck with this.
How do I merge these into one based on the first column?
Based the name "Thomas" I would like to merge the rest of the 3 columns
and get
use hash..
==================
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my %h=();
while(<DATA>) {
chomp;
my @tmp=split' ',$_,2;
$h{$tmp[0]} .= "$tmp[1] ";
}
#add the following lines to regulate the output.
for my $k(keys %h){
$h{$k} =~ s/\s+/ /g;
print "$k => $h{$k}\n";
}
 
X

Xicheng

Jim said:
use hash..
==================
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my %h=();
while(<DATA>) {
chomp;
my @tmp=split' ',$_,2;
$h{$tmp[0]} .= "$tmp[1] ";
}
print Dumper \%h;
__DATA__
Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah

The OP doesn't want duplicate entries in the output. Your program does
not fulfill that requirement. For example, it includes 'Emily' twice in
the entry for 'Thomas'.
yup, use hash again, you can fix it in a minute:
#if just print:
for my $k(keys %h){
my %t=();
print "$k => @{[grep{!$t{$_}++}split' ',$h{$k}]}\n";
}

#or put the list to a scalar:
for my $k(keys %h){
my %t=();
my $z=join' ',grep{!$t{$_}++}split' '=>$h{$k};
print "$k=>$z\n";
}

Xicheng
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top