Merge multiple rows and remove duplicates --based on the first value

Susan · Jan 26, 2006

There must be a simple solution, but I am struck with this.

I have a file --

Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah
..
..
...

How do I merge these into one based on the first column?

Based the name "Thomas" I would like to merge the rest of the 3 columns
and get

Thomas Jacob Emily Madison Ethan Samantha
Corner Joshua Emma Isabella Christopher Daniel
Matthew Hannah
Williams Mathew John Lina

Can someone help?

Thanks.

Ch Lamprecht · Jan 26, 2006

Hi,

Susan said:
I have a file --

to make it run I will use an array:

use warnings;
use strict;
use Data:

umper;

my @data =(
[qw/Thomas Jacob Emily Madison/],
[qw/ Corner Joshua Emma Isabella/],
[qw/Thomas Ethan Emily Samantha/],
[qw/Williams Mathew John Lina/],
[qw/Corner Christopher Emma Daniel/],
[qw/Corner Joshua Matthew Hannah/]
);
print Dumper \@data;
my %result;
foreach (@data){
my $key = shift (@$_);
foreach (@$_){
$result{$key}{$_}='true';
}
}
print Dumper \%result;

use function 'keys' if you prefer a hash of arrayrefs as output.

HTH
Christoph

Xicheng · Jan 27, 2006

Susan said:
There must be a simple solution, but I am struck with this.
How do I merge these into one based on the first column?
Based the name "Thomas" I would like to merge the rest of the 3 columns
and get

use hash..
==================
#!/usr/bin/perl -w
use strict;
use Data:

umper;
my %h=();
while(<DATA>) {
chomp;
my @tmp=split' ',$_,2;
$h{$tmp[0]} .= "$tmp[1] ";
}
print Dumper \%h;
__DATA__
Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah
=========
Xicheng

axel · Jan 27, 2006

Susan said:
There must be a simple solution, but I am struck with this.

I have a file --

Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah
.
.
..

How do I merge these into one based on the first column?

Based the name "Thomas" I would like to merge the rest of the 3 columns
and get

Thomas Jacob Emily Madison Ethan Samantha
Corner Joshua Emma Isabella Christopher Daniel
Matthew Hannah
Williams Mathew John Lina

One way would be to create a hash whose keys are the entries in the
first column. The values of each entry in this hash would be
a reference to another hash whose keys are the entries in the
other columns (the values being immaterial as long as they are
defined, e.g. just use '1').

Axel

usenet · Jan 27, 2006

Susan said:
How do I merge these into one based on the first column?

This solution is simple but not terribly efficient (I wouldn't use it
on a huge input list). Exception handling is left as an exercise to the
reader:

#!/usr/bin/perl
use strict; use warnings;
use Data:

umper;
use List::MoreUtils qw{uniq};

my %name;
while (<DATA>) {
my ($col1_name, @other_names) = split;
@{$name{$col1_name}} = uniq( @{$name{$col1_name}},
@other_names );
}
print Dumper \%name;

__DATA__
Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah

Xicheng · Jan 27, 2006

Xicheng said:
Susan said:

There must be a simple solution, but I am struck with this.
How do I merge these into one based on the first column?
Based the name "Thomas" I would like to merge the rest of the 3 columns
and get

Click to expand...

use hash..
==================
#!/usr/bin/perl -w
use strict;
use Data:umper;
my %h=();
while(<DATA>) {
chomp;
my @tmp=split' ',$_,2;
$h{$tmp[0]} .= "$tmp[1] ";
}

#add the following lines to regulate the output.
for my $k(keys %h){
$h{$k} =~ s/\s+/ /g;
print "$k => $h{$k}\n";
}

Xicheng · Jan 27, 2006

Jim said:
use hash..
==================
#!/usr/bin/perl -w
use strict;
use Data:umper;
my %h=();
while(<DATA>) {
chomp;
my @tmp=split' ',$_,2;
$h{$tmp[0]} .= "$tmp[1] ";
}
print Dumper \%h;
__DATA__
Thomas Jacob Emily Madison
Corner Joshua Emma Isabella
Thomas Ethan Emily Samantha
Williams Mathew John Lina
Corner Christopher Emma Daniel
Corner Joshua Matthew Hannah

Click to expand...

The OP doesn't want duplicate entries in the output. Your program does
not fulfill that requirement. For example, it includes 'Emily' twice in
the entry for 'Thomas'.

yup, use hash again, you can fix it in a minute:
#if just print:
for my $k(keys %h){
my %t=();
print "$k => @{[grep{!$t{$_}++}split' ',$h{$k}]}\n";
}

#or put the list to a scalar:
for my $k(keys %h){
my %t=();
my $z=join' ',grep{!$t{$_}++}split' '=>$h{$k};
print "$k=>$z\n";
}

Xicheng

A Practical Introduction to Data Structures and Algorithm Analysis2Ed by Shaffer	0	Feb 4, 2010
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006

Merge multiple rows and remove duplicates --based on the first value

Susan

Ch Lamprecht

Xicheng

axel

usenet

Xicheng

Xicheng

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads