removing rows based on two duplicate fileds

H

heylow

Gurus:

I have merged /etc/passwd files many systems, and trying to master
passwd file. I concatenated and sorted uniquely. I have the resultant
file which looks like

smith:*:100:100:8A-74(office):/home/smith:/bin/ksh
smith:*:100:100:8A-74(office):/home/smith:/etc/fakesh <-- duplicate
rob:*:101:101:8A-75(office):/home/smith:/bin/ksh
don:*:102:102:B25:/home/don:/bin/fakesh
don:*:102:102:B25:/home/don:/bin/fakesh <-- duplicate
ele:*:255:255:A45:/home/ele:/bin/ksh
rod:*:300:300:B456:/home/rod:/bin/ksh


I want to delete the duplicates; that is, I want to keep only one row
for every uid.

I have tried with ksh, but in vain. Can you shed how it can be done in
perl.

Thanks, Pedro
 
U

usenet

I want to delete the duplicates; that is, I want to keep only one row
for every uid.

If you always want to assume the first instance wins, something like
this should work:

#!/usr/local/bin/perl
use strict; use warnings;

open (my $in, '<', 'passwd.merged');
open (my $out, '>', 'passwd.cleaned');

my %seen;
while (<$in>) {
/^(.*?):/;
print $out $_ unless $seen{$1};
$seen{$1}++;
}

__END__
 
T

Tad McClellan

I want to delete the duplicates;
Can you shed how it can be done in
perl.


It is done the way outlined in the Frequently Asked Questions.

perldoc -q duplicate

How can I remove duplicate elements from a list or array?


Modify it for your particular case:

---------------------------------
#!/usr/bin/perl
use warnings;
use strict;

my @unique = ();
my %seen = ();

while ( <DATA> ) {
my $elem = (split /:/)[2];
next if $seen{ $elem }++;
push @unique, $_;
}

print for @unique;

__DATA__
smith:*:100:100:8A-74(office):/home/smith:/bin/ksh
smith:*:100:100:8A-74(office):/home/smith:/etc/fakesh <-- duplicate
rob:*:101:101:8A-75(office):/home/smith:/bin/ksh
don:*:102:102:B25:/home/don:/bin/fakesh
don:*:102:102:B25:/home/don:/bin/fakesh <-- duplicate
ele:*:255:255:A45:/home/ele:/bin/ksh
rod:*:300:300:B456:/home/rod:/bin/ksh
 
T

Tad McClellan

open (my $in, '<', 'passwd.merged');


You should always, yes *always*, check the return value from open().

/^(.*?):/;
print $out $_ unless $seen{$1};


You should never use the dollar-digit variables unless you have
first ensured that the match _succeeded_.

The OP said he wanted duplicate uid's removed. The 1st field
is not the uid.
 
M

Martijn Lievaart

Gurus:

I have merged /etc/passwd files many systems, and trying to master
passwd file. I concatenated and sorted uniquely. I have the resultant
file which looks like

smith:*:100:100:8A-74(office):/home/smith:/bin/ksh
smith:*:100:100:8A-74(office):/home/smith:/etc/fakesh <-- duplicate
rob:*:101:101:8A-75(office):/home/smith:/bin/ksh
don:*:102:102:B25:/home/don:/bin/fakesh
don:*:102:102:B25:/home/don:/bin/fakesh <-- duplicate
ele:*:255:255:A45:/home/ele:/bin/ksh
rod:*:300:300:B456:/home/rod:/bin/ksh


I want to delete the duplicates; that is, I want to keep only one row
for every uid.

I have tried with ksh, but in vain. Can you shed how it can be done in
perl.

I would not use Perl for this but GNU sort:

# sort -t: -k3,3 -u passwd

M4
 
B

Ben Morrow

Quoth Joe Smith said:
Why that, instead of the simple

print @unique;

It's a habit one gets into when $\ is set to something useful (such as
when using -l).

Ben
 
T

Tad McClellan

Joe Smith said:
Why that, instead of the simple

print @unique;


No good reason in this particular case, since they all have newlines.

I originally had

print "$_\n" for @unique;

then removed print()'s argument when it came out double-spaced. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top