S
Shawn Milo
I'm new to Python and fairly experienced in Perl, although that
experience is limited to the things I use daily.
I wrote the same script in both Perl and Python, and the output is
identical. The run speed is similar (very fast) and the line count is
similar.
Now that they're both working, I was looking at the code and wondering
what Perl-specific and Python-specific improvements to the code would
look like, as judged by others more knowledgeable in the individual
languages.
I am not looking for the smallest number of lines, or anything else
that would make the code more difficult to read in six months. Just
any instances where I'm doing something inefficiently or in a "bad"
way.
I'm attaching both the Perl and Python versions, and I'm open to
comments on either. The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.
Thanks for taking the time to look at these.
Shawn
##########################################################################
Perl code:
##########################################################################
#! /usr/bin/env perl
use warnings;
use strict;
my $piid;
my $row;
my %input;
my $best;
my $curr;
foreach $row (<>){
chomp($row);
$piid = (split(/\t/, $row))[0];
push ( @{$input{$piid}}, $row );
}
for $piid (keys(%input)){
$best = "";
for $curr (@{$input{$piid}}){
if ($best eq ""){
$best = $curr;
}else{
#If the current record is the correct state
if ((split(/\t/, $curr))[1] eq (split(/\t/, $curr))[6]){
#If existing record is the correct state
if ((split(/\t/, $best))[1] eq (split(/\t/, $curr))[6]){
if ((split(/\t/, $curr))[5] gt (split(/\t/, $best))[5]){
$best = $curr;
}
}else{
$best = $curr;
}
}else{
#if the existing record does not have the correct state
#and the new one has a newer expiration date
if (((split(/\t/, $best))[1] ne (split(/\t/, $curr))[6]) and
((split(/\t/, $curr))[5] gt (split(/\t/, $best))[5])){
$best = $curr;
}
}
}
}
print "$best\n";
}
##########################################################################
End Perl code
##########################################################################
##########################################################################
Python code
##########################################################################
#! /usr/bin/env python
import sys
input = sys.stdin
recs = {}
for row in input:
row = row.rstrip('\n')
piid = row.split('\t')[0]
if recs.has_key(piid) is False:
recs[piid] = []
recs[piid].append(row)
for piid in recs.keys():
best = ""
for current in recs[piid]:
if best == "":
best = current;
else:
#If the current record is the correct state
if current.split("\t")[1] == current.split("\t")[6]:
#If the existing record is the correct state
if best.split("\t")[1] == best.split("\t")[6]:
#If the new record has a newer exp. date
if current.split("\t")[5] > best.split("\t")[5]:
best = current
else:
best = current
else:
#If the existing record does not have the correct state
#and the new record has a newer exp. date
if best.split("\t")[1] != best.split("\t")[6] and
current.split("\t")[5] > best.split("\t")[5]:
best = current
print best
##########################################################################
End Python code
##########################################################################
experience is limited to the things I use daily.
I wrote the same script in both Perl and Python, and the output is
identical. The run speed is similar (very fast) and the line count is
similar.
Now that they're both working, I was looking at the code and wondering
what Perl-specific and Python-specific improvements to the code would
look like, as judged by others more knowledgeable in the individual
languages.
I am not looking for the smallest number of lines, or anything else
that would make the code more difficult to read in six months. Just
any instances where I'm doing something inefficiently or in a "bad"
way.
I'm attaching both the Perl and Python versions, and I'm open to
comments on either. The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.
Thanks for taking the time to look at these.
Shawn
##########################################################################
Perl code:
##########################################################################
#! /usr/bin/env perl
use warnings;
use strict;
my $piid;
my $row;
my %input;
my $best;
my $curr;
foreach $row (<>){
chomp($row);
$piid = (split(/\t/, $row))[0];
push ( @{$input{$piid}}, $row );
}
for $piid (keys(%input)){
$best = "";
for $curr (@{$input{$piid}}){
if ($best eq ""){
$best = $curr;
}else{
#If the current record is the correct state
if ((split(/\t/, $curr))[1] eq (split(/\t/, $curr))[6]){
#If existing record is the correct state
if ((split(/\t/, $best))[1] eq (split(/\t/, $curr))[6]){
if ((split(/\t/, $curr))[5] gt (split(/\t/, $best))[5]){
$best = $curr;
}
}else{
$best = $curr;
}
}else{
#if the existing record does not have the correct state
#and the new one has a newer expiration date
if (((split(/\t/, $best))[1] ne (split(/\t/, $curr))[6]) and
((split(/\t/, $curr))[5] gt (split(/\t/, $best))[5])){
$best = $curr;
}
}
}
}
print "$best\n";
}
##########################################################################
End Perl code
##########################################################################
##########################################################################
Python code
##########################################################################
#! /usr/bin/env python
import sys
input = sys.stdin
recs = {}
for row in input:
row = row.rstrip('\n')
piid = row.split('\t')[0]
if recs.has_key(piid) is False:
recs[piid] = []
recs[piid].append(row)
for piid in recs.keys():
best = ""
for current in recs[piid]:
if best == "":
best = current;
else:
#If the current record is the correct state
if current.split("\t")[1] == current.split("\t")[6]:
#If the existing record is the correct state
if best.split("\t")[1] == best.split("\t")[6]:
#If the new record has a newer exp. date
if current.split("\t")[5] > best.split("\t")[5]:
best = current
else:
best = current
else:
#If the existing record does not have the correct state
#and the new record has a newer exp. date
if best.split("\t")[1] != best.split("\t")[6] and
current.split("\t")[5] > best.split("\t")[5]:
best = current
print best
##########################################################################
End Python code
##########################################################################