parse hash by array element seems to run slow

J

JRoot

Below I posted my program in it's entirity and I hope that is within
the guidelines ...
The program works perectly but it seems to run slower than I had
expected based on other perl programs I've authored with much more
data involved.

I've been through the faq's and every web site that focuses on perl
and to be honest, I'm overwhelmed by the number of ways it's possible
to use perl so I'm asking if anyone could just look over this program
and see if it is obvious where the overhead lies so I can remove the
baggage and speed it up a bit. It runs in about 3-5 seconds on a 1 ghz
machine but my gut tells me it could be faster or more efficient if
you will. I'm not asking anyone to do my work for me ... just for a
pointer from anyone that deals with arrays and hashes a lot.

The roadmap is ...

I create an array from a file (one word per line)

I create a hash from a 2nd file, combining duplicate key words and
assign the duplicate quantity as the value. A key-val pair would look
like ... net_name 5

I iterate the array and search the hash keys to find a match and then
report the key and val to different reports depending on the val
(quantity) which is either 0 (not found) 1 (single occurance) or
multiple which need to be verified.

I suspect I've arrayed and hashed one too many times when it could
have been done in a more simple manner.

Any help is appreciated -- thank you

snip<------ cut here -------->snip

#!/app/util/perl/bin/perl -w

BEGIN { push(@INC, "/app/util/perl5.6.1/lib/5.6.1") }

use strict;

my @nets = ();
my %count = ();
my %list = ();
my ($net, $x, $y, $netName, $side, $junk, $line, $checkHashVal);
my ($sortedNet, $tpList, $key, $val, $list, $checkHashKey, $found);
my ($oneArr, $verifyArr, $zeroArr, $line2Push);
my $lineCnt = 0;
my @tpList = ();
my @sortedNets = ();
my @zeroArr = ();
my @oneArr = ();
my @verifyArr = ();
my $cnt = 0;

print "\n\n... Working .... \n\n";

open (TL, "tp_list") || die "Can't open tp_list for reading: $!";

while(<TL>){
chomp($_);
push(@tpList, $_);
}
close(TL);

open (TC, "testpoint_chart") || die "Can't open testpoint_chart for
reading: $!";

while(<TC>){
$line = $_;
++$lineCnt;
if($lineCnt > 9){
($x, $y, $netName, $side, $junk) = split(' ', $line, 5);
if($side eq "Top" || $side eq "Bottom"){
push(@nets, $netName);
}
}
}

close(TC);

@sortedNets = @nets;

foreach $sortedNet(@sortedNets) {
$count{$sortedNet} = ++$count{$sortedNet};
}

foreach $sortedNet (keys %count) {
($key, $val) = ($sortedNet, $count{$sortedNet});
$list{$key} = $val;
}

foreach $tpList (@tpList) {
$found = "no";
++$cnt;
foreach $key (keys %list) {
if ($key eq $tpList) {
$found = "yes";
$line2Push = "ARR: $key -- HASHKEY: $tpList VAL:
$list{$tpList}\n";
if($list{$tpList} eq "1"){
push(@oneArr, $line2Push);
}
elsif($list{$tpList} ne "1"){
push(@verifyArr, $line2Push);
}
last;
}
}
if($found eq "no"){
push(@zeroArr, $tpList);
}
}
print "\n===== ONES =====\n";
print "------------------\n";
foreach $oneArr(@oneArr){print "ONES: $oneArr";}
print "\n===== VERIFY =====\n";
print "--------------------\n";
foreach $verifyArr(@verifyArr){print "VERIFY: $verifyArr";}
print "\n===== ZERO =====\n";
print "--------------------\n";
foreach $zeroArr(@zeroArr){print "ZERO: $zeroArr\n";}

print " ... D O N e ... \n\n";
 
U

Uri Guttman

J> BEGIN { push(@INC, "/app/util/perl5.6.1/lib/5.6.1") }

use lib "/app/util/perl5.6.1/lib/5.6.1" ;

J> use strict;

J> my @nets = ();
J> my %count = ();
J> my %list = ();
J> my ($net, $x, $y, $netName, $side, $junk, $line, $checkHashVal);
J> my ($sortedNet, $tpList, $key, $val, $list, $checkHashKey, $found);
J> my ($oneArr, $verifyArr, $zeroArr, $line2Push);
J> my $lineCnt = 0;
J> my @tpList = ();
J> my @sortedNets = ();
J> my @zeroArr = ();
J> my @oneArr = ();
J> my @verifyArr = ();
J> my $cnt = 0;

don't declare variables until you need them. and there is no need to
initialize the arrays and hashes to () as they start out empty

J> open (TL, "tp_list") || die "Can't open tp_list for reading: $!";

J> while(<TL>){
J> chomp($_);
J> push(@tpList, $_);
J> }
J> close(TL);

use File::Slurp ;
chomp( my @tp_list = read_file( 'tp_list' ) ) ;

style point: use _ and not studly caps for names.


J> open (TC, "testpoint_chart") || die "Can't open testpoint_chart for
J> reading: $!";

J> while(<TC>){
J> $line = $_;

needless extra copy. either work with $_ or assign to $line in the
while:

while( my $line = <TC> ) {

J> ++$lineCnt;
J> if($lineCnt > 9){
J> ($x, $y, $netName, $side, $junk) = split(' ', $line, 5);
J> if($side eq "Top" || $side eq "Bottom"){
J> push(@nets, $netName);
J> }
J> }
J> }

i don't want to get into that logic but i bet you could slurp that in
also and then process the lines.

J> close(TC);

J> @sortedNets = @nets;

why this copy? i see no reason for it. if this is a large list it will
slow you down.

J> foreach $sortedNet(@sortedNets) {
J> $count{$sortedNet} = ++$count{$sortedNet};
J> }

huh???? why bump a hash element and then assign it to ITSELF?

$count{$_}++ for @sortedNets ;


J> foreach $sortedNet (keys %count) {
J> ($key, $val) = ($sortedNet, $count{$sortedNet});
J> $list{$key} = $val;
J> }

needless use of temp variables

foreach my $sorted_net (keys %count) {

$list{$sortedNet} = $count{$sortedNet} ;
}

but that is just a copy of the hash. is that what you really want?

you seem to do lots of extra and needless copying. that will kill any
program which handles large amounts of data.

J> foreach $tpList (@tpList) {
J> $found = "no";
J> ++$cnt;
J> foreach $key (keys %list) {
J> if ($key eq $tpList) {
J> $found = "yes";
J> $line2Push = "ARR: $key -- HASHKEY: $tpList VAL:
J> $list{$tpList}\n";
J> if($list{$tpList} eq "1"){
J> push(@oneArr, $line2Push);
J> }
J> elsif($list{$tpList} ne "1"){
J> push(@verifyArr, $line2Push);
J> }
J> last;
J> }
J> }
J> if($found eq "no"){
J> push(@zeroArr, $tpList);
J> }

instead of flag variables ($found) just do a next or last as
needed. flag variables are fortran, not perl. i am too fried to analyze
the logic in that loop so i can't optimize it now but i sense it can be
easily be done. i leave that for you and other posters.

J> print " ... D O N e ... \n\n";

well, the prompt coming back from the shell should be fine for that,
even though it isn't a cpu waster.

so your code does plenty of needless copies and slow loops. plenty of
room for speedups.

uri
 
J

Joe Smith

JRoot said:
while(<TC>){
$line = $_;
++$lineCnt;
if($lineCnt > 9){

if ($. > 9) { # Use $. when reading file line by line
foreach $sortedNet (keys %count) {
($key, $val) = ($sortedNet, $count{$sortedNet});
$list{$key} = $val;
}

%list = %count; # Copy entire hash. But why?
foreach $tpList (@tpList) {
$found = "no";
++$cnt;
foreach $key (keys %list) {
if ($key eq $tpList) {

foreach my $tp (@tp_list) {
if (exists $count{$tp}) {
push @one_arr,"$tp = $count{$tp}\n" if $count{$tp} == 1;
push @ver_arr,"$tp = $count{$tp}\n" if $count{$tp} > 1;
} else {
push @zer_arr,$tp;
}
}

That loop with 'keys %list' was killing your performance.
-Joe
 
M

Michele Dondi

BEGIN { push(@INC, "/app/util/perl5.6.1/lib/5.6.1") }

Uri already showed you a better way to do this. Why do you do it
anyway? It don't see what use it could serve...
use strict;
[snip rest]

From the description of your purpose and the part of the code I
actually read (the "big" loop was overly clumsy and complex for me to
try to understand what it did) I think that this *may* be what you're
after:


#!/app/util/perl/bin/perl -l

use strict;
use warnings;

warn "$0: Working...\n\n";

chomp(my @tp_list=do {
open my $tl, 'tp_list' or
die "$0: Can't open `tp_list' for reading: $!\n";
<$tl> });

my @nets;
{
open my $tc, 'testpoint_chart' or
die "$0: Can't open `testpoint_chart' for reading: $!\n";

while (<$tc>) {
next unless $. > 10;
my ($net_name, $side)=(split)[2,3];
push @nets, $net_name
if $side eq 'Top' or $side eq 'Bottom';
}
}

my %count;
$count{$_}++ for @nets;

my @result;
for (@tp_list) {
my $c=$count{$_} || 0;
push @{ $result[$c<2 ? $c : 2] }, $_
}

my @msg=qw/ZERO ONES VERIFY/;
for my $i (1,2,0) {
my $head="==== $msg[$i] ====";
print for '', $head, '-' x length $head;
print "$msg[$i]: $_" for @{ $result[$i] };
}

__END__


Hmmm, now that I think of it, since you're concerned about efficiency,
if chances are that 'testpoint_chart' be huge, then you may gain
something by skipping the first ten lines *before* the main cycle.
Also, you may avoid the C<for @nets> loop in unnecessary, just like
@nets itself. All in all this should be faster:


my %count;
{
open my $tc, '<', 'testpoint_chart' or
die "$0: Can't open `testpoint_chart' for reading: $!\n";

<$tc> for 1..10;
while (<$tc>) {
my ($net_name, $side)=(split)[2,3];
$count{$net_name}++
if $side eq 'Top' or $side eq 'Bottom'
}
}


Michele
 
J

John W. Krahn

JRoot said:
Below I posted my program in it's entirity and I hope that is within
the guidelines ...
The program works perectly but it seems to run slower than I had
expected based on other perl programs I've authored with much more
data involved.

I've been through the faq's and every web site that focuses on perl
and to be honest, I'm overwhelmed by the number of ways it's possible
to use perl so I'm asking if anyone could just look over this program
and see if it is obvious where the overhead lies so I can remove the
baggage and speed it up a bit. It runs in about 3-5 seconds on a 1 ghz
machine but my gut tells me it could be faster or more efficient if
you will.

Use perl's built-in variables whenever possible as they are usually optimized
for efficiency.
I'm not asking anyone to do my work for me ... just for a
pointer from anyone that deals with arrays and hashes a lot.

The roadmap is ...

I create an array from a file (one word per line)

Forget about the array.
I create a hash from a 2nd file, combining duplicate key words and
assign the duplicate quantity as the value. A key-val pair would look
like ... net_name 5

Fill this hash first and then use a while loop on the first file.

I iterate the array and search the hash keys to find a match

Use perl's exists() function to find matching hash keys instead of using a
foreach loop over keys( %hash ).

and then
report the key and val to different reports depending on the val
(quantity) which is either 0 (not found) 1 (single occurance) or
multiple which need to be verified.

I suspect I've arrayed and hashed one too many times when it could
have been done in a more simple manner.

Any help is appreciated -- thank you

Try something like this (UNTESTED):


open TC, 'testpoint_chart' or die "Can't open testpoint_chart for reading: $!";

my %list;
while ( <TC> ) {
next if 1 .. 9;
my ( $netName, $side ) = ( split )[ 2, 3 ];
$list{ $netName }++ if $side eq 'Top' or $side eq 'Bottom';
}

close TC;


open TL, 'tp_list' or die "Can't open tp_list for reading: $!";

my ( @zeroArr, @oneArr, @verifyArr );
while ( <TL> ) {
chomp;
if ( exists $list{ $_ } ) {
push @{ $list{ $_ } == 1 ? \@oneArr : \@verifyArr }, "ARR: $_ --
HASHKEY: $_ VAL: $list{$_}\n";
}
else {
push @zeroArr, "$_\n";
}
}

close TL;

print "\n===== ONES =====\n------------------\n";
print "ONES: $_" for @oneArr;

print "\n===== VERIFY =====\n--------------------\n";
print "VERIFY: $_" for @verifyArr;

print "\n===== ZERO =====\n--------------------\n";
print "ZERO: $_" for @zeroArr;

print " ... D O N e ... \n\n";

__END__



John
 
M

Michele Dondi

next unless $. > 10;

s/10/9/; # I'm an idiot...
<$tc> for 1..10;

s/10/9/; # I'm an idiot...
Also, you may avoid the C<for @nets> loop in unnecessary, just like
@nets itself. All in all this should be faster:

"Also, the C<for @nets> loop is unnecessary, just like @nets itself.
All in all this should be faster:"

While we're here, it occurred to me that there's no need to preload
@tp_list all at once either, but one can use a "traditional" C<while>
loop instead. Of course all this is obvious, or even more than
obvious, but since I basically started by translating your script to
something more human, I didn't realize it soon.

I apologize for the repeated errors/inaccuracies. All in all the
"latest version" of the script is:


#!/app/util/perl/bin/perl -l

use strict;
use warnings;

warn "$0: Working...\n";

my %count;
{
open my $tc, '<', 'testpoint_chart' or
die "$0: Can't open `testpoint_chart' for reading: $!\n";

<$tc> for 1..9;
while (<$tc>) {
my ($net_name, $side)=(split)[2,3];
$count{$net_name}++
if $side eq 'Top' or $side eq 'Bottom';
}
}

my @result;
{
open my $tl, '<', 'tp_list' or
die "$0: Can't open `tp_list' for reading: $!\n";

while (<$tl>) {
chomp;
my $c=$count{$_} || 0;
push @{ $result[$c<2 ? $c : 2] }, $_;
}
}

my @msg=qw/ZERO ONES VERIFY/;
for my $i (1,2,0) {
my $head="==== $msg[$i] ====";
print for '', $head, '-' x length $head;
print "$msg[$i]: $_" for @{ $result[$i] };
}

__END__


PS: I understand that yours could have been just a preliminary
version, but I think that it would be sensible to read the required
files from the cmd line rather than hardcoding them in the source.


Michele
 
M

Michele Dondi

s/9/10/; # or not?

s/^.*$//; # I'm an idiot!


PS: I know of C<$_=''>. I wanted to underline "delete everything of
what I wrote from the beginning to the end"


Michele
 
J

JRoot

To everyone that responded to my latest self-made crisis

Thank you so very much for the replies and critique and the bonus
style pointers.

I'm taking all the posts and re-arranging the good, the bad and the
ugly (bad and ugly being the part I did :)) and trying different
scenarios, all of which will certainly be better than what I had, to
speed up the program naturally but also to see what I can learn from
all this.

Youz guyz and galz are truly the perl gods and goddesses. Hanging
around here could turn a person into a real programmer if he's not
careful.

Thanks again and have a good one.

Jorge
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top