nested foreach loop

T

torque

I have a nested foreach loop and this is really killing the
performance. The inner foreach is using the value from the outer loop
to check if it is a valid value by comparing it against an array of
valid values. I removed the inner foreach to eliminate validation and
the performance improved from 12 hours to 3 hours for 7 million
records.
But I would like to have this validation check done by the inner
foreach loop.

This is how the code looks like. Its fairly simple but I am new to
perl.

open(WVHFILE, "<file.txt");
while (<WVHFILE>)
{
chomp;
($id, $pk, $an, $eid, $vv, $val, $et) = split(/\|/);
my @token = split(";", $val);
my %rst = ();
my $flg = 0;
my $ed = substr($et,0,11);

foreach $nvpair (@token)
{
$flg = 0;
my ( $attr_id, $attr_val) = split ("=", $nvpair);
foreach $aid (@attr_lst){
if ($attr_id == $aid){
$flg = 1;
}

}

if ($flg == 1) {
$rst{$attr_id} = $attr_val . "|" . $lst{$attr_id .
$attr_val}. "|";
}

}

Please send me an alternative if you know it. I would appreciate it.
Thanks.
 
B

Brian Helterlilne

torque said:
I have a nested foreach loop and this is really killing the
performance. The inner foreach is using the value from the outer loop
to check if it is a valid value by comparing it against an array of
valid values. I removed the inner foreach to eliminate validation and
the performance improved from 12 hours to 3 hours for 7 million
records.
But I would like to have this validation check done by the inner
foreach loop.

This is how the code looks like. Its fairly simple but I am new to
perl.

Basically, you are using the wrong data structure. If you are
constantly searching @attr_lst, it should be a hash, with the list of
attributes as keys. Then, rather than looping through the array, you
look up the key in the hash.

use strict;
use warnings;


my %attr_hash;
$attr_hash{$_} = 1 foreach ( @attr_lst );
open(WVHFILE, "<file.txt");
open my $WVHFILE, "<", "file.txt" or die "can't open file.txt: $!\n";
while (<WVHFILE>) while ( $WVHFILE )
{
chomp;
($id, $pk, $an, $eid, $vv, $val, $et) = split(/\|/);
my @token = split(";", $val);
my %rst = ();
my $flg = 0;
my $ed = substr($et,0,11);

foreach $nvpair (@token)
{
$flg = 0;
my ( $attr_id, $attr_val) = split ("=", $nvpair);
foreach $aid (@attr_lst){
if ($attr_id == $aid){
$flg = 1;
}

}

if ( $attr_hash{ $aid } {
$rst{$attr_id} = $attr_val . "|" .
$lst{$attr_id . $attr_val}. |";
 
J

Jürgen Exner

torque said:
I have a nested foreach loop and this is really killing the
performance. The inner foreach is using the value from the outer loop
to check if it is a valid value by comparing it against an array of
valid values. I removed the inner foreach to eliminate validation and
the performance improved from 12 hours to 3 hours for 7 million
records. [...]
foreach $nvpair (@token)
{
$flg = 0;
my ( $attr_id, $attr_val) = split ("=", $nvpair);
foreach $aid (@attr_lst){
if ($attr_id == $aid){
$flg = 1;
}

Ouch, that hurts. Some comments:
- using flags like this is poor style
- at the very least you could have aborted the loop as soon as the item
was found. On average (assuming equal distribution and a few other
things) you would have saved 50%.
- but most important: instead of an array use a data structure that is
meant to be searched ! And for Perl you don't even have to create your
own like e.g. an n-ary tree but Perl comes with hashes. Just put those
$aid-values into a hash and then you can use exists() to check in near
constant time if that element exists.

Using a constant check instead of the inner loop changes your algorithm
from square to linear, a gain that you cannot achive any other way.

jue
 
S

sln

I have a nested foreach loop and this is really killing the
performance. The inner foreach is using the value from the outer loop
to check if it is a valid value by comparing it against an array of
valid values. I removed the inner foreach to eliminate validation and
the performance improved from 12 hours to 3 hours for 7 million
records.
But I would like to have this validation check done by the inner
foreach loop.

This is how the code looks like. Its fairly simple but I am new to
perl.

open(WVHFILE, "<file.txt");
while (<WVHFILE>)
{
chomp;
($id, $pk, $an, $eid, $vv, $val, $et) = split(/\|/);
my @token = split(";", $val);
my %rst = ();
my $flg = 0;
my $ed = substr($et,0,11);

foreach $nvpair (@token)
{
$flg = 0;
my ( $attr_id, $attr_val) = split ("=", $nvpair);
foreach $aid (@attr_lst){
if ($attr_id == $aid){
$flg = 1;
}

}

if ($flg == 1) {
$rst{$attr_id} = $attr_val . "|" . $lst{$attr_id .
$attr_val}. "|";
}

}

Please send me an alternative if you know it. I would appreciate it.
Thanks.


Here are some possibilites that may enhance performance.

sln

------------------------------------------------
Untested ...


# id | pk | an | eid | vv | val | et
# id | pk | an | eid | vv | @token = (attid=val : attid=val : attid=val) | et
# @attr_lst = (attid, attid, attid)
# -----------------------------------------------------

my %hattr_lst = map {$_ => ''} @attr_lst;


open(WVHFILE, "<file.txt");

while (<WVHFILE>)
{
chomp;
my %rst = ();
my %hseen = ();

####### if you just need token-list field ->

my @token = (); # only if need token
while ( /\s*([^|:]+)\s*=\s*([^|:]+)\s*[:|]/g )
{
# $1 = token:attid
# $2 = token:val
push @token, $1; # only if need token

if (exists $hattr_list{$1})
{
$rst{$1} = $2 . "|" . $lst{$1 . $2}. "|";
$hseen{$1}++;
}
}
# what is done down here?

####### or, if you need all the fields ->

($id, $pk, $an, $eid, $vv, $val, $et) = split(/\|/);
my $ed = substr($et,0,11);
my @token = (); # only if need token

while ( $val =~ /\s*([^:]+)\s*=\s*([^:]+)\s*/g )
{
# $1 = token:attid
# $2 = token:val
push @token, $1; # only if need token

if (exists $hattr_list{$1})
{
$rst{$1} = $2 . "|" . $lst{$1 . $2}. "|";
$hseen{$1}++;
}
}
# what is done down here?

####### or, to fix up original code ->

($id, $pk, $an, $eid, $vv, $val, $et) = split(/\|/);
my @token = split(";", $val);
my %rst = ();
my $ed = substr($et,0,11);

foreach $nvpair (@token)
{
my ( $attr_id, $attr_val) = split ("=", $nvpair);

if (exists $hattr_list{$attr_id})
{
$rst{$attr_id} = $attr_val . "|" . $lst{$attr_id . $attr_val}. "|";
$hseen{$attr_id}++;
}
}
# what is done down here?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,071
Latest member
MetabolicSolutionsKeto

Latest Threads

Top