efficient way to write multiple loops code

F

friend.05

Hi,

I am trying to analyze some data. I have big data files.

I have 3 different files in following format. ($file_1, $file_2,
$file_3)

ID | Time | IP | Code

Following is psuedo code which I am writing. I want to know another
efficient way to do same thing.

open(INFO_1,$file_1);
open(INFO_2,$file_2);
open(INFO_3,$file_3);

@file1_lines = <INFO_1>;
@file2_lines = <INFO_2>;
@file3_lines = <INFO_3>;

foreach $file1_line (@file1_lines)
{
@file1 = split('\|',$file1_line);

#some code

foreach $file2_line (@file2_lines)
{
@file2 = split('\|',$file2_line);

#some code

#if condition between File1 data and File2 data
{

#some code

foreach $file3_line (@file3_lines)
{
@file3 = split('\|',$file3_line);

#some code

#if condition

}

}


}


}



So I am going thorugh each data of file 1 and depending on if data is
present in file2 and again depending on some if condition I look for
that data in file3.


So each data of file1 will have to go through each data of file2 and
each data of file2 will have to go thorugh file3.

So this code is taking lot of time. I want some suggestion for
efficient code.

Can I use Hash Array (by reading file in hash array)



Thanks
 
X

xhoster

foreach $file1_line (@file1_lines)
{
@file1 = split('\|',$file1_line);
foreach $file2_line (@file2_lines)
{
@file2 = split('\|',$file2_line);
#if condition between File1 data and File2 data
....

So this code is taking lot of time. I want some suggestion for
efficient code.

Can I use Hash Array (by reading file in hash array)

Whether you can use a hash to speed this up depends on whether
"If condition between File1 data and File2 data" can be reduced
to (or protected by) fast hash look ups. We can't answer this for you
without knowing what the nature of that condition is.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
T

Tim Greer

I have 3 different files in following format. ($file_1, $file_2,
$file_3)

ID | Time | IP | Code

Following is psuedo code which I am writing. I want to know another
efficient way to do same thing.

open(INFO_1,$file_1);
open(INFO_2,$file_2);
open(INFO_3,$file_3);

@file1_lines = <INFO_1>;
@file2_lines = <INFO_2>;
@file3_lines = <INFO_3>;
So I am going thorugh each data of file 1 and depending on if data is
present in file2 and again depending on some if condition I look for
that data in file3.


So each data of file1 will have to go through each data of file2 and
each data of file2 will have to go thorugh file3.

So this code is taking lot of time. I want some suggestion for
efficient code.

Can I use Hash Array (by reading file in hash array)

The answer very much depends on what #some code is actually doing. Is
the data fixed in the files, what specific checks are you doing? Could
the data be anywhere in a file, inside of a line of data, or are you
trying to match lines from ^ start to $ end of line per file, or are
you doing some other type of processing?
 
F

friend.05

The answer very much depends on what #some code is actually doing.  Is
the data fixed in the files, what specific checks are you doing?  Could
the data be anywhere in a file, inside of a line of data, or are you
trying to match lines from ^ start to $ end of line per file, or are
you doing some other type of processing?
--
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!- Hide quoted text -

- Show quoted text -

I am checking data from a line not whole line.

I want to check if IP and Code of file1 is present in file2 and if it
is present in file2 then check again if it is there in file3.

I am doing all this processing analyze some data.

Let me know if still it is not clear.

Thanks.
 
T

Tim Greer

I am checking data from a line not whole line.

I want to check if IP and Code of file1 is present in file2 and if it
is present in file2 then check again if it is there in file3.

I am doing all this processing analyze some data.

Let me know if still it is not clear.

Thanks.

I'd personally just either create a hash key and value based on it, if
there's not a lot of data involved, and open the next file and check if
it exists that way, which you can check per line with a while loop
against file 2 and 3 (if needed), instead of reading all three files
into arrays. If the files are potentially large, you'll want to avoid
that because it'll read a lot of data into memory that wouldn't be
necessary. I'd open the first file, do a split on a while loop and
create a hash, close it and then open file 2 and do a while loop and
check to see if the hash key/val exists. If not, repeat for file 3.
There is probably a better way than that, but that's a generally better
idea off the top of my head with what you're attempting now.
 
G

Grant

Hi,

I am trying to analyze some data. I have big data files.

I have 3 different files in following format. ($file_1, $file_2,
$file_3)

ID | Time | IP | Code

Following is psuedo code which I am writing. I want to know another
efficient way to do same thing.

Who knows, without seeing your data and requirements, but I'll offer
this optimised database table loader as an example that follows all
the speedup clues from the camel book. Loads a ~100k record data
table followed by a ~250 record table on a slow 500MHz Celeron box
in about 3 seconds:
....
do_log("read: $indexfile");
open FILE, "< $indexfile" or do_die("$indexfile $!");
flock FILE, 1;
$ip2c_cn = 0;
while (<FILE>) {
next if /^$/; next if /^#/; next if /^junkview/; chomp;
( $ip2c_lo[++$ip2c_cn],
$ip2c_hi[$ip2c_cn],
$ip2c_cc[$ip2c_cn]
) = split /\s+/, $_;
}
close FILE;

do_log("read: $namesfile");
open FILE, "< $namesfile" or do_die("$namesfile $!");
flock FILE, 1;
%cc_name = ();
while (<FILE>) {
next if /^$/; next if /^#/; next if /^junkview/; chomp;
my ($cc, $name) = split /:/, $_;
$cc_name{$cc} = $name;
}
close FILE;
}

You can see that as far as possible you avoid useless processing of
irrelevant data, so plan on how to skip (with 'next') over sections
of your loop code rather than using 'if ... processing', avoid complex
regexps, don't chomp records that are about to be discarded.

From log file:
2008-10-07.21:28:17 - read: /etc/ip2cn-server.conf
2008-10-07.21:28:17 - read: /usr/local/share/ip2cn/ip2c-data
2008-10-07.21:28:20 - read: /usr/local/share/ip2cn/ip2c-names
2008-10-07.21:28:20 - listen: localhost:4743

Context: http://bugsplatter.id.au/ip2cn/ip2cn-server.txt

Grant.
 
S

sln

Hi,

I am trying to analyze some data. I have big data files.

I have 3 different files in following format. ($file_1, $file_2,
$file_3)

ID | Time | IP | Code

Following is psuedo code which I am writing. I want to know another
efficient way to do same thing.

open(INFO_1,$file_1);
open(INFO_2,$file_2);
open(INFO_3,$file_3);

@file1_lines = <INFO_1>;
@file2_lines = <INFO_2>;
@file3_lines = <INFO_3>;

foreach $file1_line (@file1_lines)
{
@file1 = split('\|',$file1_line);

#some code

foreach $file2_line (@file2_lines)
{
@file2 = split('\|',$file2_line);

#some code

#if condition between File1 data and File2 data
{

#some code

foreach $file3_line (@file3_lines)
{
@file3 = split('\|',$file3_line);

#some code

#if condition

}

}


}


}



So I am going thorugh each data of file 1 and depending on if data is
present in file2 and again depending on some if condition I look for
that data in file3.


So each data of file1 will have to go through each data of file2 and
each data of file2 will have to go thorugh file3.

So this code is taking lot of time. I want some suggestion for
efficient code.

Can I use Hash Array (by reading file in hash array)

Nobody knows the impact of any pseudo code, or what data that
it process is. There is no generalization to be sought.

The best you can do, through trial and error, is benchmark
it yourself:

use Benchmark ':hireswallclock';
my $t0 = new Benchmark;

{{{{ code block}}}

my $t1 = new Benchmark;
my $tdif = timediff($t1, $t0);
print STDERR "the code took:",timestr($tdif),"\n";

sln
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
(e-mail address removed)

Nobody else commented on that yet:
@file1_lines = <INFO_1>;
@file2_lines = <INFO_2>;
@file3_lines = <INFO_3>;

foreach $file1_line (@file1_lines)
{
@file1 = split('\|',$file1_line);
foreach $file2_line (@file2_lines)
{
@file2 = split('\|',$file2_line);

This split is done again and again, once per every line of INFO_1.
The result is going to be the same anyway. Better move it outside of
the loop

@file2_fields = map [split '\|', $_], @file2_lines;

if you have enough memory. Likewise for other stuff.

Hope this helps,
Ilya
 
S

sln

Nobody knows the impact of any pseudo code, or what data that
it process is. There is no generalization to be sought.

The best you can do, through trial and error, is benchmark
it yourself:

use Benchmark ':hireswallclock';
my $t0 = new Benchmark;

{{{{ code block}}}

my $t1 = new Benchmark;
my $tdif = timediff($t1, $t0);
print STDERR "the code took:",timestr($tdif),"\n";

sln

Well, if it were my code, I would know exactly how to do it without benchmarks.
But you don't know yourself it seams. Do you?
Instead, you post phoney PSEUDO code as if you know something, which you don't.
Yet put the burdon on the sucker who is stupid enough to respond to you.

Outta here... ignant

sln
 
F

friend.05

Well, if it were my code, I would know exactly how to do it without benchmarks.
But you don't know yourself it seams. Do you?
Instead, you post phoney PSEUDO code as if you know something, which you don't.
Yet put the burdon on the sucker who is stupid enough to respond to you.

Outta here... ignant

sln- Hide quoted text -

- Show quoted text -

Thanks to all for replying.

Can I use Hash even if I don't have unique key ? Because in my data I
need IP and Code which are not necessary to be unique.

Below is my code:

open(INFO_1,$file_1);
open(INFO_2,$file_2);
open(INFO_3,$file_3);


@file1_lines = <INFO_1>;
@file2_lines = <INFO_2>;
@file3_lines = <INFO_3>;


foreach $file1_line (@file1_lines)
{
@file1 = split('\|',$file1_line);
$file1_ip = $file[2];
$file2_code = $file[3];

foreach $file2_line (@file2_lines)
{
@file2 = split('\|',$file2_line);
$file2_ip = $file[2];
$file2_code = $file[3];

if($file1_ip eq $file2_ip)
{
$flag = 1;
if($file1_code eq $file2_code)
{
$r_flag = 0;

foreach $file3_line (@file3_lines)
{
@file3 = split('\|',$file3_line);
$file3_ip = $file[2];
$file3_code = $file[3];

if(($file1_ip eq $file3_ip) &&
($file1_code eq $file3_code))
{
#some flag
}

}
#depending on flag I increment some counter
}

}
}
#depending on flag I increment some counter
}
 
X

xhoster

Thanks to all for replying.

Can I use Hash even if I don't have unique key ?

Yes and no. Hashes only have unique keys, but the hash value for that
key can be an array or a hash, so effectively each key can have several
values.
Because in my data I
need IP and Code which are not necessary to be unique.

In what data structure are they not necessarily unique?
if($file1_ip eq $file2_ip)
{
$flag = 1;

$flag is not used elsewhere
if($file1_code eq $file2_code)

Since this if has no else, and has nothing done between its end and
the end of the previous if, they can be combined into one line

if($file1_ip eq $file2_ip and $file1_code eq $file2_code) {


{
$r_flag = 0;

foreach $file3_line (@file3_lines)
{
@file3 = split('\|',$file3_line);
$file3_ip = $file[2];
$file3_code = $file[3];

if(($file1_ip eq $file3_ip) &&
($file1_code eq $file3_code))
{
#some flag

Again with the pseudo-code? Let's say that that means "$r_flag=1;"
Since any execution of that beyond the first is meaningless, that mean
it doesn't matter if the same ip|code shows up more than once in
@file3_lines, so @file3_lines can be reduced to a simple hash instead

}

}
#depending on flag I increment some counter

Incrementing a counter of course is count-sensitive, unlike setting a flag.
So it might matter of the same ip|code shows up mutliple times in
@file2_lines. But that could probably be solved by storing the count
in the hash.

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
T

Tad J McClellan

open(INFO_1,$file_1);


You should always, yes *always*, check the return value from open():

open(INFO_1, $file_1) or die "could not open '$file_1' $!";

Even better, you should use the 3-arg form of open() and a lexical filehandle:

open my $INFO_1, '<', $file_1 or die "could not open '$file_1' $!";

@file1_lines = <INFO_1>;

@file1_lines = <$INFO_1>; # use the lexical filehandle


@file1 = split('\|',$file1_line);


A pattern match should *look like* a pattern match:

@file1 = split(/\|/,$file1_line);

$file1_ip = $file[2];
$file2_code = $file[3];


You can replace those 3 lines of code with 1 line using a
List Slice (see the "Slices" section in perldata.pod):

my($file1_ip, $file2_code) = (split /\|/, $file1_line)[2,3];

$flag = 1;


You should choose meaningful variable names.
 
F

friend.05

open(INFO_1,$file_1);

You should always, yes *always*, check the return value from open():

   open(INFO_1, $file_1) or die "could not open '$file_1' $!";

Even better, you should use the 3-arg form of open() and a lexical filehandle:

   open my $INFO_1, '<', $file_1 or die "could not open '$file_1' $!";
@file1_lines = <INFO_1>;

    @file1_lines = said:
         @file1 = split('\|',$file1_line);

A pattern match should *look like* a pattern match:

   @file1 = split(/\|/,$file1_line);
         $file1_ip = $file[2];
         $file2_code = $file[3];

You can replace those 3 lines of code with 1 line using a
List Slice (see the "Slices" section in perldata.pod):

   my($file1_ip, $file2_code) = (split /\|/, $file1_line)[2,3];
                       $flag = 1;

You should choose meaningful variable names.

Hey Tad,

Thanks for your help.

Can you also suggest some efficient way.

Since I am processing three files in loop. It is taking lot of time.
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Tad J McClellan
A pattern match should *look like* a pattern match:
@file1 = split(/\|/,$file1_line);

In general, I do not agree. A split on a constant string WITHOUT
METACHARS should better be written as a split on string. However, in
this particular case, it is better to use something looking as a REx.

However, do you really find /\|/ very esthetic? ;-) Can it be
written better than m'\|'?

Yours,
Ilya
 
T

Tad J McClellan

Ilya Zakharevich said:
[A complimentary Cc of this posting was sent to
Tad J McClellan
A pattern match should *look like* a pattern match:
@file1 = split(/\|/,$file1_line);

In general, I do not agree. A split on a constant string WITHOUT
METACHARS should better be written as a split on string.


I like that idea enough that I may actually change my preference...

However, in
this particular case, it is better to use something looking as a REx.

However, do you really find /\|/ very esthetic? ;-)


No. In this case, the nature of the beast precludes anything esthetic. :-(

Can it be
written better than m'\|'?


That is not too objectionable, though I kinda like /\Q|/
 
F

friend.05

Ilya Zakharevich said:
[A complimentary Cc of this posting was sent to
Tad J McClellan
         @file1 = split('\|',$file1_line);
A pattern match should *look like* a pattern match:
   @file1 = split(/\|/,$file1_line);
In general, I do not agree.  A split on a constant string WITHOUT
METACHARS should better be written as a split on string.  

I like that idea enough that I may actually change my preference...
However, in
this particular case, it is better to use something looking as a REx.
However, do you really find /\|/ very esthetic?  ;-)  

No. In this case, the nature of the beast precludes anything esthetic. :-(
Can it be
written better than m'\|'?

That is not too objectionable, though I kinda like /\Q|/


Thanks to all.

I tried to read file in array and use @file1_lines = <INFO_P>;

But I my results are getting changed.

Below is my FULL Code which I am using.

Please suggest something to make it run more fast.


#!/usr/local/bin/perl

$p_file = "out_p.txt";
$s_file = "out_s.txt";
$r_file = "out_r.txt";

open(INFO_P,$p_file);
open(INFO_S,$s_file);
open(INFO_R,$r_file);

@p_lines = <INFO_P>;
@s_lines = <INFO_S>;
@r_lines = <INFO_R>;

$fail_flag = 1;
$p_slow = 0;
$p_fail = 0;
$r_robin = 0;
$s_as_p = 0;

foreach $s_line (@s_lines)
{
@sec = split('\|',$s_line);
$s_cli_ip = $sec[4];
$s_ser_ip = $sec[5];
$s_id = $sec[7];

$r_robin_flag = 1;
$flag = 0;

foreach $p_line (@p_lines)
{

@pri = split('\|',$p_line);
$p_cli_ip = $pri[4];
$p_ser_ip = $pri[5];
$p_id = $pri[7];

if($s_cli_ip eq $p_cli_ip)
{
$flag = 1;

if($s_id eq $p_id)
{
$r_robin_flag = 0;
$s_res_first = 0;
$p_res_first = 0;
foreach $r_line (@r_lines)
{
@res = split('\|',$r_line);
$r_cli_ip = $res[4];
$r_ser_ip = $res[5];
$r_id = $res[7];


if(($s_cli_ip eq $r_cli_ip) && ($s_id eq $r_id))
{
if($r_ser_ip eq $s_ser_ip)
{
#chk if pri_res_first
if($p_res_first eq '0'){
$slow = 1;
$s_res_first = 1;
}

}elsif($r_ser_ip eq $p_ser_ip){
$fail_flag = 0;
if($s_res_first){
#$slow = 1;
}else{
#$s_res_first = 0;
$p_res_first = 1;
}
}

}
if($p_res_first){
$primary++ ;
last;
}
}
if($fail_flag){
$primary_fail++;

}elsif($slow){
$slow = 0;
$fail_flag = 1;
$primary_slow++;
}
last;
}

}

}
if($flag == 0){
$s_as_p++;
}elsif($r_robin_flag){
$r_robin++;
}
}

close(INFO_P);
close(INFO_S);
close(INFO_R);
 
X

xhoster

Below is my FULL Code which I am using.

It is better to post real but simplified code. Especially when
your full code is so inscrutable.

open(INFO_P,$p_file);
open(INFO_S,$s_file);
open(INFO_R,$r_file);

@p_lines = <INFO_P>;
@s_lines = <INFO_S>;

As several people have said, you should use lexical file handles, you
should check the status of the open, and you should use strict.

@r_lines = <INFO_R>;

The innermost for loop doesn't seem to do anything except for when the
if statement if(($s_cli_ip eq $r_cli_ip) && ($s_id eq $r_id))
is satisfied. Thus, that loop can be reduced to loop over only
those lines of INFO_R that will cause the above to be true. Build a hash
of arrays that segregates lines according to $r_cli_ip and $r_id.

# for reasons to be seen later:
my %r_lines;
while (<INFO_R>) {
my @res = split('\|',$r_line);
my $r_cli_ip = $res[4];
my $r_ser_ip = $res[5];
my $r_id = $res[7];
push @{$r_lines{"$r_cli_ip|$r_id"}}, $_;
};


Then replace

foreach $r_line (@r_lines)

with

foreach $r_line (@{$r_lines{"$s_cli_ip|$s_id"}})

The same strategy could perhaps be employed in the middle foreach loop
as well. If I understood the motivation of your code, I might be able
to make it much simpler, but since I don't I'll stick the "minimal possible
changes" approach.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
T

Tad J McClellan

open(INFO_P,$p_file);


You should always, yes *always*, check the return value from open():

open(INFO_P, $p_file) or die "could not open '$p_file' $!";

Even better, you should use the 3-arg form of open() and a lexical filehandle:

open my $INFO_P, '<', $p_file or die "could not open '$p_file' $!";

@p_lines = <INFO_P>;


$flag = 1;


You should choose meaningful variable names.
 
D

Dr.Ruud

(e-mail address removed) schreef:
I am trying to analyze some data. I have big data files.

I have 3 different files in following format. ($file_1, $file_2,
$file_3)

Numbered variable names are a red flag. Normally you are better off
using a different data structure, like an array or a hash.

use strict;
use warnings;

use Data::Dumper;
$Data::Dumper::Sortkeys = $Data::Dumper::Indent =
$Data::Dumper::Terse = 1;

my %data;
my @filenames = qw/a b x/;
for my $fn (@filenames) {
open my $fh, "<", $fn or die "open $fn: $!";
while ( <$fh> ) {
my ($ip, $code) = (split m'\|')[2,3];
push @{$data{$ip}{$code}}, "$fn:$.";
}
}
print Dumper( \%data );
__END__

(untested)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top