A
ashutosh.gaur
Hi
I'm a perl newbie. I've been given the task of parsing through very
large (500MB) web server log files in an efficient manner. I need to
parse about 8 such files in parallel and create corresponding csv files
as output. This needs to be done every hour. In other words, the entire
parsing of about 8 files should complete well within 30 minutes. The
remaining 30 minutes are required for other database related activities
that need to be performed on the csv files generated by the perl
script.
Following is a snippet of my perl routine....
open(INFO, $in_file);
open(DAT, $out_file);
while (<INFO>) {
my ($host, $ident_user, $auth_user, $date, $time,
$time_zone, $method, $url, $protocol, $status, $bytes,
$referer, $agent);
($host, $ident_user, $auth_user, $date, $time,
$time_zone, $method, $url, $protocol, $status, $bytes,
$referer, $agent) =
/^(\S+) (\S+) (\S+) \[([^:]+)
\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.+?)
(\S+)" (\S+) (\S+) "([^"]+)" "([^"]+)"$/
or next;
my $decrypt_url = <decrypting subroutine> $url;
print DAT $host, $ident_user, $auth_user, $date, $time,
$time_zone, $method, $decrypt_url, $protocol, $status,
$bytes, $referer, $agent, "\n";
}
---------------------------------------------------------------------------------------------------
This script takes about 50 minutes to process all the 8 files. I need
some suggestions to improve the performance and bring the processing
time down.
The hardware is a good 8 ( 1.2GHz ) CPU machine with 8GB of memory.
This machine will be used solely for file processing and running one
more application (Informatica)
thanks
Ash
I'm a perl newbie. I've been given the task of parsing through very
large (500MB) web server log files in an efficient manner. I need to
parse about 8 such files in parallel and create corresponding csv files
as output. This needs to be done every hour. In other words, the entire
parsing of about 8 files should complete well within 30 minutes. The
remaining 30 minutes are required for other database related activities
that need to be performed on the csv files generated by the perl
script.
Following is a snippet of my perl routine....
open(INFO, $in_file);
open(DAT, $out_file);
while (<INFO>) {
my ($host, $ident_user, $auth_user, $date, $time,
$time_zone, $method, $url, $protocol, $status, $bytes,
$referer, $agent);
($host, $ident_user, $auth_user, $date, $time,
$time_zone, $method, $url, $protocol, $status, $bytes,
$referer, $agent) =
/^(\S+) (\S+) (\S+) \[([^:]+)
(\S+)" (\S+) (\S+) "([^"]+)" "([^"]+)"$/
or next;
my $decrypt_url = <decrypting subroutine> $url;
print DAT $host, $ident_user, $auth_user, $date, $time,
$time_zone, $method, $decrypt_url, $protocol, $status,
$bytes, $referer, $agent, "\n";
}
---------------------------------------------------------------------------------------------------
This script takes about 50 minutes to process all the 8 files. I need
some suggestions to improve the performance and bring the processing
time down.
The hardware is a good 8 ( 1.2GHz ) CPU machine with 8GB of memory.
This machine will be used solely for file processing and running one
more application (Informatica)
thanks
Ash