Inviting suggestions for performance optimization

Y

Yash

The functionality required from my program is:
Text files containing around 2 million records in all, should be read,
and every record should be split into its component fields. All
records have their fields separated by commas.
After splitting a line into fields, around 20 in-memory lookups have
to be applied to obtain around 20 new fields.
The original line and the new fields appended to it, should be written
to a different file.
This has to be done within 5 minutes. The program would start just
once and process 2 million records every 5 minutes.

Given that the program has to be in Perl, would you offer any
suggestions regarding performance optimization so that the speed
requirements can be met?


Thanks
Yash
 
A

A. Sinan Unur

(e-mail address removed) (Yash) wrote in @posting.google.com:
Given that the program has to be in Perl, would you offer any
suggestions regarding performance optimization so that the speed
requirements can be met?

First, write something. Then make sure it is correct. Then measure how
much time it is taking. Then you'll know if you need to do anything about
speed. 5 minutes seems like a lot of time to me for just 2,000,000
records. Running the script below in the background on my Celeron 1 Ghz
laptop (Win XP) took around 3 minutes and 40 seconds including startup
_and_ the creation of the 80 Mb input file.

#! perl

use strict;
use warnings;

use File::Temp;

my $tmp = File::Temp->new();

for (1 .. 2_000_000) {
printf $tmp "%8.8d: %d %d %d %d %d\n",
some_number($_), some_number($_), some_number($_),
some_number($_), some_number($_), some_number($_);
}

$tmp->seek(0, 0);

my %hash;
while(my $line = $tmp->getline()) {
my ($obs, @fields) = split ' ', $line;
$hash{$obs} = \@fields;
}

sub some_number {
my ($n) = @_;
return $n*rand;
}

__END__
 
C

ctcgag

The functionality required from my program is:
Text files containing around 2 million records in all, should be read,
and every record should be split into its component fields. All
records have their fields separated by commas.
After splitting a line into fields, around 20 in-memory lookups have
to be applied to obtain around 20 new fields.
The original line and the new fields appended to it, should be written
to a different file.

OK, no problem. I can do that in 1.5 minutes. Of course, I'm looking
up all my fields in the same hash table, and it's a small hash table.
This has to be done within 5 minutes. The program would start just
once and process 2 million records every 5 minutes.

Given that the program has to be in Perl, would you offer any
suggestions regarding performance optimization so that the speed
requirements can be met?

Yes, I'd recommend writing an actual working program, and then, if
necessary, optimizing the actual working program.

Xho
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top