Inviting suggestions for performance optimization

Yash · Jan 16, 2004

The functionality required from my program is:
Text files containing around 2 million records in all, should be read,
and every record should be split into its component fields. All
records have their fields separated by commas.
After splitting a line into fields, around 20 in-memory lookups have
to be applied to obtain around 20 new fields.
The original line and the new fields appended to it, should be written
to a different file.
This has to be done within 5 minutes. The program would start just
once and process 2 million records every 5 minutes.

Given that the program has to be in Perl, would you offer any
suggestions regarding performance optimization so that the speed
requirements can be met?

Thanks
Yash

A. Sinan Unur · Jan 16, 2004

(e-mail address removed) (Yash) wrote in @posting.google.com:

Given that the program has to be in Perl, would you offer any
suggestions regarding performance optimization so that the speed
requirements can be met?

First, write something. Then make sure it is correct. Then measure how
much time it is taking. Then you'll know if you need to do anything about
speed. 5 minutes seems like a lot of time to me for just 2,000,000
records. Running the script below in the background on my Celeron 1 Ghz
laptop (Win XP) took around 3 minutes and 40 seconds including startup
_and_ the creation of the 80 Mb input file.

#! perl

use strict;
use warnings;

use File::Temp;

my $tmp = File::Temp->new();

for (1 .. 2_000_000) {
printf $tmp "%8.8d: %d %d %d %d %d\n",
some_number($_), some_number($_), some_number($_),
some_number($_), some_number($_), some_number($_);
}

$tmp->seek(0, 0);

my %hash;
while(my $line = $tmp->getline()) {
my ($obs, @fields) = split ' ', $line;
$hash{$obs} = \@fields;
}

sub some_number {
my ($n) = @_;
return $n*rand;
}

__END__

A. Sinan Unur · Jan 16, 2004

$tmp->seek(0, 0);

Oooops ... missed the error message regarding this in the output. Not awake
yet I guess (or still frozen).

ctcgag · Jan 16, 2004

The functionality required from my program is:
Text files containing around 2 million records in all, should be read,
and every record should be split into its component fields. All
records have their fields separated by commas.
After splitting a line into fields, around 20 in-memory lookups have
to be applied to obtain around 20 new fields.
The original line and the new fields appended to it, should be written
to a different file.

OK, no problem. I can do that in 1.5 minutes. Of course, I'm looking
up all my fields in the same hash table, and it's a small hash table.

This has to be done within 5 minutes. The program would start just
once and process 2 million records every 5 minutes.

Given that the program has to be in Perl, would you offer any
suggestions regarding performance optimization so that the speed
requirements can be met?

Yes, I'd recommend writing an actual working program, and then, if
necessary, optimizing the actual working program.

Xho

Erik Tank · Jan 16, 2004

I would suggest that when you have a working program use
Devel::SmallProf
(http://search.cpan.org/~salva/Devel-SmallProf-1.15/SmallProf.pm)
to show you the slowest parts of the code are.

Michele Dondi · Jan 18, 2004

for (1 .. 2_000_000) {

^^^^^^^^^

Hey, I learned something new! Guess I had never read the relevant
section of perldata before...

Michele

Company web-site (all perl CGI) re-design, suggestions?	2	Dec 17, 2012
suggestions for optimization loading of int array from disk	7	Apr 23, 2009
logic question for text file updates	11	Mar 26, 2010
DBI Performance Issues	6	Aug 25, 2006
Any suggestions for my programming style?	9	Dec 19, 2005
What are you suggestions for raising exceptions	1	Nov 7, 2008
Documentation suggestions	62	Dec 6, 2005
R-value references and performance	2	Oct 11, 2008

Inviting suggestions for performance optimization

Yash

A. Sinan Unur

A. Sinan Unur

ctcgag

Erik Tank

Michele Dondi

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads