Inviting suggestions for performance optimization

Discussion in 'Perl Misc' started by Yash, Jan 16, 2004.

  1. Yash

    Yash Guest

    The functionality required from my program is:
    Text files containing around 2 million records in all, should be read,
    and every record should be split into its component fields. All
    records have their fields separated by commas.
    After splitting a line into fields, around 20 in-memory lookups have
    to be applied to obtain around 20 new fields.
    The original line and the new fields appended to it, should be written
    to a different file.
    This has to be done within 5 minutes. The program would start just
    once and process 2 million records every 5 minutes.

    Given that the program has to be in Perl, would you offer any
    suggestions regarding performance optimization so that the speed
    requirements can be met?


    Thanks
    Yash
     
    Yash, Jan 16, 2004
    #1
    1. Advertising

  2. (Yash) wrote in news:5a373b1d.0401160546.5cfde844
    @posting.google.com:

    > Given that the program has to be in Perl, would you offer any
    > suggestions regarding performance optimization so that the speed
    > requirements can be met?


    First, write something. Then make sure it is correct. Then measure how
    much time it is taking. Then you'll know if you need to do anything about
    speed. 5 minutes seems like a lot of time to me for just 2,000,000
    records. Running the script below in the background on my Celeron 1 Ghz
    laptop (Win XP) took around 3 minutes and 40 seconds including startup
    _and_ the creation of the 80 Mb input file.

    #! perl

    use strict;
    use warnings;

    use File::Temp;

    my $tmp = File::Temp->new();

    for (1 .. 2_000_000) {
    printf $tmp "%8.8d: %d %d %d %d %d\n",
    some_number($_), some_number($_), some_number($_),
    some_number($_), some_number($_), some_number($_);
    }

    $tmp->seek(0, 0);

    my %hash;
    while(my $line = $tmp->getline()) {
    my ($obs, @fields) = split ' ', $line;
    $hash{$obs} = \@fields;
    }

    sub some_number {
    my ($n) = @_;
    return $n*rand;
    }

    __END__



    --
    A. Sinan Unur
    (reverse each component for email address)
     
    A. Sinan Unur, Jan 16, 2004
    #2
    1. Advertising

  3. "A. Sinan Unur" <> wrote in
    news:Xns94726383E64DFasu1cornelledu@132.236.56.8:

    > $tmp->seek(0, 0);


    Oooops ... missed the error message regarding this in the output. Not awake
    yet I guess (or still frozen).


    --
    A. Sinan Unur
    (reverse each component for email address)
     
    A. Sinan Unur, Jan 16, 2004
    #3
  4. Yash

    Guest

    (Yash) wrote:
    > The functionality required from my program is:
    > Text files containing around 2 million records in all, should be read,
    > and every record should be split into its component fields. All
    > records have their fields separated by commas.
    > After splitting a line into fields, around 20 in-memory lookups have
    > to be applied to obtain around 20 new fields.
    > The original line and the new fields appended to it, should be written
    > to a different file.


    OK, no problem. I can do that in 1.5 minutes. Of course, I'm looking
    up all my fields in the same hash table, and it's a small hash table.

    > This has to be done within 5 minutes. The program would start just
    > once and process 2 million records every 5 minutes.
    >
    > Given that the program has to be in Perl, would you offer any
    > suggestions regarding performance optimization so that the speed
    > requirements can be met?


    Yes, I'd recommend writing an actual working program, and then, if
    necessary, optimizing the actual working program.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service New Rate! $9.95/Month 50GB
     
    , Jan 16, 2004
    #4
  5. Yash

    Erik Tank Guest

    I would suggest that when you have a working program use
    Devel::SmallProf
    (http://search.cpan.org/~salva/Devel-SmallProf-1.15/SmallProf.pm)
    to show you the slowest parts of the code are.

    On 16 Jan 2004 05:46:23 -0800, (Yash) wrote:

    >The functionality required from my program is:
    >Text files containing around 2 million records in all, should be read,
    >and every record should be split into its component fields. All
    >records have their fields separated by commas.
    >After splitting a line into fields, around 20 in-memory lookups have
    >to be applied to obtain around 20 new fields.
    >The original line and the new fields appended to it, should be written
    >to a different file.
    >This has to be done within 5 minutes. The program would start just
    >once and process 2 million records every 5 minutes.
    >
    >Given that the program has to be in Perl, would you offer any
    >suggestions regarding performance optimization so that the speed
    >requirements can be met?
    >
    >
    >Thanks
    >Yash
     
    Erik Tank, Jan 16, 2004
    #5
  6. On 16 Jan 2004 14:46:58 GMT, "A. Sinan Unur" <> wrote:

    >for (1 .. 2_000_000) {

    ^^^^^^^^^

    Hey, I learned something new! Guess I had never read the relevant
    section of perldata before...


    Michele
    --
    # This prints: Just another Perl hacker,
    seek DATA,15,0 and print q... <DATA>;
    __END__
     
    Michele Dondi, Jan 18, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Suresh
    Replies:
    1
    Views:
    350
  2. Karthi
    Replies:
    0
    Views:
    274
    Karthi
    Oct 5, 2010
  3. Leonard Chin
    Replies:
    0
    Views:
    114
    Leonard Chin
    May 9, 2008
  4. Eric I.
    Replies:
    0
    Views:
    129
    Eric I.
    Sep 24, 2008
  5. F.R.
    Replies:
    0
    Views:
    119
Loading...

Share This Page