Improving performance of a simple Perl script

V

vfoley

Hello everyone,

I wrote a little script in Perl to make stats on a log file. You can
view the source at: http://pastebin.ca/104391 . When running on an
OpenBSD 3.9 machine (Pentium III 450 MHz, 128 MB RAM), with Perl 5.8.6,
the script takes 35 seconds to crunch through a file with 175,000
lines. Not bad. However, a Ruby script (http://pastebin.ca/104393)
that I wrote a while ago does the same file in 23 seconds. As far as I
know, the Perl interpreter is faster than Ruby, so I would be
interested to know how I could improve the execution speed of the Perl
script. I'm still very new, so I don't know all the idioms and such,
so any help would be greatly appreciated.

Also, when I profiled the Perl script, dprofpp's time percentages only
added up to like 5 or 10%. Why is that? Where is the remaining 90%?

Regards,

Vincent
 
A

axel

I wrote a little script in Perl to make stats on a log file. You can
view the source at: http://pastebin.ca/104391 . When running on an

Why were you unable to post it here? Why should we go and visit your
website to look at your script... expecting more hits for averts
perhaps?

Axel
 
J

John W. Krahn

I wrote a little script in Perl to make stats on a log file. You can
view the source at: http://pastebin.ca/104391 . When running on an
OpenBSD 3.9 machine (Pentium III 450 MHz, 128 MB RAM), with Perl 5.8.6,
the script takes 35 seconds to crunch through a file with 175,000
lines. Not bad. However, a Ruby script (http://pastebin.ca/104393)
that I wrote a while ago does the same file in 23 seconds. As far as I
know, the Perl interpreter is faster than Ruby, so I would be
interested to know how I could improve the execution speed of the Perl
script. I'm still very new, so I don't know all the idioms and such,
so any help would be greatly appreciated.

7 sub mktime {
8 my ($day, $month_name, $year) = @_;
9 my %months = (
10 Jan => 0,
11 Feb => 1,
12 Mar => 2,
13 Apr => 3,
14 May => 4,
15 Jun => 5,
16 Jul => 6,
17 Aug => 7,
18 Sep => 8,
19 Oct => 9,
20 Nov => 10,
21 Dec => 11
22 );
23 return timelocal(0, 0, 0, $day, $months{$month_name}, $year);
24 }

You should declare and define the %months hash outside the subroutine so it
isn't assigned everytime the subroutine is called.

Date::Calc may be faster than Time::Local.


49 if (exists($counts{$date})) {
50 unless (grep { $ip eq $_ } @{$counts{$date}}) {
51 push(@{$counts{$date}}, $ip);
52 }
53 } else {
54 $counts{$date} = [];
55 }

Using a Hash of Hashes would be faster than grepping through an array for
every $ip and perl autovivifies so the test for exists() is superfluous.


60 printf "%s\t%d\n", strftime("%d-%b-%Y", 0, 0, 0, $day, $month, $year),
61 scalar(@{$counts{$key}});

print() is safer and faster than printf() and you don't need printf() anyway.



John
 
V

vfoley

Thank you for your suggestions, John. Using a hash of hashes really
improved the performance, I went from 35-36 seconds to 6 seconds, a 6x
improvement. Thank you again!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,565
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top