Strange speed-increase by separating "if"s

W

Wolfram Humann

I have a script for processing certain eps-files. What it basically does
is going through the file looking for "setgray"-lines. If it finds one,
it checks if it's followed by lines matching the values in @head and
then @dot (plus some coordinate-checks). If all matches, @head remains
in the file while @dot is discarded. If the match fails, the file
remains unchanged. Here is the script:

#!/usr/local/bin/perl -w
use strict;

my @head = (
'^N(\s+\d+)(\s+\d+)(\s+\d+) 0 360 arc sf N$',
'^\d+\s+slw$',
);
my @dot = (
'^(\d+)\s+(\d+)\s+M$',
('^(\d+)\s+(\d+)\s+D$') x 72,
);

my @c_head = map qr/$_/, @head;
my @c_dot = map qr/$_/, @dot;
my ($x, $y, $r);

while(<>)
{
print;
if(/^[0-9.]+\s+setgray\s+$/)
{
my ($h, $d) = (0,0);
my $l = "";
while(<>)
{
if ($head[$h])
{
print;
last unless $_ =~ $c_head[$h];
($x, $y, $r) = ($1, $2, $3) if defined $3;
$h++;
}
elsif ($dot[$d])
{
$l .= $_;
if ($_ !~ /$c_dot[$d]/ or
$1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2 )
{
print $l;
last;
}
$d++;
}
else
{
print $l if /^\d+\s+\d+\s+D$/;
print;
last;
}
}
}
}

I was surprised how long it took and profiled with Devel::SmallProf.
This showed that most time is spent in "if (($_ !~ /$c_dot[$d]/) or...".
To see if the pattern match or the comparisons took so long, I split the
"if" like this:

if ($_ !~ /$c_dot[$d]/)
{
if ($1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2 )
{
print $l;
last;
}
}

To my surprise a test run (without the profiler) ran *at least* twice as
fast as the original version. Also, the profiler says that "no time" is
spent for the comparisons and the time for the pattern match dropped to
one third of what is was before.

Any explanation?
Thanks,
Wolfram
 
A

Anno Siegel

Wolfram Humann said:
I have a script for processing certain eps-files. What it basically does
is going through the file looking for "setgray"-lines. If it finds one,
it checks if it's followed by lines matching the values in @head and
then @dot (plus some coordinate-checks). If all matches, @head remains
in the file while @dot is discarded. If the match fails, the file
remains unchanged. Here is the script:

#!/usr/local/bin/perl -w
use strict;

my @head = (
'^N(\s+\d+)(\s+\d+)(\s+\d+) 0 360 arc sf N$',
'^\d+\s+slw$',
);
my @dot = (
'^(\d+)\s+(\d+)\s+M$',
('^(\d+)\s+(\d+)\s+D$') x 72,
);

my @c_head = map qr/$_/, @head;
my @c_dot = map qr/$_/, @dot;
my ($x, $y, $r);

while(<>)
{
print;
if(/^[0-9.]+\s+setgray\s+$/)
{
my ($h, $d) = (0,0);
my $l = "";
while(<>)
{
if ($head[$h])
{
print;
last unless $_ =~ $c_head[$h];
($x, $y, $r) = ($1, $2, $3) if defined $3;
$h++;
}
elsif ($dot[$d])
{
$l .= $_;
if ($_ !~ /$c_dot[$d]/ or
$1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2 )
{
print $l;
last;
}
$d++;
}
else
{
print $l if /^\d+\s+\d+\s+D$/;
print;
last;
}
}
}
}

I was surprised how long it took and profiled with Devel::SmallProf.
This showed that most time is spent in "if (($_ !~ /$c_dot[$d]/) or...".
To see if the pattern match or the comparisons took so long, I split the
"if" like this:

if ($_ !~ /$c_dot[$d]/)
{
if ($1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2 )
{
print $l;
last;
}
}

The logic of the modified part is different from the original, and
makes no sense. You are making sure the regex *doesn't* match, and
then go on to use $1 and $2. In the original, you use them when the
regex *does* match. If you had warnings switched on, you'd have
noticed.
To my surprise a test run (without the profiler) ran *at least* twice as
fast as the original version.

Different control flow, different runtimes, so no surprise here.

Make your program strict- and warnings-safe and profile again, preferably
with programs that do the same thing. When benchmarking and profiling,
an important step is to monitor your variants to make sure that you
haven't built in a bug, as has happened to you.

Anno
 
W

Wolfram Humann

-----Original Message-----
From: Anno Siegel
Sent: 21.07.2005 12:22
Wolfram Humann said:
I have a script for processing certain eps-files. What it basically does
is going through the file looking for "setgray"-lines. If it finds one,
it checks if it's followed by lines matching the values in @head and
then @dot (plus some coordinate-checks). If all matches, @head remains
in the file while @dot is discarded. If the match fails, the file
remains unchanged. Here is the script:

#!/usr/local/bin/perl -w
use strict;

my @head = (
'^N(\s+\d+)(\s+\d+)(\s+\d+) 0 360 arc sf N$',
'^\d+\s+slw$',
);
my @dot = (
'^(\d+)\s+(\d+)\s+M$',
('^(\d+)\s+(\d+)\s+D$') x 72,
);

my @c_head = map qr/$_/, @head;
my @c_dot = map qr/$_/, @dot;
my ($x, $y, $r);

while(<>)
{
print;
if(/^[0-9.]+\s+setgray\s+$/)
{
my ($h, $d) = (0,0);
my $l = "";
while(<>)
{
if ($head[$h])
{
print;
last unless $_ =~ $c_head[$h];
($x, $y, $r) = ($1, $2, $3) if defined $3;
$h++;
}
elsif ($dot[$d])
{
$l .= $_;
if ($_ !~ /$c_dot[$d]/ or
$1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2 )
{
print $l;
last;
}
$d++;
}
else
{
print $l if /^\d+\s+\d+\s+D$/;
print;
last;
}
}
}
}

I was surprised how long it took and profiled with Devel::SmallProf.
This showed that most time is spent in "if (($_ !~ /$c_dot[$d]/) or...".
To see if the pattern match or the comparisons took so long, I split the
"if" like this:

if ($_ !~ /$c_dot[$d]/)
{
if ($1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2 )
{
print $l;
last;
}
}


The logic of the modified part is different from the original, and
makes no sense. You are making sure the regex *doesn't* match, and
then go on to use $1 and $2. In the original, you use them when the
regex *does* match. If you had warnings switched on, you'd have
noticed.
Of course you're right. I had the idea that splitting an "if" in two
parts like that was obvious (when indeed it was just stupid). I do have
warnings and strict on, but with the file I use, the inner "if" is never
hit...
If my brain isn't totally drained already the correctly split version
should be:

if ($_ =~ /$c_dot[$d]/)
{
if ($1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2)
{
print $l;
last;
}
}
else
{
print $l;
last;
}

Naturally, with this the timing is more or less back to what it was.
According to profiling, the four comparisons take about as much time as
the pattern match -- hence the speed increase when they never executed :)

Thanks for the help!
 
B

Big and Blue

Wolfram Humann wrote:
t...
If my brain isn't totally drained already the correctly split version
should be:

if ($_ =~ /$c_dot[$d]/)
{
if ($1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2)
{

Naturally, with this the timing is more or less back to what it was.
According to profiling, the four comparisons take about as much time as
the pattern match -- hence the speed increase when they never executed :)

You are (probably) doing 4 calculations to work out constants.

Might be quicker if you did:

$x_less_r2 = $x - $r - 2;
$x_plus_r2 = $x + $r + 2;
$y_less_r2 = $y - $r - 2;
$y_plus_r2 = $y + $r + 2;

(having declared them in a suitable scope) as soon as you get x and y then
change your test to use the calculated version. Of course, it depends on
how many times the test is run for each head branch taken (and whether Perl
optimizes this anyway).
 
W

Wolfram Humann

-----Original Message-----
From: Big and Blue
Sent: 22.07.2005 00:22
Wolfram Humann wrote:
t...
If my brain isn't totally drained already the correctly split version
should be:

if ($_ =~ /$c_dot[$d]/)
{
if ($1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2)
{

Naturally, with this the timing is more or less back to what it was.
According to profiling, the four comparisons take about as much time
as the pattern match -- hence the speed increase when they never
executed :)


You are (probably) doing 4 calculations to work out constants.

Might be quicker if you did:

$x_less_r2 = $x - $r - 2;
$x_plus_r2 = $x + $r + 2;
$y_less_r2 = $y - $r - 2;
$y_plus_r2 = $y + $r + 2;

(having declared them in a suitable scope) as soon as you get x and y
then change your test to use the calculated version. Of course, it
depends on how many times the test is run for each head branch taken
(and whether Perl optimizes this anyway).
Might be worth a try. The comparisons are done up to 73 times with the
same x / y / r. Thanks for the suggestion!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top