Strange speed-increase by separating "if"s

Discussion in 'Perl Misc' started by Wolfram Humann, Jul 21, 2005.

  1. I have a script for processing certain eps-files. What it basically does
    is going through the file looking for "setgray"-lines. If it finds one,
    it checks if it's followed by lines matching the values in @head and
    then @dot (plus some coordinate-checks). If all matches, @head remains
    in the file while @dot is discarded. If the match fails, the file
    remains unchanged. Here is the script:

    #!/usr/local/bin/perl -w
    use strict;

    my @head = (
    '^N(\s+\d+)(\s+\d+)(\s+\d+) 0 360 arc sf N$',
    '^\d+\s+slw$',
    );
    my @dot = (
    '^(\d+)\s+(\d+)\s+M$',
    ('^(\d+)\s+(\d+)\s+D$') x 72,
    );

    my @c_head = map qr/$_/, @head;
    my @c_dot = map qr/$_/, @dot;
    my ($x, $y, $r);

    while(<>)
    {
    print;
    if(/^[0-9.]+\s+setgray\s+$/)
    {
    my ($h, $d) = (0,0);
    my $l = "";
    while(<>)
    {
    if ($head[$h])
    {
    print;
    last unless $_ =~ $c_head[$h];
    ($x, $y, $r) = ($1, $2, $3) if defined $3;
    $h++;
    }
    elsif ($dot[$d])
    {
    $l .= $_;
    if ($_ !~ /$c_dot[$d]/ or
    $1 < $x - $r - 2 or
    $1 > $x + $r + 2 or
    $2 < $y - $r - 2 or
    $2 > $y + $r + 2 )
    {
    print $l;
    last;
    }
    $d++;
    }
    else
    {
    print $l if /^\d+\s+\d+\s+D$/;
    print;
    last;
    }
    }
    }
    }

    I was surprised how long it took and profiled with Devel::SmallProf.
    This showed that most time is spent in "if (($_ !~ /$c_dot[$d]/) or...".
    To see if the pattern match or the comparisons took so long, I split the
    "if" like this:

    if ($_ !~ /$c_dot[$d]/)
    {
    if ($1 < $x - $r - 2 or
    $1 > $x + $r + 2 or
    $2 < $y - $r - 2 or
    $2 > $y + $r + 2 )
    {
    print $l;
    last;
    }
    }

    To my surprise a test run (without the profiler) ran *at least* twice as
    fast as the original version. Also, the profiler says that "no time" is
    spent for the comparisons and the time for the pattern match dropped to
    one third of what is was before.

    Any explanation?
    Thanks,
    Wolfram
     
    Wolfram Humann, Jul 21, 2005
    #1
    1. Advertising

  2. Wolfram Humann

    Anno Siegel Guest

    Wolfram Humann <> wrote in comp.lang.perl.misc:
    > I have a script for processing certain eps-files. What it basically does
    > is going through the file looking for "setgray"-lines. If it finds one,
    > it checks if it's followed by lines matching the values in @head and
    > then @dot (plus some coordinate-checks). If all matches, @head remains
    > in the file while @dot is discarded. If the match fails, the file
    > remains unchanged. Here is the script:
    >
    > #!/usr/local/bin/perl -w
    > use strict;
    >
    > my @head = (
    > '^N(\s+\d+)(\s+\d+)(\s+\d+) 0 360 arc sf N$',
    > '^\d+\s+slw$',
    > );
    > my @dot = (
    > '^(\d+)\s+(\d+)\s+M$',
    > ('^(\d+)\s+(\d+)\s+D$') x 72,
    > );
    >
    > my @c_head = map qr/$_/, @head;
    > my @c_dot = map qr/$_/, @dot;
    > my ($x, $y, $r);
    >
    > while(<>)
    > {
    > print;
    > if(/^[0-9.]+\s+setgray\s+$/)
    > {
    > my ($h, $d) = (0,0);
    > my $l = "";
    > while(<>)
    > {
    > if ($head[$h])
    > {
    > print;
    > last unless $_ =~ $c_head[$h];
    > ($x, $y, $r) = ($1, $2, $3) if defined $3;
    > $h++;
    > }
    > elsif ($dot[$d])
    > {
    > $l .= $_;
    > if ($_ !~ /$c_dot[$d]/ or
    > $1 < $x - $r - 2 or
    > $1 > $x + $r + 2 or
    > $2 < $y - $r - 2 or
    > $2 > $y + $r + 2 )
    > {
    > print $l;
    > last;
    > }
    > $d++;
    > }
    > else
    > {
    > print $l if /^\d+\s+\d+\s+D$/;
    > print;
    > last;
    > }
    > }
    > }
    > }
    >
    > I was surprised how long it took and profiled with Devel::SmallProf.
    > This showed that most time is spent in "if (($_ !~ /$c_dot[$d]/) or...".
    > To see if the pattern match or the comparisons took so long, I split the
    > "if" like this:
    >
    > if ($_ !~ /$c_dot[$d]/)
    > {
    > if ($1 < $x - $r - 2 or
    > $1 > $x + $r + 2 or
    > $2 < $y - $r - 2 or
    > $2 > $y + $r + 2 )
    > {
    > print $l;
    > last;
    > }
    > }


    The logic of the modified part is different from the original, and
    makes no sense. You are making sure the regex *doesn't* match, and
    then go on to use $1 and $2. In the original, you use them when the
    regex *does* match. If you had warnings switched on, you'd have
    noticed.

    > To my surprise a test run (without the profiler) ran *at least* twice as
    > fast as the original version.


    Different control flow, different runtimes, so no surprise here.

    Make your program strict- and warnings-safe and profile again, preferably
    with programs that do the same thing. When benchmarking and profiling,
    an important step is to monitor your variants to make sure that you
    haven't built in a bug, as has happened to you.

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Jul 21, 2005
    #2
    1. Advertising

  3. -----Original Message-----
    From: Anno Siegel
    Sent: 21.07.2005 12:22
    > Wolfram Humann <> wrote in comp.lang.perl.misc:
    >
    >>I have a script for processing certain eps-files. What it basically does
    >>is going through the file looking for "setgray"-lines. If it finds one,
    >>it checks if it's followed by lines matching the values in @head and
    >>then @dot (plus some coordinate-checks). If all matches, @head remains
    >>in the file while @dot is discarded. If the match fails, the file
    >>remains unchanged. Here is the script:
    >>
    >>#!/usr/local/bin/perl -w
    >>use strict;
    >>
    >>my @head = (
    >> '^N(\s+\d+)(\s+\d+)(\s+\d+) 0 360 arc sf N$',
    >> '^\d+\s+slw$',
    >>);
    >>my @dot = (
    >> '^(\d+)\s+(\d+)\s+M$',
    >> ('^(\d+)\s+(\d+)\s+D$') x 72,
    >>);
    >>
    >>my @c_head = map qr/$_/, @head;
    >>my @c_dot = map qr/$_/, @dot;
    >>my ($x, $y, $r);
    >>
    >>while(<>)
    >>{
    >> print;
    >> if(/^[0-9.]+\s+setgray\s+$/)
    >> {
    >> my ($h, $d) = (0,0);
    >> my $l = "";
    >> while(<>)
    >> {
    >> if ($head[$h])
    >> {
    >> print;
    >> last unless $_ =~ $c_head[$h];
    >> ($x, $y, $r) = ($1, $2, $3) if defined $3;
    >> $h++;
    >> }
    >> elsif ($dot[$d])
    >> {
    >> $l .= $_;
    >> if ($_ !~ /$c_dot[$d]/ or
    >> $1 < $x - $r - 2 or
    >> $1 > $x + $r + 2 or
    >> $2 < $y - $r - 2 or
    >> $2 > $y + $r + 2 )
    >> {
    >> print $l;
    >> last;
    >> }
    >> $d++;
    >> }
    >> else
    >> {
    >> print $l if /^\d+\s+\d+\s+D$/;
    >> print;
    >> last;
    >> }
    >> }
    >> }
    >>}
    >>
    >>I was surprised how long it took and profiled with Devel::SmallProf.
    >>This showed that most time is spent in "if (($_ !~ /$c_dot[$d]/) or...".
    >>To see if the pattern match or the comparisons took so long, I split the
    >>"if" like this:
    >>
    >>if ($_ !~ /$c_dot[$d]/)
    >>{
    >> if ($1 < $x - $r - 2 or
    >> $1 > $x + $r + 2 or
    >> $2 < $y - $r - 2 or
    >> $2 > $y + $r + 2 )
    >> {
    >> print $l;
    >> last;
    >> }
    >>}

    >
    >
    > The logic of the modified part is different from the original, and
    > makes no sense. You are making sure the regex *doesn't* match, and
    > then go on to use $1 and $2. In the original, you use them when the
    > regex *does* match. If you had warnings switched on, you'd have
    > noticed.
    >

    Of course you're right. I had the idea that splitting an "if" in two
    parts like that was obvious (when indeed it was just stupid). I do have
    warnings and strict on, but with the file I use, the inner "if" is never
    hit...
    If my brain isn't totally drained already the correctly split version
    should be:

    if ($_ =~ /$c_dot[$d]/)
    {
    if ($1 < $x - $r - 2 or
    $1 > $x + $r + 2 or
    $2 < $y - $r - 2 or
    $2 > $y + $r + 2)
    {
    print $l;
    last;
    }
    }
    else
    {
    print $l;
    last;
    }

    Naturally, with this the timing is more or less back to what it was.
    According to profiling, the four comparisons take about as much time as
    the pattern match -- hence the speed increase when they never executed :)

    Thanks for the help!
     
    Wolfram Humann, Jul 21, 2005
    #3
  4. Wolfram Humann

    Big and Blue Guest

    Wolfram Humann wrote:
    t...
    > If my brain isn't totally drained already the correctly split version
    > should be:
    >
    > if ($_ =~ /$c_dot[$d]/)
    > {
    > if ($1 < $x - $r - 2 or
    > $1 > $x + $r + 2 or
    > $2 < $y - $r - 2 or
    > $2 > $y + $r + 2)
    > {
    >
    > Naturally, with this the timing is more or less back to what it was.
    > According to profiling, the four comparisons take about as much time as
    > the pattern match -- hence the speed increase when they never executed :)


    You are (probably) doing 4 calculations to work out constants.

    Might be quicker if you did:

    $x_less_r2 = $x - $r - 2;
    $x_plus_r2 = $x + $r + 2;
    $y_less_r2 = $y - $r - 2;
    $y_plus_r2 = $y + $r + 2;

    (having declared them in a suitable scope) as soon as you get x and y then
    change your test to use the calculated version. Of course, it depends on
    how many times the test is run for each head branch taken (and whether Perl
    optimizes this anyway).


    --
    Just because I've written it doesn't mean that
    either you or I have to believe it.
     
    Big and Blue, Jul 21, 2005
    #4
  5. -----Original Message-----
    From: Big and Blue
    Sent: 22.07.2005 00:22
    > Wolfram Humann wrote:
    > t...
    >
    >> If my brain isn't totally drained already the correctly split version
    >> should be:
    >>
    >> if ($_ =~ /$c_dot[$d]/)
    >> {
    >> if ($1 < $x - $r - 2 or
    >> $1 > $x + $r + 2 or
    >> $2 < $y - $r - 2 or
    >> $2 > $y + $r + 2)
    >> {
    >>
    >> Naturally, with this the timing is more or less back to what it was.
    >> According to profiling, the four comparisons take about as much time
    >> as the pattern match -- hence the speed increase when they never
    >> executed :)

    >
    >
    > You are (probably) doing 4 calculations to work out constants.
    >
    > Might be quicker if you did:
    >
    > $x_less_r2 = $x - $r - 2;
    > $x_plus_r2 = $x + $r + 2;
    > $y_less_r2 = $y - $r - 2;
    > $y_plus_r2 = $y + $r + 2;
    >
    > (having declared them in a suitable scope) as soon as you get x and y
    > then change your test to use the calculated version. Of course, it
    > depends on how many times the test is run for each head branch taken
    > (and whether Perl optimizes this anyway).
    >
    >

    Might be worth a try. The comparisons are done up to 73 times with the
    same x / y / r. Thanks for the suggestion!
     
    Wolfram Humann, Jul 22, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. anupam
    Replies:
    4
    Views:
    883
    Allan Herriman
    Sep 3, 2004
  2. Casey Hawthorne
    Replies:
    16
    Views:
    1,231
  3. RAYYILDIZ
    Replies:
    15
    Views:
    914
    Malte Starostik
    Mar 23, 2005
  4. Replies:
    5
    Views:
    370
  5. Darsant
    Replies:
    8
    Views:
    537
Loading...

Share This Page