Performance implications of using the Switch module

G

GreenLight

I hate looking at rows and rows of elsif statements as much as the
next person, but using the Switch module just doesn't cut it. I wrote:

use strict;
use warnings;
use Benchmark;
use Switch;

sub use_if() {
my ($tag, $value);
my @parsed;
my @tags = (
"TAG001^VALUE001", "TAG002^VALUE002", "TAG003^VALUE003",
"TAG004^VALUE004", "TAG005^VALUE005",
"TAG006^VALUE006", "TAG007^VALUE007", "TAG008^VALUE008",
"TAG009^VALUE009", "TAG010^VALUE010"
);
foreach my $next (@tags) {
($tag, $value) = ($next =~ /^(.*?)\^(.*?)$/);

if ($tag eq 'TAG001') { push @parsed, $value }
elsif ($tag eq 'TAG002') { push @parsed, $value }
elsif ($tag eq 'TAG003') { push @parsed, $value }
elsif ($tag eq 'TAG004') { push @parsed, $value }
elsif ($tag eq 'TAG005') { push @parsed, $value }
elsif ($tag eq 'TAG006') { push @parsed, $value }
elsif ($tag eq 'TAG007') { push @parsed, $value }
elsif ($tag eq 'TAG008') { push @parsed, $value }
elsif ($tag eq 'TAG009') { push @parsed, $value }
elsif ($tag eq 'TAG010') { push @parsed, $value }
else { die "Bad tag!" }
}
}

sub use_switch() {
my ($tag, $value);
my @parsed;
my @tags = (
"TAG001^VALUE001", "TAG002^VALUE002", "TAG003^VALUE003",
"TAG004^VALUE004", "TAG005^VALUE005",
"TAG006^VALUE006", "TAG007^VALUE007", "TAG008^VALUE008",
"TAG009^VALUE009", "TAG010^VALUE010"
);
foreach my $next (@tags) {
($tag, $value) = ($next =~ /^(.*?)\^(.*?)$/);

switch ($tag) {
case "TAG001" { push @parsed, $value }
case "TAG002" { push @parsed, $value }
case "TAG003" { push @parsed, $value }
case "TAG004" { push @parsed, $value }
case "TAG005" { push @parsed, $value }
case "TAG006" { push @parsed, $value }
case "TAG007" { push @parsed, $value }
case "TAG008" { push @parsed, $value }
case "TAG009" { push @parsed, $value }
case "TAG010" { push @parsed, $value }
else { die "Bad tag!" }
}
}
}

timethese (100000, {
"Using 'if'" => \&use_if,
"Using 'switch'" => \&use_switch
});

__END__

These subroutines adequately represent tasks performed thousands of
times per day at my client's site.
And the results:

Benchmark: timing 100000 iterations of Using 'if', Using 'switch'...
Using 'if': 17 wallclock secs (16.48 usr + 0.00 sys = 16.48 CPU) @
6066.49/s (n=100000)
Using 'switch': 153 wallclock secs (150.68 usr + 0.00 sys = 150.68
CPU) @ 663.68/s (n=100000)

Using "switch" was nearly an order of magnitude slower.

Now my real question: Does anyone know if the "forthcoming" Perl6
version (given/when, as described in "perldoc switch") will offer
better performance (that is, is anyone actually using any early
release and can comment upon the performance)?
 
S

Steven Kuo

I hate looking at rows and rows of elsif statements as much as the
next person, but using the Switch module just doesn't cut it. I wrote:

(snipped) ...
if ($tag eq 'TAG001') { push @parsed, $value }
elsif ($tag eq 'TAG002') { push @parsed, $value }
elsif ($tag eq 'TAG003') { push @parsed, $value }
elsif ($tag eq 'TAG004') { push @parsed, $value }
elsif ($tag eq 'TAG005') { push @parsed, $value }
elsif ($tag eq 'TAG006') { push @parsed, $value }
elsif ($tag eq 'TAG007') { push @parsed, $value }
elsif ($tag eq 'TAG008') { push @parsed, $value }
elsif ($tag eq 'TAG009') { push @parsed, $value }
elsif ($tag eq 'TAG010') { push @parsed, $value }
else { die "Bad tag!" }


(snipped) ...
switch ($tag) {
case "TAG001" { push @parsed, $value }
case "TAG002" { push @parsed, $value }
case "TAG003" { push @parsed, $value }
case "TAG004" { push @parsed, $value }
case "TAG005" { push @parsed, $value }
case "TAG006" { push @parsed, $value }
case "TAG007" { push @parsed, $value }
case "TAG008" { push @parsed, $value }
case "TAG009" { push @parsed, $value }
case "TAG010" { push @parsed, $value }
else { die "Bad tag!" }
}
....
timethese (100000, {
"Using 'if'" => \&use_if,
"Using 'switch'" => \&use_switch
});

__END__

These subroutines adequately represent tasks performed thousands of
times per day at my client's site.
And the results:

Benchmark: timing 100000 iterations of Using 'if', Using 'switch'...
Using 'if': 17 wallclock secs (16.48 usr + 0.00 sys = 16.48 CPU) @
6066.49/s (n=100000)
Using 'switch': 153 wallclock secs (150.68 usr + 0.00 sys = 150.68
CPU) @ 663.68/s (n=100000)

Using "switch" was nearly an order of magnitude slower.



I'd suggest that you use a hash instead. To me it's easier to read
and maintain:

{
# closure defined in this scope

my %good = map +('TAG' . sprintf("%03d", $_) => 1 ), ( 1 .. 10 );

sub use_hash {
my @parsed;
my @tags = (
"TAG001^VALUE001",
"TAG002^VALUE002",
"TAG003^VALUE003",
"TAG004^VALUE004",
"TAG005^VALUE005",
"TAG006^VALUE006",
"TAG007^VALUE007",
"TAG008^VALUE008",
"TAG009^VALUE009",
"TAG010^VALUE010",
);
foreach my $next (@tags) {
my ($tag, $value) = ($next =~ /^(.*?)\^(.*?)$/);
if ($good{$tag}) {
push @parsed, $value;
print "DEBUG: pushed $value into array \@parsed\n";
} else {
die "Bad tag ($tag)!";
}
}
}
}

use_hash();

I did not benchmark but it may be faster than what you've already
tried.
 
T

Tassilo v. Parseval

Also sprach GreenLight:
I hate looking at rows and rows of elsif statements as much as the
next person, but using the Switch module just doesn't cut it. I wrote:

use strict;
use warnings;
use Benchmark;
use Switch;

sub use_if() { [...]
if ($tag eq 'TAG001') { push @parsed, $value }
elsif ($tag eq 'TAG002') { push @parsed, $value }
elsif ($tag eq 'TAG003') { push @parsed, $value }
elsif ($tag eq 'TAG004') { push @parsed, $value }
elsif ($tag eq 'TAG005') { push @parsed, $value }
elsif ($tag eq 'TAG006') { push @parsed, $value }
elsif ($tag eq 'TAG007') { push @parsed, $value }
elsif ($tag eq 'TAG008') { push @parsed, $value }
elsif ($tag eq 'TAG009') { push @parsed, $value }
elsif ($tag eq 'TAG010') { push @parsed, $value }
else { die "Bad tag!" }
}
}

sub use_switch() { [...]
switch ($tag) {
case "TAG001" { push @parsed, $value }
case "TAG002" { push @parsed, $value }
case "TAG003" { push @parsed, $value }
case "TAG004" { push @parsed, $value }
case "TAG005" { push @parsed, $value }
case "TAG006" { push @parsed, $value }
case "TAG007" { push @parsed, $value }
case "TAG008" { push @parsed, $value }
case "TAG009" { push @parsed, $value }
case "TAG010" { push @parsed, $value }
else { die "Bad tag!" }
}
}
}

timethese (100000, {
"Using 'if'" => \&use_if,
"Using 'switch'" => \&use_switch
});

__END__

These subroutines adequately represent tasks performed thousands of
times per day at my client's site.
And the results:

Benchmark: timing 100000 iterations of Using 'if', Using 'switch'...
Using 'if': 17 wallclock secs (16.48 usr + 0.00 sys = 16.48 CPU) @
6066.49/s (n=100000)
Using 'switch': 153 wallclock secs (150.68 usr + 0.00 sys = 150.68
CPU) @ 663.68/s (n=100000)

Using "switch" was nearly an order of magnitude slower.

Now my real question: Does anyone know if the "forthcoming" Perl6
version (given/when, as described in "perldoc switch") will offer
better performance (that is, is anyone actually using any early
release and can comment upon the performance)?

Perl6 will have switch/case built into the language and it'll be part of
the Perl6's grammar. Perl5's switch, however, is done through cheating.
A source filter makes a preliminary run through your code at compile
time and translates it into code Perl5 can handle. Apparently, the
generated code is not very efficient. You can have a look at it by using
B::Deparse:

ethan@ethan:~$ perl -MSwitch -MO=Deparse
switch($var) {
case "bla" { print "bla\n" }
case "blu" { print "blu\n" }
}
^D
S_W_I_T_C_H: while (1) {
local $_S_W_I_T_C_H;
&Switch::switch($var);
if (&Switch::case('bla')) {
while (1) {
print "bla\n";
last S_W_I_T_C_H;
}
continue {
goto C_A_S_E_1;
}
last S_W_I_T_C_H;
C_A_S_E_1: ;
}
if (&Switch::case('blu')) {
while (1) {
print "blu\n";
last S_W_I_T_C_H;
}
continue {
goto C_A_S_E_2;
}
last S_W_I_T_C_H;
C_A_S_E_2: ;
}
}
continue {
last;
}

This is much slower than simple if/elsif/else chains because very
condition is handled through function calls (which itself are rather
slow in perl). Switch::case() then calls a codereference that does the
actual comparison, based upon the type of the 'case' condition. For a
simple string, as in your case, the codereference called from case()
looks like this:

$::_S_W_I_T_C_H =
sub { my $c_val = $_[0];
my $c_ref = ref $c_val;
return $s_val eq $c_val if $c_ref eq "";
return in([$s_val],$c_val) if $c_ref eq 'ARRAY';
return $c_val->($s_val) if $c_ref eq 'CODE';
return $c_val->call($s_val) if $c_ref eq 'Switch';
return scalar $s_val=~/$c_val/
if $c_ref eq 'Regexp';
return scalar $c_val->{$s_val}
if $c_ref eq 'HASH';
return;
};

It stands to reason that this has to be slow.

Tassilo
 
A

Anno Siegel

GreenLight said:
I hate looking at rows and rows of elsif statements as much as the
next person, but using the Switch module just doesn't cut it. I wrote:

use strict;
use warnings;
use Benchmark;
use Switch;

sub use_if() {
my ($tag, $value);
my @parsed;
my @tags = (
"TAG001^VALUE001", "TAG002^VALUE002", "TAG003^VALUE003",
"TAG004^VALUE004", "TAG005^VALUE005",
"TAG006^VALUE006", "TAG007^VALUE007", "TAG008^VALUE008",
"TAG009^VALUE009", "TAG010^VALUE010"
);
foreach my $next (@tags) {
($tag, $value) = ($next =~ /^(.*?)\^(.*?)$/);

if ($tag eq 'TAG001') { push @parsed, $value }
elsif ($tag eq 'TAG002') { push @parsed, $value }
elsif ($tag eq 'TAG003') { push @parsed, $value }
elsif ($tag eq 'TAG004') { push @parsed, $value }
elsif ($tag eq 'TAG005') { push @parsed, $value }
elsif ($tag eq 'TAG006') { push @parsed, $value }
elsif ($tag eq 'TAG007') { push @parsed, $value }
elsif ($tag eq 'TAG008') { push @parsed, $value }
elsif ($tag eq 'TAG009') { push @parsed, $value }
elsif ($tag eq 'TAG010') { push @parsed, $value }
else { die "Bad tag!" }
}
}

sub use_switch() {
my ($tag, $value);
my @parsed;
my @tags = (
"TAG001^VALUE001", "TAG002^VALUE002", "TAG003^VALUE003",
"TAG004^VALUE004", "TAG005^VALUE005",
"TAG006^VALUE006", "TAG007^VALUE007", "TAG008^VALUE008",
"TAG009^VALUE009", "TAG010^VALUE010"
);
foreach my $next (@tags) {
($tag, $value) = ($next =~ /^(.*?)\^(.*?)$/);

switch ($tag) {
case "TAG001" { push @parsed, $value }
case "TAG002" { push @parsed, $value }
case "TAG003" { push @parsed, $value }
case "TAG004" { push @parsed, $value }
case "TAG005" { push @parsed, $value }
case "TAG006" { push @parsed, $value }
case "TAG007" { push @parsed, $value }
case "TAG008" { push @parsed, $value }
case "TAG009" { push @parsed, $value }
case "TAG010" { push @parsed, $value }
else { die "Bad tag!" }
}
}
}

timethese (100000, {
"Using 'if'" => \&use_if,
"Using 'switch'" => \&use_switch
});

__END__

These subroutines adequately represent tasks performed thousands of
times per day at my client's site.

These only perform a single push as "payload" of the decision. Is that
a realistic representation of the program?
And the results:

Benchmark: timing 100000 iterations of Using 'if', Using 'switch'...
Using 'if': 17 wallclock secs (16.48 usr + 0.00 sys = 16.48 CPU) @
6066.49/s (n=100000)
Using 'switch': 153 wallclock secs (150.68 usr + 0.00 sys = 150.68
CPU) @ 663.68/s (n=100000)

Using "switch" was nearly an order of magnitude slower.

Appalling, isn't it? And yet, the result is misleading.

You say that these routines are performed thousands of times per day.
Let's be generous and say they're called 10_000 times. Your benchmark
has called them 100_000 times and lost about 150 seconds through the
use of "switch" (which over-estimates the loss, counting *all* of the
consumed time as overhead). So on 10_000 calls per day you waste a
total of 15 cpu-seconds, out of 24*60*60 = 86400 that are available.
Reason for concern? I think not.

Also, if you put a more realistic payload in the benchmarks, the result
is less dramatic. Assume the equivalent of "1 for 1 .. 10_000" instead
of a single "push". (My machine can run that 300 times per second,
so a few thousand times per day would still amount to very little.)
According to my benchmarks, this reduces the advantage of "if" over
"switch" to 7%. Again, no reason for much concern.

I, too, would like to see a light-weight switch statement, comparable
in execution speed to "if", but in many cases the Switch module will
still be adequate. If it isn't, there are a lot of ad-hoc alternatives
in Perl. Some of them, like dispatch tables, are blindingly fast where
applicable. Your branching on fixed strings looks like an invitation
to use a dispatch table.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top