Hundreds of cases in a switch, optimization?

C

Christian Bau

"He Shiming said:
Oh that's definitely a great idea, I'll try, thanks.

Please. PLEASE. Think about it for a second.

Compilers have been doing this kind of stuff for decades.

If there were any simple method that you could use to replace your
switch statement and make it faster, then your compiler would use it.

Of course, you are always free to spend a few hours writing functions,
setting up an array of function pointers, and so on and so on and the
next guy who will have to maintain your code will be cursing you. Might
be you, anyway.
 
R

Richard Tobin

He Shiming said:
I just wrote a function that has over 200 "cases" wrapped in a "switch"
statement. I'm wondering if there are performance issues in such
implementation. Do I need to optimize it some way?

Only you can tell us that! Try it and see if it's fast enough.

Obviously your results will depend on the compiler. When I last
looked at this (about 15 years ago), most compilers would generate a
jump table if the values were consecutive or mostly consecutive, and
perhaps generate several jump tables if there were several ranges of
consecutive values. So try to use consecutive values.

If you need to squeeze every last ounce of speed out of it (if, say,
you are implementing some other language with a virtual machine) there
are a number of things that you might do.

- Put in dummy cases to make the values consecutive.

- Bear in mind that switch is defined to do nothing if none of the
cases match, so even with a consecutive range it has to test against
the maximum and minimum. I once hoped that the use of enum
values would help here, but it doesn't because you can assign
"incorrect" values to an enum. Using an unsigned type starting
at zero will let you avoid the minimum test. If you use all
256 values of an 8-bit type there will be no need to check at all.

- If your portability requirements don't prevent it, you could use
an extension such as gcc's which allows you to assign labels to
variables, so instead of using a switch you can use a goto. You
can also (portably) use a table of function pointers but the overhead
of a function call (including loss of local variables) may be worse
than the switch overhead.

- If all else fails, modify the generated assembly code.

And of course, you should only consider any of this if you really need
to!

-- Richard
 
C

CBFalconer

Jean-Claude Arbaut said:
<[email protected]> a écrit :
.... snip ...

<OT>
Please avoid "b" "n" "2" "ur" and such things.
They make reading rather inconvenient (at least for a french reader)
;-) </OT>

Or to an English reader. Their prime effects are to make the
writer appear childish, and to lessen the probability of anyone
reading his or her output.

--
Some useful references about C:
<http://www.ungerhu.com/jxh/clc.welcome.txt>
<http://www.eskimo.com/~scs/C-faq/top.html>
<http://benpfaff.org/writings/clc/off-topic.html>
<http://anubis.dkuug.dk/jtc1/sc22/wg14/www/docs/n869/> (C99)
<http://www.dinkumware.com/refxc.html> (C-library}
<http://gcc.gnu.org/onlinedocs/> (GNU docs)
 
M

Malcolm

He Shiming said:
I just wrote a function that has over 200 "cases" wrapped in a "switch"
statement. I'm wondering if there are performance issues in such
implementation. Do I need to optimize it some way?
Does the program run too slowly? Normally we are not too interested in how
fast a program executes until it takes more than about a tenth of a second
to respond, which is perceived by the users as instantaneous. If it takes a
second to respond, it may or may not be worth shaving off a tenth of a
second. If it takes ten minutes to respond, it probably is worth while
reducing that time to nine minutes.
In terms of generated machine code, how does hundreds of cases in a switch
differ from hundreds of if-elses? Do compilers or processors do any
optimization on such structured code? Do I need to worry about the
performance, usually?
Most compilers will try to generate a jump table for a large switch, which
is likely to be faster than an if .. else ladder. However sometimes you can
optimise. let's say you have a databse of millions of customers and are
doing a switch on the code for a title. Approximately half your customers
will be "Mr", about a third will probably be "Mrs", a lesser number "Miss"
and some "Ms" (depending on the characteristics of your customer base). You
will also have a sprinking of "Dr"s, "Sir"s, and if you are lucky a customer
you very much want to keep with the title "Queen".

Now if we arrange the if .. else ladder with "Mr" as the first condition,
"Mrs" as the second, and "Queen" as the last condition, then about half the
time if will evaluate only a single condition. Only once will it go all the
way down the ladder when it finds the title for the Elizabeth Windsor.

This could easily be faster than a switch, but may not be.

Generally you will only get a few percentage points improvement by this type
of micro-optimisation. This may be worth having, but it is usually the last
thing you should try.
 
F

forayer

Christian said:
Did you verify your understanding by measuring the actual speed or
looking at code that was generated? I don't think so.

step-by-step execution in a debugger works for me! and yes, the
switch-cases did jump.

forayer.
 
B

Ben Bacarisse

In terms of generated machine code, how does hundreds of cases in a switch
differ from hundreds of if-elses? Do compilers or processors do any
optimization on such structured code? Do I need to worry about the
performance, usually?

I got curious about this so I decided to see what one compiler did on one
architecture. I chose the simplest kind of test that would be the most
likely to optimised -- consecutive integers from 0 to N. I wrote a script
that made programs of this form:

#include <stdlib.h>

extern int f(int x, int nc);

void use_if_stmt(int n)
{
int x = 0;
while (n--) {
int v = f(n, <nc>);
if (v == 0) x++;
else if (v == 1) x++;
}
}

void use_switch_stmt(int n)
{
int x = 0;
while (n--) {
int v = f(n, <nc>);
switch (v) {
case 0: x++; break;
case 1: x++; break;
}
}
}

int main(int argc, char **argv)
{
int n = argc > 1 ? strtol(argv[1], NULL, 0) : 10000000;
use_if_stmt(n);
use_switch_stmt(n);
return 0;
}

where the number of cases (and "if" choices) could be varied (<nc> gets
replaced by this value as well). The function f could be one of "low",
"spread", or "high" and the resulting code was linked with the following:

int spread(int n, int nc) { return n % nc; }
int low (int n, int nc) { return 0; }
int high (int n, int nc) { return n + nc; }

to simulate the situation where the value to be matched was always the
first ("low"); evenly spread over the range of choices ("spread"); or
higher than the highest case ("high"). The resulting programs where
compiled both with (gcc -O2) and without optimisation. gprof was used to
compare the running times of the two functions "use_if_stmt" and
"use_switch_stmt". I simply calculated the ration of the "if" time
to the "switch" time, so a result of 1 means that they both took
the same time. A value of 5 means the if version took five times
as long as the switch version.

Without optimisation the results were:

N cases all low spread all high
2 0.8 1.1 0.8
20 0.8 1.7 5.5
200 0.9 8.1 66.4
2000 0.9 112.6 3042.0

almost exactly as I would have guessed. When the value being matched is
the first case, "if" is always faster. With 200 cases and all of the
value off the top of the case range, "if" takes 66 times longer.

What surprised me was the optimised numbers:

N cases all low spread all high
2 1.1 0.9 1.0
20 1.3 0.7 0.9
200 1.2 0.9 1.1
2000 0.8 1.1 1.0

I think that is quite impressive!

gcc (GCC) 3.4.3 on Pentium(R) M processor 1.70GHz
 
M

Martijn

I just wrote a function that has over 200 "cases" wrapped in a
"switch" statement. I'm wondering if there are performance issues in
such implementation. Do I need to optimize it some way?

In terms of generated machine code, how does hundreds of cases in a
switch differ from hundreds of if-elses? Do compilers or processors
do any optimization on such structured code? Do I need to worry about
the performance, usually?

If you know something about the range of values you are testing, you can a
binary search-like approach. Of course this system will not be beaten by a
jump table, and with nowadays pipes, caches and what-not, the misses by the
CPU might even make it less efficient, I haven't tested any of it. But what
I am implying, is something like this:

if ( n < SOMEVALUE )
{
switch ( n )
{
// cases go here
}
}
else
{
switch ( n )
{
// remaining cases go here
}
}

You can introduce multiple levels as desired. Of course this may or may not
be faster for your compiler. And it (may) hurt(s) readability a little bit
also.

And as always with optimizing: make sure it _needs_ to be optimized. If the
particular code is only executed rarely and performance is not an issue
otherwise (like for real-time software), it may not be worth the time and
trouble.

Good luck,
 
M

Martijn

In my experience, the biggest performance gains can be achieved by
finding the bits in your code that are not just slow but completely
braindamaged slow, and fixing them. Not the kind where you ask "what
is faster: Switch or If", but the kind where you ask yourself "how
could anyone be so stupid to write code that way? "

Learn to find these bits of code and learn to use tools to find them.

Would you care to suggest some? I am always on the look-out for
improvements... (if not in speed and resource consumption, in things like
maintainability).
 
C

Christian Bau

In my experience, the biggest performance gains can be achieved by
finding the bits in your code that are not just slow but completely
braindamaged slow, and fixing them. Not the kind where you ask "what
is faster: Switch or If", but the kind where you ask yourself "how
could anyone be so stupid to write code that way? "

Learn to find these bits of code and learn to use tools to find them.

Would you care to suggest some? I am always on the look-out for
improvements... (if not in speed and resource consumption, in things like
maintainability).[/QUOTE]

Well, lets see... Your compiler's documentation? Does it come with a
debugger, does that have documentation? Does it come with a profiler?
Does that have any documentation?
 
C

Christian Bau

"forayer said:
step-by-step execution in a debugger works for me! and yes, the
switch-cases did jump.

You are saying you use a compiler that will use 200 consecutive
comparisons and conditional branches for a switch statement with 200
cases?
 
C

Christian Bau

"Malcolm said:
Now if we arrange the if .. else ladder with "Mr" as the first condition,
"Mrs" as the second, and "Queen" as the last condition, then about half the
time if will evaluate only a single condition. Only once will it go all the
way down the ladder when it finds the title for the Elizabeth Windsor.

This could easily be faster than a switch, but may not be.

Hi Malcolm,

nowadays with good branch prediction and heavy penalties for incorrect
prediction in many processors, this becomes less significant. Most of
the cost is in _incorrectly predicted_ branches, and in a switch-like
situation, most often there will be exactly one incorrectly predicted
branch, no matter what you do. You can reduce the number of correctly
predicted branches in between, but they are less significant.
 
F

forayer

Christian said:
You are saying you use a compiler that will use 200 consecutive
comparisons and conditional branches for a switch statement with 200
cases?

no, i meant that the switch statement _didnt_ use 200 comparisions. it
jumped right to the appropriate case. such jumps do make sense for
switch-case structures, but how well can they be compared to an array
of function pointers (assuming that the cases can be easily indexed)?
 
M

Malcolm

Christian Bau said:
Hi Malcolm,

nowadays with good branch prediction and heavy penalties for incorrect
prediction in many processors, this becomes less significant. Most of
the cost is in _incorrectly predicted_ branches, and in a switch-like
situation, most often there will be exactly one incorrectly predicted
branch, no matter what you do. You can reduce the number of correctly
predicted branches in between, but they are less significant.
That's a good point.
The truth is that I hardly ever do any assembly programming these days, so
you tend to forget what the instruction set is like.
 
P

pete

Christian said:
Please. PLEASE. Think about it for a second.

Compilers have been doing this kind of stuff for decades.

If there were any simple method that you could use to replace your
switch statement and make it faster, then your compiler would use it.

Of course, you are always free to spend a few hours writing functions,
setting up an array of function pointers, and so on and so on and the
next guy who will have to maintain your code will be cursing you.
Might be you, anyway.

Why don't you like an array of function pointers?
 
J

Jean-Claude Arbaut

no, i meant that the switch statement _didnt_ use 200 comparisions. it
jumped right to the appropriate case. such jumps do make sense for
switch-case structures, but how well can they be compared to an array
of function pointers (assuming that the cases can be easily indexed)?

Well, in assembly it probably resembles an array of "jump" pointers...
If you want to emulate that in C, you just create an array of functions,
since computed goto is not available.
 
J

Jean-Claude Arbaut

Why don't you like an array of function pointers?

Function pointers are perfect, but I'd say "don't try to beat
the compiler unless there is a good reason". If the targeted compiler does
manage switches well, there is no reason IMHO to use function pointers here.
 
C

CBFalconer

J

Jean-Claude Arbaut

... snip ...


Yes it is. It just happens to be spelled 'switch'.

Not exactly. A switch has almost the same behaviour, but at assembly level,
(on my implementation) it will be translated as jumps, whereas functions
calls are translated as calls. It has some effect on the stack.
At the C level, there are also differences on scope I suppose.
 
J

Jean-Claude Arbaut

Not exactly. A switch has almost the same behaviour, but at assembly level,
(on my implementation) it will be translated as jumps, whereas functions
calls are translated as calls. It has some effect on the stack.
At the C level, there are also differences on scope I suppose.

And you have no guarantee a C compiler is smart enough to use a jump table.
That's not in the standard ;-) However, it is very likely to be smart
enough.
 
L

Lawrence Kirby

On Mon, 20 Jun 2005 08:42:31 +0000, pete wrote:

....
Why don't you like an array of function pointers?

It is likely that calling a function will have more overhead than
performing a jump.

Lawrence
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top