Hundreds of cases in a switch, optimization?

T

Tim Rentsch

Lawrence Kirby said:
On Mon, 20 Jun 2005 08:42:31 +0000, pete wrote:

...


It is likely that calling a function will have more overhead than
performing a jump.

I wouldn't necessarily disagree with that, but here is a data
point. Performance measurements on x86 (that I've done) show
that the cost of a 'switch()' with a dense case set (256 values,
say) is pretty much neck-and-neck with the cost of calling
through pointer-to-function to do the same thing. Depending on
the specifics, calling through pointer-to-function can be faster
than using 'switch()'.

Of course, that assumes other considerations aren't a factor, and
often they are - sometimes favoring using 'switch()', and
sometimes favoring using a function call.
 
C

Christian Bau

Jean-Claude Arbaut said:
And you have no guarantee a C compiler is smart enough to use a jump table.
That's not in the standard ;-) However, it is very likely to be smart
enough.

It is also quite likely not to generate a jump table if that is smarter.
The compiler will know how long conditional branches take, how long
branching using a branch table takes, and so on. What if you have
twohundred cases with wildly varying values so you cannot use a jump
table? Some implementations will actually use a hash function in that
case. Or they use if/else arranged in a binary tree.
 
C

Christian Bau

Tim Rentsch said:
I wouldn't necessarily disagree with that, but here is a data
point. Performance measurements on x86 (that I've done) show
that the cost of a 'switch()' with a dense case set (256 values,
say) is pretty much neck-and-neck with the cost of calling
through pointer-to-function to do the same thing. Depending on
the specifics, calling through pointer-to-function can be faster
than using 'switch()'.

If you use an old compiler on a Pentium 4, you might get very slow code
(if the compiler generates a table with offset in the same memory area
as the actual code; the processor basically throws up if you try to read
data and instructions from nearby memory locations).
 
J

Jean-Claude Arbaut

It is also quite likely not to generate a jump table if that is smarter.
The compiler will know how long conditional branches take, how long
branching using a branch table takes, and so on. What if you have
twohundred cases with wildly varying values so you cannot use a jump
table? Some implementations will actually use a hash function in that
case. Or they use if/else arranged in a binary tree.

Well, actually CBFalconer was talking about computed goto, and I completely
missed the point. He was kind enough not to correct me :)
 
K

Keith Thompson

Jean-Claude Arbaut said:
Not exactly. A switch has almost the same behaviour, but at assembly level,
(on my implementation) it will be translated as jumps, whereas functions
calls are translated as calls. It has some effect on the stack.
At the C level, there are also differences on scope I suppose.

I think he meant that a switch statement is essentially the same thing
as a computed goto. (It's a bit more limited (for example you can't
do a backward jump), but that's a good thing.)
 
I

icub3d

One way to tackle the problem may be to drill down in a binary search
tree type of manner. For example, Lets say you have cases 0-255. You
could do something like this:

switch (x/100) { /* Check the first digit and drill down. */
case 0:
switch (x/10 - (x/100) * 10) { /* check the second digit. */
case 0:
switch ... /* check the third digit. */
case 1:
... (similar to above)
case 2:
... (similar to above)
... (continue like this through 9)

case 1:
.... (similar to above)
case 2:
.... (similar to above)
default:
/* Error. */
}

So what you are basically doing is checking each digit to find the
value you want. This effectively means that each check on a 0-255 (or
to 999 for that matter) will only involve three checks. The only cases
where a single switch statement will be faster is if the first two
cases were selected most of the time.
 
D

Dik T. Winter

> So what you are basically doing is checking each digit to find the
> value you want. This effectively means that each check on a 0-255 (or
> to 999 for that matter) will only involve three checks. The only cases
> where a single switch statement will be faster is if the first two
> cases were selected most of the time.

Unless the switch is implemented with a jump table.
 
R

REH

icub3d said:
One way to tackle the problem may be to drill down in a binary search
tree type of manner. For example, Lets say you have cases 0-255. You
could do something like this:

switch (x/100) { /* Check the first digit and drill down. */
case 0:
switch (x/10 - (x/100) * 10) { /* check the second digit. */
case 0:
switch ... /* check the third digit. */
case 1:
... (similar to above)
case 2:
... (similar to above)
... (continue like this through 9)

case 1:
... (similar to above)
case 2:
... (similar to above)
default:
/* Error. */
}

So what you are basically doing is checking each digit to find the
value you want. This effectively means that each check on a 0-255 (or
to 999 for that matter) will only involve three checks. The only cases
where a single switch statement will be faster is if the first two
cases were selected most of the time.
You are assuming that a single switch statement does multiple checks. I
doubt any implementation does so. So, your "optimization" does three checks
(plus all the math), where as if you just let the compiler do the work,
there is only one.

REH
 
W

Walter Roberson

You are assuming that a single switch statement does multiple checks. I
doubt any implementation does so.

I just created a simple test program with four non-consequative
case labels. SGI's "cc" compiler and gcc generated different looking
code for the same program, but at key points, they both did multiple
comparisons and not just against the boundary values. [Oddly, for
both, the first pair of comparisions they did was against the
first case label and then against the second-largest case label -- not
against the largest.]
 
I

icub3d

Unfortunately, we can't determine ahead of time if the compiler will do
this work for us. As Walter suggested, two major compilers don't do
this optimization, so it's hard to assume that many others will.
 
R

REH

Walter Roberson said:
I just created a simple test program with four non-consequative
case labels. SGI's "cc" compiler and gcc generated different looking
code for the same program, but at key points, they both did multiple
comparisons and not just against the boundary values. [Oddly, for
both, the first pair of comparisions they did was against the
first case label and then against the second-largest case label -- not
against the largest.]
That seems odd and is very suprising to me (though admittingly I am not a
compiler writer--IANACR?). I think I do some research....

REH
 
J

Jean-Claude Arbaut

Unfortunately, we can't determine ahead of time if the compiler will do
this work for us. As Walter suggested, two major compilers don't do
this optimization, so it's hard to assume that many others will.

Just to compare: gcc4 -O4 on OSX

* with 4 consecutive cases: comparisons, and at most 2 jumps
* with 200 consecutive cases: jump table
 
D

David Mathog

He said:
Hi,

I just wrote a function that has over 200 "cases" wrapped in a "switch"
statement. I'm wondering if there are performance issues in such
implementation. Do I need to optimize it some way?

There are some applications that can't be done any other way but
frequently those 200 cases have some natural heirarchy to them and your
code may run better if you can make that heirarchy explicit to the
compiler. For instance, if 10% of the cases are "geometry operations"
and you index the cases by an integer, then by having an outer switch
which has a case for "geometry operations" (perhaps represented by
a bit in the index) it narrows down the internal switch to
only 20 cases. And so forth. My point being that it is a rare
program where there is not some fairly obvious way to cluster
the operations. If nothing else you could run the program for
a while, measure usage of each case, and then arrange the cases
such that the most frequently used cases are up at the top of
the switch.

Elsewhere in this thread someone mentioned using an array of pointers to
functions. That can be efficient _IF_ each of those functions runs for
a time significantly longer than the overhead of the function
call. However I've (unfortunately) often encountered code where
the "function" does nothing more than:

int blah(int a,int b){
if(a<b)return a;
return b;
}

or something equally trivial. That is, something which might
as well have been a define or just coded into the case
explicitly. Just to pour salt into the wound these are
invariably 3 levels down inside nested loops.

Regards,

David Mathog
(e-mail address removed)
 
L

Lawrence Kirby

A decent implementation will do so when that is appropriate.
I just created a simple test program with four non-consequative
case labels. SGI's "cc" compiler and gcc generated different looking
code for the same program, but at key points, they both did multiple
comparisons and not just against the boundary values. [Oddly, for
both, the first pair of comparisions they did was against the
first case label and then against the second-largest case label -- not
against the largest.]

There are some circumstances where a jump table isn't the best appraoch.
All you seem to be saying here is that the compilers are intelligent
enough to use a different approach when that is more appropriate.

The specific strategy could make sense if

1. There's an assumption that the first case listed is likely to be more
commonly taken than the others

2. Testing the 2nd to last nicely partitions the remaining cases: equal
and you've found a case, less and it just needs to test the 2nd case,
grerater and it just needs to test the 3rd case. OK I'm making some
assumptions about the ordering of values in the cases.

It looks like a good strategy.

Lawrence
 
P

pete

He said:
Oh that's definitely a great idea, I'll try, thanks.

I worked on an embedded project where several
CPU's were sending messages to each other.
The message handling functions
were called from an array of function pointers.
I think there were somewhere between 200 and 300
different types of messages.
Each message had a type number and that was used as the index.
That's the code that they had when I showed up;
I didn't think it was difficult to work with.
 
A

Alan Balmer

A decent implementation will do so when that is appropriate.
I just created a simple test program with four non-consequative
case labels. SGI's "cc" compiler and gcc generated different looking
code for the same program, but at key points, they both did multiple
comparisons and not just against the boundary values. [Oddly, for
both, the first pair of comparisions they did was against the
first case label and then against the second-largest case label -- not
against the largest.]

There are some circumstances where a jump table isn't the best appraoch.
All you seem to be saying here is that the compilers are intelligent
enough to use a different approach when that is more appropriate.

The specific strategy could make sense if

1. There's an assumption that the first case listed is likely to be more
commonly taken than the others

2. Testing the 2nd to last nicely partitions the remaining cases: equal
and you've found a case, less and it just needs to test the 2nd case,
grerater and it just needs to test the 3rd case. OK I'm making some
assumptions about the ordering of values in the cases.
I don't think any assumptions are needed - the compiler can reorder
the values any way it wants.
 
J

Jean-Claude Arbaut

I don't think any assumptions are needed - the compiler can reorder
the values any way it wants.

Of course, but it can do a much better job if it knows that one of the 4
possibilities will be chosen in 99% of cases. Then it's best to test for
that one first. Actually the reason why the compiler does not use a jump
table for only 4 values is probably to avoid a load. I you know your
processor very well, you can check with a "pipeline sketch" which solution
is best in worst case.
 
C

CBFalconer

Jean-Claude Arbaut said:
Of course, but it can do a much better job if it knows that one of
the 4 possibilities will be chosen in 99% of cases. Then it's best
to test for that one first. Actually the reason why the compiler
does not use a jump table for only 4 values is probably to avoid a
load. I you know your processor very well, you can check with a
"pipeline sketch" which solution is best in worst case.

Consider the following silly code fragment:

int ix;
....
switch (ix) {
case 1: foo(); break;
case 100: bar(); break;
case 1000: fie(); break;
default: fum();
}

At some point the code generator will probably consider a jump
table with 1000 entries, most of which point to fum(). Then a
smarter code generator will consider transforming the code to
something like:

if (1000 == ix) fie();
else if (100 == ix) bar();
else if (1 == ix) foo();
else fum();

However, if that code is written in the first place, and the
switched indices happen to be 1, 2, and 3, the code generator is
very unlikely to even consider a jump table. The optimum final
code probably lies somewhere between the extremes, and can be
discovered because the language insists that the cases be compile
time constants.

Note that I am talking about a code generator, not a compiler. The
compiler will have transformed the source into something that
allows the equivalence of the two sequences to be detected. The
code generator may well be common to a plethora of language
specific compilers, just as a compiler front end can feed a
multitude of machine specific code generators.

--
Some informative links:
http://www.geocities.com/nnqweb/
http://www.catb.org/~esr/faqs/smart-questions.html
http://www.caliburn.nl/topposting.html
http://www.netmeister.org/news/learn2quote.html
 
L

lawrence.jones

REH said:
You are assuming that a single switch statement does multiple checks. I
doubt any implementation does so.

Any reasonable implementation does multiple checks when it makes sense to do
so. There are lots of ways to handle switch statements:

1) A directly indexed jump table
3) A hash-function indexed jump table
2) A decision tree
4) Some combination of the above

Which is most efficient depends on the range and density of the case
labels and the characteristics of the hardware. The compiler should
know all of those things, portable code cannot know the last. As I
recall, DMR's original PDP-11 compiler used all of them 30 years ago, so
I would be very surprised if any modern compiler did worse.

-Larry Jones

Archaeologists have the most mind-numbing job on the planet. -- Calvin
 
F

forayer

David said:
Elsewhere in this thread someone mentioned using an array of pointers to
functions. That can be efficient _IF_ each of those functions runs for
a time significantly longer than the overhead of the function
call. However I've (unfortunately) often encountered code where
the "function" does nothing more than:

int blah(int a,int b){
if(a<b)return a;
return b;
}

or something equally trivial. That is, something which might
as well have been a define or just coded into the case
explicitly. Just to pour salt into the wound these are
invariably 3 levels down inside nested loops.
Although the point David makes is valid, consider a swap function.
Wouldn't it be more readable to use a funtion to do the swapping than
write out the code for the same in a switch-case structure? Also,
consider functions where you rely on previous values of local variables
(static variables) such as sequence generators? Or recursive functions?
(I am thinking of fibonacci as an example in both cases). These
functions are usually small, but are better off as functions. If you
must expand, why not just inline them (though u are at the complier's
mercy here). And again if the switch is something like:
switch(x) {
case 1: f1(); break;
case 2: f2(); break;
case ...
}
then, i think, function pointers serve the purpose better. No
additional jump table look-ups, or (if no jump table is used) multiple
comparisons.

cheers,
forayer
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top