Horror! Using goto statement in finite state machines?

websnarf · Feb 11, 2007

What's wrong with nested switch statements or tables of function pointers?

Well, nothing if you don't care about performance. You can use bubble
sorts too if you like. Have a blast.

Keith Thompson · Feb 11, 2007

Correct. The C language standard itself creates this problem, for
really, no good reason. The simplest way to improve the C standard
for finite state machines is to extend either the continue or goto
statements to be able to go to a specific case label. In that way a
FSM can be implemented inside of a switch statement without paying the
exorbitant cost of rerunning a literal switch operation on every state
transition. This kind of thinking is basically beyond what the C
standards committee is capable of conceiving.

[...]

Ignoring the gratuitous insults, there's no reason you can't have
labels within a switch statement.

switch (x) {
case FOO: FOO:
/* do some stuff */
break;
case BAR: BAR:
/* do some other stuff */
break;
/* etc. */
}

Wrap it in a macro if you like:

#define LABEL(name) case name: name:

Ian Collins · Feb 11, 2007

Well, nothing if you don't care about performance. You can use bubble
sorts too if you like. Have a blast.

Assuming the branch logic dominates over the state actions, which would
appear to be both a premature and speculative optimisation.

I may well be wrong, I've only ever implemented FSMs (using function
pointers) on CPUs without an instruction cache.

Care to post an example?

Christopher Layne · Feb 11, 2007

The equivalent of an extra branch miss *PER* state transition. On
today's processors that's about 15 clocks or so. Think about that in
terms of a parser implemented as a FSM, where you are expecting each
state to be processed in, say, 5 clocks or so plus, say, a randomly
predicted branch (so half of those 15 clocks) at about 8 clocks. So
by adding an extra mis-predicted branch, you've just halved your
performance (a direct goto has an amortized cost of about 1 clock; but
it can be less in optimal circumstances).

Allow me to do some of my own tests. I'm not disagreeing.

Look. A few years back here, someone dared to challenge me to a
direct competition on this very issue for implementing a CSV parser
right here in this very newsgroup. We both implemented it using
FSMs. After he finally figured out that fread() can be faster then
millions of fgetc()'s (I basically told him so, so that the
competition would not be a complete joke), the real competition began,
and I was still able to clean his clock. Not even James Antill was
able to post much of a threat. The two other "competitors" in this
little duel, each had their own agenda and were incentivised to beat
me (the first, as a matter of honor/pride, and the second to push his
own string library).

Why couldn't they beat me? Because I was the only one who implemented
the FSM with gotos. And that was just too "gross" for them to deal
with. But the performance difference is very measurable.

http://groups.google.com/group/comp...f0724d9630/d769b3cbee182484?#d769b3cbee182484

That one?

websnarf · Feb 11, 2007

Allow me to do some of my own tests. I'm not disagreeing.

Good man. Find out for yourself.

http://groups.google.com/group/comp.lang.c/browse_frm/thread/86a3ddf0...

That one?

That's actually a follow up involving Michael B. Allen. (It all
started when he made the ridiculous claim that the C std library could
not be beat in terms of performance on strings.)

This is the original thread that I was thinking of:

http://groups.google.com/group/comp.lang.c/browse_frm/thread/4fb3b05cfb3818d7/

websnarf · Feb 11, 2007

Assuming the branch logic dominates over the state actions, which would
appear to be both a premature and speculative optimisation.

Sure, sometimes 15 clocks doesn't matter. So your per-state
operations needs to be 300-1000 clocks (or about 600 to 3000 pipelined
x86 instructions) before you decide that state transition performance
isn't that important. (Just like bubble sort for 25 elements is
probably good enough.) Of course, its easy to find examples where the
the per-state operations are way less than that (any parser). I can't
think of a single counter example where the state transition is really
high like that off the top of my head though. Perhaps you can?

I may well be wrong, I've only ever implemented FSMs (using function
pointers) on CPUs without an instruction cache.

Care to post an example?

In my response to Christopher Layne you have direct examples of two
users implementing "for() switch()" based state machines, and myself
who made a goto dominated state machine implementation. You can go
download those and see for yourself. They complained that my
implementation was obfuscated, because of the gotos, but of course
that was as a cover for the fact that my solution comes out faster
precisely because of this.

Ian Collins · Feb 11, 2007

In my response to Christopher Layne you have direct examples of two
users implementing "for() switch()" based state machines, and myself
who made a goto dominated state machine implementation. You can go
download those and see for yourself. They complained that my
implementation was obfuscated, because of the gotos, but of course
that was as a cover for the fact that my solution comes out faster
precisely because of this.

OK, I will. I must admit I've never looked at using gotos for FSMs.
By the way, your sig appears to be missing the space after the --, so
some news readers (like Mozilla) don't pick it up.

Christopher Layne · Feb 11, 2007

In my response to Christopher Layne you have direct examples of two
users implementing "for() switch()" based state machines, and myself
who made a goto dominated state machine implementation. You can go
download those and see for yourself. They complained that my
implementation was obfuscated, because of the gotos, but of course
that was as a cover for the fact that my solution comes out faster
precisely because of this.

I'm checking out the code right now.

BTW: This line fails parsing:

test5.csv:"1234 West "Q" St. (for "Quantum", or "Quality")", 0

Whereas this line does not:

test6.csv:"1234 West ""Q"" St. (for ""Quantum"", or ""Quality"")", 0

Should that be the case?

websnarf · Feb 11, 2007

Correct. The C language standard itself creates this problem, for
really, no good reason. The simplest way to improve the C standard
for finite state machines is to extend either the continue or goto
statements to be able to go to a specific case label. In that way a
FSM can be implemented inside of a switch statement without paying the
exorbitant cost of rerunning a literal switch operation on every state
transition. This kind of thinking is basically beyond what the C
standards committee is capable of conceiving.

Click to expand...

[...]

Ignoring the gratuitous insults, there's no reason you can't have
labels within a switch statement.

switch (x) {
case FOO: FOO:
/* do some stuff */
break;
case BAR: BAR:
/* do some other stuff */
break;
/* etc. */
}

Wrap it in a macro if you like:

#define LABEL(name) case name: name:

Not a bad idea, assuming you have only one FSM in a given function
scope. Of course, you are still using gotos as well. (See, this is
what structured programming is all about -- no you won't find
"structured programming" in the C language standard ...)

websnarf · Feb 11, 2007

I'm checking out the code right now.

BTW: This line fails parsing:

test5.csv:"1234 West "Q" St. (for "Quantum", or "Quality")", 0

Whereas this line does not:

test6.csv:"1234 West ""Q"" St. (for ""Quantum"", or ""Quality"")", 0

Should that be the case?

Yes, I did have some examples that were not parsable for severe
conditions testing purposes. I'm one of these people who doesn't
think "Undefined Behaviour at the least provocation" is an acceptable
state of affairs.

Christopher Layne · Feb 11, 2007

Yes, I did have some examples that were not parsable for severe
conditions testing purposes. I'm one of these people who doesn't
think "Undefined Behaviour at the least provocation" is an acceptable
state of affairs.

Alright, I ended up taking it out of the large test file I created (which was
just a concatenation).

I cannot find a copy of Michael B. Allen's version, referenced in this post:

http://groups.google.com/group/comp...hread/4fb3b05cfb3818d7/?#doc_1878074cf0c7861d

Getting a 404 on his provided URL. Do you have a copy of it laying around?

Flash Gordon · Feb 11, 2007

Sure, sometimes 15 clocks doesn't matter. So your per-state
operations needs to be 300-1000 clocks (or about 600 to 3000 pipelined
x86 instructions) before you decide that state transition performance
isn't that important. (Just like bubble sort for 25 elements is
probably good enough.) Of course, its easy to find examples where the
the per-state operations are way less than that (any parser). I can't
think of a single counter example where the state transition is really
high like that off the top of my head though. Perhaps you can?

I've done a few state machines where in each state the software is
collecting information from a number of HW devices and displaying it
with the change of state being on a specific key press. I did not do
timings, but I would be surprised if the minimum possible execution time
for one state was below several thousand clock cycles.

Currently I have a finite state machine implemented in an XML based
language which I have implemented in C where each run of the state
machine is a minimum of 1 minute apart (I was allowed to make it 10
seconds, but since it did not need to be that "fast" I slowed it down to
a more sensible one minute).

I've also done a fair bit or programming on processors without cache
where a conditional branch is either 1 or 3 clock cycles (due to
pipeline break) depending on whether the branch is taken or not, so the
overhead is a lot lower.

In my response to Christopher Layne you have direct examples of two
users implementing "for() switch()" based state machines, and myself
who made a goto dominated state machine implementation. You can go
download those and see for yourself. They complained that my
implementation was obfuscated, because of the gotos, but of course
that was as a cover for the fact that my solution comes out faster
precisely because of this.

So you measured and the performance using gotos was better. It's not a
premature optimisation when you have measurements that say the
performance will be significantly better.

I would have no problem with a decently commented FSM implemented with
gotos where required for performance reasons. I also have no problem
with well commented hairy assembler where required (I've had to do it
myself where C code was measured to be nowhere near fast enough).

websnarf · Feb 11, 2007

Alright, I ended up taking it out of the large test file I created (which
was just a concatenation).

I cannot find a copy of Michael B. Allen's version, referenced in this
post:

http://groups.google.com/group/comp.lang.c/tree/browse_frm/thread/4fb...

Getting a 404 on his provided URL. Do you have a copy of it laying around?

Hmm ... no; though it doesn't surprise me that he chucked that code
since it was so bug ridden and slow. I found an update of his general
csv parser here:

http://www.ioplex.com/~miallen/libmba/dl/src/csv.c

I don't know if it fits in the same code hoist, however. He's still
making the same performance mistake though (its amazing how
obstinantly stupid people can be about performance.)

Guest · Feb 11, 2007

There's not really anything the compiler can do unless its a forever
for (as in for(;) and the compiler is willing and able to do some
really serious optimizations for switches (something I have never
seen.)

It's something at least both Intel's compiler and GCC do.

Serve Laurijssen · Feb 11, 2007

Keith Thompson said:
What definition would that be?

see Joe Wright's post ^^

Serve Laurijssen · Feb 11, 2007

Keith Thompson said:
True.

Of course, in the *worst* case the whole program is one function.
}

my psychic powers told me you'd say that!

Mark McIntyre · Feb 11, 2007

What definition would that be?

The definition contained within the statement. Thats what "by
definition" means.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Christopher Layne · Feb 11, 2007

Hmm ... no; though it doesn't surprise me that he chucked that code
since it was so bug ridden and slow. I found an update of his general
csv parser here:

http://www.ioplex.com/~miallen/libmba/dl/src/csv.c

I don't know if it fits in the same code hoist, however. He's still
making the same performance mistake though (its amazing how
obstinantly stupid people can be about performance.)

This code is still using fgetc().

Without his workable code, there's not much one can do. I wrangled it around
to use fread on one huge ass buffer because I wasn't going to refactor his
entire code.

Anyways, in both of your code, the jumping around with gotos, and in his case,
inside the switch isn't even the majority of the cpu bandwidth at all. In
your case, it was where you predicted it "performance critical path here."
Kcachegrind showed 22% or so of instruction fetches resting on just that one
line:

11.33% 178 for (; n < len; n++) { /* Performance critical path is here. */
22.53% 179 if (!charSet[x[n]]) {

In his case, it was obviously on the memset() and getc() taking the majority
of the time. After I took those out of the equation it was the ctype
functions as expected. While trying to make his code do *something*, I got it
to parse in around .900 sec, and yours .800 sec. But I wouldn't put any
weight on that whatsoever - because I cannot verify anything his code is
really doing. Although I know it is sitting inside that switch and changing
state/calling ctype functions/setting variables, etc. for atleast the number
of bytes in the huge file I fed it. In both cases I turned off any kind of
stdio output to reduce it's cpu time.

We really just need his improved code. But from what I can see, the bottleneck
isn't in the switch or gotos. Even with my chopped up version of his code
that was doing some kind of parsing inside the switch cascade, the switch
itself accounted for only around 4%.

Keith Thompson · Feb 11, 2007

Serve Laurijssen said:
see Joe Wright's post ^^

I had to go to Google to find it. If you had told me that Joe Wright
had written:

| Bertrand Russell said (among other things)..
| "All generalizations are false, including this one."

it would have saved me some time.

I think Voltaire said it first; it's also been attributed to Mark
Twain.

But in any case, it's an observation, not a definition.

Keith Thompson · Feb 11, 2007

Mark McIntyre said:
The definition contained within the statement. Thats what "by
definition" means.

I see no definition in the statement. If there is one, what term is
being defined?

Finite state machine in python	1	Sep 12, 2009
learning vhdl - state machines	17	Feb 12, 2009
How to implement linked Finite State Machines	0	Apr 18, 2004
Using BRAM in state machines	0	Sep 18, 2007
GOTO in Javascript	18	Jul 7, 2008
[ANN] Finite State Ma-thing 0.6	0	May 27, 2005
Sending Python statement over socket in chunks	2	Feb 11, 2008
[ANN] SMC - State Machine Compiler v. 4.2.1 for C	3	Sep 14, 2005

Horror! Using goto statement in finite state machines?

websnarf

Keith Thompson

Ian Collins

Christopher Layne

websnarf

websnarf

Ian Collins

Christopher Layne

websnarf

websnarf

Christopher Layne

Flash Gordon

websnarf

Guest

Serve Laurijssen

Serve Laurijssen

Mark McIntyre

Christopher Layne

Keith Thompson

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads