Horror! Using goto statement in finite state machines?

M

Mark McIntyre

I see no definition in the statement. If there is one, what term is
being defined?

Ok, I'll play: the statement "the one with unconditional jumps will
end up being faster" is a defintion of the behaviour of unconditional
jumps.

Why are you persisting at this point? I'm sure you knew precisely what
Serve meant, and it seems bizarre.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
K

Keith Thompson

Mark McIntyre said:
Ok, I'll play: the statement "the one with unconditional jumps will
end up being faster" is a defintion of the behaviour of unconditional
jumps.

Why are you persisting at this point? I'm sure you knew precisely what
Serve meant, and it seems bizarre.

I'll just drop this; there's no point in discussing it further.
 
T

Thad Smith

Eric said:
Rui Maciel wrote On 02/09/07 14:47,:
Ben said:
How about a switch statement or an array of function pointers?

As far as I can tell, to use the switch instead of the goto statement would
end up generating a nested mess inside continuous loops. [...]

If there's nothing going on during a state transition except
the transition itself, then switch-in-a-loop has no benefit that
I can see, aside from evading the wrath of the goto police. If
you just change `goto state42' to `state = 42', all you've done
is change the spelling.

However, lots of applications of state machines perform some
additional action at each transition: They consume the next input
character or emit the next output message or something of the
sort. In that setting, switch-in-a-loop is very attractive
because it provides convenient places for the "transitional"
actions, to wit, before and after the big switch block:

for (;;) {
/* "prefix" actions ... */
switch (state) {
...
}
/* "suffix" actions ... */
}

If both "prefix" and "suffix" are empty, though, there
seems little point to this structure.

Very well put, Eric.

Often, for me, the state machine is a subprocess used within a program.
There may be a large processing loop for the program, one item of
which is to call a function to run a particular state machine. In that
case, the state is a static variable. The function does a switch which
processes the state and possibly updates the state, then exits.
 
W

William Ahern

Correct. The C language standard itself creates this problem, for
really, no good reason. The simplest way to improve the C standard
for finite state machines is to extend either the continue or goto
statements to be able to go to a specific case label. In that way a
FSM can be implemented inside of a switch statement without paying the
exorbitant cost of rerunning a literal switch operation on every state
transition. This kind of thinking is basically beyond what the C
standards committee is capable of conceiving.

For awhile I got into the practice of pairing switch case's with
equivalently named labels:

struct foo {
struct { enum foo_states this, next; } state;
};

switch (foo->state.this) {
one:
foo->state.this = one;
case one:
...
foo->state.next = two;

do_something_asynchronously(&fsm_callback);

break;
two:
foo->state.this = two;
case two:
...
goto one;

break;
}


Ultimately I stopped and fell back to just using switch. With network
server development most of the switch cases end up initiating some
asynchronous operation which will need to call back into the function, so
the optimization just clutters things up. Though, being able to do `goto
(state.this = two)' would be nice.

Tail call optimization would be even nicer. One my of I/O libraries
implements a type of trampoline, otherwise the stack grows too big. Though
in that case the C compiler would really need to be smart since it's
mutually recursive among many functions.
 
W

William Ahern

On Sat, 10 Feb 2007 19:06:14 -0800, websnarf wrote:
That's actually a follow up involving Michael B. Allen. (It all
started when he made the ridiculous claim that the C std library could
not be beat in terms of performance on strings.)

This is the original thread that I was thinking of:

http://groups.google.com/group/comp.lang.c/browse_frm/thread/4fb3b05cfb3818d7/

Ridiculous? Michael's comments echo exactly my experience. I write network
servers day-in and day-out. The standard library functions work great
for strings, memcpy() in particular. snprintf() is also very useful, but
like Michael pointed out I often deal with string parsing in situ, using
FSMs. I've written stateful base64, base16, base32, URL
percent decoders/encoders. Instead of allocating memory willy-nilly, as
much as possible I write all of my code to deal with "strings" terms of
streams, not discrete objects. I've even written a streaming MIME parser.

Actually, if a committee wanted to standardize something they'd be better
off standardizing a vector structure (not unlike the bstring package, just
less distasteful). Like was discussed in the above thread, the problem
with C strings is that everybody and their grandmother has invented some
kind of wrapper which does the same thing using different names.

Personally, of late I often make use of the iovec structure from POSIX's
sys/uio.h. If I were to revamp string handling in C I'd make two modifications.

1) Add an svec structure. struct svec { unsigned char *sv_base;
size_t sv_len; }.

2a) Mandate that (char) was unassigned (very radical), or
2b) Mandate that (char) and (unsigned char) were interchangeable without
maddening signedness compiler warnings in GCC-4.1, and add a new type
uchar (typedef unsigned char uchar), 'cause I'm lazy.

Then, if I wanted to push my luck, I'd deprecate all the wide-character
interfaces, and include a small suite of functions for manipulating UTF-8
encoded Unicode strings. I might start by adding a struct uvec, something
like: struct uvec { uchar *uv_base, [a bunch of opaque members,
noticeably and intentionally missing uv_len] }.
 
C

Christopher Layne

William said:
Ridiculous? Michael's comments echo exactly my experience. I write network
servers day-in and day-out. The standard library functions work great
for strings, memcpy() in particular. snprintf() is also very useful, but

I'd shorten that to say that the mem* stdlib functions work great. The str*
functions are just about useless for network code.
Personally, of late I often make use of the iovec structure from POSIX's
sys/uio.h. If I were to revamp string handling in C I'd make two
modifications.

1) Add an svec structure. struct svec { unsigned char *sv_base;
size_t sv_len; }.

2a) Mandate that (char) was unassigned (very radical), or
2b) Mandate that (char) and (unsigned char) were interchangeable without
maddening signedness compiler warnings in GCC-4.1, and add a new type
uchar (typedef unsigned char uchar), 'cause I'm lazy.

BTW: I have an iovec based buffer tokenizer that utilizes a source,
source_len, delimiter array, and optional character class flags (the standard
ctypes) if you're interested. Approx 3M iovec/sec tokenization on a P
(celeron) without modifying any of the source buffer. Basically, it's like
this:

const char *delim = " \t\n\r";
struct iovec v[16]; /* or whatever you feel is appropriate */
struct iovec *vp;
c = vtok(v, NE(v), buf, buf_len, delim, VT_SPACE | VT_CNTRL);
/* or */
c = vtoka(&vp, 0, buf, buf_len, delim, VT_SPACE | VT_CNTRL);

v[0..c - 1].iov_base = token
v[0..c - 1].iov_len = token_len

Returns c (count of vectors) when either len has been consumed, or max iovecs
have been used. Last iovec is the tail if len has been consumed.
Then, if I wanted to push my luck, I'd deprecate all the wide-character
interfaces, and include a small suite of functions for manipulating UTF-8
encoded Unicode strings. I might start by adding a struct uvec, something
like: struct uvec { uchar *uv_base, [a bunch of opaque members,
noticeably and intentionally missing uv_len] }.

Absolutely do NOT remove the length.
 
W

William Ahern

William Ahern wrote:

BTW: I have an iovec based buffer tokenizer that utilizes a source,
source_len, delimiter array, and optional character class flags (the
standard ctypes) if you're interested. Approx 3M iovec/sec tokenization
on a P (celeron) without modifying any of the source buffer. Basically,
it's like this:

I have something similar (almost exact, actually), which i call splitv()
(from the example split() function included in the strtok man page on many
Unices). But, I never put in character class support; it just takes an
optional (1 << CHAR_BIT)-character map for selecting delimiters.

I'd still be interested to see yours. I had a flag in mine to return empty
fields, but it's broken, meaning something in my loop should be smarter.
I've never gone back to it because so far I've never wanted to have the
empty fields.
Then, if I wanted to push my luck, I'd deprecate all the wide-character
interfaces, and include a small suite of functions for manipulating UTF-8
encoded Unicode strings. I might start by adding a struct uvec, something
like: struct uvec { uchar *uv_base, [a bunch of opaque members,
noticeably and intentionally missing uv_len] }.

Absolutely do NOT remove the length.

Well, the notion is that given the history of C strings, the terminology
is overloaded. Is the "length" the "sizeof" of the object, where the units
are C's notion of bytes, or are we talking about the number of graphemes,
etc. I would remove something like uv_len and replace it with
two or more members or functions, each with more precise, less
ambiguous names. The problem with Unicode string processing is that it
violates many of the supposed intrinsic properties of strings that have
variously benefited and plagued programmers for years. Take the tokenizer
above, for instance. In written Thai there generally isn't any word or
symbol delimiters at all. Just one long string of syllables. To
tokenize you actually need a language dictionary, and you parse from right
to left trying to form words; when the next syllable couldn't
possible result in a legitimate word, you break. So with Thai it's an all
or nothing proposition; you either manipulate the text using a
sophisticated and capable interface, or you treat it as an opaque chunk of
memory. All this "wide-char" non-sense is completely and utterly useless.
We must permanently put to rest this practice in text processing
of conflating "characters" and "bytes" (indeed, at many levels remove the
notion of characters altogether; in one sense the usefulness rests solely
with parsing so-called human readable network protocols which employ
ASCII, but then that's not really "text").

Thus, this is also why I'd deprecate the wide-character support. It
fails to provide any sort of usefulness at the memory management level,
nor does it comprehensively solve the internationalization issue (heck,
technically if doesn't even allow you the privilege of conflating
characters and bytes if you so choosed, which will always be useful
for historical and technical reasons, as mentioned above). Also, the
whole notion of environment locale's is broken. But I digress. In any
event, I think the answer to both of those is to standardize on Unicode
strings, and a UTF-8 encoding specifically, so when you need to you can
fallback to more traditional string and memory management, while also
opening the door to sophisticated and comprehensive internationalized text
processing. This, I think, proceeds from and enhances the best qualities
of C. And if the idea of including such a large string interface with C was
too repugnant, I'd still implement everything else: rip out
wide-characters, fixed char signedness headaches, etc. They stand on
their own merits.
 
C

Christopher Layne

William said:
I have something similar (almost exact, actually), which i call splitv()
(from the example split() function included in the strtok man page on many
Unices). But, I never put in character class support; it just takes an
optional (1 << CHAR_BIT)-character map for selecting delimiters.

I'd still be interested to see yours. I had a flag in mine to return empty
fields, but it's broken, meaning something in my loop should be smarter.
I've never gone back to it because so far I've never wanted to have the
empty fields.

Give me a chance to clean it up some and finish up some remaining
functionality. And I hear you on the empty field, delimiters only, etc.
business.
Well, the notion is that given the history of C strings, the terminology
is overloaded. Is the "length" the "sizeof" of the object, where the units
are C's notion of bytes, or are we talking about the number of graphemes,

Length being purely bytes.
It's a structure member that should always be accessible to facilitate any
additional external handling of data.
Thus, this is also why I'd deprecate the wide-character support. It
fails to provide any sort of usefulness at the memory management level,
nor does it comprehensively solve the internationalization issue (heck,
technically if doesn't even allow you the privilege of conflating
characters and bytes if you so choosed, which will always be useful
for historical and technical reasons, as mentioned above). Also, the

Unicode, UTF-8, and UTF-16 and how they relate to C. I wonder if these will
ever be solved honestly.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top