Command language parsing - how formal to get?

Chris Carlen · Aug 10, 2007

Hi:

Having completed enough serial driver code for a TMS320F2812
microcontroller to talk to a terminal, I am now trying different
approaches to command interpretation.

I have a very simple command set consisting of several single letter
commands which take no arguments. A few additional single letter
commands take arguments:

Command language (somewhat generalized):

command argument(s) description

a do action_a
b do action_b
c do action_c
...
n do action_n
...
y [var_name] display variable(s)
z var_name value set variable to value

where var_name is a variable identifier. The 'd' command lets one
display the value of a variable identified by var_name. If no var_name
is provide, this command will display all variables. The 'e' command
lets one set the value of the variable identified by var_name to value.

Presently I am using the simplest yet least flexible approach to parse
commands. I scan and lexically analyze on the fly, and use if, else if,
else logic blocks to parse. (I know, pathetic.) This approach requires
one if, else if, else block for each field of the command language.

Next I plan to split the lexical analysis to a first-pass, and create a
list of tokens, with additional processing such as converting keywords
and variable names to keys, and numeric text fields to machine numbers.
This will eliminate the scanning/lexing on the fly to make the
parsing code much more readable.

Then a second pass of parsing could still be done using if, else if,
else it seems in the case that a language has a fixed and relatively
small maximum number of fields per command.

The question is then, for a language similar to the above and
considering these factors:

1. need to implement in an embedded microcontroller in not more than a
few K of program memory
2. not a particularly high speed of parsing is required. Ccommands will
be issued intermittently over a slow interface of perhaps 115.2kbps.
3. ease of understanding and adapting the code to changes in the command
set is very valuable. Most likely additions would be a command to tell
the thing to accept a binary data stream representing a file/table to be
stored in the uC RAM.

Is it common to hardcode parsing with if, else if, else blocks or at
what point does one decide to use more generalized techniques such as
formalizing the language description in BNR form, and using lexer/parser
generation tools such as lex/yacc ?

I have always found the topic of parsers/interpreters/compilers
fascinating, and would like to spend considerable time learning more
about this. Yet I also have to "just get the thing to work." I feel
like I will ultimately need to perform quite a few experiments with
different parsing techniques to get a feel for the best method to choose
for this and future applications.

Thanks for input.

--
Good day!

________________________________________
Christopher R. Carlen
Principal Laser&Electronics Technologist
Sandia National Laboratories CA USA
(e-mail address removed)
NOTE, delete texts: "RemoveThis" and
"BOGUS" from email address to reply.

Jonathan Kirwan · Aug 10, 2007

Having completed enough serial driver code for a TMS320F2812
microcontroller to talk to a terminal, I am now trying different
approaches to command interpretation.

I have a very simple command set consisting of several single letter
commands which take no arguments. A few additional single letter
commands take arguments:

Command language (somewhat generalized):

command argument(s) description

a do action_a
b do action_b
c do action_c
...
n do action_n
...
y [var_name] display variable(s)
z var_name value set variable to value

where var_name is a variable identifier. The 'd' command lets one
display the value of a variable identified by var_name. If no var_name
is provide, this command will display all variables. The 'e' command
lets one set the value of the variable identified by var_name to value.

Presently I am using the simplest yet least flexible approach to parse
commands. I scan and lexically analyze on the fly, and use if, else if,
else logic blocks to parse. (I know, pathetic.) This approach requires
one if, else if, else block for each field of the command language.

Next I plan to split the lexical analysis to a first-pass, and create a
list of tokens, with additional processing such as converting keywords
and variable names to keys, and numeric text fields to machine numbers.
This will eliminate the scanning/lexing on the fly to make the
parsing code much more readable.

Then a second pass of parsing could still be done using if, else if,
else it seems in the case that a language has a fixed and relatively
small maximum number of fields per command.

The question is then, for a language similar to the above and
considering these factors:

1. need to implement in an embedded microcontroller in not more than a
few K of program memory
2. not a particularly high speed of parsing is required. Ccommands will
be issued intermittently over a slow interface of perhaps 115.2kbps.
3. ease of understanding and adapting the code to changes in the command
set is very valuable. Most likely additions would be a command to tell
the thing to accept a binary data stream representing a file/table to be
stored in the uC RAM.

Is it common to hardcode parsing with if, else if, else blocks or at
what point does one decide to use more generalized techniques such as
formalizing the language description in BNR form, and using lexer/parser
generation tools such as lex/yacc ?

I have always found the topic of parsers/interpreters/compilers
fascinating, and would like to spend considerable time learning more
about this. Yet I also have to "just get the thing to work." I feel
like I will ultimately need to perform quite a few experiments with
different parsing techniques to get a feel for the best method to choose
for this and future applications.

Thanks for input.

Hi, Chris. Just a few comments.

(1)
I wouldn't design the commands and queries as you did. I like the
recommendations of IEEE-488-2. One of them is to use a ? for queries
and to reply to such queries with a command that can also be parsed.

For example, in your above list, you list 'y' and 'z' for reading and
setting variable values. The recommended procedure would be to use
the same text for both actions, but in this way:

var? foo

which would cause the following reply:

var foo 5.3

and you'd say:

var foo 7

to change it to 7.

The query form is just a ? added to the end of a command form, in
other words.

It also recommends that you implement a single, simple command that
dumps out the current state. That state should be listed out in such
a way that if the receiver of it simply sent the entire body straight
back to your device, it would have the effect of restoring the state.
So if you had three variables that represented all of the important
state in your device, call them S1, S2, and S3, then someone might
issue the following query:

state?

and get the following response:

var S1 4.1
var S2 6.5
var S3 -1.2

and that if the receiver then fired those straight back in, the state
(no matter what it was before) would be restored to those values.

(2)
Here's an option to consider:

for ( ; ; ) {
auto void (*func)( const char * cmd );
auto int status;
static char commandname[MAXCOMMANDNAME];
skipwhitespace( );
parsecmd( commandname, MAXCOMMANDNAME );
if ( (func= commandsearch( commandname )) != 0 )
status= (*func)( command );
else
status= NOSUCHCOMMAND;
if ( status != SUCCESS ) {
skipremainderofline( );
/* print some kind of parsing error here */
}
}

Something alone these lines. The 'parsecmd' function might use the
if..else structure you already have. I tend to use a table that lists
the text of each command/query along with a function pointer to the
function to call for the rest of the parsing and execution. There
have been times in my life when, for legacy reasons, I've wanted to
have several command/query names for the same effect, for example, and
this makes that pretty easy to do.

Anyway, just a thought.

(3)
I don't know if you intend to do other things, while parsing. In most
stuff I do, it's essential that I don't get stuck, nested down in some
parsing code, when there is other work that also needs monitoring. For
example, I may need to take action on the basis of a push button that
might also be pressed at about the same time I'm down parsing serial
port text. And I can't afford to sit around waiting for the next
character of a command, with the code buried in the middle of some
if..else, while there is a keypress that is also demanding attention
right now.

So I almost always include something along the lines of an operating
system, here. In many cases, this might be nothing more than just
implementing cooperating task switches with a switchaway() function.
In other words, if I'm in the middle of the serial port code, buried
down inside an if..else, and I'm now calling the serial port code to
"get the next character" and it responds that there are no more
characters just yet... then I call this switchaway() function. When
it returns, I'm allowed some more time so I re-call the serial port
code for that 'next character' thing. If it is still empty, I recall
the switchaway() function, again. And so on. That way, I don't lose
my place in the middle of my parsing code but at the same time I can
pass along the cpu to other functions so they can get their work done.

It doesn't take more than a few lines of assembly code to implement
such a function, by the way. Most compilers set aside only a few
registers that need to be saved across a function call like that, so
it only needs to push a few registers, switch to a new stack pointer,
and then pop the same registers and return. That's it. So it isn't
hard and it helps keep your code clean. Otherwise, you'd have to
somehow save the state of your parsing function, return from it so
other work can get done, then somehow rewind yourself back into some
nasty position the next time you get called at the top of it. Or just
sit there and ignore everything else while you wait for the next
serial port character to arrive.

Best of luck,
Jon

Robert Adsett · Aug 10, 2007

Hi:

Having completed enough serial driver code for a TMS320F2812
microcontroller to talk to a terminal, I am now trying different
approaches to command interpretation.

I have a very simple command set consisting of several single letter
commands which take no arguments. A few additional single letter
commands take arguments:

For a simple human (as opposed to machine) interface I've often used a
simple lookup table composed of command name/function pointer pairs.
Simple brute force lookup, and you can add a simple help string for each
command. The functions themselves were responsible for parsing any
arguments.

Simple but limited.

Robert

Chris Carlen · Aug 10, 2007

Jonathan said:
>>[edit]
Command language (somewhat generalized):

command argument(s) description

a do action_a
b do action_b
c do action_c
...
n do action_n
...
y [var_name] display variable(s)
z var_name value set variable to value
>>[edit]
The question is then, for a language similar to the above and
considering these factors:

1. need to implement in an embedded microcontroller in not more than a
few K of program memory
2. not a particularly high speed of parsing is required. Ccommands will
be issued intermittently over a slow interface of perhaps 115.2kbps.
3. ease of understanding and adapting the code to changes in the command
set is very valuable. Most likely additions would be a command to tell
the thing to accept a binary data stream representing a file/table to be
stored in the uC RAM.
[edit]

Click to expand...

>
Hi, Chris. Just a few comments.

(1)
I wouldn't design the commands and queries as you did. I like the
recommendations of IEEE-488-2. One of them is to use a ? for queries
and to reply to such queries with a command that can also be parsed.

For example, in your above list, you list 'y' and 'z' for reading and
setting variable values. The recommended procedure would be to use
the same text for both actions, but in this way:

var? foo

which would cause the following reply:

var foo 5.3

and you'd say:

var foo 7

to change it to 7.

The query form is just a ? added to the end of a command form, in
other words.

Thanks for the input.

Interesting suggestion. I suppose since I have 0 experience with
designing command languages, most suggestinos will seem interesting

Actually, I have taken inspiration for this command set from DOS debug.
I'm also writing a hex line editor in Python (as a Python learning
exercise) which also uses this single letter business.

I just downloaded IEEE-488.2 Good grief! I have often shyed away from
standards since they are always long and agonizing to read. Would take
months just to get familiar with this. So I have no interest in making
the gadget fully 488 compliant. It doesn't appear that you were
implying that I should.

I had thought about becoming familiar with SCPI to see if it would be
applicable to this project as well. It just all seems way too complicated.

It also recommends that you implement a single, simple command that
dumps out the current state. That state should be listed out in such
a way that if the receiver of it simply sent the entire body straight
back to your device, it would have the effect of restoring the state.
So if you had three variables that represented all of the important
state in your device, call them S1, S2, and S3, then someone might
issue the following query:

state?

and get the following response:

var S1 4.1
var S2 6.5
var S3 -1.2

and that if the receiver then fired those straight back in, the state
(no matter what it was before) would be restored to those values.

This is pretty cool.

(2)
Here's an option to consider:

for ( ; ; ) {
auto void (*func)( const char * cmd );
auto int status;
static char commandname[MAXCOMMANDNAME];

commandname will get filled with a command keyword by parsecmd()?

skipwhitespace( );
parsecmd( commandname, MAXCOMMANDNAME );

parsecmd() is basically a lexer which extracts the next word ?

if ( (func= commandsearch( commandname )) != 0 )
status= (*func)( command );

Lookup and execute a function to perform the action of that command.

else
status= NOSUCHCOMMAND;

if ( status != SUCCESS ) {
skipremainderofline( );
/* print some kind of parsing error here */
}
}
Something alone these lines. The 'parsecmd' function might use the
if..else structure you already have. I tend to use a table that lists
the text of each command/query along with a function pointer to the
function to call for the rest of the parsing and execution. There
have been times in my life when, for legacy reasons, I've wanted to
have several command/query names for the same effect, for example, and
this makes that pretty easy to do.
Anyway, just a thought.

It will take considerable thought. I sort of get what your code does.
Just a little vague. I think I can come up with lots of ideas here, though.

(3)
I don't know if you intend to do other things, while parsing. In most
stuff I do, it's essential that I don't get stuck, nested down in some
parsing code, when there is other work that also needs monitoring. For
example, I may need to take action on the basis of a push button that
might also be pressed at about the same time I'm down parsing serial
port text. And I can't afford to sit around waiting for the next
character of a command, with the code buried in the middle of some
if..else, while there is a keypress that is also demanding attention
right now.

So I almost always include something along the lines of an operating
system, here. In many cases, this might be nothing more than just
implementing cooperating task switches with a switchaway() function.
In other words, if I'm in the middle of the serial port code, buried
down inside an if..else, and I'm now calling the serial port code to
"get the next character" and it responds that there are no more
characters just yet... then I call this switchaway() function. When
it returns, I'm allowed some more time so I re-call the serial port
code for that 'next character' thing. If it is still empty, I recall
the switchaway() function, again. And so on. That way, I don't lose
my place in the middle of my parsing code but at the same time I can
pass along the cpu to other functions so they can get their work done.

It doesn't take more than a few lines of assembly code to implement
such a function, by the way. Most compilers set aside only a few
registers that need to be saved across a function call like that, so
it only needs to push a few registers, switch to a new stack pointer,
and then pop the same registers and return. That's it. So it isn't
hard and it helps keep your code clean. Otherwise, you'd have to
somehow save the state of your parsing function, return from it so
other work can get done, then somehow rewind yourself back into some
nasty position the next time you get called at the top of it. Or just
sit there and ignore everything else while you wait for the next
serial port character to arrive.

I have two critical threads on my gadget, and a non-critical thread
which is the serial interface. The top #1 priority is a
non-preemptable, timer interrupt driven periodic thread. It is
guaranteed to finish in some finite maximum time, but it's period can
vary according to parameters which can be influenced by the user
interface. Basically, it's a waveform synthesizer with variable
clocking period. It's got a max clocking freq. of about

Next is #2, a preemptable (by the top thread) periodic thread which is
also timer interrupt driven. This second one has a brief assembly intro
which reenables interrupts ASAP so that it can't block #1 for more than
a few cycles. This is the hardware user interface (panel buttons)
polling thread. It can tolerate being chopped up by #1. It also has a
finite maximum exec time.

Thus, the serial code can block for as long as it likes. The only thing
that matters is that it doesn't block itself long enough to fail to
clear out the serial receive FIFO before the FIFO overflows.

I plan to use the maximum exec times for all the stuff and their
periodicities to compute the maximum baud rate at which the device can
receive while being certain never to miss characters. At least I think
this can be determined.

So that's my OS. With only a few threads, it's easy to just let timers
trigger the important ones, and round-robin the rest.

--
Good day!

________________________________________
Christopher R. Carlen
Principal Laser&Electronics Technologist
Sandia National Laboratories CA USA
(e-mail address removed)
NOTE, delete texts: "RemoveThis" and
"BOGUS" from email address to reply.

Gene · Aug 11, 2007

For a simple human (as opposed to machine) interface I've often used a
simple lookup table composed of command name/function pointer pairs.
Simple brute force lookup, and you can add a simple help string for each
command. The functions themselves were responsible for parsing any
arguments.

Simple but limited.

Don't apologize! Table-driven methods are nearly always easier to
understand and maintain than equivalent "hard code." And they avoid
redundancy (such as the comparisons in an elsif chain) that you can't
afford in a scarce memory environment. Thus all the old assemblers
and BIOS interfaces are implemented that way. Even if you choose a
more sophisticated command parsing method than table search, table-
oriented implementations of a DFAs and parsers are among the most
compact, and yet they are quite fast.

Steve at fivetrees · Aug 11, 2007

Robert Adsett said:
For a simple human (as opposed to machine) interface I've often used a
simple lookup table composed of command name/function pointer pairs.
Simple brute force lookup, and you can add a simple help string for
each
command. The functions themselves were responsible for parsing any
arguments.

Simple but limited.

Instead of a function pointer, think in terms of an enum of commands, a
structure (class, if you prefer) defining all the properties of the
command, and a table of such (constant) structures. One structure
element could indeed be a handler function, called with an enum reason
value, decoded via a switch within the handler (query vs command etc).
Others could define the number of operands and their types (handlers)
etc...

As for keeping state while handling other tasks: JK made it sound
complex

. A state machine makes it trivial to pick up the position
when a character comes in, and parse the complete line on CR/LF (etc).

Steve
http://www.fivetrees.com

Jonathan Kirwan · Aug 11, 2007

<snip>
As for keeping state while handling other tasks: JK made it sound
complex . A state machine makes it trivial to pick up the position
when a character comes in, and parse the complete line on CR/LF (etc).

On this point, I agree that state machines aren't hard to do when you
have a complete specification. They can be rather hard to document
inside the code, though.

As an example where I am really glad to use state machines would be
for driving things where I have clock edges to drive in software with
special timing of the signals relative to the clock edges. Especially
in cases like this because the hardware is usually well specified and
not likely to change in the future -- except in very predictable ways
(increased addressing width in a serial protocol, for example.)

In parsing cases, it is much more natural to use a recursive descent
approach and the co-routine/cooperative switch mechanism fits the
paradigm much better in my opinion. You often go around changing
parsing in ways that would make maintaining a state machine a bit
tricky. This is one reason why there are compiler-compilers, in fact,
to automate the construction of state machines for parsing. If you
are doing the parsing and execution coding by hand, though, and not
with some tool to automate the generation of a state machine, I would
rather use a cooperative hand-off mechanism to keep the parsing code
simple and straightforward.

Jon

Mark Borgerson · Aug 11, 2007

Instead of a function pointer, think in terms of an enum of commands, a
structure (class, if you prefer) defining all the properties of the
command, and a table of such (constant) structures. One structure
element could indeed be a handler function, called with an enum reason
value, decoded via a switch within the handler (query vs command etc).
Others could define the number of operands and their types (handlers)
etc...

As for keeping state while handling other tasks: JK made it sound
complex . A state machine makes it trivial to pick up the position
when a character comes in, and parse the complete line on CR/LF (etc).

I find it useful to buffer the input line, then activate the parser
when the CR/LF is received. This allows you to support the backspace
key for users who are not perfect typists.

I used a structure to hold the command characters (I used 2-letter
commands), a pointer to the command function, and the number and
type of parameters.

Mark Borgerson

CC · Aug 11, 2007

Mark said:
I find it useful to buffer the input line, then activate the parser
when the CR/LF is received. This allows you to support the backspace
key for users who are not perfect typists.

I used a structure to hold the command characters (I used 2-letter
commands), a pointer to the command function, and the number and
type of parameters.

Mark Borgerson

Yeah. I have made a function:

int SCI_gets(char *str, int count);

which behaves much like fgets(), except that a global enum:

typedef enum {NO_ECHO=0, ECHO=1} SCIgetsEchoModes;
static SCIgetsEchoModes Gets_echo_mode = ECHO;

control whether it echos. It also handles backspace, and interprets any
of '\n', '\r', or '\0' as EOL.

It will return a SCI read error passed up from the underlying
SCI_getc(), or an EOF if nothing was available when first called. It
blocks however, once one char has been read. It also echos non-printing
characters as '?' with red coloring on ANSI compliant terminals. Just
plain '?' if a #define is set to NON_ANSI.

So I plan to parse only buffered, terminated strings.

Robert Adsett · Aug 11, 2007

Gene said:
Don't apologize! Table-driven methods are nearly always easier to
understand and maintain than equivalent "hard code." And they avoid
redundancy (such as the comparisons in an elsif chain) that you can't
afford in a scarce memory environment. Thus all the old assemblers
and BIOS interfaces are implemented that way. Even if you choose a
more sophisticated command parsing method than table search, table-
oriented implementations of a DFAs and parsers are among the most
compact, and yet they are quite fast.

I'm sorry I didn't mean to apologize

Really, I was just mentioning a simple, quick method. I was under the
impression Chris was really looking for something with a little more
sophisticated syntax checking than a strcmp loop.

Speaking of which a friend of mine implemented a simple command
interpreter on a PDP (actually a clone I think) in fortran that used two
characters in the command. The characters were in a 6 bit encoding and
two of them formed a 12 bit word that was used in a command lookup.
Apparently you can get quite fast with two character commands if you
don't have to worry about things like whitespace and other syntactic
fluff.

Robert

John Larkin · Aug 11, 2007

Hi:

Having completed enough serial driver code for a TMS320F2812
microcontroller to talk to a terminal, I am now trying different
approaches to command interpretation.

I have a very simple command set consisting of several single letter
commands which take no arguments. A few additional single letter
commands take arguments:

Command language (somewhat generalized):

command argument(s) description

a do action_a
b do action_b
c do action_c
...
n do action_n
...
y [var_name] display variable(s)
z var_name value set variable to value

where var_name is a variable identifier. The 'd' command lets one
display the value of a variable identified by var_name. If no var_name
is provide, this command will display all variables. The 'e' command
lets one set the value of the variable identified by var_name to value.

Presently I am using the simplest yet least flexible approach to parse
commands. I scan and lexically analyze on the fly, and use if, else if,
else logic blocks to parse. (I know, pathetic.) This approach requires
one if, else if, else block for each field of the command language.

Next I plan to split the lexical analysis to a first-pass, and create a
list of tokens, with additional processing such as converting keywords
and variable names to keys, and numeric text fields to machine numbers.
This will eliminate the scanning/lexing on the fly to make the
parsing code much more readable.

Then a second pass of parsing could still be done using if, else if,
else it seems in the case that a language has a fixed and relatively
small maximum number of fields per command.

The question is then, for a language similar to the above and
considering these factors:

1. need to implement in an embedded microcontroller in not more than a
few K of program memory
2. not a particularly high speed of parsing is required. Ccommands will
be issued intermittently over a slow interface of perhaps 115.2kbps.
3. ease of understanding and adapting the code to changes in the command
set is very valuable. Most likely additions would be a command to tell
the thing to accept a binary data stream representing a file/table to be
stored in the uC RAM.

Is it common to hardcode parsing with if, else if, else blocks or at
what point does one decide to use more generalized techniques such as
formalizing the language description in BNR form, and using lexer/parser
generation tools such as lex/yacc ?

I have always found the topic of parsers/interpreters/compilers
fascinating, and would like to spend considerable time learning more
about this. Yet I also have to "just get the thing to work." I feel
like I will ultimately need to perform quite a few experiments with
different parsing techniques to get a feel for the best method to choose
for this and future applications.

Thanks for input.

It's cleaner to do a table-driven thing, rather than a bunch of case
statements. With a single letter per command, you can just index to 26
small table entries, each with a handler address, maybe a pointer to a
variable, and maybe a flag word.

I recently did a parser with command lines like...

ADelay 12.5n; TRigger REmote; FIre <cr>

where each command is a 2-char token (more chars are allowed but
ignored), a token plus an arg, or two tokens. All table driven, with
maybe 140 commands. The parser proper is about 100 lines of assembly
code. It just packs the command chars into a longword as "AD " or
"TRRE" or "FI ", looks it up in the table, and dispatches to the
named handler.

XTABLE:

; TOKEN1 TOKEN2 HANDLER PARAM

.WORD "US", 0, XLONG, FUSE ; USec timer
.WORD "UW", 0, QWORD, FUSE+2 ; USec timer, as a word!
.WORD "WA", 0, XWAIT, 0 ; WAit nnn microseconds
.WORD "FI", 0, XFIRE, FIRE ; FIre
.WORD "GA", "FI", XGFIR, FBFIRE ; GAte FIre
.WORD "FE", 0, XFEOD, FEOD ; FEod
.WORD "CO", 0, XOK, 0 ; COmment!

.WORD "AD", 0, XTSET, CCBA+CDHI ; ADelay 123.45N
.WORD "AW", 0, XTSET, CCBA+CWHI ; AWidth 123.45N
.WORD "AS", "ON", XSON, PCBA ; ASet ON
.WORD "AS", "OF", XSOF, PCBA ; ASet OFf
.WORD "AS", "PO", XSPOS, PCBA ; ASet POs
.WORD "AS", "NE", XSNEG, PCBA ; ASet NEg
.WORD "AS", 0, XAINQ, CCBA ; ASet inquiry
.WORD "AP", 0, XAINQ, PCBA ; APend pending inquiry

John

Andrew Smallshaw · Aug 11, 2007

For a simple human (as opposed to machine) interface I've often used a
simple lookup table composed of command name/function pointer pairs.
Simple brute force lookup, and you can add a simple help string for each
command. The functions themselves were responsible for parsing any
arguments.

A slightly more sophisticated approach is to ensure that the lookup
table is sorted and apply a binary search on the table. On each
iteration test the mid-point of the range between a known too low
and too high value (these begin set to 0 and the last entry in the
table). I've used this approach in the past and in C it amounts
to maybe 3 or 4 additional lines of code. ISTR one of the examples
in K&R uses this approach somewhere if you want a more concrete
example.

Ark Khasin · Aug 11, 2007

Chris said:
Hi:

Having completed enough serial driver code for a TMS320F2812
microcontroller to talk to a terminal, I am now trying different
approaches to command interpretation.

I have a very simple command set consisting of several single letter
commands which take no arguments. A few additional single letter
commands take arguments:

Command language (somewhat generalized):

command argument(s) description

a do action_a
b do action_b
c do action_c
...
n do action_n
...
y [var_name] display variable(s)
z var_name value set variable to value

where var_name is a variable identifier. The 'd' command lets one
display the value of a variable identified by var_name. If no var_name
is provide, this command will display all variables. The 'e' command
lets one set the value of the variable identified by var_name to value.

Presently I am using the simplest yet least flexible approach to parse
commands. I scan and lexically analyze on the fly, and use if, else if,
else logic blocks to parse. (I know, pathetic.) This approach requires
one if, else if, else block for each field of the command language.

Next I plan to split the lexical analysis to a first-pass, and create a
list of tokens, with additional processing such as converting keywords
and variable names to keys, and numeric text fields to machine numbers.
This will eliminate the scanning/lexing on the fly to make the
parsing code much more readable.

Then a second pass of parsing could still be done using if, else if,
else it seems in the case that a language has a fixed and relatively
small maximum number of fields per command.

The question is then, for a language similar to the above and
considering these factors:

1. need to implement in an embedded microcontroller in not more than a
few K of program memory
2. not a particularly high speed of parsing is required. Ccommands will
be issued intermittently over a slow interface of perhaps 115.2kbps.
3. ease of understanding and adapting the code to changes in the command
set is very valuable. Most likely additions would be a command to tell
the thing to accept a binary data stream representing a file/table to be
stored in the uC RAM.

Is it common to hardcode parsing with if, else if, else blocks or at
what point does one decide to use more generalized techniques such as
formalizing the language description in BNR form, and using lexer/parser
generation tools such as lex/yacc ?

I have always found the topic of parsers/interpreters/compilers
fascinating, and would like to spend considerable time learning more
about this. Yet I also have to "just get the thing to work." I feel
like I will ultimately need to perform quite a few experiments with
different parsing techniques to get a feel for the best method to choose
for this and future applications.

Thanks for input.

Lookup table would be by far the simplest and fastest. For
maintainability, you can exploit designated initializers for your
constant lookup table (the [8]=2007 stuff) if you have C99.
If you worry about the size of the table, in many cases it can be
compressed, but for that you'd need an external preprocessor or another
code (actually, data) generator. That, along with compression, will
automate the maintenance.

You may want to take a look at Unimal
(http://www.macroexpressions.com/unimal.html) and in particular App Note
4 there. Sorry for the shameless self-promotion but yours is the sort of
a problem that brought Unimal into existence.

-- Ark

Paul Keinanen · Aug 12, 2007

A slightly more sophisticated approach is to ensure that the lookup
table is sorted and apply a binary search on the table. On each
iteration test the mid-point of the range between a known too low
and too high value (these begin set to 0 and the last entry in the
table). I've used this approach in the past and in C it amounts
to maybe 3 or 4 additional lines of code. ISTR one of the examples
in K&R uses this approach somewhere if you want a more concrete
example.

Unless you are using some Zuse style relay computers, I really do not
see the point of optimizing the command line _human_ interface for
speed. The user would not notice the difference anyway.

Paul

CC · Aug 12, 2007

Paul said:
Unless you are using some Zuse style relay computers, I really do not
see the point of optimizing the command line _human_ interface for
speed. The user would not notice the difference anyway.

In this case, the interface will also be machine driven, though the
serial link limits the speed such that I can't imagine any parser
implementation that wouldn't keep up. So the simplest approach that can
do the job yet be flexible for possible modifications will suffice. The
fastest it ever might have to go is perhaps 1Mbps or so, if we move to a
FTDI based USB connection.

The thing that's really hard to foresee at this point is how the command
set might evolve.

Good day!

Command Line Arguments	0	Mar 7, 2023
How to get started with C programming:	1	Aug 7, 2023
Can't decide which language to get back into programming with	1	Mar 28, 2023
What programming language to choose?	4	Jul 3, 2022
How do i get this Curl command to work on Linux	2	Dec 19, 2020
C language. work with text	3	Dec 10, 2021
Windows command line to python	0	Sep 29, 2021
Bash: time command on PNF	0	Jan 8, 2023

Command language parsing - how formal to get?

Chris Carlen

Jonathan Kirwan

Robert Adsett

Chris Carlen

Gene

Steve at fivetrees

Jonathan Kirwan

Mark Borgerson

CC

Robert Adsett

John Larkin

Andrew Smallshaw

Ark Khasin

Paul Keinanen

CC

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads