Rui said:
I've just added a small C++ implementation example for a table-based LL parser to wikipedia's
article on LL parsers. It would be great if the code was reviewed by more knowledgeable and
experienced people, as I've added a couple of crude hacks to it.
So, if anyone is willing to take a crack at it then please take a look at:
http://en.wikipedia.org/wiki/LL_parser#Parser_implementation_in_C.2B.2B
Thanks in advance,
Rui Maciel
Funny thing just happened: I tried to change your code to my taste and
the extra '1' rule in the result went away, so now the result is the
expected sequence of rules, 2133. I swear I did not even try to fix any
behavior -- don't even know what the issue was.
For the changes, my only goal was to make the code clearer to a
Wikipedia reader, even if I had to sacrifice some performance (which I
may or may not have done -- replacing std::map with 2-dimensional array
should give some wins and replacing switch with std::find may lose
some). The summary of the changes:
1. Eliminated magic numbers (I would even feel comfortable removing some
comments now).
2. Replaced std::map-based table with a 2-dimensional array as a more
natural representation of a table
3. Got rid of a switch in a lexer() (just to make it shorter); in real
parser I would use direct indexation by character value but I felt it
would be an unnecessary complication for illustrative code.
I tried to preserve the original indentation/braces style although it
was definitely not my favorite. Below is the code.
One last suggestion for the article: a reader may wonder what's so cool
about LL-grammars as opposed to LR (that maybe consider more powerful)
so it may be worth to mention the advantages. (If I understand it
correctly LL(1) parsers can be much faster in principle than LR).
Hope this will help,
-Pavel
#include <algorithm>
#include <cassert>
#include <iostream>
#include <stack>
enum GrammarElements {
L_PAREN, // '('
R_PAREN, // ')'
A_CHAR, // 'a'
PLUS, // '+'
END_OF_STREAM, // $, '\0' in our case
UNRECOGNIZED_TOKEN,
N_TERMINALS,
SUM_EXPR = N_TERMINALS, // S non-terminal
ADDEND, // F non-terminal
N_GRAMMAR_ELEMENTS
};
enum RuleId {
DEFAULTED,
ON_AUGEND, // a funny old name for a first summand
ON_OPENING_PAREN,
ON_ADDEND // a second summand..
};
// compiler initializes to DEFAULTED
static RuleId table[N_GRAMMAR_ELEMENTS][N_TERMINALS];
/**
Translates ASCII characters into terminal symbol reference numbers
**/
int lexer(char c)
{
static const char ABC[N_TERMINALS] = "()a+";
const char *res = std::find(ABC, ABC + N_TERMINALS, c);
return res == ABC + N_TERMINALS ? UNRECOGNIZED_TOKEN : res - ABC;
}
/**
The following program parses the text passed through the first command
line argument
**/
int main(int argc, char **argv)
{
using namespace std;
if (argc < 2)
{
cout << "usage: ll <symbols>" << endl;
return 0;
}
stack<GrammarElements> ss; // symbol stack
// initialize the stack
ss.push(END_OF_STREAM); // terminal, $
ss.push(SUM_EXPR); // non-terminal, S
// initializes the input stream
char *p = &argv[1][0]; // input buffer
// setup the parsing table for this grammar
table[SUM_EXPR][L_PAREN] = ON_OPENING_PAREN;
table[SUM_EXPR][A_CHAR] = ON_AUGEND;
table[ADDEND][A_CHAR] = ON_ADDEND;
while(!ss.empty())
{
if(lexer(*p) == ss.top())
{
// input symbol and top of the stack match, pop stacks
cout << "Matched " << lexer(*p) << endl;
p++;
ss.pop();
}
else
{
// input symbol and top of the stack don't match, execute rule
according to the parsing table
RuleId ruleId = table[ss.top()][lexer(*p)];
cout << "Rule " << ruleId << endl;
switch(ruleId)
{
case ON_AUGEND: // 1. S -> F
ss.pop();
ss.push(ADDEND); // F
break;
case ON_OPENING_PAREN: // 2. S -> ( S + F )
ss.pop();
ss.push(R_PAREN); // )
ss.push(ADDEND); // F
ss.push(PLUS); // +
ss.push(SUM_EXPR); // S
ss.push(L_PAREN); // )
break;
case ON_ADDEND: // 3. F -> a
ss.pop();
ss.push(A_CHAR); // a
break;
default:
assert(ruleId == DEFAULTED);
cout << "parser table defaulted" << endl;
return 0;
}
}
}
cout << "finished parsing" << endl;
return 0;
}