Rudra Banerjee said:
I will be grateful if someone show me a sample code so that I
can build on that and come back if I face any problem.
To write a parser for a language, one does not want to use
examples of that language, but a grammar for that language.
Give a C programmer a grammar and some money and he'll
happily write a parser for you. As for an example:
(If answering to the following post, one should please not
quote all of it, but only a few lines one directly refers to.)
In order to interpret or translate an expression (term), it is
decomposed into lexical units (tokens, words), which then are
used by a parser to build symbols and a structured
representation of the input. This representation then might be
evaluated or translated into some other representation.
The syntactial structuring resembles the rules for the
construction of an expression, which often is given by so-
called "productions" of the EBNF (extended Backus-Nauer-Form)
and which sometimes are left-recursive.
When writing a parser, the left-recursive productions sometimes
are a worry to the author, because it is not obvious how to
avoid an infinite recursion. The solution is to rewrite them as
right-recursive productions.
The addition with a binary infix Operator, for example, is
left associative. However, it is simpler to analyze in a
right-associative manner. Therefore, one analyzes the source
using right-associative rules and then creates a result
using a left-associative interpretation.
A left-associative grammar might be, for example, as follows.
<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral> | <expression> '+' <numeral>.
start symbol: <expression>.
To analyze this using a recursive descent parser, one
prefers to use the following grammar.
<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral>[ '+' <expression> ].
start symbol: <expression>.
This can be written using iteration as follows.
<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral>{ '+' <numeral> }.
start symbol: <expression>.
However, the product is created in the sense of the
first grammar. Example code follows.
#include <stdio.h> /* printf */
/* scanner */
static inline char get()
{ static char const * const source = "2+4+5)";
static int pos = 0;
return source[ pos++ ]; }
/* parser */
static inline int numeral(){ return get() - '0'; }
static int sum(){ int result = numeral();
while( '+' == get() )result += numeral();
return result; }
/* main */
int main( void ){ printf( "sum = %d\n", sum() ); }
To be able to parse expressions with higher
priority, the grammar can be extended.
<numeral> ::= '2' | '4' | '5'.
<product> ::= <numeral> | <product> '*' <numeral>.
<sum> ::= <product> | <sum> '+' <product>.
start symbol: <sum>.
In iterative notation:
<numeral> ::= '2' | '4' | '5'.
<product> ::= <numeral>{ '*' <numeral> }.
<sum> ::= <product>{ '+' <product> }.
start symbol: <sum>.
In C:
#include <stdio.h> /* printf */
/* scanner */
static inline char get( int const move )
{ static char const * const source = "2+4*5)";
static int pos = 0;
return source[ pos += move ]; }
/* parser */
static inline int numeral(){ return get( 1 )- '0'; }
static int product(){ int result = numeral();
while( '*' == get( 0 )){ get( 1 ); result *= numeral(); }
return result; }
static int sum(){ int result = product();
while( '+' == get( 1 ))result += product();
return result; }
/* main */
int main( void ){ printf( "sum = %d\n", sum() ); }
Exercises
- What is the output of the above programs?
- Extend the last grammar and the last program so as
to handle subtraction.
- Extend the result of the last exercise in order
to handle division.
- Extend the result of the last exercise so that also
numbers with multiple digits are accepted.
- Extend the result of the last exercise so that also
terms in parentheses are accepted. The input "(2+4)*5)"
should give the result "30".
- Extend the result of the last exercise so that
also a unary minus "-" is recognized.
- Extend the result of the last exercise so that
more operators and functions are recognized.
- Extend the result of the last exercise so that
meaningful error messages are created for all
inputs that do not fulfill the rules of the input
language.
- Extend the result of the last exercise so that the
error messages also show the location where the error
was detected. It should be possible to enter an expression
that spans multiple lines, and an error message should
contain the number of the line where the error was
detected.
See also:
http://compilers.iecc.com/crenshaw/