Ivo said:
I need a parser for a simple expression
(If answering to the following post, one should please not
quote all of it, but only the short parapgrahs one directly
refers to.)
In order to interpret or translate an expression (term), it is
decomposed into lexical units (tokens, words), which then are
used by a parser to build symbols and a structured
representation of the input. This representation then might be
evaluated or translated into some other representation.
The syntactial structuring resembles the rules for the
construction of an expression, which often is given by so-
called "productions" of the EBNF (extended Backus-Nauer-Form)
and which sometimes are left-recursive.
When writing a parser, the left-recursive productions sometimes
are a worry to the author, because it is not obvious how to
avoid an infinite recursion. The solution is to rewrite them as
right-recursive productions.
The addition with a binary infix Operator, for example, is
left associative. However, it is simpler to analyze in a
right-associative manner. Therefore, one analyzes the source
using right-associative rules and then creates a result
using a left-associative interpretation.
A left-associative grammar might be, for example, as follows.
<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral> | <expression> '+' <numeral>.
start symbol: <expression>.
To analyze this using a recursive descent parser one
prefers to use the following grammar.
<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral>[ '+' <expression> ].
start symbol: <expression>.
This can be written using iteration as follows.
<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral>{ '+' <numeral> }.
start symbol: <expression>.
However, the product is created in the sense of the
first grammar. Example code follows.
class Scan
{ static String source = "5+4+2)";
static int pos = 0;
static char get(){ return source.charAt( pos++ ); }}
class Parse
{
static int numeral(){ return Scan.get() - '0'; }
static int expression(){ int result = numeral();
while( '+' == Scan.get() )result += numeral();
return result; }}
public class Start
{
public static void main( final String[] args )
{ System.out.println( Parse.expression() ); }}
To be able to parse expressions with higher
priority, the grammar can be extended.
<numeral> ::= '2' | '4' | '5'.
<product> ::= <numeral> | <product> '*' <numeral>.
<sum> ::= <product> | <sum> '+' <product>.
start symbol: <sum>.
In iterative notation:
<numeral> ::= '2' | '4' | '5'.
<product> ::= <numeral>{ '*' <numeral> }.
<sum> ::= <product>{ '+' <product> }.
start symbol: <sum>.
In Java:
class Scan
{ static String source = "5+4*2)";
static int pos = 0;
static char get( final boolean advance )
{ return source.charAt( advance ? pos++ : pos ); }}
class Parse
{
static int numeral(){ return Scan.get( true ) - '0'; }
static int product(){ int result = numeral();
while( '*' == Scan.get( false )){ Scan.get( true ); result *= numeral(); }
return result; }
static int sum(){ int result = product();
while( '+' == Scan.get( true ))result += product();
return result; }}
public class Start
{
public static void main( final String[] args )
{ System.out.println( Parse.sum() ); }}
Exercises
- What is the output of the above programs?
- Extend the last grammar and the last program so as
to handle subtraction.
- Extend the result of the last exercise in order
to handle division.
- Extend the result of the last exercise so that also
numbers with multiple digits are accepted.
- Extend the result of the last exercise so that also
terms in parentheses are accepted. The input "(2+4)*5)"
should give the result "30".
- Extend the result of the last exercise so that
also a unary minus "-" is recognized.
- Extend the result of the last exercise so that
more operators and functions are recognized.
- Extend the result of the last exercise so that
meaningful error messages are created for all
inputs that to not fulfill the rules of the input
language.
- Extend the result of the last exercise so that the
error messages also show the location where the error
was detected. It should be possible to enter an expression
that spans multiple lines, and an error message should
contain the number of the line where the error was
detected.
See also:
JEP - Java Mathematical Expression Parser
http://www.singularsys.com/jep/
Steven Metsker: Building Parsers with Java.
Addison-Wesley 2001, ISBN 0201719622.
A.W. Appel: Modern Compiler Implementation in Java.
Cambridge University Press 1998, ISBN 0521586542.