Parser for a simple expression

I

Ivo

People,

I need a parser for a simple expression

it must support keywords like AND, NOT and OR and "(" and ")" constructions.

I have been looking into ANTLR and JavaCC but am really curious if
anyone can point me to an easier method.

thx,
Ivo.
 
S

Stefan Ram

Ivo said:
I need a parser for a simple expression

(If answering to the following post, one should please not
quote all of it, but only the short parapgrahs one directly
refers to.)

In order to interpret or translate an expression (term), it is
decomposed into lexical units (tokens, words), which then are
used by a parser to build symbols and a structured
representation of the input. This representation then might be
evaluated or translated into some other representation.

The syntactial structuring resembles the rules for the
construction of an expression, which often is given by so-
called "productions" of the EBNF (extended Backus-Nauer-Form)
and which sometimes are left-recursive.

When writing a parser, the left-recursive productions sometimes
are a worry to the author, because it is not obvious how to
avoid an infinite recursion. The solution is to rewrite them as
right-recursive productions.

The addition with a binary infix Operator, for example, is
left associative. However, it is simpler to analyze in a
right-associative manner. Therefore, one analyzes the source
using right-associative rules and then creates a result
using a left-associative interpretation.

A left-associative grammar might be, for example, as follows.

<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral> | <expression> '+' <numeral>.
start symbol: <expression>.

To analyze this using a recursive descent parser one
prefers to use the following grammar.

<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral>[ '+' <expression> ].
start symbol: <expression>.

This can be written using iteration as follows.

<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral>{ '+' <numeral> }.
start symbol: <expression>.

However, the product is created in the sense of the
first grammar. Example code follows.

class Scan
{ static String source = "5+4+2)";
static int pos = 0;
static char get(){ return source.charAt( pos++ ); }}

class Parse
{
static int numeral(){ return Scan.get() - '0'; }

static int expression(){ int result = numeral();
while( '+' == Scan.get() )result += numeral();
return result; }}

public class Start
{
public static void main( final String[] args )
{ System.out.println( Parse.expression() ); }}

To be able to parse expressions with higher
priority, the grammar can be extended.

<numeral> ::= '2' | '4' | '5'.
<product> ::= <numeral> | <product> '*' <numeral>.
<sum> ::= <product> | <sum> '+' <product>.
start symbol: <sum>.

In iterative notation:

<numeral> ::= '2' | '4' | '5'.
<product> ::= <numeral>{ '*' <numeral> }.
<sum> ::= <product>{ '+' <product> }.
start symbol: <sum>.

In Java:

class Scan
{ static String source = "5+4*2)";
static int pos = 0;
static char get( final boolean advance )
{ return source.charAt( advance ? pos++ : pos ); }}

class Parse
{
static int numeral(){ return Scan.get( true ) - '0'; }

static int product(){ int result = numeral();
while( '*' == Scan.get( false )){ Scan.get( true ); result *= numeral(); }
return result; }

static int sum(){ int result = product();
while( '+' == Scan.get( true ))result += product();
return result; }}

public class Start
{
public static void main( final String[] args )
{ System.out.println( Parse.sum() ); }}

Exercises

- What is the output of the above programs?

- Extend the last grammar and the last program so as
to handle subtraction.

- Extend the result of the last exercise in order
to handle division.

- Extend the result of the last exercise so that also
numbers with multiple digits are accepted.

- Extend the result of the last exercise so that also
terms in parentheses are accepted. The input "(2+4)*5)"
should give the result "30".

- Extend the result of the last exercise so that
also a unary minus "-" is recognized.

- Extend the result of the last exercise so that
more operators and functions are recognized.

- Extend the result of the last exercise so that
meaningful error messages are created for all
inputs that to not fulfill the rules of the input
language.

- Extend the result of the last exercise so that the
error messages also show the location where the error
was detected. It should be possible to enter an expression
that spans multiple lines, and an error message should
contain the number of the line where the error was
detected.

See also:

JEP - Java Mathematical Expression Parser
http://www.singularsys.com/jep/

Steven Metsker: Building Parsers with Java.
Addison-Wesley 2001, ISBN 0201719622.

A.W. Appel: Modern Compiler Implementation in Java.
Cambridge University Press 1998, ISBN 0521586542.
 
D

derek

have you thought of embedding a scripting language into your application? you would get a complete language instead of just some simple expression support. embedding a language should only take a few lines.
 
I

Ivo

Ivo said:
People,

I need a parser for a simple expression

it must support keywords like AND, NOT and OR and "(" and ")"
constructions.

I have been looking into ANTLR and JavaCC but am really curious if
anyone can point me to an easier method.

thx,
Ivo.

OK here my ANTLR gramar:
===
grammar AndOrNotVoter;

@header {
package test;

}

@lexer::header {package test;}

@members {
/** Map variable name to Integer object holding value */
}


orexpression
: andexpression ('or' andexpression)*
;

andexpression
: notexpression ('and' notexpression)*
;

notexpression
: ('not')? atom
;

atom
: role
| LPAREN orexpression RPAREN
;

role
:'ROLE_LEVEL1'
:'ROLE_LEVEL2'
:'ROLE_LEVEL3'
:'ROLE_LEVEL4'
|'ROLE_EDITOR'
|'ROLE_OWNER'
|'ROLE_ADMIN'
;

LPAREN : '('
;

RPAREN : ')'
;

WS : ( ' '
| '\t'
| '\r'
| '\n'
)+
{ $channel=HIDDEN; }
;
===

I have to evaluate a couple of roles a person can have or may not have
in order to get access to certain data.

So a person that has logged on gets a couple of roles. These roles are
maintained in a class called GrantedAuthorities .
When a certain page is requested I have to check of the granted
Authorities to allow this person access to all or some of the data on
the page.

So there is a custom tag that accepts an expression like;
===example===
(ROLE_OWNER and ROLE_LEVEL4) or ROLE_ADMIN
===/example===
This expression needs to be parsed and compared to the
GrantedAuthorities I get from the logon procedure. I have no idea how to
go on from here... Anyone? I can sure use the help.

It is not like evaluating a mathematical expression where all the data
needed is provided in the expression. In order to give meaning to this
expression I need data from GrantedAuthorities.

How do I go about doing this?

Thanks for your patience

Ivo.
 
I

Ivo

derek said:
have you thought of embedding a scripting language into your application? you would get a complete language instead of just some simple expression support. embedding a language should only take a few lines.

The company I work for will not allow it.
DSL = allowed as long as the end result is one of the supported
languages (Java or COBOL) haha.

Ivo.
 
M

Martin Gregorie

Ivo said:
No I haven't. It seems that ANTLR has more support (BTW your link
doesn't seem to work?)
Here's a correction. The other was a mis-paste that I failed to catch:

http://www.ssw.uni-linz.ac.at/Research/Projects/Coco/

CoCo/R has flavors for several languages and, when I first used it a
couple of years ago, I don't recall any searches finding ANTLR. In
summary, a single .ATR input file generates a Scanner and a Parser class
that includes custom code from the .ATR file. So much, so standard. An
advantage over some other code generators is that the framework files
for the two classes can be modified fairly easily, e.g. to generate code
to parse a String rather than reading its input from a file (I needed
this ability).

If you're C literate and have used lex and yacc then Coco/R is a
complete doddle to use.
 
I

Ivo

Martin said:
Here's a correction. The other was a mis-paste that I failed to catch:

http://www.ssw.uni-linz.ac.at/Research/Projects/Coco/

CoCo/R has flavors for several languages and, when I first used it a
couple of years ago, I don't recall any searches finding ANTLR. In
summary, a single .ATR input file generates a Scanner and a Parser class
that includes custom code from the .ATR file. So much, so standard. An
advantage over some other code generators is that the framework files
for the two classes can be modified fairly easily, e.g. to generate code
to parse a String rather than reading its input from a file (I needed
this ability).

If you're C literate and have used lex and yacc then Coco/R is a
complete doddle to use.
thanks for the advice.
I have use Lex and Yacc a lot.

I am still going to use ANTLR I think because it has the same kind of
lex and yacc like method of working and it now has a cool IDE for testing.

My problem is still not solved though. My grammar works but now I have
to do something with it. My boolean expressions are dependent on
information that has to be provided before parsing is started. Still
looking into that one.

Help is appreciated

cheerz,
Ivo.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Ivo said:
I need a parser for a simple expression

it must support keywords like AND, NOT and OR and "(" and ")"
constructions.

I have been looking into ANTLR and JavaCC but am really curious if
anyone can point me to an easier method.

If you need a parser for a specific grammar then use the other
advise you have received.

If you just need some form of expression evaluation then
look at the script language integration in Java 1.6 and
JavaScript's nice eval function.

Arne
 
P

Piotr Kobzda

Ivo said:
I need a parser for a simple expression

it must support keywords like AND, NOT and OR and "(" and ")"
constructions.

I have been looking into ANTLR and JavaCC but am really curious if
anyone can point me to an easier method.

If strict validation of your expressions is not required (what I think
is your situation), then the above seems not to be very difficult to
implement without any third-party tools. See the code below (not tested
intensively!).

Regardless of what expression parsing/compiling/processing technique
you'll choose, there is also presented the way I think you could use it
in your code (in this scenario your granted Authorities must simply
implement RoleSet interface).


HTH,
piotr


import java.util.HashSet;
import java.util.Set;
import java.util.StringTokenizer;

public class Roles {

public interface RoleSet {
boolean contains(String role);
}

public interface CompiledExpression {
boolean matches(RoleSet roles);
}

public static CompiledExpression compile(String expression) {
return new Compiler(new Tokenizer(expression)).compile();
}

static class Tokenizer {
StringTokenizer st;

Tokenizer(String expression) {
st = new StringTokenizer(expression, " ()", true);
}

String nextToken() {
while (st.hasMoreTokens()) {
String token = st.nextToken().trim();
if (token.length() > 0) {
return token;
}
}
return null;
}
}

static class Compiler {
Tokenizer tokenizer;

public Compiler(Tokenizer tokenizer) {
this.tokenizer = tokenizer;
}

public CompiledExpression compile() {
return compile(compile(null));
}

CompiledExpression compile(final CompiledExpression left) {
final String token = tokenizer.nextToken();
if(token == null) {
return left;
}

abstract class NextExpression implements CompiledExpression {
CompiledExpression right;
NextExpression(CompiledExpression right) {
this.right = right;
}
}

if(token.equalsIgnoreCase("not")) {
return compile(new NextExpression(compile(null)) {

public boolean matches(RoleSet roles) {
return ! right.matches(roles);
}
});
}
if(token.equalsIgnoreCase("or")) {
return new NextExpression(compile(compile(null))) {

public boolean matches(RoleSet roles) {
return left.matches(roles) || right.matches(roles);
}
};
}
if(token.equalsIgnoreCase("and")) {
return compile(new NextExpression(compile(null)) {

public boolean matches(RoleSet roles) {
return left.matches(roles) && right.matches(roles);
}
});
}
if(token.equals("(")) {
return compile(new NextExpression(compile(null)) {

public boolean matches(RoleSet roles) {
return right.matches(roles);
}
});
}
if(token.equals(")")) {
return left;
}

return new CompiledExpression() {

public boolean matches(RoleSet roles) {
return roles.contains(token);
}
};
}
}

public static void main(String[] args) throws Exception {
final String expression = "a or b and not c";
final Roles.CompiledExpression ce = Roles.compile(expression);
System.out.println("expression: " + expression);

new RoleSet() {
Set<String> roles = new HashSet<String>();

public boolean contains(String role) {
boolean enabled = roles.contains(role);
System.out.println(
"role " + role + " " + (enabled ? "set" : "not set"));
return enabled;
}

void check() {
System.out.println(roles + " " +
(ce.matches(this) ? "matches" : "don't match"));
}

{ // test...
check();

roles.add("b");
check();

roles.add("c");
check();

roles.add("a");
check();
}
};
}
}
 
I

Ivo

Piotr said:
Ivo said:
I need a parser for a simple expression

it must support keywords like AND, NOT and OR and "(" and ")"
constructions.

I have been looking into ANTLR and JavaCC but am really curious if
anyone can point me to an easier method.

If strict validation of your expressions is not required (what I think
is your situation), then the above seems not to be very difficult to
implement without any third-party tools. See the code below (not tested
intensively!).

Regardless of what expression parsing/compiling/processing technique
you'll choose, there is also presented the way I think you could use it
in your code (in this scenario your granted Authorities must simply
implement RoleSet interface).


HTH,
piotr


import java.util.HashSet;
import java.util.Set;
import java.util.StringTokenizer;

public class Roles {

public interface RoleSet {
boolean contains(String role);
}

public interface CompiledExpression {
boolean matches(RoleSet roles);
}

public static CompiledExpression compile(String expression) {
return new Compiler(new Tokenizer(expression)).compile();
}

static class Tokenizer {
StringTokenizer st;

Tokenizer(String expression) {
st = new StringTokenizer(expression, " ()", true);
}

String nextToken() {
while (st.hasMoreTokens()) {
String token = st.nextToken().trim();
if (token.length() > 0) {
return token;
}
}
return null;
}
}

static class Compiler {
Tokenizer tokenizer;

public Compiler(Tokenizer tokenizer) {
this.tokenizer = tokenizer;
}

public CompiledExpression compile() {
return compile(compile(null));
}

CompiledExpression compile(final CompiledExpression left) {
final String token = tokenizer.nextToken();
if(token == null) {
return left;
}

abstract class NextExpression implements CompiledExpression {
CompiledExpression right;
NextExpression(CompiledExpression right) {
this.right = right;
}
}

if(token.equalsIgnoreCase("not")) {
return compile(new NextExpression(compile(null)) {

public boolean matches(RoleSet roles) {
return ! right.matches(roles);
}
});
}
if(token.equalsIgnoreCase("or")) {
return new NextExpression(compile(compile(null))) {

public boolean matches(RoleSet roles) {
return left.matches(roles) || right.matches(roles);
}
};
}
if(token.equalsIgnoreCase("and")) {
return compile(new NextExpression(compile(null)) {

public boolean matches(RoleSet roles) {
return left.matches(roles) && right.matches(roles);
}
});
}
if(token.equals("(")) {
return compile(new NextExpression(compile(null)) {

public boolean matches(RoleSet roles) {
return right.matches(roles);
}
});
}
if(token.equals(")")) {
return left;
}

return new CompiledExpression() {

public boolean matches(RoleSet roles) {
return roles.contains(token);
}
};
}
}

public static void main(String[] args) throws Exception {
final String expression = "a or b and not c";
final Roles.CompiledExpression ce = Roles.compile(expression);
System.out.println("expression: " + expression);

new RoleSet() {
Set<String> roles = new HashSet<String>();

public boolean contains(String role) {
boolean enabled = roles.contains(role);
System.out.println(
"role " + role + " " + (enabled ? "set" : "not set"));
return enabled;
}

void check() {
System.out.println(roles + " " +
(ce.matches(this) ? "matches" : "don't match"));
}

{ // test...
check();

roles.add("b");
check();

roles.add("c");
check();

roles.add("a");
check();
}
};
}
}
Piotr,

You Rock!
Thanks. This was what I am looking for.
I need to refine it a bit but It seems like you made my day.
I have an implementation in ANTLR almost finished and I will probably
finish it just for the practice and the experience but I will probably
use a variant of your version. It eliminates the need of 3rd party stuff
and because this role checking will be done a lot (really a lot) I like
this "lightweight" solution very much.
I had looked into StringTokenizer a bit but it seems not deep enough.

Thanks again.

Ivo.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top