Simple arithmetic parser example

Francis Moreau · Dec 8, 2009

Hello,

I would be interested see some piece of C code that parses arithmetic
expressions.

For example, the parser could parse : "(2+3) / 10 - 4" (respecting the
precedence of each operators) and outputs the result.

I know I can simply use yacc/bison to achieve this goal, but I'm not
really interested in it.

Could anybody point out some code ?

Thanks

Eric Sosman · Dec 8, 2009

Francis said:
Hello,

I would be interested see some piece of C code that parses arithmetic
expressions.

For example, the parser could parse : "(2+3) / 10 - 4" (respecting the
precedence of each operators) and outputs the result.

I know I can simply use yacc/bison to achieve this goal, but I'm not
really interested in it.

Could anybody point out some code ?

You can find some algorithm explanations and code samples at:

http://www.google.com/search?q=arithmetic+expression+parser

Nick Keighley · Dec 8, 2009

I would be interested see some piece of C code that parses arithmetic
expressions.

For example, the parser could parse : "(2+3) / 10 - 4" (respecting the
precedence of each operators) and outputs the result.

I know I can simply use yacc/bison to achieve this goal, but I'm not
really interested in it.

Could anybody point out some code ?

there was a long rambling thread recently on the conversion of infix
to postfix some of which used full-blown parsers to solve the problem.
Try "How to convert infix to postfix"; it *is* excessivly long but
does contain some code.

Ben Bacarisse · Dec 8, 2009

Francis Moreau said:
I would be interested see some piece of C code that parses arithmetic
expressions.

For example, the parser could parse : "(2+3) / 10 - 4" (respecting the
precedence of each operators) and outputs the result.

I know I can simply use yacc/bison to achieve this goal, but I'm not
really interested in it.

Could anybody point out some code ?

This came up quite recently in a huge thread called "How to convert
Infix notation to postfix notation". I posted an example of parsing
using operator precedence tables as did, if I recall, a poster called
Gene. If you can't find it, I'll post again.

[The old code get corrected by posting here. The string type should
have been called String.]

Ben Bacarisse · Dec 8, 2009

Ben Bacarisse said:
This came up quite recently in a huge thread called "How to convert
Infix notation to postfix notation". I posted an example of parsing
using operator precedence tables as did, if I recall, a poster called
Gene. If you can't find it, I'll post again.

[The old code get corrected by posting here. The string type should
have been called String.]

Message ID:
(e-mail address removed)

Edward A. Falk · Dec 8, 2009

Hello,

I would be interested see some piece of C code that parses arithmetic
expressions.

First, read up on "recursive descent parsers". They're really easy
and fun to write, and can be used for a million things.

Here's a quick & dirty one I wrote some years ago. It handles the
four basic functions (plus mod). No variables, no functions. It
should do what you want, or at least get you started.

Since all the operators were a single character, it was trivial to
break the input into tokens with a simple regex.

It doesn't do error handling (e.g. divide by zero) and it stops
parsing on syntax error rather than reporting it. Adding these
features is left as an exercise for the reader.

#!/usr/bin/python

'''Evaluate expressions.'''
# Written by Ed Falk

import re

parse_re = re.compile('([-+*/%()])')

def expr(str):
'''Evaluate simple expression using the operators ()+-*/%'''
tokens = [x.strip() for x in parse_re.split(str) if x]
return __sum(tokens)

def __sum(tokens):
rval = __prod(tokens)
while tokens:
if tokens[0] == '+':
tokens.pop(0)
rval += __prod(tokens)
elif tokens[0] == '-':
tokens.pop(0)
rval -= __prod(tokens)
else: break;
return rval

def __prod(tokens):
rval = __val(tokens)
while tokens:
if tokens[0] == '*':
tokens.pop(0)
rval *= __val(tokens)
elif tokens[0] == '/':
tokens.pop(0)
rval /= __val(tokens)
elif tokens[0] == '%':
tokens.pop(0)
rval %= __val(tokens)
else: break;
return rval

def __val(tokens):
if tokens[0] == '(':
tokens.pop(0)
rval = __sum(tokens)
if tokens[0] != ')': raise ArithmeticError
tokens.pop(0)
return rval
elif tokens[0] == '-':
tokens.pop(0)
return -__val(tokens)
else:
return float(tokens.pop(0))

if __name__ == "__main__":
print expr("1")
print expr("2+2")
print expr("1+7/8")
print expr(".875*64")
print expr("1 + 2 * 3/1.5")
print expr("(1+2)*(3+4)")
print expr("-(1+2)*(3+4)")
print expr("(1+2)*-(3+4)")
print expr("22/7")
print expr("355/113")

Phil Carmody · Dec 8, 2009

First, read up on "recursive descent parsers". They're really easy
and fun to write, and can be used for a million things.

Here's a quick & dirty one I wrote some years ago. It handles the
four basic functions (plus mod). No variables, no functions. It
should do what you want, or at least get you started.

Since all the operators were a single character, it was trivial to
break the input into tokens with a simple regex.

Regular expressions aren't C. You'd need a 3rd-party library to do
that. (Or roll your own.)

It doesn't do error handling (e.g. divide by zero) and it stops
parsing on syntax error rather than reporting it. Adding these
features is left as an exercise for the reader.

#!/usr/bin/python

Well, I think we agreed that was legal C

'''Evaluate expressions.'''

But that isn't.

rval = __prod(tokens)
if tokens[0] == '+':

I don't have many positive things to say about the white-space-
incontinent python, but am glad to see that other languages have
made the same 'mistake' as C when it comes to the length of the
assignment and equality-test operator tokens. I have never quite
understood the vehemence of those who take the contrary PoV.

Phil

Francis Moreau · Dec 9, 2009

Ben Bacarisse said:
Ben Bacarisse said:

This came up quite recently in a huge thread called "How to convert
Infix notation to postfix notation". I posted an example of parsing
using operator precedence tables as did, if I recall, a poster called
Gene. If you can't find it, I'll post again.

Click to expand...

[The old code get corrected by posting here. The string type should
have been called String.]

Click to expand...

Message ID:
(e-mail address removed)

Thank you Ben, I'll look into this code and the loooong thread you
pointed out.

Francis Moreau · Dec 9, 2009

First, read up on "recursive descent parsers". They're really easy
and fun to write, and can be used for a million things.

Indeed, looks quite simple although the recursivity may be a bad idea
for long expressions.

Here's a quick & dirty one I wrote some years ago. It handles the
four basic functions (plus mod). No variables, no functions. It
should do what you want, or at least get you started.

Thanks

Nick Keighley · Dec 9, 2009

(e-mail address removed) (Edward A. Falk) writes:

First, read up on "recursive descent parsers". They're really easy
and fun to write, and can be used for a million things.

Click to expand...

Here's a quick & dirty one I wrote some years ago. It handles the
four basic functions (plus mod). No variables, no functions. It
should do what you want, or at least get you started.

Click to expand...

Since all the operators were a single character, it was trivial to
break the input into tokens with a simple regex.

Click to expand...

Regular expressions aren't C. You'd need a 3rd-party library to do
that. (Or roll your own.)

It doesn't do error handling (e.g. divide by zero) and it stops
parsing on syntax error rather than reporting it. Adding these
features is left as an exercise for the reader.

Click to expand...

#!/usr/bin/python

Click to expand...

Well, I think we agreed that was legal C

'''Evaluate expressions.'''

Click to expand...

But that isn't.

rval = __prod(tokens)
if tokens[0] == '+':

Click to expand...

I don't have many positive things to say about the white-space-
incontinent python,

since it looks like my pseudo-code I quite like it. Though I'm pretty
sure they read my mind...

but am glad to see that other languages have
made the same 'mistake' as C when it comes to the length of the
assignment and equality-test operator tokens.

'cos python lifted all its operators from C. I've also thought making
the decision based on length was bizzare.

I have never quite
understood the vehemence of those who take the contrary PoV.

because the mathematics people have been using = as an equality
operator since Al-Khwarizmi was a lad (I exagerate to make a point).
Everyone knows assignment should be indicated <- (or, at a push, :=).
It took me ages to train my brain to the C-way of doing things.

Ben Bacarisse · Dec 9, 2009

Francis Moreau said:
Indeed, looks quite simple although the recursivity may be a bad idea
for long expressions.

Expressions are inherently recursive so trying to avoid a recursive
parser complicates the work. If the code is not recursive, it will
have to push and pop state using an explicit stack anyway.

The recursion is not (usually) linked to the length of the expression
but to its depth so there is little danger of running out of stack
unless (a) you have virtually no stack (as in some embedded systems)
or (b) you have to parse mechanically generated expressions with
unbounded depth.

If (a) is your problem, look up LL(1) parsers.

<snip>

Nick · Dec 9, 2009

Francis Moreau said:
Indeed, looks quite simple although the recursivity may be a bad idea
for long expressions.

Don't forget that you've got two, related, tasks. One is to parse the
input into a series of "tokens" (numbers, operators etc) and the second
is to do things with the tokens.

I tend to use a pair of stacks myself. It's a technique I learnt from
reading the dissembly of the Sinclair Spectrum ROM.

You put numbers on one stack and operators on another. When you come to
push an operator you first see if the top operator on the stack has
higher precedence than the one you are about to push. If so, you pop
and execute it first (popping values from the number stack and putting the
answer there), repeating if necessary.

When finished, start popping and operating on the operator stack. If
either stack runs out unexpectedly, or something is left at the end, the
expresion was malformed.

Francis Moreau · Dec 10, 2009

Expressions are inherently recursive so trying to avoid a recursive
parser complicates the work. If the code is not recursive, it will
have to push and pop state using an explicit stack anyway.

I agree but recursive descent parser looks doing more function calls
than the one you implemented at a first glance.

Actually I was wondering which parser would be the most efficient (in
term of speed an memory consumption) according to the simple grammar
(basic arithmetic expressions) I want to parse.

The recursion is not (usually) linked to the length of the expression
but to its depth so there is little danger of running out of stack
unless (a) you have virtually no stack (as in some embedded systems)
or (b) you have to parse mechanically generated expressions with
unbounded depth.

If (a) is your problem, look up LL(1) parsers.

I have no problem at all, I'm just wondering which parser would be the
best candidate for this purpose.

In fact a good exercice would be to implement the different possible
parsers and see how they are in practice...

Francis Moreau · Dec 10, 2009

Francis Moreau said:
Francis Moreau said:

Don't forget that you've got two, related, tasks. One is to parse the
input into a series of "tokens" (numbers, operators etc) and the second
is to do things with the tokens.

I tend to use a pair of stacks myself. It's a technique I learnt from
reading the dissembly of the Sinclair Spectrum ROM.

You put numbers on one stack and operators on another. When you come to
push an operator you first see if the top operator on the stack has
higher precedence than the one you are about to push. If so, you pop
and execute it first (popping values from the number stack and putting the
answer there), repeating if necessary.

When finished, start popping and operating on the operator stack. If
either stack runs out unexpectedly, or something is left at the end, the
expresion was malformed.

Click to expand...

It sounds that you implement the Shunting-yard algorithm, don't you ?

Click to expand...

Nick · Dec 10, 2009

Francis Moreau said:
It sounds that you implement the Shunting-yard algorithm, don't you ?

Click to expand...

Having looked it up (I'd not heard of the name) it's effectively that,
but with the subsequent evaluation of the RPN version folded into the
processing. So in 3*5+2, the shunting yard would do:
3 -> output [now 3]
* -> stack [now *]
5 -> output [now 3 5]
+ pop operator stack, push +. Output now [3 5 *], operator stack [+]
2 -> output [now 3 5 * 2]
pop operator stack. output 3 5 * 2 +

Mine (well, the Spectrum's) algorithm does:
3 -> value stack [now 3]
* -> operator stack [now *]
5 -> value stack [now 3 5]
+ pop operator stack, execute '*' (pop value stack twice, do *, push
result), + -> operator stack. Value stack now [8], operator stack [+]
2 -> value stack [now 8 2]
pop operator stack. Execute +, result 10.

You lose the benefit of having a "compiled" version of the expression
for later use, but save having two separate phases.

Very simple parser... not for me	36	Jan 27, 2014
Arithmetic will null pointer	19	Jun 16, 2010
Need to evaluate simple arithmetic expression	6	Nov 20, 2007
With this artifact, everyone can easily invent new languages	5	Jan 11, 2014
Making a simple parser	8	Apr 16, 2011
[SUMMARY] Modular Arithmetic (#179)	0	Oct 17, 2008
How to create a simple configurationfile parser?	1	Feb 26, 2004
ANN: pyparsing 1.5.6 released!	1	Jul 1, 2011

Simple arithmetic parser example

Francis Moreau

Eric Sosman

Nick Keighley

Ben Bacarisse

Ben Bacarisse

Edward A. Falk

Phil Carmody

Francis Moreau

Francis Moreau

Nick Keighley

Ben Bacarisse

Nick

Francis Moreau

Francis Moreau

Nick

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads