Simple arithmetic parser example

F

Francis Moreau

Hello,

I would be interested see some piece of C code that parses arithmetic
expressions.

For example, the parser could parse : "(2+3) / 10 - 4" (respecting the
precedence of each operators) and outputs the result.

I know I can simply use yacc/bison to achieve this goal, but I'm not
really interested in it.

Could anybody point out some code ?

Thanks
 
E

Eric Sosman

Francis said:
Hello,

I would be interested see some piece of C code that parses arithmetic
expressions.

For example, the parser could parse : "(2+3) / 10 - 4" (respecting the
precedence of each operators) and outputs the result.

I know I can simply use yacc/bison to achieve this goal, but I'm not
really interested in it.

Could anybody point out some code ?

You can find some algorithm explanations and code samples at:

http://www.google.com/search?q=arithmetic+expression+parser
 
N

Nick Keighley

I would be interested see some piece of C code that parses arithmetic
expressions.

For example, the parser could parse : "(2+3) / 10 - 4" (respecting the
precedence of each operators) and outputs the result.

I know I can simply use yacc/bison to achieve this goal, but I'm not
really interested in it.

Could anybody point out some code ?

there was a long rambling thread recently on the conversion of infix
to postfix some of which used full-blown parsers to solve the problem.
Try "How to convert infix to postfix"; it *is* excessivly long but
does contain some code.
 
B

Ben Bacarisse

Francis Moreau said:
I would be interested see some piece of C code that parses arithmetic
expressions.

For example, the parser could parse : "(2+3) / 10 - 4" (respecting the
precedence of each operators) and outputs the result.

I know I can simply use yacc/bison to achieve this goal, but I'm not
really interested in it.

Could anybody point out some code ?

This came up quite recently in a huge thread called "How to convert
Infix notation to postfix notation". I posted an example of parsing
using operator precedence tables as did, if I recall, a poster called
Gene. If you can't find it, I'll post again.

[The old code get corrected by posting here. The string type should
have been called String.]
 
B

Ben Bacarisse

Ben Bacarisse said:
This came up quite recently in a huge thread called "How to convert
Infix notation to postfix notation". I posted an example of parsing
using operator precedence tables as did, if I recall, a poster called
Gene. If you can't find it, I'll post again.

[The old code get corrected by posting here. The string type should
have been called String.]

Message ID:
(e-mail address removed)
 
E

Edward A. Falk

Hello,

I would be interested see some piece of C code that parses arithmetic
expressions.

First, read up on "recursive descent parsers". They're really easy
and fun to write, and can be used for a million things.

Here's a quick & dirty one I wrote some years ago. It handles the
four basic functions (plus mod). No variables, no functions. It
should do what you want, or at least get you started.

Since all the operators were a single character, it was trivial to
break the input into tokens with a simple regex.

It doesn't do error handling (e.g. divide by zero) and it stops
parsing on syntax error rather than reporting it. Adding these
features is left as an exercise for the reader.


#!/usr/bin/python

'''Evaluate expressions.'''
# Written by Ed Falk

import re

parse_re = re.compile('([-+*/%()])')

def expr(str):
'''Evaluate simple expression using the operators ()+-*/%'''
tokens = [x.strip() for x in parse_re.split(str) if x]
return __sum(tokens)

def __sum(tokens):
rval = __prod(tokens)
while tokens:
if tokens[0] == '+':
tokens.pop(0)
rval += __prod(tokens)
elif tokens[0] == '-':
tokens.pop(0)
rval -= __prod(tokens)
else: break;
return rval

def __prod(tokens):
rval = __val(tokens)
while tokens:
if tokens[0] == '*':
tokens.pop(0)
rval *= __val(tokens)
elif tokens[0] == '/':
tokens.pop(0)
rval /= __val(tokens)
elif tokens[0] == '%':
tokens.pop(0)
rval %= __val(tokens)
else: break;
return rval

def __val(tokens):
if tokens[0] == '(':
tokens.pop(0)
rval = __sum(tokens)
if tokens[0] != ')': raise ArithmeticError
tokens.pop(0)
return rval
elif tokens[0] == '-':
tokens.pop(0)
return -__val(tokens)
else:
return float(tokens.pop(0))

if __name__ == "__main__":
print expr("1")
print expr("2+2")
print expr("1+7/8")
print expr(".875*64")
print expr("1 + 2 * 3/1.5")
print expr("(1+2)*(3+4)")
print expr("-(1+2)*(3+4)")
print expr("(1+2)*-(3+4)")
print expr("22/7")
print expr("355/113")
 
P

Phil Carmody

First, read up on "recursive descent parsers". They're really easy
and fun to write, and can be used for a million things.

Here's a quick & dirty one I wrote some years ago. It handles the
four basic functions (plus mod). No variables, no functions. It
should do what you want, or at least get you started.

Since all the operators were a single character, it was trivial to
break the input into tokens with a simple regex.

Regular expressions aren't C. You'd need a 3rd-party library to do
that. (Or roll your own.)
It doesn't do error handling (e.g. divide by zero) and it stops
parsing on syntax error rather than reporting it. Adding these
features is left as an exercise for the reader.

#!/usr/bin/python

Well, I think we agreed that was legal C
'''Evaluate expressions.'''

But that isn't.
rval = __prod(tokens)
if tokens[0] == '+':

I don't have many positive things to say about the white-space-
incontinent python, but am glad to see that other languages have
made the same 'mistake' as C when it comes to the length of the
assignment and equality-test operator tokens. I have never quite
understood the vehemence of those who take the contrary PoV.

Phil
 
F

Francis Moreau

Ben Bacarisse said:
This came up quite recently in a huge thread called "How to convert
Infix notation to postfix notation".  I posted an example of parsing
using operator precedence tables as did, if I recall, a poster called
Gene.  If you can't find it, I'll post again.
[The old code get corrected by posting here.  The string type should
have been called String.]

Message ID:
(e-mail address removed)

Thank you Ben, I'll look into this code and the loooong thread you
pointed out.
 
F

Francis Moreau

First, read up on "recursive descent parsers".  They're really easy
and fun to write, and can be used for a million things.

Indeed, looks quite simple although the recursivity may be a bad idea
for long expressions.
Here's a quick & dirty one I wrote some years ago.  It handles the
four basic functions (plus mod).  No variables, no functions.  It
should do what you want, or at least get you started.

Thanks
 
N

Nick Keighley

(e-mail address removed) (Edward A. Falk) writes:




First, read up on "recursive descent parsers".  They're really easy
and fun to write, and can be used for a million things.
Here's a quick & dirty one I wrote some years ago.  It handles the
four basic functions (plus mod).  No variables, no functions.  It
should do what you want, or at least get you started.
Since all the operators were a single character, it was trivial to
break the input into tokens with a simple regex.

Regular expressions aren't C. You'd need a 3rd-party library to do
that. (Or roll your own.)
It doesn't do error handling (e.g. divide by zero) and it stops
parsing on syntax error rather than reporting it.  Adding these
features is left as an exercise for the reader.
#!/usr/bin/python

Well, I think we agreed that was legal C
'''Evaluate expressions.'''

But that isn't.
  rval = __prod(tokens)
    if tokens[0] == '+':

I don't have many positive things to say about the white-space-
incontinent python,

since it looks like my pseudo-code I quite like it. Though I'm pretty
sure they read my mind...
but am glad to see that other languages have
made the same 'mistake' as C when it comes to the length of the
assignment and equality-test operator tokens.

'cos python lifted all its operators from C. I've also thought making
the decision based on length was bizzare.
I have never quite
understood the vehemence of those who take the contrary PoV.

because the mathematics people have been using = as an equality
operator since Al-Khwarizmi was a lad (I exagerate to make a point).
Everyone knows assignment should be indicated <- (or, at a push, :=).
It took me ages to train my brain to the C-way of doing things.
 
B

Ben Bacarisse

Francis Moreau said:
Indeed, looks quite simple although the recursivity may be a bad idea
for long expressions.

Expressions are inherently recursive so trying to avoid a recursive
parser complicates the work. If the code is not recursive, it will
have to push and pop state using an explicit stack anyway.

The recursion is not (usually) linked to the length of the expression
but to its depth so there is little danger of running out of stack
unless (a) you have virtually no stack (as in some embedded systems)
or (b) you have to parse mechanically generated expressions with
unbounded depth.

If (a) is your problem, look up LL(1) parsers.

<snip>
 
N

Nick

Francis Moreau said:
Indeed, looks quite simple although the recursivity may be a bad idea
for long expressions.

Don't forget that you've got two, related, tasks. One is to parse the
input into a series of "tokens" (numbers, operators etc) and the second
is to do things with the tokens.

I tend to use a pair of stacks myself. It's a technique I learnt from
reading the dissembly of the Sinclair Spectrum ROM.

You put numbers on one stack and operators on another. When you come to
push an operator you first see if the top operator on the stack has
higher precedence than the one you are about to push. If so, you pop
and execute it first (popping values from the number stack and putting the
answer there), repeating if necessary.

When finished, start popping and operating on the operator stack. If
either stack runs out unexpectedly, or something is left at the end, the
expresion was malformed.
 
F

Francis Moreau

Expressions are inherently recursive so trying to avoid a recursive
parser complicates the work.  If the code is not recursive, it will
have to push and pop state using an explicit stack anyway.

I agree but recursive descent parser looks doing more function calls
than the one you implemented at a first glance.

Actually I was wondering which parser would be the most efficient (in
term of speed an memory consumption) according to the simple grammar
(basic arithmetic expressions) I want to parse.
The recursion is not (usually) linked to the length of the expression
but to its depth so there is little danger of running out of stack
unless (a) you have virtually no stack (as in some embedded systems)
or (b) you have to parse mechanically generated expressions with
unbounded depth.

If (a) is your problem, look up LL(1) parsers.

I have no problem at all, I'm just wondering which parser would be the
best candidate for this purpose.

In fact a good exercice would be to implement the different possible
parsers and see how they are in practice...
 
F

Francis Moreau

Francis Moreau said:
Don't forget that you've got two, related, tasks.  One is to parse the
input into a series of "tokens" (numbers, operators etc) and the second
is to do things with the tokens.

I tend to use a pair of stacks myself.  It's a technique I learnt from
reading the dissembly of the Sinclair Spectrum ROM.  

You put numbers on one stack and operators on another.  When you come to
push an operator you first see if the top operator on the stack has
higher precedence than the one you are about to push.  If so, you pop
and execute it first (popping values from the number stack and putting the
answer there), repeating if necessary.

When finished, start popping and operating on the operator stack.  If
either stack runs out unexpectedly, or something is left at the end, the
expresion was malformed.

It sounds that you implement the Shunting-yard algorithm, don't you ?
 
N

Nick

Francis Moreau said:
It sounds that you implement the Shunting-yard algorithm, don't you ?

Having looked it up (I'd not heard of the name) it's effectively that,
but with the subsequent evaluation of the RPN version folded into the
processing. So in 3*5+2, the shunting yard would do:
3 -> output [now 3]
* -> stack [now *]
5 -> output [now 3 5]
+ pop operator stack, push +. Output now [3 5 *], operator stack [+]
2 -> output [now 3 5 * 2]
pop operator stack. output 3 5 * 2 +

Mine (well, the Spectrum's) algorithm does:
3 -> value stack [now 3]
* -> operator stack [now *]
5 -> value stack [now 3 5]
+ pop operator stack, execute '*' (pop value stack twice, do *, push
result), + -> operator stack. Value stack now [8], operator stack [+]
2 -> value stack [now 8 2]
pop operator stack. Execute +, result 10.

You lose the benefit of having a "compiled" version of the expression
for later use, but save having two separate phases.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top