# Simple arithmetic parser example

Discussion in 'C Programming' started by Francis Moreau, Dec 8, 2009.

1. ### Francis MoreauGuest

Hello,

I would be interested see some piece of C code that parses arithmetic
expressions.

For example, the parser could parse : "(2+3) / 10 - 4" (respecting the
precedence of each operators) and outputs the result.

I know I can simply use yacc/bison to achieve this goal, but I'm not
really interested in it.

Could anybody point out some code ?

Thanks

Francis Moreau, Dec 8, 2009

2. ### Eric SosmanGuest

Francis Moreau wrote:
> Hello,
>
> I would be interested see some piece of C code that parses arithmetic
> expressions.
>
> For example, the parser could parse : "(2+3) / 10 - 4" (respecting the
> precedence of each operators) and outputs the result.
>
> I know I can simply use yacc/bison to achieve this goal, but I'm not
> really interested in it.
>
> Could anybody point out some code ?

You can find some algorithm explanations and code samples at:

--
Eric Sosman
lid

Eric Sosman, Dec 8, 2009

3. ### Nick KeighleyGuest

On 8 Dec, 12:31, Francis Moreau <> wrote:

> I would be interested see some piece of C code that parses arithmetic
> expressions.
>
> For example, the parser could parse : "(2+3) / 10 - 4" (respecting the
> precedence of each operators) and outputs the result.
>
> I know I can simply use yacc/bison to achieve this goal, but I'm not
> really interested in it.
>
> Could anybody point out some code ?

there was a long rambling thread recently on the conversion of infix
to postfix some of which used full-blown parsers to solve the problem.
Try "How to convert infix to postfix"; it *is* excessivly long but
does contain some code.

Nick Keighley, Dec 8, 2009
4. ### Ben BacarisseGuest

Francis Moreau <> writes:

> I would be interested see some piece of C code that parses arithmetic
> expressions.
>
> For example, the parser could parse : "(2+3) / 10 - 4" (respecting the
> precedence of each operators) and outputs the result.
>
> I know I can simply use yacc/bison to achieve this goal, but I'm not
> really interested in it.
>
> Could anybody point out some code ?

This came up quite recently in a huge thread called "How to convert
Infix notation to postfix notation". I posted an example of parsing
using operator precedence tables as did, if I recall, a poster called
Gene. If you can't find it, I'll post again.

[The old code get corrected by posting here. The string type should
have been called String.]

--
Ben.

Ben Bacarisse, Dec 8, 2009
5. ### Ben BacarisseGuest

Ben Bacarisse <> writes:

> This came up quite recently in a huge thread called "How to convert
> Infix notation to postfix notation". I posted an example of parsing
> using operator precedence tables as did, if I recall, a poster called
> Gene. If you can't find it, I'll post again.
>
> [The old code get corrected by posting here. The string type should
> have been called String.]

Message ID:

--
Ben.

Ben Bacarisse, Dec 8, 2009
6. ### Edward A. FalkGuest

In article <>,
Francis Moreau <> wrote:
>Hello,
>
>I would be interested see some piece of C code that parses arithmetic
>expressions.

First, read up on "recursive descent parsers". They're really easy
and fun to write, and can be used for a million things.

Here's a quick & dirty one I wrote some years ago. It handles the
four basic functions (plus mod). No variables, no functions. It
should do what you want, or at least get you started.

Since all the operators were a single character, it was trivial to
break the input into tokens with a simple regex.

It doesn't do error handling (e.g. divide by zero) and it stops
parsing on syntax error rather than reporting it. Adding these
features is left as an exercise for the reader.

#!/usr/bin/python

'''Evaluate expressions.'''
# Written by Ed Falk

import re

parse_re = re.compile('([-+*/%()])')

def expr(str):
'''Evaluate simple expression using the operators ()+-*/%'''
tokens = [x.strip() for x in parse_re.split(str) if x]
return __sum(tokens)

def __sum(tokens):
rval = __prod(tokens)
while tokens:
if tokens[0] == '+':
tokens.pop(0)
rval += __prod(tokens)
elif tokens[0] == '-':
tokens.pop(0)
rval -= __prod(tokens)
else: break;
return rval

def __prod(tokens):
rval = __val(tokens)
while tokens:
if tokens[0] == '*':
tokens.pop(0)
rval *= __val(tokens)
elif tokens[0] == '/':
tokens.pop(0)
rval /= __val(tokens)
elif tokens[0] == '%':
tokens.pop(0)
rval %= __val(tokens)
else: break;
return rval

def __val(tokens):
if tokens[0] == '(':
tokens.pop(0)
rval = __sum(tokens)
if tokens[0] != ')': raise ArithmeticError
tokens.pop(0)
return rval
elif tokens[0] == '-':
tokens.pop(0)
return -__val(tokens)
else:
return float(tokens.pop(0))

if __name__ == "__main__":
print expr("1")
print expr("2+2")
print expr("1+7/8")
print expr(".875*64")
print expr("1 + 2 * 3/1.5")
print expr("(1+2)*(3+4)")
print expr("-(1+2)*(3+4)")
print expr("(1+2)*-(3+4)")
print expr("22/7")
print expr("355/113")
--
-Ed Falk,
http://thespamdiaries.blogspot.com/

Edward A. Falk, Dec 8, 2009
7. ### Phil CarmodyGuest

(Edward A. Falk) writes:
> In article <>,
> Francis Moreau <> wrote:
>>Hello,
>>
>>I would be interested see some piece of C code that parses arithmetic
>>expressions.

>
> First, read up on "recursive descent parsers". They're really easy
> and fun to write, and can be used for a million things.
>
> Here's a quick & dirty one I wrote some years ago. It handles the
> four basic functions (plus mod). No variables, no functions. It
> should do what you want, or at least get you started.
>
> Since all the operators were a single character, it was trivial to
> break the input into tokens with a simple regex.

Regular expressions aren't C. You'd need a 3rd-party library to do

> It doesn't do error handling (e.g. divide by zero) and it stops
> parsing on syntax error rather than reporting it. Adding these
> features is left as an exercise for the reader.
>
> #!/usr/bin/python

Well, I think we agreed that was legal C

> '''Evaluate expressions.'''

But that isn't.

> rval = __prod(tokens)
> if tokens[0] == '+':

I don't have many positive things to say about the white-space-
incontinent python, but am glad to see that other languages have
made the same 'mistake' as C when it comes to the length of the
assignment and equality-test operator tokens. I have never quite
understood the vehemence of those who take the contrary PoV.

Phil
--
Any true emperor never needs to wear clothes. -- Devany on r.a.s.f1

Phil Carmody, Dec 8, 2009
8. ### Francis MoreauGuest

On Dec 8, 3:00 pm, Ben Bacarisse <> wrote:
> Ben Bacarisse <> writes:
> > This came up quite recently in a huge thread called "How to convert
> > Infix notation to postfix notation".  I posted an example of parsing
> > using operator precedence tables as did, if I recall, a poster called
> > Gene.  If you can't find it, I'll post again.

>
> > [The old code get corrected by posting here.  The string type should
> > have been called String.]

>
> Message ID:
>
>

Thank you Ben, I'll look into this code and the loooong thread you
pointed out.

Francis Moreau, Dec 9, 2009
9. ### Francis MoreauGuest

On Dec 8, 10:02 pm, (Edward A. Falk) wrote:
> In article <..com>,
> Francis Moreau  <> wrote:
>
> >Hello,

>
> >I would be interested see some piece of C code that parses arithmetic
> >expressions.

>
> First, read up on "recursive descent parsers".  They're really easy
> and fun to write, and can be used for a million things.
>

Indeed, looks quite simple although the recursivity may be a bad idea
for long expressions.

> Here's a quick & dirty one I wrote some years ago.  It handles the
> four basic functions (plus mod).  No variables, no functions.  It
> should do what you want, or at least get you started.
>

Thanks

Francis Moreau, Dec 9, 2009
10. ### Nick KeighleyGuest

On 8 Dec, 23:59, Phil Carmody <> wrote:
> (Edward A. Falk) writes:
>
>
>
>
>
> > In article <>,
> > Francis Moreau  <> wrote:
> >>Hello,

>
> >>I would be interested see some piece of C code that parses arithmetic
> >>expressions.

>
> > First, read up on "recursive descent parsers".  They're really easy
> > and fun to write, and can be used for a million things.

>
> > Here's a quick & dirty one I wrote some years ago.  It handles the
> > four basic functions (plus mod).  No variables, no functions.  It
> > should do what you want, or at least get you started.

>
> > Since all the operators were a single character, it was trivial to
> > break the input into tokens with a simple regex.

>
> Regular expressions aren't C. You'd need a 3rd-party library to do
> that. (Or roll your own.)
>
> > It doesn't do error handling (e.g. divide by zero) and it stops
> > parsing on syntax error rather than reporting it.  Adding these
> > features is left as an exercise for the reader.

>
> > #!/usr/bin/python

>
> Well, I think we agreed that was legal C
>
> > '''Evaluate expressions.'''

>
> But that isn't.
>
> >   rval = __prod(tokens)
> >     if tokens[0] == '+':

>
> I don't have many positive things to say about the white-space-
> incontinent python,

since it looks like my pseudo-code I quite like it. Though I'm pretty

> but am glad to see that other languages have
> made the same 'mistake' as C when it comes to the length of the
> assignment and equality-test operator tokens.

'cos python lifted all its operators from C. I've also thought making
the decision based on length was bizzare.

> I have never quite
> understood the vehemence of those who take the contrary PoV.

because the mathematics people have been using = as an equality
operator since Al-Khwarizmi was a lad (I exagerate to make a point).
Everyone knows assignment should be indicated <- (or, at a push, :=).
It took me ages to train my brain to the C-way of doing things.

Nick Keighley, Dec 9, 2009
11. ### Ben BacarisseGuest

Francis Moreau <> writes:

> On Dec 8, 10:02Â pm, (Edward A. Falk) wrote:
>> In article <>,
>> Francis Moreau Â <> wrote:

<snip>
>> >I would be interested see some piece of C code that parses arithmetic
>> >expressions.

>>
>> First, read up on "recursive descent parsers". Â They're really easy
>> and fun to write, and can be used for a million things.

>
> Indeed, looks quite simple although the recursivity may be a bad idea
> for long expressions.

Expressions are inherently recursive so trying to avoid a recursive
parser complicates the work. If the code is not recursive, it will
have to push and pop state using an explicit stack anyway.

The recursion is not (usually) linked to the length of the expression
but to its depth so there is little danger of running out of stack
unless (a) you have virtually no stack (as in some embedded systems)
or (b) you have to parse mechanically generated expressions with
unbounded depth.

If (a) is your problem, look up LL(1) parsers.

<snip>
--
Ben.

Ben Bacarisse, Dec 9, 2009
12. ### NickGuest

Francis Moreau <> writes:

> On Dec 8, 10:02Â pm, (Edward A. Falk) wrote:
>> In article <>,
>> Francis Moreau Â <> wrote:
>>
>> >Hello,

>>
>> >I would be interested see some piece of C code that parses arithmetic
>> >expressions.

>>
>> First, read up on "recursive descent parsers". Â They're really easy
>> and fun to write, and can be used for a million things.
>>

>
> Indeed, looks quite simple although the recursivity may be a bad idea
> for long expressions.

Don't forget that you've got two, related, tasks. One is to parse the
input into a series of "tokens" (numbers, operators etc) and the second
is to do things with the tokens.

I tend to use a pair of stacks myself. It's a technique I learnt from
reading the dissembly of the Sinclair Spectrum ROM.

You put numbers on one stack and operators on another. When you come to
push an operator you first see if the top operator on the stack has
higher precedence than the one you are about to push. If so, you pop
and execute it first (popping values from the number stack and putting the

When finished, start popping and operating on the operator stack. If
either stack runs out unexpectedly, or something is left at the end, the
expresion was malformed.
--
Online waterways route planner: http://canalplan.org.uk
development version: http://canalplan.eu

Nick, Dec 9, 2009
13. ### Francis MoreauGuest

On Dec 9, 2:55 pm, Ben Bacarisse <> wrote:
>
> Expressions are inherently recursive so trying to avoid a recursive
> parser complicates the work.  If the code is not recursive, it will
> have to push and pop state using an explicit stack anyway.

I agree but recursive descent parser looks doing more function calls
than the one you implemented at a first glance.

Actually I was wondering which parser would be the most efficient (in
term of speed an memory consumption) according to the simple grammar
(basic arithmetic expressions) I want to parse.

>
> The recursion is not (usually) linked to the length of the expression
> but to its depth so there is little danger of running out of stack
> unless (a) you have virtually no stack (as in some embedded systems)
> or (b) you have to parse mechanically generated expressions with
> unbounded depth.
>
> If (a) is your problem, look up LL(1) parsers.

I have no problem at all, I'm just wondering which parser would be the
best candidate for this purpose.

In fact a good exercice would be to implement the different possible
parsers and see how they are in practice...

Francis Moreau, Dec 10, 2009
14. ### Francis MoreauGuest

On Dec 9, 7:54 pm, Nick <> wrote:
> Francis Moreau <> writes:
> > On Dec 8, 10:02 pm, (Edward A. Falk) wrote:
> >> In article <>,
> >> Francis Moreau  <> wrote:

>
> >> >Hello,

>
> >> >I would be interested see some piece of C code that parses arithmetic
> >> >expressions.

>
> >> First, read up on "recursive descent parsers".  They're really easy
> >> and fun to write, and can be used for a million things.

>
> > Indeed, looks quite simple although the recursivity may be a bad idea
> > for long expressions.

>
> Don't forget that you've got two, related, tasks.  One is to parse the
> input into a series of "tokens" (numbers, operators etc) and the second
> is to do things with the tokens.
>
> I tend to use a pair of stacks myself.  It's a technique I learnt from
> reading the dissembly of the Sinclair Spectrum ROM.
>
> You put numbers on one stack and operators on another.  When you come to
> push an operator you first see if the top operator on the stack has
> higher precedence than the one you are about to push.  If so, you pop
> and execute it first (popping values from the number stack and putting the
> answer there), repeating if necessary.
>
> When finished, start popping and operating on the operator stack.  If
> either stack runs out unexpectedly, or something is left at the end, the
> expresion was malformed.

It sounds that you implement the Shunting-yard algorithm, don't you ?

Francis Moreau, Dec 10, 2009
15. ### NickGuest

Francis Moreau <> writes:

> On Dec 9, 7:54Â pm, Nick <> wrote:
>> Francis Moreau <> writes:
>> > On Dec 8, 10:02Â pm, (Edward A. Falk) wrote:
>> >> In article <>,
>> >> Francis Moreau Â <> wrote:

>>
>> >> >Hello,

>>
>> >> >I would be interested see some piece of C code that parses arithmetic
>> >> >expressions.

>>
>> >> First, read up on "recursive descent parsers". Â They're really easy
>> >> and fun to write, and can be used for a million things.

>>
>> > Indeed, looks quite simple although the recursivity may be a bad idea
>> > for long expressions.

>>
>> Don't forget that you've got two, related, tasks. Â One is to parse the
>> input into a series of "tokens" (numbers, operators etc) and the second
>> is to do things with the tokens.
>>
>> I tend to use a pair of stacks myself. Â It's a technique I learnt from
>> reading the dissembly of the Sinclair Spectrum ROM. Â
>>
>> You put numbers on one stack and operators on another. Â When you come to
>> push an operator you first see if the top operator on the stack has
>> higher precedence than the one you are about to push. Â If so, you pop
>> and execute it first (popping values from the number stack and putting the
>> answer there), repeating if necessary.
>>
>> When finished, start popping and operating on the operator stack. Â If
>> either stack runs out unexpectedly, or something is left at the end, the
>> expresion was malformed.

>
> It sounds that you implement the Shunting-yard algorithm, don't you ?

Having looked it up (I'd not heard of the name) it's effectively that,
but with the subsequent evaluation of the RPN version folded into the
processing. So in 3*5+2, the shunting yard would do:
3 -> output [now 3]
* -> stack [now *]
5 -> output [now 3 5]
+ pop operator stack, push +. Output now [3 5 *], operator stack [+]
2 -> output [now 3 5 * 2]
pop operator stack. output 3 5 * 2 +

Mine (well, the Spectrum's) algorithm does:
3 -> value stack [now 3]
* -> operator stack [now *]
5 -> value stack [now 3 5]
+ pop operator stack, execute '*' (pop value stack twice, do *, push
result), + -> operator stack. Value stack now [8], operator stack [+]
2 -> value stack [now 8 2]
pop operator stack. Execute +, result 10.

You lose the benefit of having a "compiled" version of the expression
for later use, but save having two separate phases.
--
Online waterways route planner: http://canalplan.org.uk
development version: http://canalplan.eu

Nick, Dec 10, 2009