associativity and precedence

junky_fellow · Oct 13, 2006

Hi,

I have a very basic doubt about associativity and precedence of
operators. I am not a computer science person and may find it quite
weird.

Consider an expression

4+5*2

As per the tutorials that I read earlier, there could be two
different parse trees possible for this.

+ and *
/ \ / \
4 * + 3
/ \ / \
5 2 4 5

So, to avoid such ambiguities, the language specify the associativity
and precedence rules so that the compiler always generate a unique
parse tree for an expression.

But, what I feel that the lexical analyzer always starts reading the
tokens from left. If this is the case it will always generate the
second parse tree and the expression is equivalent to
(4+5)*2.

and there will be no ambiguity. If we start from left we will always
have only one parse tree and there is no need for any associativity and
precedence.

Consider other example,
4+5+8
The articles say that it may be treated as (4+5)+8 Or 4+(5+8) and to
avoid this associativity rule is required that says "+" operator is
left associative.
I feel that if the lexical analyzer starts from left it should always
be equivalent to (4+5)+8 and therefore no associativity rule is
required.

I know there is something terribly wrong in my understanding. So,
please let me know what is that I am missing ? Any pointers to some
good links or books that would be useful for a beginner like me.

thanks a lot for any help ....

Tak-Shing Chan · Oct 13, 2006

Hi,

I have a very basic doubt about associativity and precedence of
operators. I am not a computer science person and may find it quite
weird.

Consider an expression

4+5*2

As per the tutorials that I read earlier, there could be two
different parse trees possible for this.

+ and *
/ \ / \
4 * + 3
/ \ / \
5 2 4 5

So, to avoid such ambiguities, the language specify the associativity
and precedence rules so that the compiler always generate a unique
parse tree for an expression.

But, what I feel that the lexical analyzer always starts reading the
tokens from left. If this is the case it will always generate the
second parse tree and the expression is equivalent to
(4+5)*2.

and there will be no ambiguity. If we start from left we will always
have only one parse tree and there is no need for any associativity and
precedence.

Consider other example,
4+5+8
The articles say that it may be treated as (4+5)+8 Or 4+(5+8) and to
avoid this associativity rule is required that says "+" operator is
left associative.
I feel that if the lexical analyzer starts from left it should always
be equivalent to (4+5)+8 and therefore no associativity rule is
required.

I know there is something terribly wrong in my understanding. So,
please let me know what is that I am missing ? Any pointers to some
good links or books that would be useful for a beginner like me.

thanks a lot for any help ....

The C standard resolves this issue by specifying an
unambiguous grammar for * and +:

multiplicative-expression:
cast-expression
multiplicative-expression * cast-expression
multiplicative-expression / cast-expression
multiplicative-expression % cast-expression

additive-expression:
multiplicative-expression
additive-expression + multiplicative-expression
additive-expression - multiplicative-expression

Given this, there is only one way to parse 4+5*2, which is
your parse tree on the left.

Tak-Shing

Chris Dollin · Oct 13, 2006

Hi,

I have a very basic doubt about associativity and precedence of
operators. I am not a computer science person and may find it quite
weird.

This isn't really topical for comp.lang.c, but since precedence and
associativity (not to mention their half-brother, evaluation order)
seem to cause problems to C beginners, I'll give it a go.

Consider an expression

4+5*2

As per the tutorials that I read earlier,

(which ones?)

there could be two different parse trees possible for this.

No parse tree without a grammar. What the parse tree is depends
on the grammar that's specified. For an expression grammar,
likely productions are

exp ::= number | exp + exp | exp * exp ...

This grammar is ambiguous, as you say, since:

+ and *
/ \ / \
4 * + 3
/ \ / \
5 2 4 5

would both be valid parse trees.

So, to avoid such ambiguities, the language specify the associativity
and precedence rules so that the compiler always generate a unique
parse tree for an expression.

No. To avoid such ambiguities, an unambiguous grammar is used. /One
way/ to disambiguate is to apply precedence and associativity rules.
Another is to write the grammar so that they're unnecessary:

exp ::= additive
additive ::= additive | additive + multiplicative
multiplicative ::= operand | operand * multiplicative
operand ::= integer

Here we can say "+ associates to the left", because we can't
parse "1 + 2 + 3" as "1 + additive" because the right operand
of `+` has to be multiplicative and hence (in this grammar) can't
include a `+` operator.

Similarly we can say that `* has precedence over +` because
"1 + 2 * 3" can't be parsed as "operand * multiplicative" because
`operand` can't include `+` in this grammar.

But these "associative" and "precedence" rules are just shorthands
for talking about the grammar. The C grammar of the (either) Standard
/does not use them/; it has explicit productions in the grammar
instead.

It is, I think, a matter of opinion/style (and parser-generation tool)
which is the "better" way of describing how operators and operands
group in a language. [I lean toward defining it with a full grammar
but ensuring that the shorthand terms work clearly with it.]

But, what I feel that the lexical analyzer always starts reading the
tokens from left. If this is the case it will always generate the
second parse tree and the expression is equivalent to
(4+5)*2.

Well ... no.

First, a piece of terminology: it's not the lexical analyser that parses
the expression, it's the parser. The parser eats the lexical tokens
the lexical analyser provides.

and there will be no ambiguity. If we start from left we will always
have only one parse tree and there is no need for any associativity and
precedence.

If the parser starts from the left and eats as few tokens at a time as
possible, it has /imposed/ (left) associativity and (no) precedence on
the grammar.

Here's an expression parser. (Incomplete pseudo-code, alas, for brevity.)

parseExpression():
var L = parseOperand()
if next token is operator then
var op = parseOperator()
var R = parseOperand()
return infix( L, op, R )
else
return L
endif

This does /right/-association, without precedence. If you find an
operator, you have to /decide/ whether you want to eat the
minimum (left association) or maximum (right association) or
some intermediate (precedence) size of operand.

Consider other example,
4+5+8
The articles say that it may be treated as (4+5)+8 Or 4+(5+8) and to
avoid this associativity rule is required that says "+" operator is
left associative.

You have to disambiguate /somehow/, yes: you have to answer the
question "what should this mean?", or, at least, "what structure
should this have?"

I feel that if the lexical analyzer starts from left it should always
be equivalent to (4+5)+8 and therefore no associativity rule is
required.

The choice doesn't go away if you go left-to-right, but it's not so
easy to /see/ that it's a choice.

I made a mistake like this once [1]. I was designing and implementing
a (Pure-)functional programming language, in which I had the usual
expressions (leaving out a whole host of details):

Exp ::=
let Dec in Exp [Dec ::= Id = Exp, if it matters]
| Exp Exp
| Id

Writing the parser is dead easy:

parseExp()
var F = parseOperand()
while current token can start an Exp do
F := apply( F, parseExp() )
endwhile
return F

parseOperand()
if current token is `let` then
skip token, var D = parseDec()
skip `in`, var E = parseExp()
return let( D, E )
else
return parseId()
endif

Brill, no problem, this ensures that the Exp of a let-expression is as
long as possible, as tradition required.

What I /hadn't/ noticed was that this grammar is ambiguous, because
not only does

let D in F X

have two parses /according to the grammar/, viz

let D in (F X)
(let D in F) X

but there's a whole other issue of

F let D in X Y

which I hadn't even noticed. It took exposure to an LL(1) parser
generator to show me that although my /code/ was one-token-lookahead,
no backtracking, potter along left-to-right, my /grammar/ was not LL(1)
and Decisions Had To Be Made.

I made them by requiring that `let`s had to appear as "full expressions",
ie not embedded in other expressions unless bracketed. Since this covered
about 99.17254% of the actual code written, it was not a problem. (I
paid off the other 0.82746% of the code with a promise of a quick
assignment with the garbage collector.)

[1] OK, once /that I noticed/.

Dave Thompson · Nov 6, 2006

No. To avoid such ambiguities, an unambiguous grammar is used. /One
way/ to disambiguate is to apply precedence and associativity rules.
Another is to write the grammar so that they're unnecessary:

If the parser starts from the left and eats as few tokens at a time as
possible, it has /imposed/ (left) associativity and (no) precedence on
the grammar.

'eats' here is informal and ambiguous(!) to me. If you mean the parser
always matches fewest and most-left tokens to a rule (aka production)
RHS (right hand side), concur; in the conventional LR terminology this
is choosing reduce over shift.

Here's an expression parser. (Incomplete pseudo-code, alas, for brevity.)

parseExpression():
var L = parseOperand()
if next token is operator then
var op = parseOperator()
var R = parseOperand()
return infix( L, op, R )
else
return L
endif

This does /right/-association, without precedence. If you find an
operator, you have to /decide/ whether you want to eat the
minimum (left association) or maximum (right association) or
some intermediate (precedence) size of operand.

Assuming parseOperand() never 'eats' an (unnested) operator token,
that is either stops at or backs up over such, which is the only way
this works, this code doesn't associate at all. You need to make the
var R = parseExpression() to do right association in this technique,
which is conventionally called recursive descent. If you want left
association with the usual unidirectional stream model of input in RD,
you need backtracking. (Which is possible, but not easy to show in a
pedagogical example.)

- David.Thompson1 at worldnet.att.net

Chris Dollin · Nov 6, 2006

Dave said:
'eats' here is informal and ambiguous(!) to me.

It's meant to be informal. If that ambiguated it, that's a shame.

If you mean the parser
always matches fewest and most-left tokens to a rule (aka production)
RHS (right hand side), concur; in the conventional LR terminology this
is choosing reduce over shift.

No, I didn't mean that, although it's a possible formalisation.
GRR GRR GRR re-reading this I see I messed up completely. That's
what comes of writing code on the Friday afternoon before
taking a week's holiday (and possibly missing corrective
posts). Apologies to anyone who was mislead by the kidneys.

As Dave observed:

You need to make the
var R = parseExpression() to do right association in this technique
Yes.

which is conventionally called recursive descent.
Yes.

If you want left
association with the usual unidirectional stream model of input in RD,
you need backtracking. (Which is possible, but not easy to show in a
pedagogical example.)

You don't need backtracking:

parseExpression():
var L = parseOperand()
while next token is operator then
var op = parseOperator()
var R = parseOperand()
L := infix( L, op, R )
endwhile
return L

Unless, of course, the dingoes have been at my brain again.

raxitsheth2000 · Nov 6, 2006

Hi,

I have a very basic doubt about associativity and precedence of
operators. I am not a computer science person and may find it quite
weird.

Consider an expression

4+5*2

As per the tutorials that I read earlier, there could be two
different parse trees possible for this.

+ and *
/ \ / \
4 * + 3
/ \ / \
5 2 4 5

So, to avoid such ambiguities, the language specify the associativity
and precedence rules so that the compiler always generate a unique
parse tree for an expression.

But, what I feel that the lexical analyzer always starts reading the
tokens from left. If this is the case it will always generate the
second parse tree and the expression is equivalent to
(4+5)*2.

Reading and Evaluating are Not Same.
Lexical Ana. just separate out token (ideally) and mostly left to
right.
Parser ask LA to give next token

and there will be no ambiguity. If we start from left we will always
have only one parse tree and there is no need for any associativity and
precedence.

I think you are telling like 4+5*8 should be evaluated as (4+5)*8 by
ignoring the precedence.
as a language designer it is not good.

continuing with above rule what if someone want to do evaluation like
4+(5*8) one need to enter the expression like 5*8+4. In General if
precedence is not explicitly defined by language, then PROGRAMMER has
to manually enter the expression in left-to-right (or language defined)
order and in language defined precedence rule.

Homework, how would u write manually the expression so the equivalent
evaluation is same as of 2+(3*5)+(4*6) <--there are really no bracket
in expression.

3*5+4*6+2<----This is not Correct it is doing

i think this will help you to make clear need of operator precedence in
good language.

Consider other example,
4+5+8
The articles say that it may be treated as (4+5)+8 Or 4+(5+8) and to
avoid this associativity rule is required that says "+" operator is
left associative.
I feel that if the lexical analyzer starts from left it should always
be equivalent to (4+5)+8 and therefore no associativity rule is
required.

but precedence required so that wrong evaluation not carried out.

I know there is something terribly wrong in my understanding. So,
please let me know what is that I am missing ? Any pointers to some
good links or books that would be useful for a beginner like me.

If we start from left we will always
have only one parse tree and there is no need for any associativity and
precedence.

if compiler alwas start from left (evaluation ) then programmer is in
trouble

Think

--raxit

Order of precedence and associativity	22	Jun 15, 2010
using precedence and associativity to parse an expression	6	Feb 15, 2010
Precedence Misunderstanding	43	Jul 17, 2012
Logical operator precedence & associativity	9	Nov 11, 2006
Operator precedence.	41	May 7, 2012
Precedence,Associativity and order of evaluation	7	Nov 26, 2007
verifying precedence	15	May 1, 2014
associativity question	5	Feb 6, 2007

associativity and precedence

junky_fellow

Tak-Shing Chan

Chris Dollin

Dave Thompson

Chris Dollin

raxitsheth2000

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads