resolving ambiguities while parsing a c expression

J

junky_fellow

Hi,

Sorry, for asking similar questions again and again.

1) I want to know how should we reslove the ambiguities in a c
expression ?
Should we use precedence table as mentioned in K&R book or should
we refer to the ANSI C grammar ?

2) As told by many people in this newsgroup that ANSI C grammar does
not use
precedence to resolve the ambiguities. In fact, the grammar has been
written in
such a way so that no precedence is required to resolve the
ambiguities (please
correct me if I am wrong).
My question is, why "precedence" is not used in ANSI C grammar to
resolve the
ambiguities ? Is it possible to write a new ANSI C grammar file (yacc
file) that
uses precedence ? Is there any such file already available on the web
?

3) Personally, I feel "precedence and associativity table" more handy
and easy
to remember to parse a C expression. Should I continue to use this
table or
refer the C grammar for this ?

4) Any idea in which language the first C compiler was written ?

5) Do the people use lex/yacc to write modern C compilers (like gcc
and other) ?

thanks for any help .....
 
S

santosh

Hi,

Sorry, for asking similar questions again and again.

1) I want to know how should we reslove the ambiguities in a c
expression ?
Should we use precedence table as mentioned in K&R book or should
we refer to the ANSI C grammar ?

Use the precedence table as a handy guide, but the grammar as given in
the standard is the ultimate reference, if any confusion persists.
2) As told by many people in this newsgroup that ANSI C grammar does
not use
precedence to resolve the ambiguities.

To be more precise, ISO C's grammar doesn't know anything about
'precedence' of operators as such. It merely specifies the manner in
which C expressions are parsed.
In fact, the grammar has been
written in
such a way so that no precedence is required to resolve the
ambiguities (please
correct me if I am wrong).

Essentially yes, though most people find a operator precedence table
handy.
My question is, why "precedence" is not used in ANSI C grammar to
resolve the ambiguities ?

Because a grammar is much more precise than a few rules of precedence.
In fact, the apparent operator precedence in C arises out of an
interpretation of it's grammar.
Is it possible to write a new ANSI C grammar file (yacc
file) that
uses precedence ? Is there any such file already available on the web
?

Not if you define precedence as simply operator precedence. If on the
other hand, by precedence, you mean expression parsing precedence, then
yes, but in that case you've essentially reinvented a type of language
grammar. Better to use the existing well researched ones. :)
3) Personally, I feel "precedence and associativity table" more handy
and easy
to remember to parse a C expression. Should I continue to use this
table or
refer the C grammar for this ?

You're choice. However a good understanding of C's grammar will place
you in a much stronger position in understanding complex C statements.
4) Any idea in which language the first C compiler was written ?

Not sure, but I believe it was the B langauge, (which was in itself a
bare bones vesion of the earlier BPCL), with perhaps some assembly code
too.
5) Do the people use lex/yacc to write modern C compilers (like gcc
and other) ?

Sometimes. Sometimes hand-written parsers are used.
 
J

Jack Klein

Hi,

Sorry, for asking similar questions again and again.

1) I want to know how should we reslove the ambiguities in a c
expression ?

You should use the grammar in the C language standard.
Should we use precedence table as mentioned in K&R book or should
we refer to the ANSI C grammar ?

Unless you use the grammar defined by the standard, you cannot be sure
that your parser will be correct.
2) As told by many people in this newsgroup that ANSI C grammar does
not use
precedence to resolve the ambiguities. In fact, the grammar has been
written in
such a way so that no precedence is required to resolve the
ambiguities (please
correct me if I am wrong).
My question is, why "precedence" is not used in ANSI C grammar to
resolve the
ambiguities ? Is it possible to write a new ANSI C grammar file (yacc
file) that
uses precedence ? Is there any such file already available on the web
?

If you define a new computer language, as hundreds if not thousands of
others have done over the years, you can decide how to define its
syntax and structure.

In the meantime, many millions of lines of C program code have been
written by many thousands of programmers. The language has managed to
become the most universally available and one of the most widely used
in all aspects of programming.

Somehow it has managed to become extremely successful despite the fact
that its design includes choices that you don't happen to like.

As to whether it is possible to write a correct C parser using only
precedence and associativity rules, I do not know the answer, but I
suspect not.
3) Personally, I feel "precedence and associativity table" more handy
and easy
to remember to parse a C expression. Should I continue to use this
table or
refer the C grammar for this ?

Fortunately or unfortunately, your preference is not going to change
the definition of the language.
4) Any idea in which language the first C compiler was written ?

No idea. Perhaps B or BCPL.
5) Do the people use lex/yacc to write modern C compilers (like gcc
and other) ?

What people use to write compilers is off-topic here, as the C
standard does not define how compilers are written, only the results
they must produce when compiling a strictly conforming compiler.

That is true whether you are interested in C, FORTRAN, COBOL, or any
other compiler.
thanks for any help .....

If you are interested in writing compilers, you should take a look at
It is a moderated group without a high volume of
traffic, but most of the traffic it does get is relevant.
 
C

Chris Torek

(I started on a web version of this answer some time ago, but it
is currently in sad shape.)

My question is, why "precedence" is not used in ANSI C grammar to
resolve the ambiguities?

This is only answerable by those who wrote the standards: it was
their choice; the grammar they wrote is only one possible way to
write it. (Well, we can say for certain that operator precedence
alone is insufficient: associativity is required as well, if one
is to go that route.)
Is it possible to write a new ANSI C grammar file (yacc file) that
uses precedence?

In general, one can rewrite a "real" language expressed in some
original grammar into a new form (using different nonterminals) as
long as the new grammar is at least as expressive as the original.
(The second clause is required since one cannot always rewrite a
language that is recognized/generated by a LALR(1) grammar using
an LL(1) grammar, for instance.) I mean this in more than just the
trivial sense of renaming nonterminals:

<S> ::= <a>;
<a> ::= term;

(which accepts only the terminal "term") can be rewritten trivially
as, e.g.:

<S> ::= <b>;
<b> ::= term;

where we have just renamed the nonterminal <a> to <b>. Real
languages tend to be less rigid, providing opportunities for
nontrivial transforms. The tricky part is proving that one's
rewrite recognizes/generates exactly the same language.

For (much) more about grammars, other newsgroups would be more
appropriate (comp.compilers and probably comp.theory, to name two).
There is also some introductory material in the Wikipedia.
Is there any such file already available on the web?

No doubt there are many -- for instance, GNUC uses (or used to use)
a LALR(1) grammar written in Bison, using an operator precedence
sub-grammar for expressions. (C is a little tricky to parse with
a LALR(1) grammar, due to "typedef", but people have come up with
lexer hacks.)
3) Personally, I feel "precedence and associativity table" more handy
and easy to remember to parse a C expression.

This is, indeed, why people (human beings) prefer them.
 
C

Chris Torek

(The second clause is required since one cannot always rewrite a
language that is recognized/generated by a LALR(1) grammar using
an LL(1) grammar, for instance.)

Oops, too much yacc/bison in mind: that should read LR(1), not
LALR(1).
 
R

Richard Heathfield

CBFalconer said:
Use parentheses when in doubt.

That isn't always sufficient. Sometimes you need to break up the statement.
Example:

j = i++ + i++; /* ambiguous and indeed undefined */

So don't write that. Replace it with:

j = i++;
j += i++;

(assuming that's what you meant, obviously).

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: normal service will be restored as soon as possible. Please do not
adjust your email clients.
 
C

Chris Dollin

(That isn't /grammatically/ ambiguous, though. There's only
the one parse for it.)
But, is this ambiguous ?
j = (i++) + (i++);

It is exactly as ambiguous as the form without parentheses, and
means (or fails to mean) the same thing.

C doesn't have [1] syntactic ambiguities: if an expression is
grammatically correct, it's grammatically correct in just one
way. (That way might not have been the one you intended, or the
one the reader might expect, which is why judicious use of
parentheses can be helpful.)

But in `j = i++ + i++;` there's a more serious kind of
ambiguity: when do the three effects of the expression,
the assignment to `j` and two assignments to `i`, happen?
Well, there's two things to know. One is that C /does not
specify/ the order of evaluation of the components of
most expressions [2]: the left and right operands of `+`
can be evaluated in any order, and the implied assignment
in `E++` can happen at /any/ time in the expression evaluation.
The other thing to know is that it is /not allowed/ to
update the same variable twice in a single expression [3],
or to read a variable as well as updating it /except/ when
the reading is essential to the updating.

So `j = i++ + i++` is ambiguous, in the sense that you don't
know what order the increments happen in or when they happen,
and because it's so ambiguous, it's defined that it's /not/
defined what the effect is. Anything can happen, and it's
legal. (The C Standard doesn't even prohibit physically
impossible things from happening, although many implementations
have QoI requirements which prevent this. But /not/ the
DeathStation 2000.) Supposing `i` is initially `0`, after
`j = i++ + i++;`:

* i might be 2, or 1, or 0, or 17, or -1
* j might be 1, or 0, or 2, or 1066, or 1829.
* there might be a diagnostic on stdout, in your email,
in your boss's email, or on usenet
* the process might have terminated, or be hung

/Usually/ the results will be something boring, but the Standard
doesn't require /anything/ specific.

[1] Assuming that you can tell the difference between a type
identifier and a non-type-identifier. And that you've
fixed the dangling-else problem. Neither of which matters
inside an expression. Phew!

[2] There are specific rules for &&, ||, ? :, the comma /operator
(not argument separator), and for some details of function
call.

[3] There are rules for what counts as a "single expression",
which isn't a term of the Standard, just something I'm using
to summarise without having to say "sequence point" [4].

[4] Oops.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top