Example of why C++ is NOT a context free grammar?

R

Robert

Is this an example of why C++ is not a nice grammar ( context free )?

A = B = ARR[i+1] = x+y*f(z) ;

The parser doesn't know how many equal signs to expect, so it can only
assume that each expression is an arithmetic expression in general,
since the last one does not have to be a left value ( variable name or
array expression ), but all the other ones must be a left value.
 
R

red floyd

Robert said:
Is this an example of why C++ is not a nice grammar ( context free )?

A = B = ARR[i+1] = x+y*f(z) ;

The parser doesn't know how many equal signs to expect, so it can only
assume that each expression is an arithmetic expression in general,
since the last one does not have to be a left value ( variable name or
array expression ), but all the other ones must be a left value.

Irrelevant. 90% of all programming languages aren't CFGs.
 
K

Kaz Kylheku

Is this an example of why C++ is not a nice grammar ( context free )?

A = B = ARR[i+1] = x+y*f(z) ;

No, this is not such an example. The above string can easily be
matched by some context free grammar consisting of a few production
rules governing the relationship of some simple operators.

C++ does in fact have a context-free grammar.

Its grammar, however, is ambiguous. Semantic information must be taken
into account in order to resolve ambiguities in the syntax.

This means that there are situations which require a C++ parser to
consider the emerging meaning of the translation unit being processed
before it can decide how to parse the remainder of it.
The parser doesn't know how many equal signs to expect, so it can only

That is not what context free means. A context-free grammar is one
whose production rules strictly expand single nonterminals.

For instance if you have the sentential form A B C, and there is a
rule B -> x y z for expanding B, you can derive the form A x y z
C. Consider that this B occurs in a context: it is surrounded by A
and C. But the rule can be applied without regard for context: it says
that any B whatsoever in a sentential form may be blindly replaced by
x y z. That is why a grammar which only has rules that expand a single
nonterminal symbol is called context-free. This expansion can be done
in reverse; you can identify the x y z as being a B, and reduce to
A B C, which is the essence of parsing. The x y z subsequence is
recognized as being a B without any context; A and C is not taken into
account, only the rule B -> x y z.

A non-context free grammar would have a rule with more than one symbol
in the left hand side. More than one symbol means that a pattern has
to be matched before the expansion takes place. For instance consider
the rule A B -> A x y z . This is very similar to B -> x y z
because the net effect is that B is replaced by x y z. But the A on
both sides means that this replacement can only take place for a B
that is preceded by an A. In other words, it can take place only /in
the context of/ A. Hence, not context-free. But this type of case is
still reasonably nice. In the general case, the rules can be
arbitrary, like A B -> C x y . This one means that some
nonterminal A followed by a nonterminal B, can derive a nonterminal C
followed by terminals x and y. This kind of thing would be a real
hairball to have in a programming language.
assume that each expression is an arithmetic expression in general,
since the last one does not have to be a left value ( variable name or
array expression ), but all the other ones must be a left value.

It is a semantic attribute, not a syntactic attribute, whether or not
an expression is an lvalue. As such it has nothing to do with whether
or not a grammar is context-free.
 
K

Kaz Kylheku

Robert said:
Is this an example of why C++ is not a nice grammar ( context free )?
A = B = ARR[i+1] = x+y*f(z) ;
The parser doesn't know how many equal signs to expect, so it can only
assume that each expression is an arithmetic expression in general,
since the last one does not have to be a left value ( variable name or
array expression ), but all the other ones must be a left value.

Irrelevant.  90% of all programming languages aren't CFGs.

The vast majority of programming languages do not have grammar
productions with more than one nonterminal symbol on the left. They by
and large have grammar productions with a single nonterminal on the
left of each rule. Therefore, they are context-free.

I suspect you are confusing lack of ambiguity with context-freedom.

Context-free grammars can be ambiguous. There can be more than way to
parse a string generated by a context-free grammar.

A language whose syntactic structure is defined by an ambiguous
grammar needs additional semantic rules which govern how those
ambiguities are resolved or diagnosed, but that is irrelevant to the
question of the grammar being context-free.
 
J

James Kanze

Is this an example of why C++ is not a nice grammar (context
free)?
A = B = ARR[i+1] = x+y*f(z) ;
No, this is not such an example. The above string can easily
be matched by some context free grammar consisting of a few
production rules governing the relationship of some simple
operators.
C++ does in fact have a context-free grammar.

Not according to the definition in Chomsky.
Its grammar, however, is ambiguous. Semantic information must
be taken into account in order to resolve ambiguities in the
syntax.

Which is precisely what isn't allowed in a context free grammar.

It's not necessarily really a problem, of course.
Pragmatically, it's fairly easy for e.g. the scanner to look up
a symbol in the symbol table, and return a different token if it
is the name of a type. Such hacks are necessary for a number of
languages, not just C++. (Thus, for example, the four token
sequence A(x) must be parsed differently depending on whether A
names a type or not.)

In practice, C++ has other problems, which can't be solved by
such simple hacks. Thus, the reduction of an expression like
"A(x)", where A is a typename (the previous hack), and x isn't,
depends on context: in
void f( A( x ) ) ;
it is a declaration, but in:
A a = A( x ) ;
it is an expression. (Declarations and expressions must be
separate productions, because they occur in different contexts
in the grammar.)

Unlike the hack of using feedback in the scanner to modify the
token type, this ambiguity doesn't have any simple solutions.
(I think most compilers use recursive descent with backtracking
to handle it.)
This means that there are situations which require a C++
parser to consider the emerging meaning of the translation
unit being processed before it can decide how to parse the
remainder of it.

Which means that the grammar isn't context free.

[...]
It is a semantic attribute, not a syntactic attribute, whether
or not an expression is an lvalue. As such it has nothing to
do with whether or not a grammar is context-free.

That's actually an arbitrary choice. But one made by pretty
much all statically typed languages: in any normal sense of the
words, agreement in type and number (or gender and number, in
real languages) is part of the grammar. But for pragmatic
reasons, languages prefer considering it as "semantics" in their
descriptions.
 
P

peter koch

Is this an example of why C++ is not a nice grammar ( context free )?

A = B = ARR[i+1] = x+y*f(z) ;

It is not an example of the grammer being context-free. It could be an
example of a grammer that is not nice, but that is a subjective
judgement.
The parser doesn't know how many equal signs to expect, so it can only
assume that each expression is an arithmetic expression in general,
since the last one does not have to be a left value ( variable name or
array expression ), but all the other ones must be a left value.

I do not quite understand your "left-value" here. The value of an
assignment in C is the value of the variable assigned. In C++ this is
also the case for all built-ins, and you should follow the conventions
while creating your userdefined structures.

/Peter
 
R

red floyd

Kaz said:
Robert said:
Is this an example of why C++ is not a nice grammar ( context free )?
A = B = ARR[i+1] = x+y*f(z) ;
The parser doesn't know how many equal signs to expect, so it can only
assume that each expression is an arithmetic expression in general,
since the last one does not have to be a left value ( variable name or
array expression ), but all the other ones must be a left value.
Irrelevant. 90% of all programming languages aren't CFGs.

The vast majority of programming languages do not have grammar
productions with more than one nonterminal symbol on the left. They by
and large have grammar productions with a single nonterminal on the
left of each rule. Therefore, they are context-free.

I suspect you are confusing lack of ambiguity with context-freedom.

Context-free grammars can be ambiguous. There can be more than way to
parse a string generated by a context-free grammar.

You are correct. My mistake. Thank you.
 
J

Jerry Coffin

[ ... ]
C++ does in fact have a context-free grammar.

Not really. In fact, I can hardly think of a language that really does
-- though C++ is certainly further from it than most. As Aho, Sethi and
Ullman note (example 4.11):

While it is beyond the scope of this book, the non-context-freedom
of L1 directly implies the non-context-freedom of languages like
Algol and Pascal, which require declaration of identifiers before
their use, and which allow identifiers of arbitrary length.

They go on to note the requirement to use semantic analysis to ensure
that identifiers are declared before use. In C++, this happens more
often, and (more importantly) changes the basic type of a statement.

I doubt it's possible to disambiguate most of this using a context
sensitive grammar either though. Consider the widely known situation
with:

int x(X);

and the fact that it's a function declaration if X is a type but a
variable definition if X is a rvalue; To disambiguate this with a CSG,
we'd like to do something like this:

TDEF = // a type definition
VDEF = // a variable definition
LIST = // an arbitrary list of definitions and declarations

AMBIG = TYPE IDENT '(' IDENT ')'

VDEF LIST AMBIG = DEFINITION
TDEF LIST AMBIG = FUNC_DECLARATION

but even this doesn't really work, because it provides no way to specify
that the VDEF/TDEF has to define the same name used as the second IDENT
in the AMBIG production and/or that the name it specifies must be in
scope at that point.
 
J

Jerry Coffin

Is this an example of why C++ is not a nice grammar ( context free )?

A = B = ARR[i+1] = x+y*f(z) ;

The parser doesn't know how many equal signs to expect, so it can only
assume that each expression is an arithmetic expression in general,
since the last one does not have to be a left value ( variable name or
array expression ), but all the other ones must be a left value.

Not really. This can all be recognized by a context-free grammar,
something like this:

ASSIGNMENT = LVALUE '=' RVALUE
LVALUE = IDENT
RVALUE = IDENT | EXPRESSION
EXPRESSION = ADDITION | ASSIGNMENT
ADDITION = TERM '+' TERM | TERM '-' TERM
TERM = VALUE '*' VALUE | VALUE '/' VALUE
VALUE = IDENT | '(' EXPRESSION ')' | FUNC_CALL | NUMBER
FUNC_CALL = IDENT '(' PARAM_LIST ')'
PARAM_LIST = VALUE | PARAM_LIST ',' VALUE
IDENT = NAME | NAME '[' EXPRESSION ']'

For the sake of simplicity, I've included only a smattering of operators
(arithmetic, function calls, subscripting), rather than trying to define
all of those included in C++. I've also left out things like rules for
numbers and names -- recognizing these properly would add a great deal
of irrelevant complexity.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top