Parsing, BNF, TreeTop, GhostWheel, ...

P

Philipp Kempgen

Hi,

I'm looking for a parser which can be fed with some (A)BNF-style rules.

Gave TreeTop a try
---cut---------------------------------------------------------
grammar MathGrammar
rule additive
multitive '+' additive / multitive
end
rule multitive
primary '*' multitive / primary
end
rule primary
'(' additive ')' / number
end
rule number
[1-9] [0-9]*
end
end
---cut---------------------------------------------------------

as well as GhostWheel
---cut---------------------------------------------------------
MathParser = GhostWheel.build_parser {

rule( :additive , alt( :multitive,
seq( :multitive, '+', :additive )
))
rule( :multitive , alt( seq( :primary, '*', :multitive ),
:primary
))
rule( :primary , alt( seq( '(', :additive, ')' ),
:number
))
rule( :number , seq( /[1-9]/ , zplus( /0-9/ ) ))

parser( :main, seq:)additive, eof()) )
}
---cut---------------------------------------------------------

'1+1' parses fine.
However when I change the definition of "additive" from
multitive '+' additive / multitive
to
multitive / multitive '+' additive
it fails to parse.

Is this a problem with packrat/PEG parsers in general?
If so, which parser is more appropriate?
It should hand back a parse tree.

Memory consumption is not an issue.

Regards,
Philipp
 
P

Philipp Kempgen

Philipp said:
I'm looking for a parser which can be fed with some (A)BNF-style rules. ...
If so, which parser is more appropriate?
It should hand back a parse tree.

Alternatively I'd appreciate if Regexps could return *all*
occurrences of named capture groups inside repetitions etc.
instead of just the last match for each name. Feasible?

Regards,
Philipp
 
I

Iain Barnett

rules.
=20
Alternatively I'd appreciate if Regexps could return *all*
occurrences of named capture groups inside repetitions etc.
instead of just the last match for each name. Feasible?
=20
Regards,
Philipp
=20

I was trying to do the same thing and asked about repeated named =
captures.
http://osdir.com/ml/ruby-talk/2010-07/msg00361.html

It seems that using String#scan is the closest anything Ruby has, as the =
Oniguruma regex engine doesn't support it. I think it's a real shame as =
named capture groups are really useful.

Regards,
Iain
 
R

Robert Klemme

As I understand you want /f(o)+/ =~ "foo" to return ["o", "o"] as match
for the group (used normal capturing groups for simplicity).
I was trying to do the same thing and asked about repeated named captures.
http://osdir.com/ml/ruby-talk/2010-07/msg00361.html

It seems that using String#scan is the closest anything Ruby has, as the Oniguruma regex engine doesn't support it. I think it's a real shame as named capture groups are really useful.

Regular expression engines generally do only return one match per group
- regardless of naming or not naming groups. I'm not up to date with
current Perl's regular expressions which are the only ones I can imagine
to be crazy enough to provide such a feature. :) Otherwise String#scan
is indeed the proper tool to get multiple matches. The example above
could be done like this

if /f(o+)/ =~ s
$1.scan /o/ do |match|
p match
end
end

or this

if /f(o+)/ =~ s
matches = $1.scan /o/
p matches
end

Kind regards

robert
 
I

Iain Barnett

=20
Regular expression engines generally do only return one match per =
group - regardless of naming or not naming groups. I'm not up to date =
with current Perl's regular expressions which are the only ones I can =
imagine to be crazy enough to provide such a feature. :) =20

Net definitely offers it, which is ironic as none of the .Net devs I =
used to work with ever used regex.=20

Scan is a really poor alternative (for this kind of task), I think. =
Instead of just writing one regex which expresses what I want to do much =
better, I end up having to write a lot of extra code to get the same =
effect. Scan's cool for more simple things.

Regards
Iain
 
P

Philipp Kempgen

Robert said:
As I understand you want /f(o)+/ =~ "foo" to return ["o", "o"] as match
for the group (used normal capturing groups for simplicity).

FRUIT = '(?<FRUIT> Apple|Banana|Pear|Orange)'
FRUIT_COLLECTION = '(?<FRUIT_COLLECTION> ' << FRUIT << '*)'

re = Regexp.new( '^' << FRUIT_COLLECTION << '$', Regexp::EXTENDED )
re.match( 'PearBananaApple' )[:FRUIT]
=> "Apple"

I would want e.g. matchdata[:FRUIT] to be an Array
[ 'Pear', 'Banana', 'Apple' ]
and not just 'Apple' (the last FRUIT).

Actually something similar to a concrete syntax tree / parse tree
would be even better:

ROOT "PearBananaApple"
FRUIT_COLLECTION "PearBananaApple"
FRUIT "Pear"
FRUIT "Banana"
FRUIT "Apple"

Exactly.
Anyway thanks for your input.

Regards,
Philipp
 
C

Clifford Heath

(Sent previously, but it didn't appear. Sorry if you get this twice).

Philipp said:
Is this a problem with packrat/PEG parsers in general?

Yes. PEG parsers are greedy (iterate as far as possible and never
backtrack on that), and also take the first alternative that succeeds,
without trying subsequent alternatives. It's actually quite useful
and natural, once you get the hang of it.

Pure PEGs have no lookahead, as Treetop does. It's often necessary
though, in the light of the above.

Also, your approach will yield the wrong results when you add '-' into
the mix, because of associativity. "1-2-3" will calculate as 1-(2-3).
A better way is to use iteration, see

<http://github.com/nathansobo/treetop/blob/master/examples/lambda_calculus/arithmetic.treetop>

Clifford Heath.
 
C

Clifford Heath

Christian said:

Actually, I think Citrus does a better job than Treetop (of which I'm primary maintainer and a serious user). It provides a BNF built on ruby functions and operators, and also a Treetop/PEG-like parser for grammar files as an alternative. A brief look at Kanocc indicates Citrus may be preferable over it also.

Citrus doesn't memoize by default however, you have to enable that. It also doesn't have support for semantic predicates, an important feature (for me) which I added to Treetop. Otherwise it is preferable in most ways, and I think Michael Jackson has done a sterling job. Kudos!

<http://github.com/mjijackson/citrus/>

Clifford Heath.
 
P

Philipp Kempgen

Clifford said:
Yes. PEG parsers are greedy (iterate as far as possible and never
backtrack on that), and also take the first alternative that succeeds,
without trying subsequent alternatives. It's actually quite useful
and natural, once you get the hang of it.

Pure PEGs have no lookahead, as Treetop does. It's often necessary
though, in the light of the above.

What I wanted to do is to construct/generate a parser from a bunch
of (A)BNF rules (dozends) from an RFC so backtracking is a
requirement while lookahead is not. The grammar specification for
the parser (/ parser generator) does not need to be exactly like BNF
but should not require me to think about where lookahead is needed
as this manual conversion can be error-prone.
That seems to rule out PEG parsers.

Clifford said:
Actually, I think Citrus does a better job than Treetop (of which I'm primary maintainer and a serious user). It provides a BNF built on ruby functions and operators, and also a Treetop/PEG-like parser for grammar files as an alternative. A brief look at Kanocc indicates Citrus may be preferable over it also.

Citrus doesn't memoize by default however, you have to enable that. It also doesn't have support for semantic predicates, an important feature (for me) which I added to Treetop.

Thanks a lot for the pointers Clifford and Christian!
Will have a look as soon as time permits.


Philipp
 
C

Clifford Heath

Philipp said:
Clifford Heath wrote:
What I wanted to do is to construct/generate a parser from a bunch
of (A)BNF rules (dozends) from an RFC so backtracking is a
requirement while lookahead is not. The grammar specification for
the parser (/ parser generator) does not need to be exactly like BNF
but should not require me to think about where lookahead is needed
as this manual conversion can be error-prone.

Most of the BNFs in RFCs cannot be directly automated by any parser
generator. In addition, quite a few of them are slightly incorrect
and/or incomplete, but haven't been fixed because they're always
implemented by humans, who paper over the issues, and introduce new
errors instead ;-)

The upshot is that you're not going to find a direct implementation
that works without thinking.
That seems to rule out PEG parsers.

On the contrary, I believe that a PEG parser might represent the most
direct mapping. PEGs do backtrack, just not on a successful rule or
iteration. You simply have to introduce enough lookahead to prevent
false success or excessive iteration... which is actually pretty easy
once you get the hang of it... the point here is that the *rest* of
the grammar is much more likely to look like the EBNF you started
with. Oh, yeah, you have to refactor any left-recursion too, which can
be difficult.

Clifford Heath.
 
J

Jörg W Mittag

Clifford said:
Most of the BNFs in RFCs cannot be directly automated by any parser
generator. In addition, quite a few of them are slightly incorrect
and/or incomplete, but haven't been fixed because they're always
implemented by humans, who paper over the issues, and introduce new
errors instead ;-)

The upshot is that you're not going to find a direct implementation
that works without thinking.

On the contrary, I believe that a PEG parser might represent the most
direct mapping. PEGs do backtrack, just not on a successful rule or
iteration. You simply have to introduce enough lookahead to prevent
false success or excessive iteration... which is actually pretty easy
once you get the hang of it... the point here is that the *rest* of
the grammar is much more likely to look like the EBNF you started
with. Oh, yeah, you have to refactor any left-recursion too, which can
be difficult.

Actually, while it is true that PEGs as specificied in Bryan Ford's
paper do not support left recursion, PEGs are usually implemented
using packrat parsers which *can* support direct left recursion just
fine as Alessandro Warth showed in his aptly titled paper "Packrat
Parsers Can Support Left Recursion", although with the caveat that you
lose the linear time complexity guarantee that makes packrat parsing
so compelling in the first place.

jwm
 
C

Clifford Heath

Jörg W Mittag said:
Actually, while it is true that PEGs as specificied in Bryan Ford's
paper do not support left recursion, PEGs are usually implemented
using packrat parsers which *can* support direct left recursion just
fine as Alessandro Warth showed in his aptly titled paper "Packrat
Parsers Can Support Left Recursion",

Nice, thanks for the pointer. I'm not brave enough to try to implement
it in Treetop (the builder used for code gen is a bit of a pain), but
it'd certainly be cool to have. Probably easier to add to Citrus, where
there's no code generator in the way.

Clifford Heath.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top