Parsing, BNF, TreeTop, GhostWheel, ...

Discussion in 'Ruby' started by Philipp Kempgen, Aug 12, 2010.

  1. Hi,

    I'm looking for a parser which can be fed with some (A)BNF-style rules.

    Gave TreeTop a try
    ---cut---------------------------------------------------------
    grammar MathGrammar
    rule additive
    multitive '+' additive / multitive
    end
    rule multitive
    primary '*' multitive / primary
    end
    rule primary
    '(' additive ')' / number
    end
    rule number
    [1-9] [0-9]*
    end
    end
    ---cut---------------------------------------------------------

    as well as GhostWheel
    ---cut---------------------------------------------------------
    MathParser = GhostWheel.build_parser {

    rule( :additive , alt( :multitive,
    seq( :multitive, '+', :additive )
    ))
    rule( :multitive , alt( seq( :primary, '*', :multitive ),
    :primary
    ))
    rule( :primary , alt( seq( '(', :additive, ')' ),
    :number
    ))
    rule( :number , seq( /[1-9]/ , zplus( /0-9/ ) ))

    parser( :main, seq:)additive, eof()) )
    }
    ---cut---------------------------------------------------------

    '1+1' parses fine.
    However when I change the definition of "additive" from
    multitive '+' additive / multitive
    to
    multitive / multitive '+' additive
    it fails to parse.

    Is this a problem with packrat/PEG parsers in general?
    If so, which parser is more appropriate?
    It should hand back a parse tree.

    Memory consumption is not an issue.

    Regards,
    Philipp
     
    Philipp Kempgen, Aug 12, 2010
    #1
    1. Advertising

  2. Regexp & named capture groups (was: Parsing, BNF, TreeTop,GhostWheel, ...)

    Philipp Kempgen wrote:
    > I'm looking for a parser which can be fed with some (A)BNF-style rules.

    ...
    > If so, which parser is more appropriate?
    > It should hand back a parse tree.


    Alternatively I'd appreciate if Regexps could return *all*
    occurrences of named capture groups inside repetitions etc.
    instead of just the last match for each name. Feasible?

    Regards,
    Philipp
     
    Philipp Kempgen, Aug 12, 2010
    #2
    1. Advertising

  3. Philipp Kempgen

    Iain Barnett Guest

    Re: Regexp & named capture groups (was: Parsing, BNF, TreeTop,GhostWheel, ...)

    On 12 Aug 2010, at 19:52, Philipp Kempgen wrote:

    > Philipp Kempgen wrote:
    >> I'm looking for a parser which can be fed with some (A)BNF-style =

    rules.
    > ...
    >> If so, which parser is more appropriate?
    >> It should hand back a parse tree.

    >=20
    > Alternatively I'd appreciate if Regexps could return *all*
    > occurrences of named capture groups inside repetitions etc.
    > instead of just the last match for each name. Feasible?
    >=20
    > Regards,
    > Philipp
    >=20


    I was trying to do the same thing and asked about repeated named =
    captures.
    http://osdir.com/ml/ruby-talk/2010-07/msg00361.html

    It seems that using String#scan is the closest anything Ruby has, as the =
    Oniguruma regex engine doesn't support it. I think it's a real shame as =
    named capture groups are really useful.

    Regards,
    Iain
     
    Iain Barnett, Aug 12, 2010
    #3
  4. Re: Regexp & named capture groups

    On 13.08.2010 00:50, Iain Barnett wrote:
    >
    > On 12 Aug 2010, at 19:52, Philipp Kempgen wrote:
    >
    >> Philipp Kempgen wrote:
    >>> I'm looking for a parser which can be fed with some (A)BNF-style rules.

    >> ...
    >>> If so, which parser is more appropriate?
    >>> It should hand back a parse tree.

    >>
    >> Alternatively I'd appreciate if Regexps could return *all*
    >> occurrences of named capture groups inside repetitions etc.
    >> instead of just the last match for each name. Feasible?


    As I understand you want /f(o)+/ =~ "foo" to return ["o", "o"] as match
    for the group (used normal capturing groups for simplicity).

    > I was trying to do the same thing and asked about repeated named captures.
    > http://osdir.com/ml/ruby-talk/2010-07/msg00361.html
    >
    > It seems that using String#scan is the closest anything Ruby has, as the Oniguruma regex engine doesn't support it. I think it's a real shame as named capture groups are really useful.


    Regular expression engines generally do only return one match per group
    - regardless of naming or not naming groups. I'm not up to date with
    current Perl's regular expressions which are the only ones I can imagine
    to be crazy enough to provide such a feature. :) Otherwise String#scan
    is indeed the proper tool to get multiple matches. The example above
    could be done like this

    if /f(o+)/ =~ s
    $1.scan /o/ do |match|
    p match
    end
    end

    or this

    if /f(o+)/ =~ s
    matches = $1.scan /o/
    p matches
    end

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Aug 13, 2010
    #4
  5. Philipp Kempgen

    Iain Barnett Guest

    Re: Regexp & named capture groups

    On 13 Aug 2010, at 11:15, Robert Klemme wrote:
    >>=20

    >=20
    > Regular expression engines generally do only return one match per =

    group - regardless of naming or not naming groups. I'm not up to date =
    with current Perl's regular expressions which are the only ones I can =
    imagine to be crazy enough to provide such a feature. :) =20

    Net definitely offers it, which is ironic as none of the .Net devs I =
    used to work with ever used regex.=20

    Scan is a really poor alternative (for this kind of task), I think. =
    Instead of just writing one regex which expresses what I want to do much =
    better, I end up having to write a lot of extra code to get the same =
    effect. Scan's cool for more simple things.

    Regards
    Iain
     
    Iain Barnett, Aug 13, 2010
    #5
  6. Re: Regexp & named capture groups

    Robert Klemme wrote:
    > On 13.08.2010 00:50, Iain Barnett wrote:
    >> On 12 Aug 2010, at 19:52, Philipp Kempgen wrote:
    >>
    >>> Philipp Kempgen wrote:
    >>>> I'm looking for a parser which can be fed with some (A)BNF-style rules.
    >>> ...
    >>>> It should hand back a parse tree.
    >>>
    >>> Alternatively I'd appreciate if Regexps could return *all*
    >>> occurrences of named capture groups inside repetitions etc.
    >>> instead of just the last match for each name. Feasible?

    >
    > As I understand you want /f(o)+/ =~ "foo" to return ["o", "o"] as match
    > for the group (used normal capturing groups for simplicity).


    FRUIT = '(?<FRUIT> Apple|Banana|Pear|Orange)'
    FRUIT_COLLECTION = '(?<FRUIT_COLLECTION> ' << FRUIT << '*)'

    re = Regexp.new( '^' << FRUIT_COLLECTION << '$', Regexp::EXTENDED )
    re.match( 'PearBananaApple' )[:FRUIT]
    => "Apple"

    I would want e.g. matchdata[:FRUIT] to be an Array
    [ 'Pear', 'Banana', 'Apple' ]
    and not just 'Apple' (the last FRUIT).

    Actually something similar to a concrete syntax tree / parse tree
    would be even better:

    ROOT "PearBananaApple"
    FRUIT_COLLECTION "PearBananaApple"
    FRUIT "Pear"
    FRUIT "Banana"
    FRUIT "Apple"

    >> the Oniguruma regex engine doesn't support it. I think it's a real shame as named capture groups are really useful.


    Exactly.
    Anyway thanks for your input.

    Regards,
    Philipp
     
    Philipp Kempgen, Aug 13, 2010
    #6
  7. (Sent previously, but it didn't appear. Sorry if you get this twice).

    Philipp Kempgen wrote:
    > Is this a problem with packrat/PEG parsers in general?


    Yes. PEG parsers are greedy (iterate as far as possible and never
    backtrack on that), and also take the first alternative that succeeds,
    without trying subsequent alternatives. It's actually quite useful
    and natural, once you get the hang of it.

    Pure PEGs have no lookahead, as Treetop does. It's often necessary
    though, in the light of the above.

    Also, your approach will yield the wrong results when you add '-' into
    the mix, because of associativity. "1-2-3" will calculate as 1-(2-3).
    A better way is to use iteration, see

    <http://github.com/nathansobo/treetop/blob/master/examples/lambda_calculus/arithmetic.treetop>

    Clifford Heath.
     
    Clifford Heath, Aug 17, 2010
    #7
  8. Christian Surlykke wrote:
    > You may want to look at: http://kanocc.rubyforge.org/


    Actually, I think Citrus does a better job than Treetop (of which I'm primary maintainer and a serious user). It provides a BNF built on ruby functions and operators, and also a Treetop/PEG-like parser for grammar files as an alternative. A brief look at Kanocc indicates Citrus may be preferable over it also.

    Citrus doesn't memoize by default however, you have to enable that. It also doesn't have support for semantic predicates, an important feature (for me) which I added to Treetop. Otherwise it is preferable in most ways, and I think Michael Jackson has done a sterling job. Kudos!

    <http://github.com/mjijackson/citrus/>

    Clifford Heath.
     
    Clifford Heath, Aug 18, 2010
    #8
  9. Clifford Heath wrote:
    > Yes. PEG parsers are greedy (iterate as far as possible and never
    > backtrack on that), and also take the first alternative that succeeds,
    > without trying subsequent alternatives. It's actually quite useful
    > and natural, once you get the hang of it.
    >
    > Pure PEGs have no lookahead, as Treetop does. It's often necessary
    > though, in the light of the above.


    What I wanted to do is to construct/generate a parser from a bunch
    of (A)BNF rules (dozends) from an RFC so backtracking is a
    requirement while lookahead is not. The grammar specification for
    the parser (/ parser generator) does not need to be exactly like BNF
    but should not require me to think about where lookahead is needed
    as this manual conversion can be error-prone.
    That seems to rule out PEG parsers.

    Clifford Heath wrote:
    > Christian Surlykke wrote:
    >> You may want to look at: http://kanocc.rubyforge.org/

    >
    > Actually, I think Citrus does a better job than Treetop (of which I'm primary maintainer and a serious user). It provides a BNF built on ruby functions and operators, and also a Treetop/PEG-like parser for grammar files as an alternative. A brief look at Kanocc indicates Citrus may be preferable over it also.
    >
    > Citrus doesn't memoize by default however, you have to enable that. It also doesn't have support for semantic predicates, an important feature (for me) which I added to Treetop.


    > <http://github.com/mjijackson/citrus/>


    Thanks a lot for the pointers Clifford and Christian!
    Will have a look as soon as time permits.


    Philipp
     
    Philipp Kempgen, Aug 19, 2010
    #9
  10. Philipp Kempgen wrote:
    > Clifford Heath wrote:
    > What I wanted to do is to construct/generate a parser from a bunch
    > of (A)BNF rules (dozends) from an RFC so backtracking is a
    > requirement while lookahead is not. The grammar specification for
    > the parser (/ parser generator) does not need to be exactly like BNF
    > but should not require me to think about where lookahead is needed
    > as this manual conversion can be error-prone.


    Most of the BNFs in RFCs cannot be directly automated by any parser
    generator. In addition, quite a few of them are slightly incorrect
    and/or incomplete, but haven't been fixed because they're always
    implemented by humans, who paper over the issues, and introduce new
    errors instead ;-)

    The upshot is that you're not going to find a direct implementation
    that works without thinking.

    > That seems to rule out PEG parsers.


    On the contrary, I believe that a PEG parser might represent the most
    direct mapping. PEGs do backtrack, just not on a successful rule or
    iteration. You simply have to introduce enough lookahead to prevent
    false success or excessive iteration... which is actually pretty easy
    once you get the hang of it... the point here is that the *rest* of
    the grammar is much more likely to look like the EBNF you started
    with. Oh, yeah, you have to refactor any left-recursion too, which can
    be difficult.

    Clifford Heath.
     
    Clifford Heath, Aug 20, 2010
    #10
  11. Clifford Heath wrote:
    > Philipp Kempgen wrote:
    >> Clifford Heath wrote:
    >> What I wanted to do is to construct/generate a parser from a bunch
    >> of (A)BNF rules (dozends) from an RFC so backtracking is a
    >> requirement while lookahead is not. The grammar specification for
    >> the parser (/ parser generator) does not need to be exactly like BNF
    >> but should not require me to think about where lookahead is needed
    >> as this manual conversion can be error-prone.

    > Most of the BNFs in RFCs cannot be directly automated by any parser
    > generator. In addition, quite a few of them are slightly incorrect
    > and/or incomplete, but haven't been fixed because they're always
    > implemented by humans, who paper over the issues, and introduce new
    > errors instead ;-)
    >
    > The upshot is that you're not going to find a direct implementation
    > that works without thinking.
    >
    >> That seems to rule out PEG parsers.

    > On the contrary, I believe that a PEG parser might represent the most
    > direct mapping. PEGs do backtrack, just not on a successful rule or
    > iteration. You simply have to introduce enough lookahead to prevent
    > false success or excessive iteration... which is actually pretty easy
    > once you get the hang of it... the point here is that the *rest* of
    > the grammar is much more likely to look like the EBNF you started
    > with. Oh, yeah, you have to refactor any left-recursion too, which can
    > be difficult.


    Actually, while it is true that PEGs as specificied in Bryan Ford's
    paper do not support left recursion, PEGs are usually implemented
    using packrat parsers which *can* support direct left recursion just
    fine as Alessandro Warth showed in his aptly titled paper "Packrat
    Parsers Can Support Left Recursion", although with the caveat that you
    lose the linear time complexity guarantee that makes packrat parsing
    so compelling in the first place.

    jwm
     
    Jörg W Mittag, Sep 3, 2010
    #11
  12. Jörg W Mittag wrote:
    > Actually, while it is true that PEGs as specificied in Bryan Ford's
    > paper do not support left recursion, PEGs are usually implemented
    > using packrat parsers which *can* support direct left recursion just
    > fine as Alessandro Warth showed in his aptly titled paper "Packrat
    > Parsers Can Support Left Recursion",


    Nice, thanks for the pointer. I'm not brave enough to try to implement
    it in Treetop (the builder used for code gen is a bit of a pain), but
    it'd certainly be cool to have. Probably easier to add to Citrus, where
    there's no code generator in the way.

    Clifford Heath.
     
    Clifford Heath, Sep 7, 2010
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris
    Replies:
    1
    Views:
    803
    Patrick TJ McPhee
    Feb 10, 2004
  2. David List

    BNF for XML Schema?

    David List, Apr 27, 2004, in forum: XML
    Replies:
    1
    Views:
    2,253
  3. Venky
    Replies:
    0
    Views:
    538
    Venky
    Jun 27, 2005
  4. Eric
    Replies:
    1
    Views:
    1,135
    Jack Klein
    Jan 27, 2005
  5. Christopher Rose

    Treetop - parsing to nil result

    Christopher Rose, Feb 16, 2011, in forum: Ruby
    Replies:
    2
    Views:
    181
    Clifford Heath
    Feb 16, 2011
Loading...

Share This Page