ANTLR Target for Ruby

Discussion in 'Ruby' started by arcadio, Sep 16, 2008.

  1. arcadio

    arcadio Guest

    Hi everyone,

    I've asked the following question in the ANTLR mailing list, with no
    luck. Since it's quite Ruby-centric, perhaps someone here knows the
    answer, given the popularity of ANTLR.

    I'm in the process of migrating a YACC grammar to ANTLR.

    However, since the rest of the project is written in Ruby, I'd like to
    generate the ANTLR code in Ruby, if possible. Currently I'm generating
    Java and using JRuby to circumvent this limitation. I'd prefer to use
    MRI-Ruby, though

    I've read the wiki page related to the Ruby Target (http://
    www.antlr.org/wiki/display/ANTLR3/Antlr3RubyTarget), and the blog of
    its developer (http://split-s.blogspot.com/2005/12/antlr-for-
    ruby.html). The development seems to have halted in version 3.0b4.
    This version could suit my needs as I'm only using basic Lexer and
    Parser generation. No tree parsers.

    Does anyone know where to get this version? It's not listed in the
    Download directory (http://www.antlr.org/download/) and I couldn't
    find it in the development repository. Has anyone used it
    successfully?

    Thanks in advance.
     
    arcadio, Sep 16, 2008
    #1
    1. Advertising

  2. arcadio wrote:
    > I've asked the following question in the ANTLR mailing list, with no
    > luck. Since it's quite Ruby-centric, perhaps someone here knows the
    > answer, given the popularity of ANTLR.
    >
    > I'm in the process of migrating a YACC grammar to ANTLR.
    >
    > However, since the rest of the project is written in Ruby, I'd like to
    > generate the ANTLR code in Ruby, if possible.


    Have you tried just adding the required options section below your grammar
    statement? It just worked for me, though the generated code is a little
    quirky. The Ruby generator has been a standard part of ANTLR for years.

    options {
    language = Ruby;
    output = AST;
    }

    I found that ANTLR was very poor at reporting situations for which it
    couldn't generate correct code (or rather, the designer has a strange
    idea of "correct"), and in particular the lexing behaviour is very
    non-intuitive. I was attempting a very difficult grammar that needs
    LL(*) behaviour, and as I know now, needs to handle lexing using parse
    rules (no lexer at all). After some unhelpful arguments, I gave up and
    implemented my grammar using Treetop. Terribly slow, but powerful and
    simple to use, once you get the hang of PEG parsing.

    Clifford Heath.
     
    Clifford Heath, Sep 21, 2008
    #2
    1. Advertising

  3. arcadio

    Eric Mahurin Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Sun, Sep 21, 2008 at 7:56 AM, Clifford Heath <> wrote:

    > arcadio wrote:
    >
    >> I've asked the following question in the ANTLR mailing list, with no
    >> luck. Since it's quite Ruby-centric, perhaps someone here knows the
    >> answer, given the popularity of ANTLR.
    >>
    >> I'm in the process of migrating a YACC grammar to ANTLR.
    >>
    >> However, since the rest of the project is written in Ruby, I'd like to
    >> generate the ANTLR code in Ruby, if possible.
    >>

    >
    > Have you tried just adding the required options section below your grammar
    > statement? It just worked for me, though the generated code is a little
    > quirky. The Ruby generator has been a standard part of ANTLR for years.
    >
    > options {
    > language = Ruby;
    > output = AST;
    > }
    >
    > I found that ANTLR was very poor at reporting situations for which it
    > couldn't generate correct code (or rather, the designer has a strange
    > idea of "correct"), and in particular the lexing behaviour is very
    > non-intuitive. I was attempting a very difficult grammar that needs
    > LL(*) behaviour, and as I know now, needs to handle lexing using parse
    > rules (no lexer at all). After some unhelpful arguments, I gave up and
    > implemented my grammar using Treetop. Terribly slow, but powerful and
    > simple to use, once you get the hang of PEG parsing.
    >
    > Clifford Heath.



    Hey Clifford,

    I still don't know enough about packrat parsing yet. Do you think it
    possible for this type of parser to reach the same performance as LL (or
    LALR) parsers or is there just extra overhead that you need at parse time
    (as opposed to accounted for at parser generation time)? In my
    benchmarking, the 3 packrat parsers I looked at (Treetop, Ghostwheel, and
    Peggy) are 10-100 X slower than pure ruby LL and LALR parsers.

    I also was a previous ANTLR user (you might find my C preprocessor in ANTLR
    2.7.6 releases) before writing my own parser. I decided to stick with LL
    parsing for my Grammar project. It does handle lexer-free parsers (but you
    can have a lexer) and handles LL(*) parsing, but with a performance penalty
    - backtracking. From my understanding, packrat parsers shine in
    backtracking by memoizing to maintain linear performance. I'm wondering if
    I could use this technique and still maintain my non-backtracking
    performance. I thinking I could also make an "engine" for Grammar that did
    packrat or even LALR parsing instead of LL parsing. Independently I should
    be able to translate a PEG syntax, Regexp syntax, or BNF (YACC/ANTLR) to a
    Grammar.

    Also, any of you have JSON parser for ANTLR with a Ruby target? JSON is
    what I've been benchmarking with (because of the ruby quiz) and I'd like to
    compare against ANTLR.

    Eric
     
    Eric Mahurin, Sep 21, 2008
    #3
  4. Eric Mahurin wrote:
    > I still don't know enough about packrat parsing yet. Do you think it
    > possible for this type of parser to reach the same performance as LL (or
    > LALR) parsers or is there just extra overhead that you need at parse time


    Unless you can analyse the heck out of the source grammar, as ANTLR does,
    I think there's always going to be an overhead. Terence Parr's achievement
    is to figure out where backtracking might be needed and do it efficiently
    only when needed.

    Treetop could be improved with some local optimisations, like using Regexp,
    and perhaps looking at the rest of the rule to reduce memoization for example,
    which might pick up 10x improvement or mor, but I expect there'll always be a
    a large overhead over using a hand-crafted parser. There's a suggestion in the
    wind to implement "skip" rules, that produce no syntax nodes, for things like
    whitespace skipping.

    Currently, saying "space*" will generate one node for *each* space - Regexp
    would fix that. So even for a PEG parser, Treetop is unnecessarily slow.

    > It does handle lexer-free parsers (but you
    > can have a lexer) and handles LL(*) parsing, but with a performance penalty
    > - backtracking. From my understanding, packrat parsers shine in
    > backtracking by memoizing to maintain linear performance. I'm wondering if
    > I could use this technique and still maintain my non-backtracking
    > performance.


    The major part of the cost is the construction of so many objects.
    If you provide memoize-when-hinted, I think that'd be best.
    If you also provided a sweet metagrammar (I find your earlier Grammar
    examples make my eyes bleed), I'd be onto it like a shot. I need the
    prospect of non-Ruby code generation however...

    Perhaps an option to "memoize everything" could gather statistics based
    on actual parser behaviour with real input, and produce the backtracking
    hints automatically? That'd be really sweet, because you wouldn't need to
    be a language wizard to work out where to put them.

    > I thinking I could also make an "engine" for Grammar that did
    > packrat or even LALR parsing instead of LL parsing.


    Nathan was thinking about implementing Treetop using a VM, Truth is,
    if the VM had the right hints, it'd be awesome.

    > Also, any of you have JSON parser for ANTLR with a Ruby target? JSON is
    > what I've been benchmarking with (because of the ruby quiz) and I'd like to
    > compare against ANTLR.


    Not I, but I'd be surprised if Google doesn't find one. It'd be pretty
    simple to throw one together anyhow.

    Have you used ANTRWorks BTW? It's excellent!

    Clifford Heath.
     
    Clifford Heath, Sep 23, 2008
    #4
  5. arcadio

    Eric Mahurin Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Tue, Sep 23, 2008 at 4:20 AM, Clifford Heath <> wrote:

    > Eric Mahurin wrote:
    >
    >> I still don't know enough about packrat parsing yet. Do you think it
    >> possible for this type of parser to reach the same performance as LL (or
    >> LALR) parsers or is there just extra overhead that you need at parse time
    >>

    >
    > Unless you can analyse the heck out of the source grammar, as ANTLR does,
    > I think there's always going to be an overhead. Terence Parr's achievement
    > is to figure out where backtracking might be needed and do it efficiently
    > only when needed.
    >
    > Treetop could be improved with some local optimisations, like using Regexp,
    > and perhaps looking at the rest of the rule to reduce memoization for
    > example,
    > which might pick up 10x improvement or mor, but I expect there'll always be
    > a
    > a large overhead over using a hand-crafted parser. There's a suggestion in
    > the
    > wind to implement "skip" rules, that produce no syntax nodes, for things
    > like
    > whitespace skipping.
    >
    > Currently, saying "space*" will generate one node for *each* space - Regexp
    > would fix that. So even for a PEG parser, Treetop is unnecessarily slow.
    >
    > It does handle lexer-free parsers (but you
    >> can have a lexer) and handles LL(*) parsing, but with a performance
    >> penalty
    >> - backtracking. From my understanding, packrat parsers shine in
    >> backtracking by memoizing to maintain linear performance. I'm wondering
    >> if
    >> I could use this technique and still maintain my non-backtracking
    >> performance.
    >>

    >
    > The major part of the cost is the construction of so many objects.
    > If you provide memoize-when-hinted, I think that'd be best.
    > If you also provided a sweet metagrammar (I find your earlier Grammar
    > examples make my eyes bleed), I'd be onto it like a shot. I need the
    > prospect of non-Ruby code generation however...
    >


    I'm curious, could you give an example of what makes your eyes bleed? I
    would like to improve the usability where possible. The Grammar layer
    itself will always be just ruby objects with a ruby DSL. But, I would like
    to provide translators from various formats (BNF, PEG, regex, etc) to
    Grammar objects. I'm sure I could even give you a faster replacement for
    Regexp (using ruby2cext). These translators would simply be Grammars that
    build Grammars (or maybe the code, so you could see/edit it).

    Also, with Grammar 0.8, I've completely separated the parser generation.
    Grammar is just a light-weight layer used to hold the description of the
    language you are parsing. The real work is in the "engine" that does parser
    generation. There aren't many bounds as to what an "engine" could do with a
    Grammar. An engine could:

    - generate a parser in another target language (engine might allow an
    "action" to be simply a code string)
    - directly parse while traversing a Grammar
    - generate another type of parser - LALR, packrat ???
    - build a Graphviz diagram of a Grammar tree
    - "inspect" a Grammar (Grammar#inspect isn't useful because every Grammar
    level mainly just has a lambda)
    - translate to another format - regex, BNF, PEG

    Right now I've implemented only a few engines (some built on each other) -
    Ruby (generates very flat LL ruby parsers), RubyCall (puts method calls in
    certain places), Ruby0 (parses directly instead of generating a parser),
    Ruby2CExt (uses ruby2cext to generate a rubyC parser).


    Perhaps an option to "memoize everything" could gather statistics based
    > on actual parser behaviour with real input, and produce the backtracking
    > hints automatically? That'd be really sweet, because you wouldn't need to
    > be a language wizard to work out where to put them.



    Personally, I don't like the idea of automatic backtracking. Regexp does
    this. The big problem I see with auto-backtracking is that when the parse
    fails, you just get a simple "it failed" response. Or you could possibly
    generate a tree of possible failing locations (and expectations) in the
    input. Nothing as simple as "expected this, got that, at line 123". Ever
    try to debug a large Regexp? If it doesn't parse your input, it just
    returns false and you have nothing to go on. I see the auto-backtracking
    feature of Regexp as the reason it can't do much more than that. If/when I
    make a Regexp translator, it would generate a Grammar that can backtrack on
    every alternative (except with the (?>...) no backtrack construct) to be
    compatible. I guess an "engine" could also backtrack by default.

    In my very first incarnation of Grammar (called Syntax on RAA), I did do
    auto-backtracking, but found the above problem. I'd rather it be the
    exception than the rule.

    Does Treetop auto-backtrack? How do you deal with error reporting?

    If backtracking isn't always on, do you memoize only when you might possibly
    backtrack? In Grammar, I'm wondering if it is possible to take to a
    memoizing performance hit only when backtracking is allowed.

    I thinking I could also make an "engine" for Grammar that did
    > packrat or even LALR parsing instead of LL parsing.
    >


    Nathan was thinking about implementing Treetop using a VM, Truth is,
    > if the VM had the right hints, it'd be awesome.



    I'd also could make a Grammar engine for generating VM bytecode if that were
    possible.

    Also, any of you have JSON parser for ANTLR with a Ruby target? JSON is
    >> what I've been benchmarking with (because of the ruby quiz) and I'd like
    >> to
    >> compare against ANTLR.
    >>

    >
    > Not I, but I'd be surprised if Google doesn't find one. It'd be pretty
    > simple to throw one together anyhow.



    I already found it on the antlr site, but not with a ruby target. I didn't
    want to go relearn antlr (v3 now) and how to get it to generate ruby.

    Have you used ANTRWorks BTW? It's excellent!


    Once. I'd like to make something like this for Grammar (at least an engine
    that graphically displays a Grammar).

    Eric
     
    Eric Mahurin, Sep 23, 2008
    #5
  6. arcadio

    James Gray Guest

    On Sep 23, 2008, at 8:58 AM, Eric Mahurin wrote:

    > I already found it on the antlr site, but not with a ruby target. I
    > didn't want to go relearn antlr (v3 now) and how to get it to
    > generate ruby.


    Hopefully it's fixed now, but last time I tried using ANTLR for Ruby
    output it was very broken. That was a while ago though (shortly after
    version three released, I think).

    James Edward Gray II
     
    James Gray, Sep 23, 2008
    #6
  7. arcadio

    Ryan Davis Guest

    On Sep 23, 2008, at 07:08 , James Gray wrote:

    > On Sep 23, 2008, at 8:58 AM, Eric Mahurin wrote:
    >
    >> I already found it on the antlr site, but not with a ruby target.
    >> I didn't want to go relearn antlr (v3 now) and how to get it to
    >> generate ruby.

    >
    > Hopefully it's fixed now, but last time I tried using ANTLR for Ruby
    > output it was very broken. That was a while ago though (shortly
    > after version three released, I think).


    I doubt it is fixed... I tried to submit patches to get the ruby code
    to work and Terr was a PITA to no end so I walked away. He later came
    to me asking if I wanted to maintain the ruby side, but... too little,
    too late.

    I just checked perforce and the last change to ruby runtime was:

    Change 3846 on 2007/06/26 by 3 '- fixed
    various bugs related...'

    My last emails from Terr were 2007-08, so ... no, probably not fixed
    in the slightest.
     
    Ryan Davis, Sep 23, 2008
    #7
  8. arcadio

    James Gray Guest

    On Sep 23, 2008, at 1:30 PM, Ryan Davis wrote:

    >
    > On Sep 23, 2008, at 07:08 , James Gray wrote:
    >
    >> On Sep 23, 2008, at 8:58 AM, Eric Mahurin wrote:
    >>
    >>> I already found it on the antlr site, but not with a ruby target.
    >>> I didn't want to go relearn antlr (v3 now) and how to get it to
    >>> generate ruby.

    >>
    >> Hopefully it's fixed now, but last time I tried using ANTLR for
    >> Ruby output it was very broken. That was a while ago though
    >> (shortly after version three released, I think).

    >
    > I doubt it is fixed... I tried to submit patches to get the ruby
    > code to work and Terr was a PITA to no end so I walked away. He
    > later came to me asking if I wanted to maintain the ruby side,
    > but... too little, too late.
    >
    > I just checked perforce and the last change to ruby runtime was:
    >
    > Change 3846 on 2007/06/26 by 3 '- fixed
    > various bugs related...'
    >
    > My last emails from Terr were 2007-08, so ... no, probably not fixed
    > in the slightest.


    Yeah, I too felt like Ruby was a second class citizen. I was trying
    to make the switch to ANTLR as my parsing tool, but their week support
    ran me off.

    James Edward Gray II
     
    James Gray, Sep 23, 2008
    #8
  9. arcadio

    Ryan Davis Guest

    On Sep 23, 2008, at 12:19 , James Gray wrote:

    > Yeah, I too felt like Ruby was a second class citizen. I was trying
    > to make the switch to ANTLR as my parsing tool, but their week
    > support ran me off.


    I doubt they even gave it that long... *rimshot*
     
    Ryan Davis, Sep 23, 2008
    #9
  10. Eric Mahurin wrote:
    >> The major part of the cost is the construction of so many objects.
    >> If you provide memoize-when-hinted, I think that'd be best.
    >> If you also provided a sweet metagrammar (I find your earlier Grammar
    >> examples make my eyes bleed), I'd be onto it like a shot. I need the
    >> prospect of non-Ruby code generation however...

    > I'm curious, could you give an example of what makes your eyes bleed?


    I must admit I haven't looked closely at Grammar for a while, however...

    DSLs are fine for folk who don't know how to make proper grammars :).
    They tend to have a somewhat high signal-to-noise ratio, by which I mean
    there are too many extraneous characters/symbols, or symbols that aren't
    intuitively communicative.

    Every non-essential symbol in a sentence is a significant barrier
    to a newcomer, though experienced users learn to skip them and don't
    understand what the fuss is about. The problem is that the difficulty
    of grasping which symbols are significant and which aren't goes with
    the *square* of the S/N ratio.

    XML is a perfect example of how to do this wrongly :) sendmail.cf
    and Perl are even better examples ;-).

    > I would like
    > to provide translators from various formats (BNF, PEG, regex, etc) to
    > Grammar objects.


    That's what I'm talking about, good stuff.

    > Also, with Grammar 0.8, I've completely separated the parser generation.
    > Grammar is just a light-weight layer used to hold the description of the
    > language you are parsing.


    So you're already halfway to a VM, where the generated parser doesn't have
    to be human-readable at all :)

    > Personally, I don't like the idea of automatic backtracking. Regexp does
    > this. The big problem I see with auto-backtracking is that when the parse
    > fails, you just get a simple "it failed" response.


    Not the case with Treetop. It assumes that the furthest point reached is
    the failure point (not often true but an assumption that works, given...)
    and lists the text leading up to that, and enumerates the terminal tokens
    that would have allowed it to get further. This works *really* well.
    Try it - call the "failure_reason" method on a failed parse to see it
    in action.

    > Nothing as simple as "expected this, got that, at line 123".


    It's exactly as simple as that. The diagnosis might be wrong, but to
    the human, it's a comprehensible mistake.

    > Does Treetop auto-backtrack?


    That's what memoizing is for - it's the core of how PEG works.

    > If backtracking isn't always on, do you memoize only when you might possibly
    > backtrack? In Grammar, I'm wondering if it is possible to take to a
    > memoizing performance hit only when backtracking is allowed.


    I don't think you can guess when backtracking will be needed.
    As I said, it requires heavy analysis - Terrence didn't get his
    PhD for nothing!

    But if you always allow backtracking ala PEG, but only memoize in
    places where experience (aka statistics collected) have shown it's
    needed, you could have the best of both worlds.

    > I already found it on the antlr site, but not with a ruby target. I didn't
    > want to go relearn antlr (v3 now) and how to get it to generate ruby.


    You just ask for Ruby in your options block at the top of the grammar.
    The Ruby target is quite strange and limited though IIRC. The lexer
    in all ANTLR grammars is definitely non-intuitive, it doesn't choose
    the longest possible match like every other lexer in the world. Terr
    isn't concerned about his users, just about big-noting himself. I
    wouldn't say "their weak support ran me off" so much as just "they ran
    me off". When he pisses someone off, that just proves his superiority
    more... and I have the emails to prove it. You mighta thought DHH had
    an attitude problem, heh, he's tame!

    > Have you used ANTRWorks BTW? It's excellent!
    > Once. I'd like to make something like this for Grammar (at least an engine
    > that graphically displays a Grammar).


    ANTLRWorks shows railroad diagrams, which I used to produce syntax
    documentation for CQL. But the ability to interactively test individual
    rules is great. A lot of work though - I'd have thought your skills
    were better spent elsewhere.

    Clifford Heath.
     
    Clifford Heath, Sep 24, 2008
    #10
  11. I wrote a PEG parser generator a couple of years ago to support my
    Walrus object-oriented templating system (inspired by the Cheetah
    template engine in Python, and carrying over a lot of its syntax and
    directives). Actually, the reason I did this was because I didn't have
    any luck getting ANTLR to do what I wanted it to do.

    The parser generator uses a Ruby DSL to specify the grammar,
    dynamically constructing the parser on the fly. To get an idea of what
    the DSL looks like and what it can do, check out the parser for the
    Walrus language itself; it is a fairly complex example because it
    demonstrates things like "here documents", include files, and parses a
    subset of Ruby itself in addition to the Walrus markup:

    http://git.wincent.com/Walrus.git?a=blob;f=lib/walrus/parser.rb

    Like Treetop, it is fairly slow, but it works. The major performance
    penalty actually seems to be the memory bloat associated with
    memoization. There is also a resource leak in there where a lot of the
    memoized data doesn't appear to be getting garbage-collected when you
    run the parser over lots of documents in the same session; never got
    to the bottom of that one despite spending quite a few hours on it. My
    workaround was to process multiple files in batches, running Walrus
    once for each file rather than passing it all the files at once. Not
    very elegant but it worked.

    Given that it works I haven't really done any more development on it.
    But an academic itch that I've wanted to scratch for some time now has
    been turning it into a performance powerhouse; either by replacing the
    dynamically generated parser with a custom Ragel lexer (packaged as a
    C extension for Ruby) or by doing static grammar analysis like ANTLR
    does to identify places where prediction could be used to circumvent
    unnecessary backtracking and jump straight to the right alternative.

    But I think both of those options would require me to lose the Ruby
    DSL and switch to a custom, non-Ruby language for specifying grammars.
    One of the things I always liked about the system was that you wrote
    your grammars in Ruby. Evidently, you can only do static analysis once
    you've parsed the grammar, and that would be fiendishly difficult (or
    impossible?) to do with a dynamically-constructed one built on the fly
    by evaluating the Ruby DSL.

    Cheers,
    Wincent
     
    Wincent Colaiuta, Sep 24, 2008
    #11
  12. arcadio

    Eric Mahurin Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Tue, Sep 23, 2008 at 9:04 PM, Clifford Heath <> wrote:

    > DSLs are fine for folk who don't know how to make proper grammars :).
    >


    OK. I guess that's that source of the "bleeding eyes"... I'm sounds like I
    won't be able to convince you why a DSL is better than a specific format,
    and you vice-versa. A quick summary is: although you lose the exact syntax
    that you may want, you gain in power and flexibility. Using Grammar
    (directly), you could template Grammars (method/lambda could return a
    Grammar based on Grammar args and other params), use plain ruby for extra
    control when specifying a Grammar tree, or simply embed Grammar objects in a
    ruby program (like Regexp objects can be - although they parse the grammar
    from a string).

    I would like
    >> to provide translators from various formats (BNF, PEG, regex, etc) to
    >> Grammar objects.
    >>

    >
    > That's what I'm talking about, good stuff.
    >
    > Also, with Grammar 0.8, I've completely separated the parser generation.
    >> Grammar is just a light-weight layer used to hold the description of the
    >> language you are parsing.
    >>

    >
    > So you're already halfway to a VM, where the generated parser doesn't have
    > to be human-readable at all :)



    Better - it is a compiler. Compiles from a Grammar to a target language.
    Grammar has always done this. Previously, Ruby code generation was done in
    the Grammar class itself. I've separated it to add quite a bit of
    flexibility.


    >
    > Personally, I don't like the idea of automatic backtracking. Regexp does
    >> this. The big problem I see with auto-backtracking is that when the parse
    >> fails, you just get a simple "it failed" response.
    >>

    >
    > Not the case with Treetop. It assumes that the furthest point reached is
    > the failure point (not often true but an assumption that works, given...)
    > and lists the text leading up to that, and enumerates the terminal tokens
    > that would have allowed it to get further. This works *really* well.
    > Try it - call the "failure_reason" method on a failed parse to see it
    > in action.
    >


    OK. I guess it just remembers the furthest point it reached when
    backtracking and when all possibilities are exhausted, it reports this
    furthest point. Sounds reasonable. I guess I could try something like this
    when I backtrack. Right now, I ignore errors/mismatches that cause a
    backtrack.


    > Does Treetop auto-backtrack?
    >>

    >
    > That's what memoizing is for - it's the core of how PEG works.
    >


    Don't you mean packrat instead of PEG? I thought PEG was just a format for
    describing something to parse, like BNF. And packrat refers to the type of
    parser - like LL or LALR. From my understanding packrat is a cousin of LL,
    but adds memoizing to achieve linear backtracking performance.


    >
    > If backtracking isn't always on, do you memoize only when you might
    >> possibly
    >> backtrack? In Grammar, I'm wondering if it is possible to take to a
    >> memoizing performance hit only when backtracking is allowed.
    >>

    >
    > I don't think you can guess when backtracking will be needed.
    > As I said, it requires heavy analysis - Terrence didn't get his
    > PhD for nothing!
    >


    In Grammar, I don't try to guess. Backtracking is normally not done. To
    allow for backtracking, I provide a Grammar#backtrack method that returns a
    new Grammar that can backtrack to the beginning of where the Grammar started
    parsing the input when it fails in the middle. I would only apply
    memoization for one of these - if this is doable.


    > I already found it on the antlr site, but not with a ruby target. I
    >> didn't
    >> want to go relearn antlr (v3 now) and how to get it to generate ruby.
    >>

    >
    > You just ask for Ruby in your options block at the top of the grammar.
    > The Ruby target is quite strange and limited though IIRC. The lexer
    > in all ANTLR grammars is definitely non-intuitive, it doesn't choose
    > the longest possible match like every other lexer in the world. Terr
    > isn't concerned about his users, just about big-noting himself. I
    > wouldn't say "their weak support ran me off" so much as just "they ran
    > me off". When he pisses someone off, that just proves his superiority
    > more... and I have the emails to prove it. You mighta thought DHH had
    > an attitude problem, heh, he's tame!
    >


    I didn't encounter that a few years ago. I mostly just learned a lot about
    LL parsing and wanted to make something better. It does sound like others
    have been run off though...


    > Have you used ANTRWorks BTW? It's excellent!
    >> Once. I'd like to make something like this for Grammar (at least an
    >> engine
    >> that graphically displays a Grammar).
    >>

    >
    > ANTLRWorks shows railroad diagrams, which I used to produce syntax
    > documentation for CQL. But the ability to interactively test individual
    > rules is great. A lot of work though - I'd have thought your skills
    > were better spent elsewhere.
    >


    I was only thinking making a Grammar engine that generated dot files for
    Graphviz to get a visualization of a Grammar.

    Eric
     
    Eric Mahurin, Sep 24, 2008
    #12
  13. arcadio

    Eric Mahurin Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Wed, Sep 24, 2008 at 3:49 AM, Wincent Colaiuta <> wrote:

    > I wrote a PEG parser generator a couple of years ago to support my
    > Walrus object-oriented templating system (inspired by the Cheetah
    > template engine in Python, and carrying over a lot of its syntax and
    > directives). Actually, the reason I did this was because I didn't have
    > any luck getting ANTLR to do what I wanted it to do.



    It seems like everybody writing parser (generators) is coming from ANTLR. I
    count Clifford Heath, James Gray, Ryan Davis, Wincent, and me.

    Wincent, do you happen to have something in Walrus that parses JSON and
    generates ruby (tree of hashs and arrays)? I've benchmarked a bunch of
    parser generators out there using JSON and would always like to add more.
     
    Eric Mahurin, Sep 24, 2008
    #13
  14. arcadio

    Dave Thomas Guest

    On Sep 24, 2008, at 7:23 AM, Eric Mahurin wrote:

    > Wincent, do you happen to have something in Walrus that parses JSON
    > and
    > generates ruby (tree of hashs and arrays)? I've benchmarked a bunch
    > of
    > parser generators out there using JSON and would always like to add
    > more.


    Have you tried the JSON parser one built in to 1.9? It's a native
    extension, but I don't know how its performance compares with other
    JSON parsers.


    Dave
     
    Dave Thomas, Sep 24, 2008
    #14
  15. arcadio

    James Gray Guest

    On Sep 24, 2008, at 7:23 AM, Eric Mahurin wrote:

    > I'm sounds like I
    > won't be able to convince you why a DSL is better than a specific
    > format,
    > and you vice-versa. A quick summary is: although you lose the exact
    > syntax
    > that you may want, you gain in power and flexibility. Using Grammar
    > (directly), you could template Grammars (method/lambda could return a
    > Grammar based on Grammar args and other params), use plain ruby for
    > extra
    > control when specifying a Grammar tree, or simply embed Grammar
    > objects in a
    > ruby program (like Regexp objects can be - although they parse the
    > grammar
    > from a string).


    When I wrote GhostWheel, I was convinced the DSL was better too. Then
    I tried using it and changed my mind. :)

    The DSL adds a lot of Ruby noise that makes it harder to think about
    and understand the language being parsed. It keeps you from focusing
    on the right things.

    For example, compare this JSON parser using GhostWheel's DSL:

    http://ghostwheel.rubyforge.org/svn/trunk/test/example/tc_json_ruby.rb

    and this using a grammar:

    http://ghostwheel.rubyforge.org/svn/trunk/test/example/tc_json_ghost_wheel.rb

    I'm not arguing my grammar is perfect either. I think it could get
    even more readable. But have a look at how clear rules like
    value_space, value_content, value, and value_with_array_sep play out.

    I also think most of the advantages you list can be gained with non-
    DSL grammars. I mean, we can always dynamically build up a String in
    Ruby too and that's the dumbest thing I could possibly think of.

    Anyway, the two don't have to be exclusive. As you see, GhostWheel
    supports both.

    James Edward Gray II
     
    James Gray, Sep 24, 2008
    #15
  16. arcadio

    Eric Mahurin Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Wed, Sep 24, 2008 at 7:54 AM, Dave Thomas <> wrote:

    >
    > On Sep 24, 2008, at 7:23 AM, Eric Mahurin wrote:
    >
    > Wincent, do you happen to have something in Walrus that parses JSON and
    >> generates ruby (tree of hashs and arrays)? I've benchmarked a bunch of
    >> parser generators out there using JSON and would always like to add more.
    >>

    >
    > Have you tried the JSON parser one built in to 1.9? It's a native
    > extension, but I don't know how its performance compares with other JSON
    > parsers.
    >
    >
    > Dave
    >
    >

    Yes. This is the same as the "json" gem that Florian Frank did. Florian
    used Ragel to generate a pure C extension. The Ragel file is about 20X
    larger than what my gramar specification is (and most other ruby parsers)
    because all the actions are C. But it is really, really fast. This parser
    is a key benchmark for me. My Grammar generated parser (using ruby2cext) is
    about half the speed of it now. Still 100X -1000X faster than most other
    ruby parsers out there. Dominik just gave me developer priviledges for
    ruby2cext and I have ideas to double the speed again.
     
    Eric Mahurin, Sep 24, 2008
    #16
  17. arcadio

    Eric Mahurin Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Wed, Sep 24, 2008 at 8:32 AM, James Gray <>wrote:

    > On Sep 24, 2008, at 7:23 AM, Eric Mahurin wrote:
    >
    > I'm sounds like I
    >> won't be able to convince you why a DSL is better than a specific format,
    >> and you vice-versa. A quick summary is: although you lose the exact
    >> syntax
    >> that you may want, you gain in power and flexibility. Using Grammar
    >> (directly), you could template Grammars (method/lambda could return a
    >> Grammar based on Grammar args and other params), use plain ruby for extra
    >> control when specifying a Grammar tree, or simply embed Grammar objects in
    >> a
    >> ruby program (like Regexp objects can be - although they parse the grammar
    >> from a string).
    >>

    >
    > When I wrote GhostWheel, I was convinced the DSL was better too. Then I
    > tried using it and changed my mind. :)
    >
    > The DSL adds a lot of Ruby noise that makes it harder to think about and
    > understand the language being parsed. It keeps you from focusing on the
    > right things.
    >
    > For example, compare this JSON parser using GhostWheel's DSL:
    >
    > http://ghostwheel.rubyforge.org/svn/trunk/test/example/tc_json_ruby.rb
    >
    > and this using a grammar:
    >
    >
    > http://ghostwheel.rubyforge.org/svn/trunk/test/example/tc_json_ghost_wheel.rb
    >
    > I'm not arguing my grammar is perfect either. I think it could get even
    > more readable. But have a look at how clear rules like value_space,
    > value_content, value, and value_with_array_sep play out.
    >
    > I also think most of the advantages you list can be gained with non-DSL
    > grammars. I mean, we can always dynamically build up a String in Ruby too
    > and that's the dumbest thing I could possibly think of.
    >
    > Anyway, the two don't have to be exclusive. As you see, GhostWheel
    > supports both.



    I think my DSL is still fairly clean because you can simply use variable
    assignments for "rules" and use the "+" and "|" operators for sequences and
    alternations. Those help a lot. Ideally I would use a space instead of "+"
    to match BNF, but it doesn't seem that bad.

    But, I think I'll also have converters (PEG, BNF, regex) to give the non-DSL
    approach.
     
    Eric Mahurin, Sep 24, 2008
    #17
  18. On 24 sep, 14:23, Eric Mahurin <> wrote:

    > Wincent, do you happen to have something in Walrus that parses JSON and
    > generates ruby (tree of hashs and arrays)?  I've benchmarked a bunch of
    > parser generators out there using JSON and would always like to add more.


    Nope, sorry. Although my cursory knowledge of JSON syntax leads me to
    believe that the JSON grammar would be pretty short and easy to whip
    up.

    Cheers,
    Wincent
     
    Wincent Colaiuta, Sep 24, 2008
    #18
  19. On 24 sep, 14:23, Eric Mahurin <> wrote:
    >
    > On Tue, Sep 23, 2008 at 9:04 PM, Clifford Heath <> wrote:
    > > Does Treetop auto-backtrack?

    >
    > > That's what memoizing is for - it's the core of how PEG works.

    >
    > Don't you mean packrat instead of PEG?  I thought PEG was just a formatfor
    > describing something to parse, like BNF.  And packrat refers to the type of
    > parser - like LL or LALR.  From my understanding packrat is a cousin ofLL,
    > but adds memoizing to achieve linear backtracking performance.


    I think your right about the first part, but not about the second
    part; Packrat has nothing to do with LL or LALR parsing.

    A PEG is a Parsing Expression Grammar. Its distinguishing features
    are:

    - has the usual repetition (zero or more, one or more) operators,
    sequence operators, alternative operators, as well as "and predicates"
    and "not predicates"

    - grammars are non-ambiguous because alternatives are specified as a
    set of ordered choices

    - lends itself to recognition by recursive descent, and maps quite
    well onto the way human beings perform recognition in their own minds

    A Packrat parser is a memoizing recursive-descent parser that
    recognizes input based on a PEG. Its distinguishing feature is the
    memoization.

    At least, that's my understanding of it.

    Cheers,
    Wincent
     
    Wincent Colaiuta, Sep 24, 2008
    #19
  20. arcadio

    Ryan Davis Guest

    Ryan Davis, Sep 24, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Georg Kaiser

    ANTLR Calculator

    Georg Kaiser, Feb 20, 2004, in forum: C++
    Replies:
    1
    Views:
    1,690
    Jack Klein
    Feb 21, 2004
  2. Edward C. Jones

    Convert Python grammar specs to ANTLR

    Edward C. Jones, Feb 27, 2004, in forum: Python
    Replies:
    0
    Views:
    412
    Edward C. Jones
    Feb 27, 2004
  3. Daniel Pitts
    Replies:
    2
    Views:
    354
    Daniel Pitts
    Sep 28, 2007
  4. cdf
    Replies:
    1
    Views:
    94
    gabriele renzi
    May 5, 2005
  5. Kyle
    Replies:
    0
    Views:
    103
Loading...

Share This Page