ANTLR Target for Ruby

E

Eric Mahurin

[Note: parts of this message were removed to make it a legal post.]

I think your right about the first part, but not about the second
part; Packrat has nothing to do with LL or LALR parsing.

A PEG is a Parsing Expression Grammar. Its distinguishing features
are:

- has the usual repetition (zero or more, one or more) operators,
sequence operators, alternative operators, as well as "and predicates"
and "not predicates"

- grammars are non-ambiguous because alternatives are specified as a
set of ordered choices

- lends itself to recognition by recursive descent, and maps quite
well onto the way human beings perform recognition in their own minds

A Packrat parser is a memoizing recursive-descent parser that
recognizes input based on a PEG. Its distinguishing feature is the
memoization.

At least, that's my understanding of it.

Recursive descent parsing is a form of LL parsing. See here:

http://en.wikipedia.org/wiki/LL_parser

I'd also call packrat parsing a form of LL parsing since recursive descent
is a form of LL parsing and packrat parsing is a form of recursive descent.
Maybe it would be LL(*) with memoization.

Like ANTLR my Grammar project also generates recursive descent parsers,
although mine flattens all methods except where recursion is actually
required. My Grammars are non-ambiguous (first alternative wins) like
packrat parsers.
 
C

Clifford Heath

Eric said:
A quick summary is: although you lose the exact syntax
that you may want, you gain in power and flexibility.

As James has said, I don't believe the latter is necessarily true.
It's beholden on we who want folk to create proper grammars to use
best practise ourselves, anyhow.
Don't you mean packrat instead of PEG?

You're right, but I identify the two, as I've never seen one used
without the other.
I was only thinking making a Grammar engine that generated dot files for
Graphviz to get a visualization of a Grammar.

Ok. That wouldn't use useful to me, because when I have to document a
grammar, the diagrams aren't as appropriate for that audience cf a railroad
diagram. See my "Introduction to CQL" here, which has railroads extracted
from ANTLRWorks: <dataconstellation.com/ActiveFacts/CQLIntroduction.pdf>.

Clifford Heath.
 
C

Clifford Heath

Wincent said:
Like Treetop, it is fairly slow, but it works.
Nice!

But an academic itch that I've wanted to scratch for some time now has
been turning it into a performance powerhouse; either by replacing the
dynamically generated parser with a custom Ragel lexer (packaged as a
C extension for Ruby) or by doing static grammar analysis like ANTLR
does to identify places where prediction could be used to circumvent
unnecessary backtracking and jump straight to the right alternative.

What do you think of my third option: Provide memoization only where a
test workload shows the need for it? Backtrack in every situation just
in case, but memoize where testing shows it's useful - totally pragmatic
approach...
One of the things I always liked about the system was that you wrote
your grammars in Ruby.

I think that's an error, based on the presumption that "is Ruby, is good".
Not all that's Ruby is golden,,, err... red ;-). By which I mean that
one size doesn't fit all, or we wouldn't need parsers in the first place.

Clifford Heath.
 
C

Clifford Heath

James said:
When I wrote GhostWheel, I was convinced the DSL was better too. Then
I tried using it and changed my mind. :)
For example, compare this JSON parser using GhostWheel's DSL:
http://ghostwheel.rubyforge.org/svn/trunk/test/example/tc_json_ghost_wheel.rb
I also think most of the advantages you list can be gained with non-
DSL grammars. I mean, we can always dynamically build up a String in
Ruby too and that's the dumbest thing I could possibly think of.

A trick I've been wanting to use in a DSL would make your Ghostwheel
example start like this:

def setup
@parser = GhostWheel.build_parser {%q{
...
}}
end

By passing a block to build_parser, you're also passing that block's binding.
Now, after build_parser calls the block to get the string it must parse, it
can identify references to variables including local variables in the "setup"
method, and it can get and set values for those variables, or call methods on
the referenced objects.

I plan to use this to access query parameters when I embed CQL queries into
Ruby. No need for string interpolation, just name the variable in the CQL and
the query engine can access the Ruby object that contains the value - nice!

Just another reason why a parsed language doesn't have to be second-class to
a DSL.

Clifford Heath.
 
E

Eric Mahurin

[Note: parts of this message were removed to make it a legal post.]

But I think both of those options would require me to lose the Ruby
DSL and switch to a custom, non-Ruby language for specifying grammars.
One of the things I always liked about the system was that you wrote
your grammars in Ruby. Evidently, you can only do static analysis once
you've parsed the grammar, and that would be fiendishly difficult (or
impossible?) to do with a dynamically-constructed one built on the fly
by evaluating the Ruby DSL.

Hi Wincent,

Just because you are using a Ruby DSL, you shouldn't be limited. There is
nothing that says your DSL couldn't build exactly the same data structures
that you'd build if you read/parsed in a non-Ruby DSL. In my Grammar
project, I create a Grammar tree with a Ruby DSL (simply methods that create
and combine Grammars). I can parse directly using this tree or I can
generate highly optimized code. Here are a few optimizations: flattening
(method calls only needed for recursion), tail call optimization (right
recursion), left recursion (normally unheard of with LL), some refactoring,
and boolean minimizations. The main thing that should matter in terms of
analysis should be your data structures, not how you get there. A ruby DSL
will only enhance flexibility.

Eric
 
W

Wincent Colaiuta

I think that's an error, based on the presumption that "is Ruby, is good"..
Not all that's Ruby is golden,,, err... red ;-).

Well, using a Ruby DSL brought with it three advantages:

1. Could develop incrementally seeing as I already had a working
interpreter (the Ruby interpreter); no need to develop a custom parser
to parse the DSL. This allowed me to develop very very quickly with
minimal time between "initial idea" and "first basic, working
implementation".

2. Could leverage existing Ruby functionality to take shortcuts; for
example, regular expressions were already just sitting there waiting
to be used, or lambda blocks to create miniature "sub-parsers" and
handle embedded langauges etc.

3. (Debatably) A shorter learning curve because I already know Ruby.

None of these are show-stopping "deal-winners", but they were things
that lead me down the DSL-in-Ruby path. That and the fact that there
are so many DSLs written in Ruby out there that you find yourself
using them all the time (to a person with a hammer, everything looks
like a nail).

Just because you are using a Ruby DSL, you shouldn't be limited.  Thereis
nothing that says your DSL couldn't build exactly the same data structures
that you'd build if you read/parsed in a non-Ruby DSL.  In my Grammar
project, I create a Grammar tree with a Ruby DSL (simply methods that create
and combine Grammars).  I can parse directly using this tree or I can
generate highly optimized code.  Here are a few optimizations: flattening
(method calls only needed for recursion), tail call optimization (right
recursion), left recursion (normally unheard of with LL)

Yes, although Pappy, written in Haskell and one of the first Packrat
parsers (if not the first) does it using Higher Order functions. I do
it using Ruby continuations.
some refactoring,
and boolean minimizations.  The main thing that should matter in terms of
analysis should be your data structures, not how you get there.  A rubyDSL
will only enhance flexibility.

I think part of the problem is that the Ruby DSL is too open-ended, at
least as I've used it. This is simultaneously a great advantage but
also a great disadvantage when it comes to static analysis.

Let's take regular expressions, for example. By using a Ruby DSL you
can allow the user to specify something using a regular expression and
you can suddenly recognize all those regular language structures
basically "for free". But if you want to do static grammar analysis on
something that uses Ruby regular expressions you either have to look
inside the expressions at runtime (effectively writing yet another
parser to break them down), or you have to forgo Ruby's and roll your
own. This is because in order to do any kind of useful static analysis
you have to be able to reduce the regular expression into a canonical
DFA.

Another example would be the use of lambda's to provide dynamic
context-sensitivity at runtime. I use these, for example, to parse
"here documents", where you don't know what the end marker is going to
look like at the time you construct the grammar. This is a great
convenience for the user and allows you to recognize some pretty
powerful stuff, but it is also pretty much immune to static analysis
because you can't know what's going in inside the lambda.

Cheers,
Wincent
 
E

Eric Mahurin

[Note: parts of this message were removed to make it a legal post.]

Well, using a Ruby DSL brought with it three advantages:

1. Could develop incrementally seeing as I already had a working
interpreter (the Ruby interpreter); no need to develop a custom parser
to parse the DSL. This allowed me to develop very very quickly with
minimal time between "initial idea" and "first basic, working
implementation".

2. Could leverage existing Ruby functionality to take shortcuts; for
example, regular expressions were already just sitting there waiting
to be used, or lambda blocks to create miniature "sub-parsers" and
handle embedded langauges etc.

3. (Debatably) A shorter learning curve because I already know Ruby.

None of these are show-stopping "deal-winners", but they were things
that lead me down the DSL-in-Ruby path. That and the fact that there
are so many DSLs written in Ruby out there that you find yourself
using them all the time (to a person with a hammer, everything looks
like a nail).



Yes, although Pappy, written in Haskell and one of the first Packrat
parsers (if not the first) does it using Higher Order functions. I do
it using Ruby continuations.


If you mean parser combinator when you say "higher order function", I would
assume that many/most parser DSLs specified in a programming language use
parser combinators. Mine does. My first approach was to parse directly
when the DSL is "executed" and I assume that is what you are doing. But,
instead you can make the DSL build a data structure (which might have user
specified blocks/lambdas in it). You can then analyze and then parse.
some refactoring,


Let's take regular expressions, for example. By using a Ruby DSL you
can allow the user to specify something using a regular expression and
you can suddenly recognize all those regular language structures
basically "for free". But if you want to do static grammar analysis on
something that uses Ruby regular expressions you either have to look
inside the expressions at runtime (effectively writing yet another
parser to break them down), or you have to forgo Ruby's and roll your
own. This is because in order to do any kind of useful static analysis
you have to be able to reduce the regular expression into a canonical
DFA.


I think I see what you are saying. I assume you are talking about the
"actions" primarily. If you allow user actions to play a role in parsing
(not just AST generation), it would be an issue. I suppose this could limit
some optimizations for mine (actions could make a grammar context sensitive
for example), but I haven't found it to be performance limiting.

If you use something like the rubynode library that can extract ruby parse
trees from lambdas (or any ruby in the current interpreter), you could
possibly do some analysis of even the actions. But that probably isn't
worth the effort.

Eric
 
W

Wincent Colaiuta

If you mean parser combinator when you say "higher order function", I would
assume that many/most parser DSLs specified in a programming language use
parser combinators.  Mine does.  My first approach was to parse directly
when the DSL is "executed" and I assume that is what you are doing.  But,
instead you can make the DSL build a data structure (which might have user
specified blocks/lambdas in it).  You can then analyze and then parse.

Not sure if we're using the same terminology here, so I'll clarify
what I meant.

The left-recursion thing I'm talking about is described in this
thesis:

http://pdos.csail.mit.edu/~baford/packrat/thesis/

See page 69, which talks about how it "rewrites" simple left recursion
(but not indirect left recursion) by refactoring it into the
equivalent right recursive rules, and page 40, where it describes how
the desired associativity of the original left recursive construct is
preserved through the use of "higher-order functions as intermediate
parser results".

As for whether this constitutes a "parser combinator", I think so, if
we agree that a parser combinator is a "function that can take
functions as input and can also produce functions as
output" (paraphrased from the Wikipedia article on parser
combinators).

According to that definition I don't actually think that many/most
parser DSLs use parser combinators. Mine, for example, for most
purposes passes around "parslet" instances rather than functions (or
lambdas).

Cheers,
Wincent
 
E

Eric Mahurin

[Note: parts of this message were removed to make it a legal post.]

Not sure if we're using the same terminology here, so I'll clarify
what I meant.

The left-recursion thing I'm talking about is described in this
thesis:

http://pdos.csail.mit.edu/~baford/packrat/thesis/<http://pdos.csail.mit.edu/~baford/packrat/thesis/>

See page 69, which talks about how it "rewrites" simple left recursion
(but not indirect left recursion) by refactoring it into the
equivalent right recursive rules, and page 40, where it describes how
the desired associativity of the original left recursive construct is
preserved through the use of "higher-order functions as intermediate
parser results".

As for whether this constitutes a "parser combinator", I think so, if
we agree that a parser combinator is a "function that can take
functions as input and can also produce functions as
output" (paraphrased from the Wikipedia article on parser
combinators).

According to that definition I don't actually think that many/most
parser DSLs use parser combinators. Mine, for example, for most
purposes passes around "parslet" instances rather than functions (or
lambdas).

Sorry, Wincent. I didn't realize you were talking about left recursion
specifically. I thought your previous response was about doing analysis and
optimizations in general. Thank you for pointing out the packrat history
with left recursion.

I also used continuations at one time to accomplish left recursion. It gave
me a nice education in their use. But, they are soooo slow. I found it not
too difficult to form loops when I find left recursion. My left recursion
can do direct or indirect. I'm guessing since you are using continuations,
your left recursion is fairly flexible too.

My take on parser combinators is that functions/methods are used to combine
smaller parsers together to make larger ones. It sounds like yours does
this by building larger "parslet"s from smaller ones. Mine combines Grammar
objects.

Eric
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top