Is pyparsing really a recursive descent parser?

Just Another Victim of the Ambient Morality · Nov 2, 2007

Is pyparsing really a recursive descent parser? I ask this because
there are grammars it can't parse that my recursive descent parser would
parse, should I have written one. For instance:

from pyparsing import *

grammar = OneOrMore(Word(alphas)) + Literal('end')
grammar.parseString('First Second Third end')

Amazingly enough, this will fail to parse!
Now, maybe I have no idea what I'm talking about but shouldn't a
recursive descent parser recursively descend through all possible rule
combinations to discover one that works? Why does pyparsing fail to parse
this? Is there a way to get pyparsing to parse a grammar like this?
Thank you...

Marc 'BlackJack' Rintsch · Nov 2, 2007

Is pyparsing really a recursive descent parser? I ask this because
there are grammars it can't parse that my recursive descent parser would
parse, should I have written one. For instance:

from pyparsing import *

grammar = OneOrMore(Word(alphas)) + Literal('end')
grammar.parseString('First Second Third end')

Amazingly enough, this will fail to parse!
Now, maybe I have no idea what I'm talking about but shouldn't a
recursive descent parser recursively descend through all possible rule
combinations to discover one that works? Why does pyparsing fail to parse
this?

Pyparsing is no recursive descent parser. It doesn't go back in the input
stream. The ``OneOrMore(Word(alphas))`` part "eats" the 'end' and when it
can't get more, the parser moves to the ``Literal('end')`` part which
fails because the 'end' is already gone.

Is there a way to get pyparsing to parse a grammar like this?

Negative lookahead maybe:

grammar = (OneOrMore(NotAny(Literal('end')) + Word(alphas))
+ Literal('end'))

Ciao,
Marc 'BlackJack' Rintsch

Paul McGuire · Nov 2, 2007

Pyparsing is no recursive descent parser. It doesn't go back in the input
stream. The ``OneOrMore(Word(alphas))`` part "eats" the 'end' and when it
can't get more, the parser moves to the ``Literal('end')`` part which
fails because the 'end' is already gone.

Negative lookahead maybe:

grammar = (OneOrMore(NotAny(Literal('end')) + Word(alphas))
+ Literal('end'))

Ciao,
Marc 'BlackJack' Rintsch- Hide quoted text -

- Show quoted text -

Well I'll be darned! All this time, I thought "recursive descent"
described the recursive behavior of the parser, which pyparsing
definitely has. I never knew that backing up in the face of parse
mismatches was a required part of the picture.

In pyparsing, each construction gets composed from more fine-grained
constructions, and they are called recursively to match or not match
against the input string.

For example, taking your working parser example:

grammar = (OneOrMore(NotAny(Literal('end')) + Word(alphas))
+ Literal('end'))

This creates the following data structure:

- And
- OneOrMore
- And
- NotAny
- Literal('end')
- Word(alphas)
- Literal('end')

Every instance in this structure derives from the base class
ParserElement, which defines a method parse(). parse() in turn calls
preParse(), parseImpl(), and postParse(). If parseImpl succeeds, it
returns a ParseResults object and the next location within the input
string to continue parsing.

The parseImpl methods are most often overridden in the subclasses (a
few override postParse as well), such as:
- And.parseImpl invokes parse() (note the recursion) on each of the
expressions in its list. All must succeed or And.parseImpl throws an
exception.
- OneOrMore.parseImpl invokes parse() on its contained expression,
which must succeed at least once; if not, the exception is rethrown.
If the contained expression succeeds once, then its parse() method is
called again and again until it fails, but no exceptions are rethrown,
since only one match was actually required.
- NotAny inverts the success/failure of its contained expression. If
the expression's parse() method succeeds, NotAny.parseImpl throws an
exception. If the contained expression's parse() method throws an
exception, NotAny returns successfully. (NotAny does not advance the
parse location, nor does it return any tokens.)
- Literal and Word are terminals, in that they do not invoke any other
expression's parse() method. Literal.parseImpl tests whether its
string exists at the current parse location, and throws an exception
if it doesn't. Word.parseImpl tests whether the current parse
location contains a letter in the Word instance's set of valid initial
characters - if so success; if not, throws an exception. It then
advances through the input string, matching characters in the Word
instance's set of valid body characters. The entire matched string is
then returned, along with an updated string index at which to continue
parsing.

In my concept of "recursive descent" parsing, I was under the
impression that pyparsing's use of this data structure, and
*recursive* calls of parse() as it *descends* through the data
structure, constituted a recursive descent parser. What the OP
requested was a more regular expression-ish matcher, with backup (or
backtracking).

Your example did not include either of the alternation classes,
MatchFirst or Or. These classes implement a form of backtracking on
the stack, that if we descend into an expression on the list of
alternatives, and that expression fails, then MatchFirst or Or will
back up to the same starting location and try the next alternative.
(MatchFirst is an "eager" matcher, in that it will pick the first
matching expression in its list - Or is an "exhaustive" matcher, in
that it will evaluate all of the alternatives, and select the longest
match.)

The Wikipedia entry for "Recursive Descent Parser" describes this
parser model as a "predictive parser", and later goes on to say that
some (uncited) authors equate "predictive parser" with "recursive
descent parsers". The article makes a special distinction for a
"recursive parser with backup", which is what I believe the OP was
asking for. Yes, pyparsing is definitely *not* a "recursive descent
parser with backup." The solution, as you correctly posted, is to add
a negative lookahead. NotAny is also represented using the '~'
operator.

By the way, there are a couple of other places where pyparsing
diverges from the standard model:
- implicit whitespace skipping
- user-definable comment skipping
- attachment of parse actions to expressions
These features, while atypical, provide some added capability and ease-
of-use in creating quick and simple parsers.

So I will take my stance with the uncited authors who lump predictive
parsers in with recursive descent parsers.

-- Paul

Just Another Victim of the Ambient Morality · Nov 3, 2007

Paul McGuire said:
Well I'll be darned! All this time, I thought "recursive descent"
described the recursive behavior of the parser, which pyparsing
definitely has. I never knew that backing up in the face of parse
mismatches was a required part of the picture.

It has recursion in it but that's not sufficient to call it a recursive
descent parser any more than having a recursive implementation of the
factorial function. The important part is what it recurses through...

In pyparsing, each construction gets composed from more fine-grained
constructions, and they are called recursively to match or not match
against the input string.

For example, taking your working parser example:

grammar = (OneOrMore(NotAny(Literal('end')) + Word(alphas))
+ Literal('end'))

This creates the following data structure:

- And
- OneOrMore
- And
- NotAny
- Literal('end')
- Word(alphas)
- Literal('end')

Every instance in this structure derives from the base class
ParserElement, which defines a method parse(). parse() in turn calls
preParse(), parseImpl(), and postParse(). If parseImpl succeeds, it
returns a ParseResults object and the next location within the input
string to continue parsing.

The parseImpl methods are most often overridden in the subclasses (a
few override postParse as well), such as:
- And.parseImpl invokes parse() (note the recursion) on each of the
expressions in its list. All must succeed or And.parseImpl throws an
exception.
- OneOrMore.parseImpl invokes parse() on its contained expression,
which must succeed at least once; if not, the exception is rethrown.
If the contained expression succeeds once, then its parse() method is
called again and again until it fails, but no exceptions are rethrown,
since only one match was actually required.
- NotAny inverts the success/failure of its contained expression. If
the expression's parse() method succeeds, NotAny.parseImpl throws an
exception. If the contained expression's parse() method throws an
exception, NotAny returns successfully. (NotAny does not advance the
parse location, nor does it return any tokens.)
- Literal and Word are terminals, in that they do not invoke any other
expression's parse() method. Literal.parseImpl tests whether its
string exists at the current parse location, and throws an exception
if it doesn't. Word.parseImpl tests whether the current parse
location contains a letter in the Word instance's set of valid initial
characters - if so success; if not, throws an exception. It then
advances through the input string, matching characters in the Word
instance's set of valid body characters. The entire matched string is
then returned, along with an updated string index at which to continue
parsing.

Thank you for the detailed description of pyparsing's implementation.

In my concept of "recursive descent" parsing, I was under the
impression that pyparsing's use of this data structure, and
*recursive* calls of parse() as it *descends* through the data
structure, constituted a recursive descent parser. What the OP
requested was a more regular expression-ish matcher, with backup (or
backtracking).

In my concept of "recursive descent" parsing, I was under the impression
that one should recurse through all rule combinations to ensure that the
grammar is fully applied. As I have mentioned before, merely having
recursion in your algorithm is insufficient. What you recurse through is
key. pyparsing recurses through rules but what's important is to recurse
through rule combinations. Otherwise, the parser won't be correct. That is
to say, there will be strings that are grammatically correct for which the
parser will fail to parse. Honestly, what is the point of a recursive
descent parser if it doesn't parse correct string? If you're willing to
have correct strings unparsed, you might as well go for LALR parsing, or
some such...
In my opinion, the rule set I mentioned in my original post:

grammar = OneOrMore(Word(alphas)) + Literal('end')

...should be translated (internally) to something like this:

word = Word(alphas)
grammar = Forward()
grammar << ((word + grammar) | (word + Literal(end)))

This allows a recursive search to find the string correct without any
need for "backtracking," if I understand what you mean by that. Couldn't
pyparsing have done something like this?

Your example did not include either of the alternation classes,
MatchFirst or Or. These classes implement a form of backtracking on
the stack, that if we descend into an expression on the list of
alternatives, and that expression fails, then MatchFirst or Or will
back up to the same starting location and try the next alternative.
(MatchFirst is an "eager" matcher, in that it will pick the first
matching expression in its list - Or is an "exhaustive" matcher, in
that it will evaluate all of the alternatives, and select the longest
match.)

I guess I was hoping that I could simply specify, mathematically, a
grammar and have the software Do The Right Thing(tm)...

The Wikipedia entry for "Recursive Descent Parser" describes this
parser model as a "predictive parser", and later goes on to say that
some (uncited) authors equate "predictive parser" with "recursive
descent parsers". The article makes a special distinction for a
"recursive parser with backup", which is what I believe the OP was
asking for. Yes, pyparsing is definitely *not* a "recursive descent
parser with backup." The solution, as you correctly posted, is to add
a negative lookahead. NotAny is also represented using the '~'
operator.

So I will take my stance with the uncited authors who lump predictive
parsers in with recursive descent parsers.

Well, the good thing about Wikipedia is that, if you don't like the
definition on the page, you're welcome to change it! Although, I'd
recommend discussing it on the talk page before you do, just to make sure
there isn't a good reason for the page to be as it is...
Thank you...

Kay Schluehr · Nov 3, 2007

It has recursion in it but that's not sufficient to call it a recursive
descent parser any more than having a recursive implementation of the
factorial function. The important part is what it recurses through...

Thank you for the detailed description of pyparsing's implementation.

In my concept of "recursive descent" parsing, I was under the impression
that one should recurse through all rule combinations to ensure that the
grammar is fully applied. As I have mentioned before, merely having
recursion in your algorithm is insufficient. What you recurse through is
key. pyparsing recurses through rules but what's important is to recurse
through rule combinations.

I think the confusing aspect of pyparsing for someone like me coming
from an (E)BNF and formal language background is that scanning and
parsing are merged into one. pyparsing is scannerless and where a
tokenizer such as Pythons performs a longest match tokenization but
interprets CFGs correctly ( in the bounds of the parsers power ) one
has to disambiguate grammars in pyparsing where you expect a CFG to be
established.

Note that I don't like pyparsings approach and it is clearly not for
everyone. I rather tend to make the experience that token definitions
can be reused across different languages. I count 65 token for Python
and I added 10 token for an experimental C++ parser and deactivated
some others. The distance between both languages on a lexical level is
not that huge.

What pyparsing has been established in the Python community is
something different IMO: readable and verbose pattern matching syntax
which is not regexp line noise. Around 8 years ago when I started
using Python I found a little module called reverb.py. It was actually
nothing but a front end for regular expressions. It didn't make any
career but I think it is the only one survived the anagrammatical
release hopping from 1.5.2 to 2.5.1 in my own case:

from reverb import*'(?:\\w+\\s)+end'

Paul McGuire · Nov 3, 2007

It has recursion in it but that's not sufficient to call it a recursive
descent parser any more than having a recursive implementation of the
factorial function. The important part is what it recurses through...

In my opinion, the rule set I mentioned in my original post:

grammar = OneOrMore(Word(alphas)) + Literal('end')

...should be translated (internally) to something like this:

word = Word(alphas)
grammar = Forward()
grammar << ((word + grammar) | (word + Literal(end)))

This allows a recursive search to find the string correct without any
need for "backtracking," if I understand what you mean by that. Couldn't
pyparsing have done something like this?

Dear JAVotAM -

This was a great post! (Although I'm not sure the comment in the
first paragraph was entirely fair - I know the difference between
recursive string parsing and recursive multiplication.)

You really got me thinking more about how this recursion actually
behaves, especially with respect to elements such as OneOrMore. Your
original question is really quite common, and up until now, my stock
answer has been to use negative lookahead. The expression you posted
is the first time I've seen this solution, and it *does* work.

I was all set to write a cocky response on why your expression
wouldn't work. I've seen it many times before, where people (usually
coming from EBNF experience) implement listOfXs = OneOrMore(x) as:

listOfXs = Forward()
listOfXs << ( x + listOfXs | x )

Actually, what they usually write is:

listOfXs << ( listOfXs + x )

but this sends pyparsing into a recursive tailspin.

So I fired up SciTE and copy/pasted your code into the editor and ran
it, and it worked just fine - this was a shock! I studied this for a
few minutes, and then realized what was happening. First of all, I
misread what you posted. You posted this:

grammar << ((word + grammar) | (word + Literal(end)))

which works. I *thought* you posted this:

grammar << ((word + grammar) | word) + Literal(end)

which doesn't work. In fact this behaves the same as your original
post, except it iterates over the input string's words recursively,
vs. repetition ins a for-loop, as is done by OneOrMore.

So going back to your working version, I had to see why it works.
Initially, the first term in the MatchFirst (the '|' operator creates
MatchFirst instances) is just the same, and by grammar referencing
itself, it just goes word by word through the input trying to find a
match. I'll try to visualize the steps:

level "First Second Third end"
1 word grammar
2 word grammar
3 word grammar
4 word grammar <- fails!
4 word end <- fails!
(therefore level 3 grammar fails!)
3 word end <-- success!!!

grammar has 2 options: to match a word followed by a grammar, or to
match a word followed by 'end'. At 4 levels deep into the Forward's
recursion, the first option fails, because after parsing "end" as the
initial word, there is no more text to try to match against grammar.
Level 4's Forward then also tries to match a word followed by 'end',
but this fails for the same reason. So at this point, the 4th level
Forward fails to match either of its options, so it throws its
exception back up to level 3, indicating that the first alternative,
word followed by grammar, failed. Level 3 then moves on to see if
word followed by the literal 'end' matches, and it does - success!

Here's where I am stuck now. In the original grammar that you posted,
which you want to render into this behavior, grammar is defined as:

grammar = OneOrMore(Word(alphas)) + Literal('end')

Unfortunately, when the OneOrMore gets constructed, it does not have
any visibility beyond knowing what is to be repeated. Again, here is
the data structure that is being built:

- And
- OneOrMore
- Word(alphas)
- Literal('end')

Only at the level of the And is there any awareness that the OneOrMore
is followed by anything, let alone by something which could be
misinterpreted as something matching the OneOrMore's repetition
expression.

Can you suggest a way I could generalize this, so that OneOrMore stops
matching before it gets to 'end'?

-- Paul

Neil Cerutti · Nov 3, 2007

Dear JAVotAM -

This was a great post! (Although I'm not sure the comment in the
first paragraph was entirely fair - I know the difference between
recursive string parsing and recursive multiplication.)

You really got me thinking more about how this recursion actually
behaves, especially with respect to elements such as OneOrMore. Your
original question is really quite common, and up until now, my stock
answer has been to use negative lookahead. The expression you posted
is the first time I've seen this solution, and it *does* work.

I was all set to write a cocky response on why your expression
wouldn't work. I've seen it many times before, where people (usually
coming from EBNF experience) implement listOfXs = OneOrMore(x) as:

listOfXs = Forward()
listOfXs << ( x + listOfXs | x )

Actually, what they usually write is:

listOfXs << ( listOfXs + x )

but this sends pyparsing into a recursive tailspin.

So I fired up SciTE and copy/pasted your code into the editor and ran
it, and it worked just fine - this was a shock! I studied this for a
few minutes, and then realized what was happening. First of all, I
misread what you posted. You posted this:

grammar << ((word + grammar) | (word + Literal(end)))

which works. I *thought* you posted this:

grammar << ((word + grammar) | word) + Literal(end)

which doesn't work. In fact this behaves the same as your original
post, except it iterates over the input string's words recursively,
vs. repetition ins a for-loop, as is done by OneOrMore.

So going back to your working version, I had to see why it works.
Initially, the first term in the MatchFirst (the '|' operator creates
MatchFirst instances) is just the same, and by grammar referencing
itself, it just goes word by word through the input trying to find a
match. I'll try to visualize the steps:

level "First Second Third end"
1 word grammar
2 word grammar
3 word grammar
4 word grammar <- fails!
4 word end <- fails!
(therefore level 3 grammar fails!)
3 word end <-- success!!!

grammar has 2 options: to match a word followed by a grammar, or to
match a word followed by 'end'. At 4 levels deep into the Forward's
recursion, the first option fails, because after parsing "end" as the
initial word, there is no more text to try to match against grammar.
Level 4's Forward then also tries to match a word followed by 'end',
but this fails for the same reason. So at this point, the 4th level
Forward fails to match either of its options, so it throws its
exception back up to level 3, indicating that the first alternative,
word followed by grammar, failed. Level 3 then moves on to see if
word followed by the literal 'end' matches, and it does - success!

Here's where I am stuck now. In the original grammar that you posted,
which you want to render into this behavior, grammar is defined as:

grammar = OneOrMore(Word(alphas)) + Literal('end')

Is there not an ambiguity in the grammar?

In EBNF:

goal --> WORD { WORD } END

WORD is '[a-zA-Z]+'
END is 'end'

I think it is fine that PyParsing can't guess what the composer
of that grammar meant.

Just Another Victim of the Ambient Morality · Nov 3, 2007

Paul McGuire said:
Dear JAVotAM -

This was a great post! (Although I'm not sure the comment in the
first paragraph was entirely fair - I know the difference between
recursive string parsing and recursive multiplication.)

I often use hyperbole to emphasize a point. I honestly mean no offense.
That comment wasn't even meant to be fair, I was hoping to be provocative...

You really got me thinking more about how this recursion actually
behaves, especially with respect to elements such as OneOrMore. Your
original question is really quite common, and up until now, my stock
answer has been to use negative lookahead. The expression you posted
is the first time I've seen this solution, and it *does* work.

I'm glad I got you thinking! I'd hate to be another newbie with another
of a thousand questions answered in the FAQ....
Hey, are you the author of pyparsing?

I was all set to write a cocky response on why your expression
wouldn't work. I've seen it many times before, where people (usually
coming from EBNF experience) implement listOfXs = OneOrMore(x) as:

listOfXs = Forward()
listOfXs << ( x + listOfXs | x )

Actually, what they usually write is:

listOfXs << ( listOfXs + x )

but this sends pyparsing into a recursive tailspin.

So I fired up SciTE and copy/pasted your code into the editor and ran
it, and it worked just fine - this was a shock! I studied this for a
few minutes, and then realized what was happening. First of all, I
misread what you posted. You posted this:

grammar << ((word + grammar) | (word + Literal(end)))

which works. I *thought* you posted this:

grammar << ((word + grammar) | word) + Literal(end)

which doesn't work. In fact this behaves the same as your original
post, except it iterates over the input string's words recursively,
vs. repetition ins a for-loop, as is done by OneOrMore.

I'm grateful that you actually tested my code before posting your cocky
response!

So going back to your working version, I had to see why it works.
Initially, the first term in the MatchFirst (the '|' operator creates
MatchFirst instances) is just the same, and by grammar referencing
itself, it just goes word by word through the input trying to find a
match. I'll try to visualize the steps:

level "First Second Third end"
1 word grammar
2 word grammar
3 word grammar
4 word grammar <- fails!
4 word end <- fails!
(therefore level 3 grammar fails!)
3 word end <-- success!!!

grammar has 2 options: to match a word followed by a grammar, or to
match a word followed by 'end'. At 4 levels deep into the Forward's
recursion, the first option fails, because after parsing "end" as the
initial word, there is no more text to try to match against grammar.
Level 4's Forward then also tries to match a word followed by 'end',
but this fails for the same reason. So at this point, the 4th level
Forward fails to match either of its options, so it throws its
exception back up to level 3, indicating that the first alternative,
word followed by grammar, failed. Level 3 then moves on to see if
word followed by the literal 'end' matches, and it does - success!

This is, literally, what it's doing. I'm not exactly a programming whiz
so I think of it a little more abstractly. In pyparsing's implementation,
it recurses through the first rule, OneOrMore, then iteratively moves to the
next rule, only to fail. So, obviously that rule must be part of the
recursion if we're not to use backtracking or lookaheads. If it's part of
the recursion, it has to be the terminal case. We know that the OneOrMore
rule is necessarily followed by the literal, so we can safely conclude that
the terminal case will necessarily be followed by the literal, hence the
production mentioned...

Here's where I am stuck now. In the original grammar that you posted,
which you want to render into this behavior, grammar is defined as:

grammar = OneOrMore(Word(alphas)) + Literal('end')

Unfortunately, when the OneOrMore gets constructed, it does not have
any visibility beyond knowing what is to be repeated. Again, here is
the data structure that is being built:

- And
- OneOrMore
- Word(alphas)
- Literal('end')

Only at the level of the And is there any awareness that the OneOrMore
is followed by anything, let alone by something which could be
misinterpreted as something matching the OneOrMore's repetition
expression.

Can you suggest a way I could generalize this, so that OneOrMore stops
matching before it gets to 'end'?

Again, I'm not an elite hacker so it's hard to give a good suggestion.
I might have implemented the parser by having its input be a string
specifying the grammar but, then again, I'd need a parser to prase that
string! Oh, the irony...
Probably, I would have constructed the rule tree differently. I like
how pyparsing uses the Python interpreter to parse the input string, the
Python code, itself, as pyparsing's input. That's worth preserving...
So, instead of organizing the tree by rule categories, as pyparsing
does:

-And
-OneOrMore
-Word(alphas)
-Literal('end')

...I would have organized it by what the grammar actually expects. It's
hard for me to describe what I'm thinking as a simple hierarchy. I could
use a directed graph but that's hard to type, so I'll just use "labels":

start:
call OneOrMore:

OneOrMore:
Or(
And(
Word(alphas),
call OneOrMore

,
And(
Word(alphas),
Literal('end')))

...hopefully, that indentation makes some sort of sense. I would
imagine that it would be constructed like this:

# This translates to temp << ((word + temp) | word)
temp = OneOrMore(word)

# Instead of wrapping this in an And structure,
# tell temp to And this with its terminal case...
grammar = temp + literal

In my parser, everything would be recursive, hence the term "recursive
descent" parser. So, a list of Ands like:

grammar = Literal('a') + Literal('b') + Literal('c')

...would look like this:

start:
call And_a:

And_a:
Literal('a') + call And_b:

And_b:
Literal('b') + call And_c:

And_c:
Literal('c')

...notice the utter lack of iteration.
Now, this example isn't, strictly speaking, recursive, but it descends
through a set of function calls that are, potentionally, recursive...
Now, obviously things like Or will have iterate through its cases but
that's unavoidable and should be okay. The other constructions should need
some thought, too. However, hopefully, this technique will render some
constructions unnecessary...
So, in my parser, the grammar is represented by a directed graph (the
temporary variable flowing through operations like a + b + c + d) and all
operations modify this graph. Obviously, some research needs to be done on
how feasible this implementation is, especially with the rich set of
operations available in pyparsing, but I'm hopeful...
Hopefully this all makes sense. Thanks for reading...

Just Another Victim of the Ambient Morality · Nov 4, 2007

Neil Cerutti said:
Dear JAVotAM -

This was a great post! (Although I'm not sure the comment in the
first paragraph was entirely fair - I know the difference between
recursive string parsing and recursive multiplication.)

You really got me thinking more about how this recursion actually
behaves, especially with respect to elements such as OneOrMore. Your
original question is really quite common, and up until now, my stock
answer has been to use negative lookahead. The expression you posted
is the first time I've seen this solution, and it *does* work.

Click to expand...

Is there not an ambiguity in the grammar?

In EBNF:

goal --> WORD { WORD } END

WORD is '[a-zA-Z]+'
END is 'end'

I think it is fine that PyParsing can't guess what the composer
of that grammar meant.

First, I don't know if that constitutes an ambiguity in the grammar.
'end' is a word but it's unambiguous that this grammar must end in a literal
'end'. You could interpret the input as just a sequence of words or you
could interpret it as a sequence of words terminated by the word 'end'. One
interpretation conforms to the grammar while the other doesn't. You would
assume that the interpretation that agrees with the grammar would be the
preferable choice and so should the program...
Secondly, even if it is an ambiguity... so what? pyparsing's current
behaviour is to return a parse error, pretending that the string can't be
parsed. Ideally, perhaps it should alert you to the ambiguity but, surely,
it's better to return _a_ valid parsing than to pretend that the string
can't be parsed at all...

Neil Cerutti · Nov 4, 2007

Neil Cerutti said:
Neil Cerutti said:

On Nov 3, 12:33 am, "Just Another Victim of the Ambient Morality"
It has recursion in it but that's not sufficient to call it a
recursive
descent parser any more than having a recursive implementation of the
factorial function. The important part is what it recurses through...

<snip>

In my opinion, the rule set I mentioned in my original post:

grammar = OneOrMore(Word(alphas)) + Literal('end')

...should be translated (internally) to something like this:

word = Word(alphas)
grammar = Forward()
grammar << ((word + grammar) | (word + Literal(end)))

This allows a recursive search to find the string correct without
any
need for "backtracking," if I understand what you mean by that.
Couldn't
pyparsing have done something like this?

Dear JAVotAM -

This was a great post! (Although I'm not sure the comment in the
first paragraph was entirely fair - I know the difference between
recursive string parsing and recursive multiplication.)

You really got me thinking more about how this recursion actually
behaves, especially with respect to elements such as OneOrMore. Your
original question is really quite common, and up until now, my stock
answer has been to use negative lookahead. The expression you posted
is the first time I've seen this solution, and it *does* work.

Click to expand...

Is there not an ambiguity in the grammar?

In EBNF:

goal --> WORD { WORD } END

WORD is '[a-zA-Z]+'
END is 'end'

I think it is fine that PyParsing can't guess what the composer
of that grammar meant.

Click to expand...

First, I don't know if that constitutes an ambiguity in the
grammar. 'end' is a word but it's unambiguous that this grammar
must end in a literal 'end'. You could interpret the input as
just a sequence of words or you could interpret it as a
sequence of words terminated by the word 'end'.

Yeah. If it were a regex, it would be: '[ab]+b'. That is a fine
regex, because a regex is generally just a recognizer; the
ambiguity doesn't have to do with recognizing the language. But
PyParsing is parser. The ambiguity is in deciding what's a
Word(alphas) and what's an 'end'.

One interpretation conforms to the grammar while the other
doesn't. You would assume that the interpretation that agrees
with the grammar would be the preferable choice and so should
the program. Secondly, even if it is an ambiguity... so what?
pyparsing's current behaviour is to return a parse error,
pretending that the string can't be parsed. Ideally, perhaps
it should alert you to the ambiguity but, surely, it's better
to return _a_ valid parsing than to pretend that the string
can't be parsed at all...

I wouldn't characterize it as pretending. How would you parse:

hello end hello end

"WORD END WORD END" and "WORD WORD WORD END" are both valid
interpretations, according to the grammar.

As soon as you remove the ambiguity from the grammar, PyParsing
starts to work correctly.

Consider writing a recursive decent parser by hand to parse the
language '[ab]+b'.

goal --> ab_list 'b'
ab_list --> 'a' list_tail
ab_list --> 'b' list_tail
list_tail --> 'a' list_tail
list_tail --> 'b' list_tail
list_tail --> null

The above has the exact same bug (and probably some others--I'm
sorry unable to test it just now) as the PyParsing solution.

The error is in the grammar. It might be fixed by specifying that
'b' must be followed by EOF, and then it could be coded by using
more than one character of lookahead.

class ParseError(Exception):
pass

class Parser:
def __init__(self, s):
self.s = s
self.i = 0
self.c = self.s[self.i]

def match(self, c):
if self.c != c:
raise ParseError('expected %s got %s' % (c, self.c))
self.i += 1
if self.i == len(self.s):
raise ParseError('unexpected EOF')
self.c = self.s[self.i]

def goal(self):
self.ab_list()
self.match('b')

def ab_list(self):
if self.c in 'ab':
self.match(self.c)
self.list_tail()
else:
raise ParseError('expected list of a or b, got %s' % self.c)

def list_tail(self):
if self.c in 'ab':
self.match(self.c)
self.list_tail()
else:
raise ParseError('expected a list of a or b, got %s' % self.c)
Traceback (most recent call last):
File "py.py", line 37, in ?
Parser('aaab').goal()
File "py.py", line 20, in goal
self.ab_list()
File "py.py", line 26, in ab_list
self.list_tail()
File "py.py", line 33, in list_tail
self.list_tail()
File "py.py", line 33, in list_tail
self.list_tail()
File "py.py", line 32, in list_tail
self.match(self.c)
File "py.py", line 16, in match
raise ParseError('unexpected EOF')
__main__.ParseError: unexpected EOF

It's not a coincidence that is has the same bug as the PyParsing
solution. You can remove the ambiguity in several ways, perhaps
by specifying where EOF should be (you seem to be assuming this
in your interpretation of the grammar, but PyParsing does not).

goal --> ab_list 'b' EOF
ab_list --> 'a' list_tail
ab_list --> 'b' list_tail
list_tail --> 'a' list_tail
list_tail --> 'b' list_tail
list_tail --> null

I need to change goal, match and list_tail for this new grammar,
and add an EOF object and a peek method.

....
EOF = object()
....
def match(self, c):
if self.c != c:
raise ParseError('expected %s got %s' % (c, self.c))
if self.i >= len(self.s)-1:
self.c = EOF
self.i = len(self.s)
else:
self.i += 1
self.c = self.s[self.i]

def peek(self, lookahead):
if self.i+lookahead >= len(self.s):
return EOF
else:
return self.s[self.i + lookahead]
...
def list_tail(self):
if self.c == 'a':
self.match('a')
self.list_tail()
elif self.c == 'b':
if self.peek(1) != EOF:
self.match('b')
self.list_tail()
else:
raise ParseError('expected list of a or b, got %s' % self.c)

The grammar now has the problem that it's LL(2) rather than
LL(1). PyParsing can handle that type of grammar with negative
lookahead assertions, I think.

Kay Schluehr · Nov 4, 2007

Neil Cerutti said:
Neil Cerutti said:

On Nov 3, 12:33 am, "Just Another Victim of the Ambient Morality"
It has recursion in it but that's not sufficient to call it a
recursive
descent parser any more than having a recursive implementation of the
factorial function. The important part is what it recurses through...
<snip>
In my opinion, the rule set I mentioned in my original post:
grammar = OneOrMore(Word(alphas)) + Literal('end')
...should be translated (internally) to something like this:
word = Word(alphas)
grammar = Forward()
grammar << ((word + grammar) | (word + Literal(end)))
This allows a recursive search to find the string correct without
any
need for "backtracking," if I understand what you mean by that.
Couldn't
pyparsing have done something like this?
Dear JAVotAM -
This was a great post! (Although I'm not sure the comment in the
first paragraph was entirely fair - I know the difference between
recursive string parsing and recursive multiplication.)
You really got me thinking more about how this recursion actually
behaves, especially with respect to elements such as OneOrMore. Your
original question is really quite common, and up until now, my stock
answer has been to use negative lookahead. The expression you posted
is the first time I've seen this solution, and it *does* work.
Is there not an ambiguity in the grammar?
In EBNF:
goal --> WORD { WORD } END
WORD is '[a-zA-Z]+'
END is 'end'
I think it is fine that PyParsing can't guess what the composer
of that grammar meant.

Click to expand...

Click to expand...

First, I don't know if that constitutes an ambiguity in the
grammar. 'end' is a word but it's unambiguous that this grammar
must end in a literal 'end'. You could interpret the input as
just a sequence of words or you could interpret it as a
sequence of words terminated by the word 'end'.

Click to expand...

Yeah. If it were a regex, it would be: '[ab]+b'. That is a fine
regex, because a regex is generally just a recognizer; the
ambiguity doesn't have to do with recognizing the language. But
PyParsing is parser. The ambiguity is in deciding what's a
Word(alphas) and what's an 'end'.

One interpretation conforms to the grammar while the other
doesn't. You would assume that the interpretation that agrees
with the grammar would be the preferable choice and so should
the program. Secondly, even if it is an ambiguity... so what?
pyparsing's current behaviour is to return a parse error,
pretending that the string can't be parsed. Ideally, perhaps
it should alert you to the ambiguity but, surely, it's better
to return _a_ valid parsing than to pretend that the string
can't be parsed at all...

Click to expand...

I wouldn't characterize it as pretending. How would you parse:

hello end hello end

"WORD END WORD END" and "WORD WORD WORD END" are both valid
interpretations, according to the grammar.

As soon as you remove the ambiguity from the grammar, PyParsing
starts to work correctly.

I think you are correct about this. But I'm not sure how much it shall
matter. Just take a look at Pythons Grammar

http://svn.python.org/view/python/trunk/Grammar/Grammar?rev=55446&view=markup

Without special keyword treatment this grammar would be ambigous and
couldn't be parsed using an LL(1) parser. The "grammar compiler" which
builds the parser tables creates a special label for each keyword.
This label is filtered when a NAME token is feeded into the parser.
With the label that belongs to e.g. 'if' or 'while' the correct
statement can be selected in constant time. Same happens when I use
the parser generator with your EBNF grammar. With a little more
adaption also NUMBER token could be filtered. But this would be
overdesign.

Theoretical beauty is compromised here using reasonable default
assumptions for keeping the grammar simple ( "convention over
configuration" to borrow a Rails slogan ).

Tokenization is another issue in Python. It is indeed somewhat special
due to significant whitespace and line continuation but tokenization
is conceptually much simpler and one would likely throw all kinds of
options and special case handling in the lexical analysis phase.

Neil Cerutti · Nov 4, 2007

I think you are correct about this. But I'm not sure how much
it shall matter. Just take a look at Pythons Grammar

http://svn.python.org/view/python/trunk/Grammar/Grammar?rev=55446&view=markup

Without special keyword treatment this grammar would be
ambigous and couldn't be parsed using an LL(1) parser.

I agree. I don't know how easy it is to create keywords using
PyParsing, or if the grammar in question would still be
considered correct by the author.

The "grammar compiler" which builds the parser tables creates a
special label for each keyword. This label is filtered when a
NAME token is feeded into the parser. With the label that
belongs to e.g. 'if' or 'while' the correct statement can be
selected in constant time. Same happens when I use the parser
generator with your EBNF grammar. With a little more adaption
also NUMBER token could be filtered. But this would be
overdesign.

Theoretical beauty is compromised here using reasonable default
assumptions for keeping the grammar simple ( "convention over
configuration" to borrow a Rails slogan ).

Keywords are practically ubiquitous. I never thought of them as
unbeautiful before.

Tokenization is another issue in Python. It is indeed somewhat
special due to significant whitespace and line continuation but
tokenization is conceptually much simpler and one would likely
throw all kinds of options and special case handling in the
lexical analysis phase.

It might be a quick fix in PyParsing, which includes a Keyword
type, but without the semantics that are needed in this case. You
have to use (as suggested earlier) negative lookahead in either a
Regex or with the NotAny type.

goal = OneOrMore(NotAny(Literal('end')) + Word(alphas)) + Literal('end')
goal.parseString('hello hello hello end') (['hello', 'hello', 'hello', 'end'], {})
goal.parseString('hello end hello end')

Click to expand...

Click to expand...

(['hello', 'end'], {})

No scanner/tokenizer needed!

Just Another Victim of the Ambient Morality · Nov 4, 2007

Neil Cerutti said:
Neil Cerutti said:

Is there not an ambiguity in the grammar?

In EBNF:

goal --> WORD { WORD } END

WORD is '[a-zA-Z]+'
END is 'end'

I think it is fine that PyParsing can't guess what the composer
of that grammar meant.

Click to expand...

One interpretation conforms to the grammar while the other
doesn't. You would assume that the interpretation that agrees
with the grammar would be the preferable choice and so should
the program. Secondly, even if it is an ambiguity... so what?
pyparsing's current behaviour is to return a parse error,
pretending that the string can't be parsed. Ideally, perhaps
it should alert you to the ambiguity but, surely, it's better
to return _a_ valid parsing than to pretend that the string
can't be parsed at all...

Click to expand...

I wouldn't characterize it as pretending. How would you parse:

hello end hello end

"WORD END WORD END" and "WORD WORD WORD END" are both valid
interpretations, according to the grammar.

...and it would be nice if the parser were to parse one of them since
they are both right. Having more than one right answer is not the same as
having no answer, which is what pyparsing claims...

As soon as you remove the ambiguity from the grammar, PyParsing
starts to work correctly.

This is simply not true. Try this:

grammar = OneOrMore(Word(alphas)) + Literal('end') + Literal('.')
grammar.parseString('First Second Third end.')

...again, this will fail to parse. Where's the ambiguity?
Besides, parsing ambiguous grammars is a useful feature. Not all
grammars being parsed are designed by those doing the parsing...

Consider writing a recursive decent parser by hand to parse the
language '[ab]+b'.

goal --> ab_list 'b'
ab_list --> 'a' list_tail
ab_list --> 'b' list_tail
list_tail --> 'a' list_tail
list_tail --> 'b' list_tail
list_tail --> null

The above has the exact same bug (and probably some others--I'm
sorry unable to test it just now) as the PyParsing solution.

The error is in the grammar. It might be fixed by specifying that
'b' must be followed by EOF, and then it could be coded by using
more than one character of lookahead.

I don't exactly understand the syntax you used to describe the
productions of your recursive descent parser so not only did I not follow it
but I couldn't make out the rest of your post. Could you explain in a
little more detail? The last part that points to 'null' is especially
confusing...
As demonstrated earlier, it's not just the grammar. There are
situations that are unambiguous that pyparsing can't parse simply and
there's no reason for it.
Besides, ambiguous grammars are a fact of life and some of us need to
parse them. It's usually okay, too. Consider a previous example:

grammar = OneOrMore(Word(alphas)) + Literal('end')

While you may consider this inherently ambiguous, it's usually not.
That is to say, as long as it is rare that 'end' is used not at the end of
the string, this will simply parse and, yet, pyparsing will consistently
fail to parse it...

Neil Cerutti · Nov 4, 2007

Consider writing a recursive decent parser by hand to parse
the language '[ab]+b'.

goal --> ab_list 'b'
ab_list --> 'a' list_tail
ab_list --> 'b' list_tail
list_tail --> 'a' list_tail
list_tail --> 'b' list_tail
list_tail --> null

The above has the exact same bug (and probably some others--I'm
sorry unable to test it just now) as the PyParsing solution.

The error is in the grammar. It might be fixed by specifying that
'b' must be followed by EOF, and then it could be coded by using
more than one character of lookahead.

Click to expand...

I don't exactly understand the syntax you used to describe the
productions of your recursive descent parser so not only did I not follow it
but I couldn't make out the rest of your post. Could you explain in a
little more detail? The last part that points to 'null' is especially
confusing...

It's the BNF spelling of

goal --> ab_list 'b'
ab_list --> ab { ab }
ab --> 'a' | 'b'

The null is to say that list_tail can match nothing, i.e, an
empty string.

Then, in the Parser class, every method (except for match, which
is used as a central place to consume characters) corresponds to
one of the productions in the BNF. Breaking things down into
BNF-based productions often makes implementation, debugging and
code generation easier.

PyParsing saves me that stop, since I can often directly
implement the EBNF using PyParsing.

As demonstrated earlier, it's not just the grammar. There are
situations that are unambiguous that pyparsing can't parse
simply and there's no reason for it.

Yes, many parser generators have many more limitations than just
the requirement of an unambiguous grammar.

Besides, ambiguous grammars are a fact of life and some of us
need to parse them. It's usually okay, too. Consider a
previous example:

grammar = OneOrMore(Word(alphas)) + Literal('end')

While you may consider this inherently ambiguous, it's usually
not. That is to say, as long as it is rare that 'end' is used
not at the end of the string, this will simply parse and, yet,
pyparsing will consistently fail to parse it...

I believe there's no cure for the confusion you're having except
for implementing a parser for your proposed grammar.
Alternatively, try implementing your grammar in one of your other
favorite parser generators.

Just Another Victim of the Ambient Morality · Nov 4, 2007

Neil Cerutti said:
Consider writing a recursive decent parser by hand to parse
the language '[ab]+b'.

goal --> ab_list 'b'
ab_list --> 'a' list_tail
ab_list --> 'b' list_tail
list_tail --> 'a' list_tail
list_tail --> 'b' list_tail
list_tail --> null

The above has the exact same bug (and probably some others--I'm
sorry unable to test it just now) as the PyParsing solution.

The error is in the grammar. It might be fixed by specifying that
'b' must be followed by EOF, and then it could be coded by using
more than one character of lookahead.

Click to expand...

I don't exactly understand the syntax you used to describe the
productions of your recursive descent parser so not only did I not follow
it
but I couldn't make out the rest of your post. Could you explain in a
little more detail? The last part that points to 'null' is especially
confusing...

Click to expand...

It's the BNF spelling of

goal --> ab_list 'b'
ab_list --> ab { ab }
ab --> 'a' | 'b'

The null is to say that list_tail can match nothing, i.e, an
empty string.

Then, in the Parser class, every method (except for match, which
is used as a central place to consume characters) corresponds to
one of the productions in the BNF. Breaking things down into
BNF-based productions often makes implementation, debugging and
code generation easier.

PyParsing saves me that stop, since I can often directly
implement the EBNF using PyParsing.

Okay, I see that now, thank you.
Your statement from the previous post:

Consider writing a recursive decent parser by hand to parse
the language '[ab]+b'.

goal --> ab_list 'b'
ab_list --> 'a' list_tail
ab_list --> 'b' list_tail
list_tail --> 'a' list_tail
list_tail --> 'b' list_tail
list_tail --> null

The above has the exact same bug (and probably some others--I'm
sorry unable to test it just now) as the PyParsing solution.

Click to expand...

...merely demonstrates that this grammar is similarly ambiguous. There
are many ways to parse this correctly and pyparsing chooses none of these!
Instead, it returns the same error it does when the string has no
solutions...

Yes, many parser generators have many more limitations than just
the requirement of an unambiguous grammar.

Yes, but a recursive descent parser? I expect such things from LALR and
others, but not only do I expect a recursive descent parser to correctly
parse grammars but I expect it to even parse ambiguous ones, in that it is
the only technique prepared to find more than one solution...

I believe there's no cure for the confusion you're having except
for implementing a parser for your proposed grammar.
Alternatively, try implementing your grammar in one of your other
favorite parser generators.

I believe there is a cure and it's called recursive descent parsing.
It's slow, obviously, but it's correct and, sometimes (arguably, often),
that's more important the execution speed.

I spent this morning whipping up a proof of concept parser whose
interface greatly resembles pyparsing but, baring unknown bugs, works and
works as I'd expect a recursive descent parser to work. I don't know Python
very well so the parser is pretty simple. It only lexes single characters
as tokens. It only supports And, Or, Optional, OneOrMore and ZeroOrMore
rules but I already think this is a rich set of rules. I'm sure others can
be added. Finally, I'm not sure it's safely copying all its parameter input
the same way pyparsing does but surely those bugs can be worked out. It's
merely a proof of concept to demonstrate a point.
Everyone, please look it over and tell me what you think.
Unfortunately, my news client is kind of poor, so I can't simply cut and
paste the code into here. All the tabs get turned into single spacing, so I
will post this link, instead:

http://theorem.ca/~dlkong/new_pyparsing.zip

I hope you can all deal with .zip files. Let me know if this is a
problem.
Thank you...

Neil Cerutti · Nov 4, 2007

I believe there is a cure and it's called recursive descent
parsing. It's slow, obviously, but it's correct and, sometimes
(arguably, often), that's more important the execution speed.

I spent this morning whipping up a proof of concept parser
whose interface greatly resembles pyparsing but, baring unknown
bugs, works and works as I'd expect a recursive descent parser
to work. I don't know Python very well so the parser is pretty
simple. It only lexes single characters as tokens. It only
supports And, Or, Optional, OneOrMore and ZeroOrMore rules but
I already think this is a rich set of rules. I'm sure others
can be added. Finally, I'm not sure it's safely copying all
its parameter input the same way pyparsing does but surely
those bugs can be worked out. It's merely a proof of concept
to demonstrate a point.
Everyone, please look it over and tell me what you think.
Unfortunately, my news client is kind of poor, so I can't
simply cut and paste the code into here. All the tabs get
turned into single spacing, so I will post this link, instead:

http://theorem.ca/~dlkong/new_pyparsing.zip

Your program doesn't necessarily address the ambiguity in the
grammar in question, since right now it is only a recognizer.
Will it be hard to get it to return a parse tree?

The grammar in your implementation is:
True

So far so good.

Just Another Victim of the Ambient Morality · Nov 5, 2007

Neil Cerutti said:
Your program doesn't necessarily address the ambiguity in the
grammar in question, since right now it is only a recognizer.
Will it be hard to get it to return a parse tree?

Hey, it's only a proof of concept. If you can parse the tree, surely
you can record what you parsed, right?
Did you notice that the parse() functions have the rather serious bug of
not returning how much of the string they could parse? It just so happens
that the contstructions that I made only ever had to increment the matches
by one, so they just happen to work. That's an easy bug to fix but a pretty
major one to have overlooked. Hence, my enthusiasm for input...

The grammar in your implementation is:

True

So far so good.

Good! Keep hammering at it!
More importantly, study it to understand the idea I'm trying to convey.
This is what I thought a recursive descent parser would do...

Kay Schluehr · Nov 5, 2007

On Nov 4, 10:44 pm, "Just Another Victim of the Ambient Morality"

I believe there is a cure and it's called recursive descent parsing.
It's slow, obviously, but it's correct and, sometimes (arguably, often),
that's more important the execution speed.

Recursive decendent parsing is not necessarily slow but from your
remarks above I infer you want a general RD parser with backtracking:
when one rule doesn't match, try another one to derive the current
symbol in the input stream.

I'm not sure one needs to start again with a naive approach just to
avoid any parser theory. For a user of a parser it is quite important
whether she has to wait 50 seconds for a parse to run or 50
milliseconds. I don't like to compromise speed for implementation
simplicity here.

Just Another Victim of the Ambient Morality · Nov 5, 2007

Kay Schluehr said:
On Nov 4, 10:44 pm, "Just Another Victim of the Ambient Morality"

Recursive decendent parsing is not necessarily slow but from your
remarks above I infer you want a general RD parser with backtracking:
when one rule doesn't match, try another one to derive the current
symbol in the input stream.

I think I've just discovered a major hurdle in my understand of the
problem.
You keep saying "with backtracking." Why? Isn't "backtracking"
inherent in recursion? So, why can't these alleged "recursive descent
parsers" find valid parsings? How are they not already backtracking? What
was the point of being recursive if not to take advantage of the inherent
backtracking in it?
Obviously, these parsers aren't recursing through what I think they
should be recursing. The question is "why not?"

Correct me if I'm wrong but I'm beginning to think that pyparsing
doesn't typically use recursion, at all. It only employs it if you create
one, using the Forward class. Otherwise, it does everything iteratively,
hence the lack of "backtracking."

I'm not sure one needs to start again with a naive approach just to
avoid any parser theory. For a user of a parser it is quite important
whether she has to wait 50 seconds for a parse to run or 50
milliseconds. I don't like to compromise speed for implementation
simplicity here.

This attitude is all too prevalent among computer professionals... Of
course it's a useful thing to shield users from the intricacies of parser
theory! Just as much as it is useful to shield drivers from needing
automotive engineering or software users from programing. How many people
have come to this newsgroup asking about anomalous pyparsing behaviour,
despite their grammars being mathematically correct.
Think of it this way. You can force all the clients of pyparsing to
duplicate work on figuring out how to massage pyparsing to their grammars,
or you can do the work of getting pyparsing to solve people's problems,
once. That's what a library is supposed to do...
Finally, I can't believe you complain about potential speed problems.
First, depending on the size of the string, it's likely to be the difference
between 2ms and 200ms. Secondly, if speed were an issue, you wouldn't go
with a recursive descent parser. You'd go with LALR or the many other
parsing techniques available. Recursive descent parsing is for those
situations where you need correctness, regardless of execution time. These
situations happen...
I've said this before, albeit for a different language, but it applies
to Python just as well. I don't use Python to write fast code, I use it to
write code fast.
If _you_ "don't like to compromise speed for implementation simplicity"
then you have a plethora choices available to you. What about the guy who
needs to parse correctly and is unconcerned about speed?

Neil Cerutti · Nov 5, 2007

Hey, it's only a proof of concept. If you can parse the tree, surely
you can record what you parsed, right?
Did you notice that the parse() functions have the rather serious bug of
not returning how much of the string they could parse?

Unfortunately I haven't had much time to play with it today; just
barely enough to put it through a very few paces.

It just so happens that the contstructions that I made only
ever had to increment the matches by one, so they just happen
to work. That's an easy bug to fix but a pretty major one to
have overlooked. Hence, my enthusiasm for input...

Good! Keep hammering at it!
More importantly, study it to understand the idea I'm
trying to convey. This is what I thought a recursive descent
parser would do...

Kay has pointed out how it works. Strangely enough, I've never
studied a backtracking RDP before (trying to teach yourself a
subject like parsing can be tricky--I've had to somehow avoid all
the texts that overuse Greek letters--those incomprehensible
symbols confuse the hell out of me). It does simplify the job of
the grammar designer, but Kay's message makes it sound like it
won't scale very well.

It might, perhaps, be an interesting feature for PyParsing to
entertain by setting a 'backtracking' option, for when you're
writing a quick script and don't want to fuss too much with a
non-conformant grammar.

I'll have more time to look at it tomorrow.

ANN: pyparsing 1.5.6 released!	1	Jul 1, 2011
Ann: Pyparsing 1.5.0 released	0	Jun 1, 2008
[ANN] pyparsing 1.5.3 released	0	Jun 25, 2010
ANN: pyparsing 1.5.2 released!	0	Apr 20, 2009
ANN: pyparsing 1.5.1 released	4	Oct 18, 2008
ANN: pyparsing 1.4.8 released	0	Oct 7, 2007
ANN: pyparsing-1.4.6 released	0	Apr 12, 2007
ANN: pyparsing 1.4.11 released	2	Feb 11, 2008

Is pyparsing really a recursive descent parser?

Just Another Victim of the Ambient Morality

Marc 'BlackJack' Rintsch

Paul McGuire

Just Another Victim of the Ambient Morality

Kay Schluehr

Paul McGuire

Neil Cerutti

Just Another Victim of the Ambient Morality

Just Another Victim of the Ambient Morality

Neil Cerutti

Kay Schluehr

Neil Cerutti

Just Another Victim of the Ambient Morality

Neil Cerutti

Just Another Victim of the Ambient Morality

Neil Cerutti

Just Another Victim of the Ambient Morality

Kay Schluehr

Just Another Victim of the Ambient Morality

Neil Cerutti

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads