N
Neil Cerutti
I think I've just discovered a major hurdle in my understand of the
problem.
You keep saying "with backtracking." Why? Isn't "backtracking"
inherent in recursion? So, why can't these alleged "recursive descent
parsers" find valid parsings? How are they not already backtracking? What
was the point of being recursive if not to take advantage of the inherent
backtracking in it?
Obviously, these parsers aren't recursing through what I think they
should be recursing. The question is "why not?"
There are different kinds of recursion. Compare:
def fac1(x, y=1):
""" Compute factorials with a recursive function (it calls
itself), but the stack is not actually used for storing
anything important, i.e., it is tail-recursive. """
if x < 0:
raise ValueError('non-negative integer')
elif x == 0:
return y
else:
return fac1(x-1, y*x)
to
def fac2(x):
""" Computes factorials with a recursive process, keeping
the state of the calculation on the stack. """
if x < 0:
raise ValueError('non-negative integer')
if x == 0:
return 1
else:
return fac2(x-1) * x
to
def Ack(x, y):
""" The Ackermann function. Creates a humongous mess even
with quite tiny numbers. """
if x < 0 or y < 0:
raise ValueError('non-negative integer')
elif x == 0:
return y + 1
elif y == 0:
return foo3(x-1, 1)
else:
return foo3(x-1, foo3(x, y-1))
There's probably a word for the type of recursive process built
by fac2; the RDP's I'm most familiar with create a fac2 sort of
process, which stores valuable info on the stack.
And even though fac1 defines an iterative process, the code
itself is recursive, and you can call it a recursive function if
you wish (and in Python you might as well).
Correct me if I'm wrong but I'm beginning to think that
pyparsing doesn't typically use recursion, at all. It only
employs it if you create one, using the Forward class.
Otherwise, it does everything iteratively, hence the lack of
"backtracking."
It's recursive because each production rule calls other
production rules to define itself. A rule regularly ends up
calling itself. Consider the Parser class I built earlier.
list_tail keeps calling itself to continue consuming characters
in an ab_list. The stack is used to keep track of where we are in
the grammar; at any time you can look up the stack and see how
you got where you are--you 'descend' down from the topmost
productions to the most primitive productions, and then back up
once everything has been sorted out. Take another look
at the exception raised in my Parsing class example for an
illustrative traceback.
Finally, I can't believe you complain about potential speed
problems. First, depending on the size of the string, it's
likely to be the difference between 2ms and 200ms. Secondly,
if speed were an issue, you wouldn't go with a recursive
descent parser. You'd go with LALR or the many other parsing
techniques available. Recursive descent parsing is for those
situations where you need correctness, regardless of execution
time. These situations happen...
RDP is plenty fast; speed has never been one of it's
disadvantages, as far as I know. Today there are many
excellent parser generators and compiler builders that compose an
RDP under the hood, e.g., Antlr and Gentle.
I've said this before, albeit for a different language, but
it applies to Python just as well. I don't use Python to write
fast code, I use it to write code fast.
If _you_ "don't like to compromise speed for implementation
simplicity" then you have a plethora choices available to you.
What about the guy who needs to parse correctly and is
unconcerned about speed?
You have to be concerned about speed when something runs so
slowly in common circumstances compared to other well-known
algotithms that you can't practically wait for an answer. Would
you consider bubble-sort a suitable general-purpose sorting
algorithm for Python?