R
robert.muller2
I am trying to implement a lexer and parser for a subset of python
using lexer and parser generators. (It doesn't matter, but I happen to
be using
ocamllex and ocamlyacc). I've run into the following annoying problem
and hoping someone can tell me what I'm missing. Lexers generated by
such tools return a tokens in a stream as they consume the input text.
But python's indentation appears to require interruption of that
stream. For example, in:
def f(x):
statement1;
statement2;
statement3;
statement4;
A
Between the '\n' at the end of statement4 and the A, a lexer for
Python should return 2 DEDENT tokens. But there is no way to interject
two DEDENT tokens within the token stream between the tokens for
NEWLINE and A. The generated lexer doesn't have anyway to freeze the
input text pointer.
Does this mean that python lexers are all written by hand? If not, how
do you do it using your favorite lexer generator?
Thanks!
Bob Muller
using lexer and parser generators. (It doesn't matter, but I happen to
be using
ocamllex and ocamlyacc). I've run into the following annoying problem
and hoping someone can tell me what I'm missing. Lexers generated by
such tools return a tokens in a stream as they consume the input text.
But python's indentation appears to require interruption of that
stream. For example, in:
def f(x):
statement1;
statement2;
statement3;
statement4;
A
Between the '\n' at the end of statement4 and the A, a lexer for
Python should return 2 DEDENT tokens. But there is no way to interject
two DEDENT tokens within the token stream between the tokens for
NEWLINE and A. The generated lexer doesn't have anyway to freeze the
input text pointer.
Does this mean that python lexers are all written by hand? If not, how
do you do it using your favorite lexer generator?
Thanks!
Bob Muller