Python parser that records source ranges

Jonathan Edwards · Sep 29, 2003

The parser library module only records source line numbers for tokens. I
need a parser that records ranges of line and character locations for
each AST node, so I can map back to the source. Does anyone know of such
a thing? Thanks

Jonathan

Jeff Epler · Sep 29, 2003

The tokenize module will give column information for each token, but
it produces a stream of tokens only, not an AST.

Jeff

logistix at cathoderaymission.net · Sep 29, 2003

Jonathan Edwards said:
The parser library module only records source line numbers for tokens. I
need a parser that records ranges of line and character locations for
each AST node, so I can map back to the source. Does anyone know of such
a thing? Thanks

Jonathan

You know there's not going to be a one-to-one relationship, right?
Most ast nodes are symbols and aren't going to match to any tokens.
Python asts also use a lot of intermediate nodes to enforce operator
precidence.

Anyway, I have some rather specialized code in PyXR that syncs tokens
to an ast. You probably won't be able to use it out of the box but it
should give you a good start:

http://www.cathoderaymission.net/~logistix/PyXR/

The source file of particular interest to you would be astToHtml.py:

http://tinyurl.com/p3cn

Jonathan Edwards · Oct 1, 2003

So the basic idea is to match up the leaves of the AST with the list of
tokens from tokenizer, which do contain location info. I had thought of
that, but was hoping there was a more informative parser out there.
Thanks.

Jonathan

logistix at cathoderaymission.net · Oct 1, 2003

Jonathan Edwards said:
So the basic idea is to match up the leaves of the AST with the list of
tokens from tokenizer, which do contain location info. I had thought of
that, but was hoping there was a more informative parser out there.
Thanks.

Jonathan

Its really not that bad. The more I think about it, the code
reference I sent you is way overcomplicated. General pseudocode for
walking asts generated via parser.ast2tuple(parser.suite(code)) is:

def walk_node(node):
if len(node) == 2 and type(node[1]) is not tuple:
walk_token(node)
else:
return walk_symbol(node)

def walk_symbol(node):
symbol_type = node[0]
symbol_leaves = node[1:]
for leave in symbol_leaves:
walk_node(nod)

def walk_token(node):
token_type = node[0]
token_value = node[1]

Paul Paterson · Oct 2, 2003

Jonathan Edwards said:
The parser library module only records source line numbers for tokens. I
need a parser that records ranges of line and character locations for
each AST node, so I can map back to the source. Does anyone know of such
a thing? Thanks

Jonathan

If I understand you correctly, then the Simpleparse parser may be just what
you are looking for:

http://simpleparse.sourceforge.net

It is very powerful but still easy to use. The AST it produces gives the
start and end points of the matching tokens. Below is an example for parsing
a statement (from a VB grammar) ... you will see each node comprises a tuple
of (token_name, start_char, end_char, [sub_node1, sub_node2, ...]).

The example below looks rather complex because of the grammar, but you can
see that most of the sub_node matches all relate to the same characters in
the source. You can easily match each token to the corresponding text in the
source.

Paul
1 15
[('line_body',
0,
15,
[('single_statement',
0,
14,
[('assignment_statement',
0,
14,
[('object', 0, 1, [('primary', 0, 1, [('identifier', 0, 1, [])])]),
('expression',
4,
14,
[('par_expression',
4,
14,
[('base_expression',
4,
14,
[('simple_expr',
4,
14,
[('call',
4,
14,
[('object',
4,
14,
[('primary',
4,
5,
[('identifier', 4, 5, [])]),
('parameter_list',
5,
14,
[('list',
5,
14,
[('bare_list',
6,
13,
[('bare_list_item',
6,
8,
[('expression',
6,
8,
[('par_expression',
6,
8,
[('base_expression',
6,
8,
[('simple_expr',
6,
8,
[('atom',
6,
8,
[('literal',
6,
8,
[('integer',
6,
8,
[('decimalinteger',
6,
8,
None)])])])])])])])]),
('bare_list_item',
10,
13,
[('expression',
10,
13,
[('par_expression',
10,
13,
[('base_expression',
10,
13,
[('simple_expr',
10,
13,
[('call',
10,
13,
[('object',
10,
13,
[('primary',
10,
13,
[('identifier',
10,
13,

[])])])])])])])])])])])])])])])])])])])]),
('line_end', 14, 15, [('NEWLINE', 14, 15, None)])])]

simple ElementTree based parser that allows entity definition map	0	Dec 4, 2013
PEP/GSoC idea: built-in parser generator module for Python?	0	Mar 14, 2014
python parser overridden by pymol	6	Nov 12, 2009
How to write a language parser ?	5	Feb 22, 2013
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Running code from source that includes extension modules	0	Oct 2, 2013
Python-based regular expression parser that allows patterns to callfunctions?	3	Mar 2, 2008
How to parse the starting and ending of a loop statements in python	0	Jul 30, 2013

Python parser that records source ranges

Jonathan Edwards

Jeff Epler

logistix at cathoderaymission.net

Jonathan Edwards

logistix at cathoderaymission.net

Paul Paterson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads