tokenize module

Jim · Jun 27, 2009

I'm trying to understand the output of the tokenize.generate_tokens()
generator. The token types returned seem to be more general than I'd
expect. For example, when fed the following line of code:

def func_a():

the (abbreviated) returned token tuples are as follows:

(NAME, def, ..., def func_a()

(NAME , func_a, ..., def func_a()

(OP, (, ..., def func_a()

(OP, ), ..., def func_a()

(OP, :, ..., def func_a()

(NEWLINE, NEWLINE, ..., def func_a()

It seems to me that the token '(' should be identified as 'LPAR' and
')' as 'RPAR', as found in the dictionary token.tok_name. What am I
missing here?

bootkey · Jun 29, 2009

Did you read the module? Right at the top, it says:

<quote>
It is designed to match the working of the Python tokenizer exactly, except
that it produces COMMENT tokens for comments and gives type OP for all
operators
</quote>

Thanks for the reply. I wonder why the tokenizer classifies all
operators simply as OP, instead of the various operators listed in the
tok_name dictionary.

Jim Cook

Translater + module + tkinter	1	Feb 16, 2023
Problem with tokenize module and indents	1	Aug 23, 2006
Simple eval	5	Nov 18, 2007
parse tree has symbols not in the grammar?	0	Apr 27, 2005
Calling function from another module	2	Dec 16, 2010
Directory Caching, suggestions and comments?	0	May 15, 2014
.py to sqlite translator [1 of 2]	1	Oct 26, 2007
C Python: Running Python code within function scope	1	Sep 4, 2012

tokenize module

Jim

bootkey

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads