Where regexs listed for Python language's tokenizer/lexer?

C

Chris Seberino

Where regexs listed for Python language's tokenizer/lexer?

If I'm not mistaken, the grammar is not sufficient to specify the
language....
you also need to specify the regexs that define the tokens
right?..where is that?

chris
 
M

Miles Kaufmann

Where regexs listed for Python language's tokenizer/lexer?

If I'm not mistaken, the grammar is not sufficient to specify the
language....
you also need to specify the regexs that define the tokens
right?..where is that?

The Python tokenization process is described here:

http://docs.python.org/reference/lexical_analysis.html

The tokenizer can't be expressed in terms of regular expressions,
because it's non-regular (thanks to things like matching nested braces
and keeping track of the indentation level).

-Miles
 
D

Dennis Lee Bieber

Where regexs listed for Python language's tokenizer/lexer?

If I'm not mistaken, the grammar is not sufficient to specify the
language....
you also need to specify the regexs that define the tokens
right?..where is that?
Pardon... I've been out of the "market", but I don't recall EVER
seeing a "regex" used in a textbook for compiler/interpreter design.

BNF (or Pascal's bubble diagram equivalent) has always been used to
define the syntactical components in those books in my possession, and
parsers (tokenizers) were written using those implied algorithms (if the
first character is numeric or "." it starts a number, otherwise treat it
as an identifier, etc.),
 
P

Paul McGuire

Where regexs listed for Python language's tokenizer/lexer?

If I'm not mistaken, the grammar is not sufficient to specify the
language....
you also need to specify the regexs that define the tokens
right?..where is that?

I think the OP is asking for the regexs that define the terminals
referenced in the Python grammar, similar to those found in yacc token
definitions. He's not implying that there are regexs that implement
the whole grammar.

-- Paul
 
R

Robert Kern

Dennis said:
Pardon... I've been out of the "market", but I don't recall EVER
seeing a "regex" used in a textbook for compiler/interpreter design.

BNF (or Pascal's bubble diagram equivalent) has always been used to
define the syntactical components in those books in my possession, and
parsers (tokenizers) were written using those implied algorithms (if the
first character is numeric or "." it starts a number, otherwise treat it
as an identifier, etc.),

In actual implementations of lexers and the lexical analysis components of
parsers, regexes are fairly common. For example, from ply:

http://www.dabeaz.com/ply/ply.html#ply_nn6

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top