Basic tokenizer

  • Thread starter Dale Strickland-Clak
  • Start date
D

Dale Strickland-Clak

I'm looking for a simple string tokenizer. The Python tokenize module is
too specific and is geared to Python syntax.

I just want something that can be setup to read the basic stuff like
.. identifiers
.. integers (and/or reals)
.. strings
.. special characters such as operators and brackets.

Is anyone aware of such a module?

Thanks.
 
M

Miki Tebeka

Hello Dale,
I'm looking for a simple string tokenizer. The Python tokenize module is
too specific and is geared to Python syntax.

I just want something that can be setup to read the basic stuff like
. identifiers
. integers (and/or reals)
. strings
. special characters such as operators and brackets.

Is anyone aware of such a module?

Thanks.
There are several parsing/lexing packages for Python. My favorite is PLY
(http://systems.cs.uchicago.edu/ply/).

Bye.
 
A

Andrea Griffini

There are several parsing/lexing packages for Python. My favorite is PLY
(http://systems.cs.uchicago.edu/ply/).

I gave it a very quick look and I've to say that I was
impressed... but it wasn't a good impression.

For example the pycalc.py example is just a few lines
shorter than a recursive descent parser that implements
a similar calculator *without importing any module, not
even "re"*. If implementing something from scratch requires
the same amount of code than using a tool I begin wondering
what's the point in using such a tool.

I was told that the good point about parser generators
is that at least you can be sure the code is correct.

Try feeding "10/2" or ")" to pycalc.py ... :-(

Andrea
 
A

Andrea Griffini

I'd be interested in your feedback on this pyparsing example:
http://www.geocities.com/ptmcg/python/fourFn.py.txt

(The "fourFn.py" title is a legacy - this now does 5-function arithmetic
plus trig functions.)

pyparsing impressed me... but positively :)

I don't like the example you're talking about that much;
first there are at least a couple of problems in the
specific calculator logic... the solution found for "^"
right-association is broken (try evaluating 2^3+4) and
there is no provision for unary minus.

The solution for "^" is however very simple; instead of
using "expr" you can use as structure

factor = Forward()
factor << (atom + Optional( expop + factor ));

Usually when I (hand) write a parser I've a single function
for parsing all binary operations that reads from a
table the operator and associativity. For left-association
the rule is

expr(n) = expr(n-1) + ZeroOrMore( op+expr(n-1) )

while for right-association is

expr(n) = expr(n-1) + ZeroOrMore( op+expr(n) )

(in the latter ZeroOrMore could be Optional instead,
but ZeroOrMore is fine anyway and that way the code
is simpler).

Also I don't like the idea of writing to the global
exprstack and the way parsing expression is used
(things like (lpar + expr + rpar).suppress() look
really weird).

The module seems to me very easy to use, however, and in
little time and very few lines of code I was able to change
fourFn to parse the five binary operations, unary minus,
comparisions, the C++ "?:" ternary opertor, variable
reading and writing (but with assignment being "statement"
and not an operator like in python) and generating as
output a list of opcodes for a stack based VM including
jump codes for "?:" (and this without global variables,
and so handling correctly the backtracking).

I've yet to see how nicely is possible to handle error
reporting (another no-brainer for hand-coded parsers);
but so far I didn't read the documentation :).

Andrea
 
A

Andrea Griffini

I'd be interested in your feedback on this pyparsing example:
http://www.geocities.com/ptmcg/python/fourFn.py.txt

Like I said in another post I like pyparsing so far... but
I tried writing a bare-bone calculator and I got this...

http://www.gripho.it/barecalc.py.txt

It implements support for 5 operations, a few functions
and variables with assignment. Only imported module is "math",
and the line count is not really much higher than fourFn.py.

If you wonder if you can be able to write a full programming
language parser this way I can ensure you that it's possible;
I've been there a few times (of course I'm not talking about
languages in which the grammar is a nightmare).

On the plus if you need to do a few little hacks (like
deciding how to parse something depending on the *meaning*
of a symbol) it's really easy. For an example of this kind
of hack in the barecalc calculator if an identifier is a
known function then the parser requires "("+expr+")" and
IMO this is crystal clear in the source.

Andrea
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top