Basic tokenizer

Discussion in 'Python' started by Dale Strickland-Clak, Sep 1, 2004.

  1. I'm looking for a simple string tokenizer. The Python tokenize module is
    too specific and is geared to Python syntax.

    I just want something that can be setup to read the basic stuff like
    .. identifiers
    .. integers (and/or reals)
    .. strings
    .. special characters such as operators and brackets.

    Is anyone aware of such a module?

    Thanks.

    --
    Dale Strickland-Clark
    Riverhall Systems Ltd, www.riverhall.co.uk
     
    Dale Strickland-Clak, Sep 1, 2004
    #1
    1. Advertising

  2. Dale Strickland-Clak

    Miki Tebeka Guest

    Hello Dale,

    > I'm looking for a simple string tokenizer. The Python tokenize module is
    > too specific and is geared to Python syntax.
    >
    > I just want something that can be setup to read the basic stuff like
    > . identifiers
    > . integers (and/or reals)
    > . strings
    > . special characters such as operators and brackets.
    >
    > Is anyone aware of such a module?
    >
    > Thanks.

    There are several parsing/lexing packages for Python. My favorite is PLY
    (http://systems.cs.uchicago.edu/ply/).

    Bye.
    --
    ------------------------------------------------------------------------
    Miki Tebeka <>
    http://tebeka.spymac.net
    The only difference between children and adults is the price of the toys
     
    Miki Tebeka, Sep 2, 2004
    #2
    1. Advertising

  3. On Thu, 2 Sep 2004 08:24:19 +0200, "Miki Tebeka"
    <> wrote:

    >There are several parsing/lexing packages for Python. My favorite is PLY
    >(http://systems.cs.uchicago.edu/ply/).


    I gave it a very quick look and I've to say that I was
    impressed... but it wasn't a good impression.

    For example the pycalc.py example is just a few lines
    shorter than a recursive descent parser that implements
    a similar calculator *without importing any module, not
    even "re"*. If implementing something from scratch requires
    the same amount of code than using a tool I begin wondering
    what's the point in using such a tool.

    I was told that the good point about parser generators
    is that at least you can be sure the code is correct.

    Try feeding "10/2" or ")" to pycalc.py ... :-(

    Andrea
     
    Andrea Griffini, Sep 2, 2004
    #3
  4. Dale Strickland-Clak

    Paul McGuire Guest

    "Andrea Griffini" <> wrote in message
    news:...
    > On Thu, 2 Sep 2004 08:24:19 +0200, "Miki Tebeka"
    > <> wrote:
    >
    > >There are several parsing/lexing packages for Python. My favorite is PLY
    > >(http://systems.cs.uchicago.edu/ply/).

    >
    > I gave it a very quick look and I've to say that I was
    > impressed... but it wasn't a good impression.
    >


    I'd be interested in your feedback on this pyparsing example:
    http://www.geocities.com/ptmcg/python/fourFn.py.txt

    (The "fourFn.py" title is a legacy - this now does 5-function arithmetic
    plus trig functions.)

    -- Paul
     
    Paul McGuire, Sep 2, 2004
    #4
  5. On Thu, 02 Sep 2004 13:02:09 GMT, "Paul McGuire"
    <._bogus_.com> wrote:

    >I'd be interested in your feedback on this pyparsing example:
    >http://www.geocities.com/ptmcg/python/fourFn.py.txt
    >
    >(The "fourFn.py" title is a legacy - this now does 5-function arithmetic
    >plus trig functions.)


    pyparsing impressed me... but positively :)

    I don't like the example you're talking about that much;
    first there are at least a couple of problems in the
    specific calculator logic... the solution found for "^"
    right-association is broken (try evaluating 2^3+4) and
    there is no provision for unary minus.

    The solution for "^" is however very simple; instead of
    using "expr" you can use as structure

    factor = Forward()
    factor << (atom + Optional( expop + factor ));

    Usually when I (hand) write a parser I've a single function
    for parsing all binary operations that reads from a
    table the operator and associativity. For left-association
    the rule is

    expr(n) = expr(n-1) + ZeroOrMore( op+expr(n-1) )

    while for right-association is

    expr(n) = expr(n-1) + ZeroOrMore( op+expr(n) )

    (in the latter ZeroOrMore could be Optional instead,
    but ZeroOrMore is fine anyway and that way the code
    is simpler).

    Also I don't like the idea of writing to the global
    exprstack and the way parsing expression is used
    (things like (lpar + expr + rpar).suppress() look
    really weird).

    The module seems to me very easy to use, however, and in
    little time and very few lines of code I was able to change
    fourFn to parse the five binary operations, unary minus,
    comparisions, the C++ "?:" ternary opertor, variable
    reading and writing (but with assignment being "statement"
    and not an operator like in python) and generating as
    output a list of opcodes for a stack based VM including
    jump codes for "?:" (and this without global variables,
    and so handling correctly the backtracking).

    I've yet to see how nicely is possible to handle error
    reporting (another no-brainer for hand-coded parsers);
    but so far I didn't read the documentation :).

    Andrea
     
    Andrea Griffini, Sep 4, 2004
    #5
  6. On Thu, 02 Sep 2004 13:02:09 GMT, "Paul McGuire"
    <._bogus_.com> wrote:

    >I'd be interested in your feedback on this pyparsing example:
    >http://www.geocities.com/ptmcg/python/fourFn.py.txt


    Like I said in another post I like pyparsing so far... but
    I tried writing a bare-bone calculator and I got this...

    http://www.gripho.it/barecalc.py.txt

    It implements support for 5 operations, a few functions
    and variables with assignment. Only imported module is "math",
    and the line count is not really much higher than fourFn.py.

    If you wonder if you can be able to write a full programming
    language parser this way I can ensure you that it's possible;
    I've been there a few times (of course I'm not talking about
    languages in which the grammar is a nightmare).

    On the plus if you need to do a few little hacks (like
    deciding how to parse something depending on the *meaning*
    of a symbol) it's really easy. For an example of this kind
    of hack in the barecalc calculator if an identifier is a
    known function then the parser requires "("+expr+")" and
    IMO this is crystal clear in the source.

    Andrea
     
    Andrea Griffini, Sep 4, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andy K
    Replies:
    7
    Views:
    452
    John C. Bollinger
    Nov 17, 2003
  2. Knackeback
    Replies:
    5
    Views:
    2,907
    John Harrison
    May 11, 2004
  3. Christopher Benson-Manica

    String tokenizer comments desired

    Christopher Benson-Manica, May 12, 2004, in forum: C++
    Replies:
    5
    Views:
    533
    Christopher Benson-Manica
    May 13, 2004
  4. Java Guy

    string tokenizer.

    Java Guy, Jun 17, 2004, in forum: C++
    Replies:
    4
    Views:
    1,346
    Chris Theis
    Jun 18, 2004
  5. Alex
    Replies:
    10
    Views:
    913
    tom_usenet
    Aug 3, 2004
Loading...

Share This Page