simpleparse parsing problem

  • Thread starter David Hirschfield
  • Start date
D

David Hirschfield

Anyone out there use simpleparse? If so, I have a problem that I can't
seem to solve...I need to be able to parse this line:

"""Cen2 = Cen(OUT, "Cep", "ies", wh, 544, (wh/ht));"""

with this grammar:

grammar = r'''
declaration := ws, line, (ws, line)*, ws
line := (statement / assignment), ';', ws
assignment := identifier, ws, '=', ws, statement
statement := identifier, '(', arglist?, ')', chars?
identifier := ([a-zA-Z0-9_.:])+
arglist := arg, (',', ws, arg)*
arg := expr/ statement / identifier / num / str /
curve / spline / union / conditional / definition
definition := typedef?, ws, identifier, ws, '=', ws, arg
typedef := ([a-zA-Z0-9_])+
expr := termlist, ( operator, termlist )+
termlist := ( '(', expr, ')' ) / term
term := call / identifier / num
call := identifier, '(', arglist?, ')'
union := '{{', ws, (arg, ws, ';', ws)*, arg, ws, '}}'
operator := ( '+' / '-' / '/' / '*' /
'==' / '>=' / '<=' / '>' / '<' )
conditional := termlist, ws, '?', ws, termlist, ws, ':', ws, termlist
curve := (list / num), '@', num
spline := (cv, ',')*, cv
cv := identifier, '@', num
list := '[', arg, (',', ws, arg)*, ']'
str := '"', ([;] / chars)*, '"'
num := ( scinot / float / int )
<chars> := ('-' / '/' / '?' / [a-zA-Z0-9_.!@#$%^&\*\+=<> :])+
<int> := ([-+]?, [0-9]+)
<float> := ([-+]?, [0-9\.]+)
<scinot> := (float, 'e', int)
<ws> := [ \t\n]*
'''

But it fails. The problem is with how arglist/arg/expr are defined,
which makes it unable to handle the parenthesized expression at the end
of the line:

(wh/ht)

But everything I've tried to correct that problem fails. In the end, it
needs to be able to parse that line with those parentheses around wh/ht,
or without them.
Recursive parsing of expressions just seems hard to do in simpleparse,
and is beyond my parsing knowledge.

Here's the code to get the parser going:

from simpleparse.parser import Parser
p = Parser(grammar, 'line')
import pprint
bad_line = """Cen2 = Cen(OUT, "Cep", "ies", wh, 544, (wh/ht));"""

pprint.pprint(p.parse(bad_line))


Any help greatly appreciated, thanks,
-Dave
 
P

Paul McGuire

David Hirschfield said:
Anyone out there use simpleparse? If so, I have a problem that I can't
seem to solve...I need to be able to parse this line:

"""Cen2 = Cen(OUT, "Cep", "ies", wh, 544, (wh/ht));"""

with this grammar:

grammar = r'''
declaration := ws, line, (ws, line)*, ws
line := (statement / assignment), ';', ws
assignment := identifier, ws, '=', ws, statement
statement := identifier, '(', arglist?, ')', chars?
identifier := ([a-zA-Z0-9_.:])+
arglist := arg, (',', ws, arg)*
arg := expr/ statement / identifier / num / str /
curve / spline / union / conditional / definition
definition := typedef?, ws, identifier, ws, '=', ws, arg
typedef := ([a-zA-Z0-9_])+
expr := termlist, ( operator, termlist )+
termlist := ( '(', expr, ')' ) / term
term := call / identifier / num
call := identifier, '(', arglist?, ')'
union := '{{', ws, (arg, ws, ';', ws)*, arg, ws, '}}'
operator := ( '+' / '-' / '/' / '*' /
'==' / '>=' / '<=' / '>' / '<' )
conditional := termlist, ws, '?', ws, termlist, ws, ':', ws, termlist
curve := (list / num), '@', num
spline := (cv, ',')*, cv
cv := identifier, '@', num
list := '[', arg, (',', ws, arg)*, ']'
str := '"', ([;] / chars)*, '"'
num := ( scinot / float / int )
<chars> := ('-' / '/' / '?' / [a-zA-Z0-9_.!@#$%^&\*\+=<> :])+
<int> := ([-+]?, [0-9]+)
<float> := ([-+]?, [0-9\.]+)
<scinot> := (float, 'e', int)
<ws> := [ \t\n]*
'''

David -

I converted your simpleparse grammar to pyparsing, which I could then
troubleshoot. Here is a working pyparsing grammar, perhaps you can convert
it back to simpleparse form and see if you make any better progress.

-- Paul


test = """Cen2 = Cen(OUT, "Cep", "ies", wh, 544, (wh/ht));"""


from pyparsing import *

# recursive items need forward decl - assign contents later using '<<'
operator
arg = Forward()
expr = Forward()
statement = Forward()

float_ = Regex (r"[-+]?[0-9]+\.[0-9]*")
int_ = Regex (r"[-+]?[0-9]+")
scinot = Combine(float_ + oneOf(list("eE")) + int_)
num = scinot | float_ | int_
str_ = dblQuotedString
list_ = "[" + delimitedList(arg) + "]"
identifier = Word(alphas, srange("[a-zA-Z0-9_.:]"))
cv = identifier + "@" + num
spline = delimitedList(cv)
curve = (list_ | num) + "@" + num
conditional = expr + "?" + expr + ":" + expr
operator = oneOf( ('+', '-', '/', '*', '==', '>=', '<=', '>', '<') )
union = "{{" + delimitedList( arg, delim=";" ) + "}}"
call = identifier + "(" + delimitedList(arg) + ")"
term = call | identifier | num | Group( "(" + expr + ")" )
expr << (term + ZeroOrMore( operator+term ) )
typedef = Word( alphas, alphanums )
definition = ( (typedef + identifier) | identifier ) + "=" + arg
arg << (expr | statement | identifier | num | str_ | "." |
curve | spline | union | conditional | definition)
assignment = identifier + "=" + statement
statement << ( call | assignment )
line_ = statement + ';'
declaration = OneOrMore(line_)

print declaration.parseString(test)

Prints:
['Cen2', '=', 'Cen', '(', 'OUT', '"Cep"', '"ies"', 'wh', '544', ['(', 'wh',
'/', 'ht', ')'], ')', ';']
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,045
Latest member
DRCM

Latest Threads

Top