Parser or regex ?

F

Fuzzyman

Hello all,

I'm writing a module that takes user input as strings and (effectively)
translates them to function calls with arguments and keyword
arguments.to pass a list I use a sort of 'list constructor' - so the
syntax looks a bit like :

checkname(arg1, "arg 2", 'arg 3', keywarg="value",
keywarg2='value2', default=list("val1", 'val2'))

Worst case anyway :)

I can handle this with regular expressions but they are becoming truly
horrible. I wonder if anyone has any suggestions on optimising them. I
could hand write a parser - which would be more code, probably slower -
but less error prone. (Regualr expressions are subject to obscure
errors - especially the ones I create).

The trouble is that I have to pull out the separate arguments, then
pull apart the keyword arguments and the list keyword arguments. This
makes it a 'multi-pass' task - and I wondered if there was a better way
to do it.

As I use ``findall`` to pull out all the arguments - so I also have to
use a *very similar* regex to first check that there are no errors (as
findall will just miss out badly formed parts of the input).

My current approach is :

pull out the checkname and *all* the arguments using :

'(.+?)\((.*)\)'

I then have :


_paramstring = r'''
(?:
(
(?:
[a-zA-Z_][a-zA-Z0-9_]*\s*=\s*list\(
(?:
\s*
(?:
(?:".*?")| # double quotes
(?:'.*?')| # single quotes
(?:[^'",\s\)][^,\)]*?) # unquoted
)
\s*,\s*
)*
(?:
(?:".*?")| # double quotes
(?:'.*?')| # single quotes
(?:[^'",\s\)][^,\)]*?) # unquoted
)? # last one
\)
)|
(?:
(?:".*?")| # double quotes
(?:'.*?')| # single quotes
(?:[^'",\s=][^,=]*?)| # unquoted
(?: # keyword argument
[a-zA-Z_][a-zA-Z0-9_]*\s*=\s*
(?:
(?:".*?")| # double quotes
(?:'.*?')| # single quotes
(?:[^'",\s=][^,=]*?) # unquoted
)
)
)
)
(?:
(?:\s*,\s*)|(?:\s*$) # comma
)
)
'''

I can use ``_paramstring`` with findall to pull out all the arguments.
However - as I said, I first need to check that the entrie input is
well formed. So I do a match against :

_matchstring = '^%s*' % _paramstring

Having done a match I can use findall and ``_paramstring`` to pull out
*all* the parameters as a list - and go through each one checking if it
is a single argument, keyword argument or list constructor.

For keyword arguments and lists constructors I use another regular
expression (the appropriate part of _paramstring basically) to pull out
the values from that.

Now this approach works - but it's hardly "optimal" (for some value of
optimal). I wondered if anyone could suggest a better approach.

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
T

Tim Arnold

Fuzzyman said:
Hello all,

I'm writing a module that takes user input as strings and (effectively)
translates them to function calls with arguments and keyword
arguments.to pass a list I use a sort of 'list constructor' - so the
syntax looks a bit like :

checkname(arg1, "arg 2", 'arg 3', keywarg="value",
keywarg2='value2', default=list("val1", 'val2'))

Worst case anyway :)

pyparsing is great, easy to configure and very powerful--I think it looks
like a great tool for your inputs.

http://pyparsing.sourceforge.net/


--Tim
 
M

Michael Spencer

Fuzzyman said:
Hello all,

I'm writing a module that takes user input as strings and (effectively)
translates them to function calls with arguments and keyword
arguments.to pass a list I use a sort of 'list constructor' - so the
syntax looks a bit like :

checkname(arg1, "arg 2", 'arg 3', keywarg="value",
keywarg2='value2', default=list("val1", 'val2'))

Worst case anyway :)
...

Perhaps you could simply use Python's parser - the syntax appears to be Python's.

e.g., a very quick hack using eval, which is easier without the list call, so
I'm cheating and replacing it with a list literal for now:
>>>
>>> source = """checkname(arg1, "arg 2", 'arg 3', keywarg="value", ... keywarg2='value2', default=["val1", 'val2'])"""
>>>

We need some way to ensure bare names don't cause NameErrors: ... def __getitem__(self, key):
... if key in self:
... return dict.__getitem__(self, key)
... return "%s" % key # if name not found, return it as a str constant
...
... return args, kw
...
With this set up, you can parse in one line! (('arg1', 'arg 2', 'arg 3'), {'default': ['val1', 'val2'], 'keywarg': 'value',
'keywarg2': 'value2'})
If you don't like the risks of eval, then compiler.parse gives a form of the
parse output that is fairly easy to deal with

Cheers
Michael
 
G

Graham Fawcett

Fuzzyman said:
Hello all,

I'm writing a module that takes user input as strings and (effectively)
translates them to function calls with arguments and keyword
arguments.to pass a list I use a sort of 'list constructor' - so the
syntax looks a bit like :

checkname(arg1, "arg 2", 'arg 3', keywarg="value",
keywarg2='value2', default=list("val1", 'val2'))

Worst case anyway :)

Since you've taken the care to use Python syntax, you could do worse
than using standard-library tools that are designed for parsing Python.
.... checkname(arg1, "arg 2", 'arg 3', keywarg="value",
.... keywarg2='value2', default=list("val1", 'val2'))
.... """Expression(CallFunc(Name('checkname'), [Name('arg1'), Const('arg 2'),
Const('arg 3'), Keyword('keywarg', Const('value')), Keyword('keywarg2',
Const('value2')), Keyword('default', CallFunc(Name('list'),
[Const('val1'), Const('val2')], None, None))], None, None))['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'asList',
'getChildNodes', 'getChildren', 'node']

In this case, compiler.parse returns a compiler.ast.Expression object,
acting as the root node of an abstract syntax tree.

The compiler.visitor package may also be of help in writing tree-walker
code to extract the bits you want. With the example session above,

class KwVisitor(compiler.visitor.ASTVisitor):

def visitKeyword(self, node):
print '>>', node

compiler.visitor.walk(node, KwVisitor())

would output:
Keyword('keywarg', Const('value'))
Keyword('keywarg2', Const('value2'))
Keyword('default', CallFunc(Name('list'), [Const('val1'), Const('val2')], None, None))


Note that with a bit of black magic, you can modify the AST, and then
compile the transformed AST into a callable codeblock. Tools like
Quixote's PTL template language use this approach to introduce new
behaviours into Python-like code at compliation time.

Graham
 
F

Fredrik Lundh

Fuzzyman said:
I'm writing a module that takes user input as strings and (effectively)
translates them to function calls with arguments and keyword
arguments.to pass a list I use a sort of 'list constructor' - so the
syntax looks a bit like :

checkname(arg1, "arg 2", 'arg 3', keywarg="value",
keywarg2='value2', default=list("val1", 'val2'))

Worst case anyway :)

I can handle this with regular expressions but they are becoming truly
horrible. I wonder if anyone has any suggestions on optimising them. I
could hand write a parser - which would be more code, probably slower -
but less error prone. (Regualr expressions are subject to obscure
errors - especially the ones I create).

The trouble is that I have to pull out the separate arguments, then
pull apart the keyword arguments and the list keyword arguments. This
makes it a 'multi-pass' task - and I wondered if there was a better way
to do it.

I'd use some variation of:

http://online.effbot.org/2005_11_01_archive.htm#simple-parser-1

(that version can parse tuples, but it should be too hard to extend
it to handle keyword arguments)

</F>
 
P

Paul McGuire

Fuzzyman said:
Thanks - I considered it. It's actually quite a small module (about
38k). I don't want to introduce a dependency on an external module.

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml

Fuzzy -

This wont make the pyparsing module any smaller or less external, but here's
a pyparsing grammar for you.
- handles parameter values of single or double quoted strings, numbers, or
lists
- handles nested list values
- defines the grammar in just 8 readable lines

The simple version shows the basic pyparsing code, the fancy version adds
results names, which make the results processing step simpler.

-- Paul


data = """checkname(arg1, "arg 2", 'arg 3', keywarg="value",
keywarg2='value2', default=list("val1", 'val2',list("val3.1",
'val3.2')))"""

# simple version
from pyparsing import *

ident = Word(alphas,alphanums+"_$")
fname = ident
number = Word(nums,nums+".")
listDef = Forward()
val = quotedString | number | listDef
listDef << Group( Literal("list") + "(" + delimitedList(val) + ")" )
param = Group((ident | quotedString) + Optional(Literal("=").suppress() +
val) )
fn = fname.setResultsName("func") + "(" +
Group(Optional(delimitedList(param))) + ")"

res = fn.parseString(data)
import pprint
pprint.pprint(res.asList())

prints:
['checkname',
'(',
[['arg1'],
['"arg 2"'],
["'arg 3'"],
['keywarg', '"value"'],
['keywarg2', "'value2'"],
['default',
['list',
'(',
'"val1"',
"'val2'",
['list', '(', '"val3.1"', "'val3.2'", ')'],
')']]],
')']

# fancy version, using results names
ident = Word(alphas,alphanums+"_$")
fname = ident
number = Word(nums,nums+".")
listDef = Forward()
val = quotedString | number | listDef
listDef << Group( Literal("list") + Suppress("(") + delimitedList(val) +
Suppress(")") )
noParam = object()
param = Group((ident | quotedString).setResultsName("name") + \
Optional(Literal("=").suppress() + val,
default=noParam).setResultsName("val") )
fn = fname.setResultsName("func") + "(" + \
Group(Optional(delimitedList(param))).setResultsName("params") + ")"

res = fn.parseString(data)

print "func:", res.func
print "params:"
for p in res.params:
if p.val != noParam:
print "-",p.name,"=",p.val
else:
print "-",p.name

prints:

func: checkname
params:
- arg1
- "arg 2"
- 'arg 3'
- keywarg = "value"
- keywarg2 = 'value2'
- default = ['list', '"val1"', "'val2'", ['list', '"val3.1"', "'val3.2'"]]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Staff online

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,071
Latest member
MetabolicSolutionsKeto

Latest Threads

Top