pyparser and recursion problem

P

pyscottishguy

Hi,

Using pyparser, I'm trying to parse a string like this:

:Start: first SECOND THIRD :SECOND: second1 | second2 :THIRD: third1 |
FOURTH :FOURTH: fourth1 | fourth2

I want the parser to do the following:
1) Get the text for the :Start: label e.g ('first SECOND THIRD')
2) Do nothing with the lower-case words e.g ('first')
3) For each upper-case word find the corresponding entries, and
replace the word
with these entries (the '|' indicates separate records)
e.g. for 'SECOND', replace the word with ("second1", "second2")
4 Do this recursively, because each item in '3' can have upper-case
words

I can do this - but not within pyparser. I had to write a recursive
function to do it. I would like to do it within pyparser however.

I'm pretty sure I have to use the Forward() function along with a few
setResultsName() - but after reading the documentation, many examples,
and trying for hours, I'm still totally lost. Please help!

Here is the program I have so far:
#!/usr/bin/python

from pyparsing import Word, Optional, OneOrMore, Group, alphas,
alphanums, Suppress, Dict

import string

def allIn( as, members ):
"Tests that all elements of as are in members"""
for a in as:
if a not in members:
return False
return True

def allUpper( as ):
"""Tests that all strings in as are uppercase"""
return allIn( as, string.uppercase )

def getItems(myArray, myDict):
"""Recursively get the items for each CAPITAL word"""
myElements=[]
for element in myArray:
myWords=[]
for word in element:
if allUpper(word):
items = getItems(myDict[word], myDict)
myWords.append(items)
else:
myWords.append(word)
myElements.append(myWords)

return myElements

testData = """
:Start: first SECOND THIRD fourth FIFTH

:SECOND: second1_1 second1_2 | second2 | second3

:THIRD: third1 third2 | SIXTH

:FIFTH: fifth1 | SEVENTH

:SIXTH: sixth1_1 sixth1_2 | sixth2

:SEVENTH: EIGHTH | seventh1

:EIGHTH: eighth1 | eighth2

"""

label = Suppress(":") + Word(alphas + "_") + Suppress(":")

words = Group(OneOrMore(Word(alphanums + "_"))) +
Suppress(Optional("|"))

data = ~label + OneOrMore(words)

line = Group(label + data)

doc = Dict(OneOrMore(line))

res = doc.parseString(testData)

# This prints out what pyparser gives us
for line in res:
print line

print
print

startString = res["Start"]
items = getItems([startString], res)[0]
# This prints out what we want
for line in items:
print line
 
N

Neil Cerutti

Using pyparser, I'm trying to parse a string like this:

:Start: first SECOND THIRD :SECOND: second1 | second2 :THIRD: third1 |
FOURTH :FOURTH: fourth1 | fourth2

I want the parser to do the following:
1) Get the text for the :Start: label e.g ('first SECOND THIRD')
2) Do nothing with the lower-case words e.g ('first')
3) For each upper-case word find the corresponding entries, and
replace the word
with these entries (the '|' indicates separate records)
e.g. for 'SECOND', replace the word with ("second1", "second2")
4 Do this recursively, because each item in '3' can have upper-case
words

I can do this - but not within pyparser. I had to write a
recursive function to do it. I would like to do it within
pyparser however.

pyparser is a great parser, but out of the box it only parses. To
get it to convert a text file into a new form you'll have to
write most of the conversion code yourself, and merely hook it
into pyparser.

The critical step you're missing (and that I missed until my
third try) is the setParseAction method, with which you can
monkey around with the resulting parse tree.

On the other hand, since you got it working without pyparsing,
probably you're problem doesn't need pyparsing.

Hopefully I'll have time to help you a bit more later, or Paul
MaGuire will swoop down in his pyparsing powered super-suit. ;)
 
P

Paul McGuire

Hopefully I'll have time to help you a bit more later, or Paul
MaGuire will swoop down in his pyparsing powered super-suit. ;)
There's no need to fear...!

Neil was dead on, and your parser is almost exactly right.
Congratulations for delving into the arcane Dict class, not an easy
element for first time pyparsers!

Forward() would have been okay if you were going to have your macros
defined in advance of referencing them. However, since you are
defining them after-the-fact, you'll have to wait until all the text
is parsed into a tree to start doing the macro substitution.

Your grammar as-is was almost exactly right (I've shown the minimal
mod needed to make this work, plus an alternative grammar that might
be a bit neater-looking). To perform some work after the tree is
built, you attach a parse action to the top-level doc element. This
parse action's job is to begin with the "Start" element, and
recursively replace words in all caps with their corresponding
substitution. As you surmised, the Dict class automagically builds
the lookup dictionary for you during the parsing phase.

After the parse action runse, the resulting res["Start"] element gives
the desired results. (This looks vaguely YAML-ish, am I right?)

-- Paul

Here is your working code:

from pyparsing import Word, Optional, OneOrMore, Group, alphas, \
alphanums, Suppress, Dict, Combine, delimitedList, traceParseAction, \
ParseResults


import string


def allIn( as, members ):
"Tests that all elements of as are in members"""
for a in as:
if a not in members:
return False
return True


def allUpper( as ):
"""Tests that all strings in as are uppercase"""
return allIn( as, string.uppercase )


def getItems(myArray, myDict):
"""Recursively get the items for each CAPITAL word"""
myElements=[]
for element in myArray:
myWords=[]
for word in element:
if allUpper(word):
items = getItems(myDict[word], myDict)
myWords.append(items)
else:
myWords.append(word)
myElements.append(myWords)


return myElements


testData = """
:Start: first SECOND THIRD fourth FIFTH

:SECOND: second1_1 second1_2 | second2 | second3

:THIRD: third1 third2 | SIXTH

:FIFTH: fifth1 | SEVENTH

:SIXTH: sixth1_1 sixth1_2 | sixth2

:SEVENTH: EIGHTH | seventh1

:EIGHTH: eighth1 | eighth2

"""

#> original grammar - very close!
#> just needed to enclose definition of data in a Group
label = Suppress(":") + Word(alphas + "_") + Suppress(":")
words = Group(OneOrMore(Word(alphanums + "_"))) + \
Suppress(Optional("|"))
#~ data = ~label + OneOrMore(words)
data = Group( OneOrMore(words) )
line = Group(label + data)
doc = Dict(OneOrMore(line))

#> suggested alternative grammar
#> - note use of Combine and delimitedList
#~ COLON = Suppress(":")
#~ label = Combine( COLON + Word(alphas + "_") + COLON )
#~ entry = Word(alphanums + "_")
#~ data = delimitedList( Group(OneOrMore(entry)), delim="|" )
#~ line = Group(label + data)
#~ doc = Dict(OneOrMore(line))

# recursive reference fixer-upper
def fixupRefsRecursive(tokens, lookup):
if isinstance(tokens, ParseResults):
subs = [ fixupRefsRecursive(t, lookup) for t in tokens ]
tokens = ParseResults( subs )
else:
if tokens.isupper():
tokens = fixupRefsRecursive(lookup[tokens], lookup)
return tokens

#> add this parse action to doc, which invokes recursive
#> reference fixer-upper
def fixupRefs(tokens):
tokens["Start"] = fixupRefsRecursive( tokens["Start"], tokens )

doc.setParseAction( fixupRefs )

res = doc.parseString(testData)

# This prints out what pyparser gives us
#~ for line in res:
#~ print line
#> not really interested in all of res, just the fixed-up
#> "Start" entry
print res["Start"][0].asList()

print

startString = res["Start"]
items = getItems([startString], res)[0]
# This prints out what we want
for line in items:
print line

Prints:
['first', [['second1_1', 'second1_2'], ['second2'], ['second3']],
[['third1', 'third2'], [[['sixth1_1', 'sixth1_2'], ['sixth2']]]],
'fourth', [['fifth1'], [[[[['eighth1'], ['eighth2']]],
['seventh1']]]]]

['first', [['second1_1', 'second1_2'], ['second2'], ['second3']],
[['third1', 'third2'], [[['sixth1_1', 'sixth1_2'], ['sixth2']]]],
'fourth', [['fifth1'], [[[[['eighth1'], ['eighth2']]],
['seventh1']]]]]
 
P

pyscottishguy

Hey,

Thanks Neil and Paul!

After reading Neil's advice I started playing around with the
setParseAction method, and then I found Paul's script
'macroExpander.py' (http://pyparsing.wikispaces.com/space/showimage/
macroExpander.py).

With only a few modifications to macroExpander.py (and reversing my
string!) I had an almost complete answer of:

first [second1_1 second1_2 | second2 | second3 ] [third1 third2 |
[sixth1_1 sixth1_2 | sixth2 ] ] | fourth [fifth1 | [[eighth1 |
eighth2] | seventh1]]

I was in the process of tyding this up so that I could do an eval() to
get the array when I saw Paul's answer (thanks!)

Here is macroExpander.py with my minimal changes:

#!/usr/bin/python

from pyparsing import *

# define the structure of a macro definition (the empty term is used
# to advance to the next non-whitespace character)
label = Suppress(":") + Word(alphas + "_").setResultsName("macro") +
Suppress(":")

values = restOfLine.setResultsName("value")

macroDef = label + empty + values

# define a placeholder for defined macros - initially nothing
macroExpr = Forward()

# global dictionary for macro definitions
macros = {}

# parse action for macro definitions
def processMacroDefn(s,l,t):
macroVal = macroExpander.transformString(t.value)
macros[t.macro] = macroVal
macroExpr << MatchFirst( map(Keyword,macros.keys()) )

# parse action to replace macro references with their respective
definition
def processMacroRef(s,l,t):
return '[' + macros[t[0]] +']'

# attach parse actions to expressions
macroExpr.setParseAction(processMacroRef)
macroDef.setParseAction(processMacroDefn)

# define pattern for scanning through the input string
macroExpander = macroExpr | macroDef

# test macro substitution using transformString
testString = """
:EIGHTH: eighth1 | eighth2

:SEVENTH: EIGHTH | seventh1

:SIXTH: sixth1_1 sixth1_2 | sixth2

:FIFTH: fifth1 | SEVENTH

:THIRD: third1 third2 | SIXTH

:SECOND: second1_1 second1_2 | second2 | second3

:Start: first SECOND THIRD | fourth FIFTH

"""

macroExpander.transformString(testString)

print macros['Start']
 
P

Paul McGuire

Hey,

Thanks Neil and Paul!

After reading Neil's advice I started playing around with the
setParseAction method, and then I found Paul's script
'macroExpander.py' (http://pyparsing.wikispaces.com/space/showimage/
macroExpander.py).
<snip>

Great! I'm glad those examples are useful (I keep meaning to get them
a little better organized.)

Just one side note. macroExpander uses the Forward class in a non-
traditional way. The Forward class is usually used to define a
recursive grammar, like in this simple list grammar that defines a
comma-separated list, with nested sublists enclosed in parentheses:

listExpr = Forward()
listItem = Word(alphanums) | Group( '(' + listExpr + ')' )
listExpr << delimitedList( listItem )

I need to use listExpr as part of the definition of listItem, but I
need to use listItem to define listExpr. So I "forward declare"
listExpr by calling it a Forward instance, then I "inject" the grammar
contents using the shift '<<' operator when I have defined listItem.

macroExpander is a little different. In this case, I'm using a
Forward as a parse-time dynamic grammar element. A Forward is used as
a placeholder for all previously seen macro names, beginning as an
empty Forward (an empty Forward never matches). As new macro
definitions are found, the placeholder content gets defined as a
MatchFirst of all macro definitions that have been seen so far - sort
of a self-modifying parser.

This explains why you had to reverse the order of your example string
when you adapted macroExpander to your example. Your original string
referenced macros that hadn't been encountered yet, so the only way to
handle that was to parse everything, and then recursively post-process
the structure with a parse action on the top-level expression. (I
tried a version using parse actions on the all-uppercase macro
expression, and a dict of previously-seen references and placeholders
in the cumulative ParseResults, but I couldn't get this working, so I
went with the recursive post-process instead - much simpler and more
straightforward.)

Glad you worked out your question, hope to hear from you further in
your pyparsing exploits.

-- Paul
 
P

pyscottishguy

Hey,

Thanks for the further explanations. I'm going to play around more
with the 'recursive grammar' and 'parse-time dynamic grammar element'
stuff so that I understand it a bit better.

I liked the changes you suggested. The alternative grammar definitely
makes the code easier to understand, plus the 'delimitedList' function
is very handy. I can see the usefulness of 'Combine' but it's not
needed here.

This didn't work though:

print res["Start"][0].asList()

But that's ok, because I was more than happy to do:

myArray = [res["Start"]][0]

and have the whole thing as a list. Very nice indeed :)

Thanks again for your help!

For anyone thats interested, here is the updated code (my original
code with Paul's enhancements)

#!/usr/bin/python

from pyparsing import Word, OneOrMore, Group, alphas, \
alphanums, Suppress, Dict, delimitedList, ParseResults

import string

def allIn( as, members ):
"Tests that all elements of as are in members"""
for a in as:
if a not in members:
return False
return True

def allUpper( as ):
"""Tests that all strings in as are uppercase"""
return allIn( as, string.uppercase )

# recursive reference fixer-upper
def fixupRefsRecursive(tokens, lookup):
if isinstance(tokens, ParseResults):
subs = [ fixupRefsRecursive(t, lookup) for t in tokens ]
tokens = ParseResults( subs )
else:
if tokens.isupper():
tokens = fixupRefsRecursive(lookup[tokens], lookup)
return tokens

def fixupRefs(tokens):
tokens["Start"] = fixupRefsRecursive( tokens["Start"], tokens )

testData = """
:Start: first SECOND THIRD fourth FIFTH

:SECOND: second1_1 second1_2 | second2 | second3

:THIRD: third1 third2 | SIXTH

:FIFTH: fifth1 | SEVENTH

:SIXTH: sixth1_1 sixth1_2 | sixth2

:SEVENTH: EIGHTH | seventh1

:EIGHTH: eighth1 | eighth2

"""

COLON = Suppress(":")
label = COLON + Word(alphas + "_") + COLON
entry = Word(alphanums + "_")
data = delimitedList( Group(OneOrMore(entry)), delim="|" )
line = Group(label + data)
doc = Dict(OneOrMore(line))

doc.setParseAction( fixupRefs )

res = doc.parseString(testData)

myArray = [res["Start"]][0]

print myArray
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,139
Latest member
JamaalCald
Top