Pyparsing: Grammar Suggestion

K

Khoa Nguyen

I am trying to come up with a grammar that describes the following:

record = f1,f2,...,fn END_RECORD
All the f(i) has to be in that order.
Any f(i) can be absent (e.g. f1,,f3,f4,,f6 END_RECORD)
Number of f(i)'s can vary. For example, the followings are allowed:
f1,f2 END_RECORD
f1,f2,,f4,,f6 END_RECORD

Any suggestions?

Thanks,
Khoa
 
P

Paul McGuire

I am trying to come up with a grammar that describes the following:

record = f1,f2,...,fn END_RECORD
All the f(i) has to be in that order.
Any f(i) can be absent (e.g. f1,,f3,f4,,f6 END_RECORD)
Number of f(i)'s can vary. For example, the followings are allowed:
f1,f2 END_RECORD
f1,f2,,f4,,f6 END_RECORD

Any suggestions?

Thanks,
Khoa

--------
pyparsing includes a built-in expression, commaSeparatedList, for just such
a case. Here is a simple pyparsing program to crack your input text:


data = """f1,f2,f3,f4,f5,f6 END_RECORD
f1,f2 END_RECORD
f1,f2,,f4,,f6 END_RECORD"""

from pyparsing import commaSeparatedList

for tokens,start,end in commaSeparatedList.scanString(data):
print tokens


This returns:
['f1', 'f2', 'f3', 'f4', 'f5', 'f6 END_RECORD']
['f1', 'f2 END_RECORD']
['f1', 'f2', '', 'f4', '', 'f6 END_RECORD']

Note that consecutive commas in the input return empty strings at the
corresponding places in the results.

Unfortunately, commaSeparatedList embeds its own definition of what is
allowed between commas, so the last field looks like it always has
END_RECORD added to the end. We could copy the definition of
commaSeparatedList and exclude this, but it is simpler just to add a parse
action to commaSeparatedList, to remove END_RECORD from the -1'th list
element:

def stripEND_RECORD(s,l,t):
last = t[-1]
if last.endswith("END_RECORD"):
# return a copy of t with last element trimmed of "END_RECORD"
return t[:-1] + [last[:-(len("END_RECORD"))].rstrip()]

commaSeparatedList.setParseAction(stripEND_RECORD)


for tokens,start,end in commaSeparatedList.scanString(data):
print tokens


This returns:

['f1', 'f2', 'f3', 'f4', 'f5', 'f6']
['f1', 'f2']
['f1', 'f2', '', 'f4', '', 'f6']

As one of my wife's 3rd graders concluded on a science report - "wah-lah!"

Python also includes a csv module if this example doesn't work for you, but
you asked for a pyparsing solution, so there it is.

-- Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,071
Latest member
MetabolicSolutionsKeto

Latest Threads

Top