Parsing files -- pyparsing to the rescue?

rh0dium · Jan 16, 2006

Hi all,

I have a file which I need to parse and I need to be able to break it
down by sections. I know it's possible but I can't seem to figure this
out.

The sections are broken by <> with one or more keywords in the <>.
What I want to do is to be able to pars a particular section of the
file. So for example I need to be able to look at the SYSLIB section.
Presumably the sections are

<SYSLIB>
Sys Data
Sys-Data
asdkData
Data
<LOGLVS>
Data
Data
Data
Data
<SOME SECTION>
Data
Data
Data
Data
<NETLIST>
Data
Data
Data
Data
<NET>

So if I wanted to break them down..

Sections are broken down by this..

secH=pyparsing.LineStart() + pyparsing.Suppress(
pyparsing.Literal("<")) +
pyparsing.OneOrMore(pyparsing.Word(pyparsing.alphanums)) +
pyparsing.Suppress( pyparsing.Literal(">"))

But how do I say that <SECTIONn> stops at the start of the next
<SECTIONm>?

Giovanni Bajo · Jan 16, 2006

rh0dium said:
I have a file which I need to parse and I need to be able to break it
down by sections. I know it's possible but I can't seem to figure this
out.

The sections are broken by <> with one or more keywords in the <>.
What I want to do is to be able to pars a particular section of the
file. So for example I need to be able to look at the SYSLIB section.
Presumably the sections are

<SYSLIB>
Sys Data
Sys-Data
asdkData
Data
<LOGLVS>
Data
Data
Data
Data
<SOME SECTION>
Data
Data
Data
Data
<NETLIST>
Data
Data
Data
Data
<NET>

Given your description, pyparsing doesn't feel like the correct tool:

secs = {}
for L in file("foo.txt", "rU"):
L = L.rstrip("\n")
if re.match(r"<.*>", L):
name = L[1:-1]
secs[name] = []
else:
secs[name].append(L)

Paul McGuire · Jan 17, 2006

rh0dium said:
Hi all,

I have a file which I need to parse and I need to be able to break it
down by sections. I know it's possible but I can't seem to figure this
out.

The sections are broken by <> with one or more keywords in the <>.
But how do I say that <SECTIONn> stops at the start of the next
<SECTIONm>?

See the attached working example - the comments and definition of dataLine
show how this is done.

This is something of a trick in pyparsing, but it is a basic characteristic
of the pyparsing recursive descent parser.

-- Paul

data="""<SYSLIB>
Sys Data
Sys-Data
asdkData
Data
<LOGLVS>
Data
Data
Data
Data
<SOME SECTION>
Data
Data
Data
Data
<NETLIST>
Data
Data
Data
Data
<NET>
"""

from pyparsing import *

# basic pyparsing version
secLabel = Suppress("<") + OneOrMore(Word(alphas)) + Suppress(">") +
LineEnd().suppress()
# need to indicate which entries are *not* valid datalines - next secLabel,
or end of string
dataLine = ~secLabel + ~StringEnd() + restOfLine + LineEnd().suppress()

# a data section is a section label, followed by zero or more data lines
section = Group(secLabel + ZeroOrMore(dataLine))

# a config data contains one or more sections
configData = OneOrMore(section)

# parse the input data and print the results
res = configData.parseString(data)
print res

# prints:
# [['SYSLIB', 'Sys Data', 'Sys-Data', 'asdkData', 'Data'], ['LOGLVS',
'Data', 'Data', 'Data', 'Data'], ['SOME', 'SECTION', 'Data', 'Data', 'Data',
'Data'], ['NETLIST', 'Data', 'Data', 'Data', 'Data'], ['NET']]

# enhanced version, constructing a ParseResults with dict-like access
# (reuses previous expression definitions)

# combine multiword keys into a single string
# - want <SOME SECTION> to return 'SOME SECTION', not
# 'SOME', 'SECTION'
def joinKeyWords(s,l,t):
return " ".join(t)
secLabel.setParseAction(joinKeyWords)
section = Group(secLabel + ZeroOrMore(dataLine))
configData = Dict(OneOrMore(section))

# parse the input data, and access the results by section name
res = configData.parseString(data)
print res
print res["SYSLIB"]
print res["SOME SECTION"]
print res.keys()

# prints:
#[['SYSLIB', 'Sys Data', 'Sys-Data', 'asdkData', 'Data'], ['LOGLVS', 'Data',
'Data', 'Data', 'Data'], ['SOME SECTION', 'Data', 'Data', 'Data', 'Data'],
['NETLIST', 'Data', 'Data', 'Data', 'Data'], ['NET']]
#['Sys Data', 'Sys-Data', 'asdkData', 'Data']
#['Data', 'Data', 'Data', 'Data']
#['LOGLVS', 'NET', 'NETLIST', 'SYSLIB', 'SOME SECTION']

Allan Zhang · Jan 17, 2006

Try this

code
=====
import re
p = re.compile(r'<SYSLIB>([^<]*)<')
s = open("file").read()
m = re.search(p, s)
if m: res = m.groups()[0]
res = res.lstrip("\n")
res = res.rstrip("\n")
print res

result:
=======
%python parser.py
Sys Data
Sys-Data
asdkData
Data
%

Thanks
Allan

ANN: pyparsing 1.5.6 released!	1	Jul 1, 2011
Need help parsing with pyparsing...	6	Oct 22, 2007
ANN: pyparsing 1.5.1 released	4	Oct 18, 2008
[ANN] pyparsing 1.5.3 released	0	Jun 25, 2010
Ann: Pyparsing 1.5.0 released	0	Jun 1, 2008
[ANN] pyparsing 2.0.1 released - compatible with Python 2.6 and later	1	Jul 20, 2013
ANN: pyparsing 1.5.2 released!	0	Apr 20, 2009
ANN: pyparsing 1.4.11 released	2	Feb 11, 2008

Parsing files -- pyparsing to the rescue?

rh0dium

Giovanni Bajo

Paul McGuire

Allan Zhang

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads