Pyparsing help

rh0dium · Mar 22, 2008

Hi all,

I am struggling with parsing the following data:

test1 = """
Technology {
name = "gtc"
dielectric = 2.75e-05
unitTimeName = "ns"
timePrecision = 1000
unitLengthName = "micron"
lengthPrecision = 1000
gridResolution = 5
unitVoltageName = "v"
voltagePrecision = 1000000
unitCurrentName = "ma"
currentPrecision = 1000
unitPowerName = "pw"
powerPrecision = 1000
unitResistanceName = "kohm"
resistancePrecision = 10000000
unitCapacitanceName = "pf"
capacitancePrecision = 10000000
unitInductanceName = "nh"
inductancePrecision = 100
}

Tile "unit" {
width = 0.22
height = 1.69
}

Layer "PRBOUNDARY" {
layerNumber = 0
maskName = ""
visible = 1
selectable = 1
blink = 0
color = "cyan"
lineStyle = "solid"
pattern = "blank"
pitch = 0
defaultWidth = 0
minWidth = 0
minSpacing = 0
}

Layer "METAL2" {
layerNumber = 36
maskName = "metal2"
isDefaultLayer = 1
visible = 1
selectable = 1
blink = 0
color = "yellow"
lineStyle = "solid"
pattern = "blank"
pitch = 0.46
defaultWidth = 0.2
minWidth = 0.2
minSpacing = 0.21
fatContactThreshold = 1.4
maxSegLenForRC = 2000
unitMinResistance = 6.1e-05
unitNomResistance = 6.3e-05
unitMaxResistance = 6.9e-05
unitMinHeightFromSub = 1.21
unitNomHeightFromSub = 1.237
unitMaxHeightFromSub = 1.267
unitMinThickness = 0.25
unitNomThickness = 0.475
unitMaxThickness = 0.75
fatTblDimension = 3
fatTblThreshold = (0,0.39,10.005)
fatTblParallelLength = (0,1,0)
fatTblSpacing = (0.21,0.24,0.6,
0.24,0.24,0.6,
0.6,0.6,0.6)
minArea = 0.144
}
"""

So it looks like starting from the inside out
I have an key and a value where the value can be a QuotedString,
Word(num), or a list of nums

So my code to catch this looks like this..

atflist = Suppress("(") + commaSeparatedList + Suppress(")")
atfstr = quotedString.setParseAction(removeQuotes)
atfvalues = ( Word(nums) | atfstr | atflist )

l = ("36", '"metal2"', '(0.21,0.24,0.6,0.24,0.24,0.6)')

for x in l:
print atfvalues.parseString(x)

But this isn't passing the list commaSeparatedList. Can someone point
out my errors?

As a side note: Is this the right approach to using pyparsing. Do we
start from the inside and work our way out or should I have started
with looking at the bigger picture ( keyword + "{" + OneOrMore key /
vals + "}" + ) I started there but could figure out how to look
multiline - I'm assuming I'd just join them all up?

Thanks

Paul McGuire · Mar 22, 2008

Hi all,

I am struggling with parsing the following data:

As a side note: Is this the right approach to using pyparsing. Do we
start from the inside and work our way out or should I have started
with looking at the bigger picture ( keyword + "{" + OneOrMore key /
vals + "}" + ) I started there but could figure out how to look
multiline - I'm assuming I'd just join them all up?

Thanks

I think your "inside-out" approach is just fine. Start by composing
expressions for the different "pieces" of your input text, then
steadily build up more and more complex forms.

I think the main complication you have is that of using
commaSeparatedList for your list of real numbers. commaSeparatedList
is a very generic helper expression. From the online example (http://
pyparsing.wikispaces.com/space/showimage/commasep.py), here is a
sample of the data that commaSeparatedList will handle:

"a,b,c,100.2,,3",
"d, e, j k , m ",
"'Hello, World', f, g , , 5.1,x",
"John Doe, 123 Main St., Cleveland, Ohio",
"Jane Doe, 456 St. James St., Los Angeles , California ",

In other words, the content of the items between commas is pretty much
anything that is *not* a comma. If you change your definition of
atflist to:

atflist = Suppress("(") + commaSeparatedList # + Suppress(")")

(that is, comment out the trailing right paren), you'll get this
successful parse result:

['0.21', '0.24', '0.6', '0.24', '0.24', '0.6)']

In your example, you are parsing a list of floating point numbers, in
a list delimited by commas, surrounded by parens. This definition of
atflist should give you more control over the parsing process, and
give you real floats to boot:

floatnum = Combine(Word(nums) + "." + Word(nums) +
Optional('e'+oneOf("+ -")+Word(nums)))
floatnum.setParseAction(lambda t:float(t[0]))
atflist = Suppress("(") + delimitedList(floatnum) + Suppress(")")

Now I get this output for your parse test:

[0.20999999999999999, 0.23999999999999999, 0.59999999999999998,
0.23999999999999999, 0.23999999999999999, 0.59999999999999998]

So you can see that this has actually parsed the numbers and converted
them to floats.

I went ahead and added support for scientific notation in floatnum,
since I see that you have several atfvalues that are standalone
floats, some using scientific notation. To add these, just expand
atfvalues to:

atfvalues = ( floatnum | Word(nums) | atfstr | atflist )

(At this point, I'll go on to show how to parse the rest of the data
structure - if you want to take a stab at it yourself, stop reading
here, and then come back to compare your results with my approach.)

To parse the overall structure, now that you have expressions for the
different component pieces, look into using Dict (or more simply using
the helper function dictOf) to define results names automagically for
you based on the attribute names in the input. Dict does *not* change
any of the parsing or matching logic, it just adds named fields in the
parsed results corresponding to the key names found in the input.

Dict is a complex pyparsing class, but dictOf simplfies things.
dictOf takes two arguments:

dictOf(keyExpression, valueExpression)

This translates to:

Dict( OneOrMore( Group(keyExpression + valueExpression) ) )

For example, to parse the lists of entries that look like:

name = "gtc"
dielectric = 2.75e-05
unitTimeName = "ns"
timePrecision = 1000
unitLengthName = "micron"
etc.

just define that this is "a dict of entries each composed of a key
consisting of a Word(alphas), followed by a suppressed '=' sign and an
atfvalues", that is:

attrDict = dictOf(Word(alphas), Suppress("=") + atfvalues)

dictOf takes care of all of the repetition and grouping necessary for
Dict to do its work. These attribute dicts are nested within an outer
main dict, which is "a dict of entries, each with a key of
Word(alphas), and a value of an optional quotedString (an alias,
perhaps?), a left brace, an attrDict, and a right brace," or:

mainDict = dictOf(
Word(alphas),
Optional(quotedString)("alias") +
Suppress("{") + attrDict + Suppress("}")
)

By adding this code to what you already have:

attrDict = dictOf(Word(alphas), Suppress("=") + atfvalues)
mainDict = dictOf(
Word(alphas),
Optional(quotedString)("alias") +
Suppress("{") + attrDict + Suppress("}")
)

You can now write:

md = mainDict.parseString(test1)
print md.dump()
print md.Layer.lineStyle

and get this output:

[['Technology', ['name', 'gtc'], ['dielectric',
2.7500000000000001e-005], ['unitTimeName', 'ns'], ['timePrecision',
'1000'], ['unitLengthName', 'micron'], ['lengthPrecision', '1000'],
['gridResolution', '5'], ['unitVoltageName', 'v'],
['voltagePrecision', '1000000'], ['unitCurrentName', 'ma'],
['currentPrecision', '1000'], ['unitPowerName', 'pw'],
['powerPrecision', '1000'], ['unitResistanceName', 'kohm'],
['resistancePrecision', '10000000'], ['unitCapacitanceName', 'pf'],
['capacitancePrecision', '10000000'], ['unitInductanceName', 'nh'],
['inductancePrecision', '100']], ['Tile', 'unit', ['width', 0.22],
['height', 1.6899999999999999]], ['Layer', 'PRBOUNDARY',
['layerNumber', '0'], ['maskName', ''], ['visible', '1'],
['selectable', '1'], ['blink', '0'], ['color', 'cyan'], ['lineStyle',
'solid'], ['pattern', 'blank'], ['pitch', '0'], ['defaultWidth', '0'],
['minWidth', '0'], ['minSpacing', '0']]]
- Layer: ['PRBOUNDARY', ['layerNumber', '0'], ['maskName', ''],
['visible', '1'], ['selectable', '1'], ['blink', '0'], ['color',
'cyan'], ['lineStyle', 'solid'], ['pattern', 'blank'], ['pitch', '0'],
['defaultWidth', '0'], ['minWidth', '0'], ['minSpacing', '0']]
- alias: PRBOUNDARY
- blink: 0
- color: cyan
- defaultWidth: 0
- layerNumber: 0
- lineStyle: solid
- maskName:
- minSpacing: 0
- minWidth: 0
- pattern: blank
- pitch: 0
- selectable: 1
- visible: 1
- Technology: [['name', 'gtc'], ['dielectric',
2.7500000000000001e-005], ['unitTimeName', 'ns'], ['timePrecision',
'1000'], ['unitLengthName', 'micron'], ['lengthPrecision', '1000'],
['gridResolution', '5'], ['unitVoltageName', 'v'],
['voltagePrecision', '1000000'], ['unitCurrentName', 'ma'],
['currentPrecision', '1000'], ['unitPowerName', 'pw'],
['powerPrecision', '1000'], ['unitResistanceName', 'kohm'],
['resistancePrecision', '10000000'], ['unitCapacitanceName', 'pf'],
['capacitancePrecision', '10000000'], ['unitInductanceName', 'nh'],
['inductancePrecision', '100']]
- capacitancePrecision: 10000000
- currentPrecision: 1000
- dielectric: 2.75e-005
- gridResolution: 5
- inductancePrecision: 100
- lengthPrecision: 1000
- name: gtc
- powerPrecision: 1000
- resistancePrecision: 10000000
- timePrecision: 1000
- unitCapacitanceName: pf
- unitCurrentName: ma
- unitInductanceName: nh
- unitLengthName: micron
- unitPowerName: pw
- unitResistanceName: kohm
- unitTimeName: ns
- unitVoltageName: v
- voltagePrecision: 1000000
- Tile: ['unit', ['width', 0.22], ['height', 1.6899999999999999]]
- alias: unit
- height: 1.69
- width: 0.22
solid

Cheers!
-- Paul

Paul McGuire · Mar 23, 2008

Oof, I see that you have multiple "Layer" entries, with different
qualifying labels. Since the dicts use "Layer" as the key, you only
get the last "Layer" value, with qualifier "PRBOUNDARY", and lose the
"Layer" for "METAL2". To fix this, you'll have to move the optional
alias term to the key, and merge "Layer" and "PRBOUNDARY" into a
single key, perhaps "Layer/PRBOUNDARY" or "Layer(PRBOUNDARY)" - a
parse action should take care of this for you. Unfortnately, these
forms will not allow you to use object attribute form
(md.Layer.lineStyle), you will have to use dict access form
(md["Layer(PRBOUNDARY)"].lineStyle), since these keys have characters
that are not valid attribute name characters.

Or you could add one more level of Dict nesting to your grammar, to
permit access like "md.Layer.PRBOUNDARY.lineStyle".

-- Paul

rh0dium · Mar 23, 2008

Oof, I see that you have multiple "Layer" entries, with different
qualifying labels. Since the dicts use "Layer" as the key, you only
get the last "Layer" value, with qualifier "PRBOUNDARY", and lose the
"Layer" for "METAL2". To fix this, you'll have to move the optional
alias term to the key, and merge "Layer" and "PRBOUNDARY" into a
single key, perhaps "Layer/PRBOUNDARY" or "Layer(PRBOUNDARY)" - a
parse action should take care of this for you. Unfortnately, these
forms will not allow you to use object attribute form
(md.Layer.lineStyle), you will have to use dict access form
(md["Layer(PRBOUNDARY)"].lineStyle), since these keys have characters
that are not valid attribute name characters.

Or you could add one more level of Dict nesting to your grammar, to
permit access like "md.Layer.PRBOUNDARY.lineStyle".

-- Paul

OK - We'll I got as far as you did but I did it a bit differently..
Then I merged some of your data with my data. But Now I am at the
point of adding another level of the dict and am struggling.. Here is
what I have..

# parse actions
LPAR = Literal("(")
RPAR = Literal(")")
LBRACE = Literal("{")
RBRACE = Literal("}")
EQUAL = Literal("=")

# This will get the values all figured out..
# "metal2" 1 6.05E-05 30
cvtInt = lambda toks: int(toks[0])
cvtReal = lambda toks: float(toks[0])

integer = Combine(Optional(oneOf("+ -")) + Word(nums))\
.setParseAction( cvtInt )
real = Combine(Optional(oneOf("+ -")) + Word(nums) + "." +
Optional(Word(nums)) +
Optional(oneOf("e E")+Optional(oneOf("+ -"))
+Word(nums)))\
.setParseAction( cvtReal )
atfstr = quotedString.setParseAction(removeQuotes)
atflist = Group( LPAR.suppress() +
delimitedList(real, ",") +
RPAR.suppress() )

atfvalues = ( real | integer | atfstr | atflist )

# Now this should work out a single line inside a section
# maskName = "metal2"
# isDefaultLayer = 1
# visible = 1
# fatTblSpacing = (0.21,0.24,0.6,
# 0.6,0.6,0.6)
# minArea = 0.144
atfkeys = Word(alphanums)
attrDict = dictOf( atfkeys , EQUAL.suppress() + atfvalues)

# Now we need to take care of the "Metal2" { one or more
attrDict }
# "METAL2" {
# layerNumber = 36
# maskName = "metal2"
# isDefaultLayer = 1
# visible = 1
# fatTblSpacing =
(0.21,0.24,0.6,
#
0.24,0.24,0.6,
#
0.6,0.6,0.6)
# minArea = 0.144
# }
attrType = dictOf(atfstr, LBRACE.suppress() + attrDict +
RBRACE.suppress())

# Lastly we need to get the ones without attributes (Technology)
attrType2 = LBRACE.suppress() + attrDict + RBRACE.suppress()
mainDict = dictOf(atfkeys, attrType2 | attrType )

md = mainDict.parseString(test1)

But I too am only getting the last layer. I thought if broke out the
"alias" area and then built on that I'd be set but I did something
wrong.

Paul McGuire · Mar 23, 2008

There are a couple of bugs in our program so far.

First of all, our grammar isn't parsing the METAL2 entry at all. We
should change this line:

md = mainDict.parseString(test1)

to

md = (mainDict+stringEnd).parseString(test1)

The parser is reading as far as it can, but then stopping once
successful parsing is no longer possible. Since there is at least one
valid entry matching the OneOrMore expression, then parseString raises
no errors. By adding "+stringEnd" to our expression to be parsed, we
are saying "once parsing is finished, we should be at the end of the
input string". By making this change, we now get this parse
exception:

pyparsing.ParseException: Expected stringEnd (at char 1948), (line:54,
col:1)

So what is the matter with the METAL2 entries? After using brute
force "divide and conquer" (I deleted half of the entries and got a
successful parse, then restored half of the entries I removed, until I
added back the entry that caused the parse to fail), I found these
lines in the input:

fatTblThreshold = (0,0.39,10.005)
fatTblParallelLength = (0,1,0)

Both of these violate the atflist definition, because they contain
integers, not just floatnums. So we need to expand the definition of
aftlist:

floatnum = Combine(Word(nums) + "." + Word(nums) +
Optional('e'+oneOf("+ -")+Word(nums)))
floatnum.setParseAction(lambda t:float(t[0]))
integer = Word(nums).setParseAction(lambda t:int(t[0]))
atflist = Suppress("(") + delimitedList(floatnum|integer) + \
Suppress(")")

Then we need to tackle the issue of adding nesting for those entries
that have sub-keys. This is actually kind of tricky for your data
example, because nesting within Dict expects input data to be nested.
That is, nesting Dict's is normally done with data that is input like:

main
Technology
Layer
PRBOUNDARY
METAL2
Tile
unit

But your data is structured slightly differently:

main
Technology
Layer PRBOUNDARY
Layer METAL2
Tile unit

Because Layer is repeated, the second entry creates a new node named
"Layer" at the second level, and the first "Layer" entry is lost. To
fix this, we need to combine Layer and the layer id into a composite-
type of key. I did this by using Group, and adding the Optional alias
(which I see now is a poor name, "layerId" would be better) as a
second element of the key:

mainDict = dictOf(
Group(Word(alphas)+Optional(quotedString)),
Suppress("{") + attrDict + Suppress("}")
)

But now if we parse the input with this mainDict, we see that the keys
are no longer nice simple strings, but they are 1- or 2-element
ParseResults objects. Here is what I get from the command "print
md.keys()":

[(['Technology'], {}), (['Tile', 'unit'], {}), (['Layer',
'PRBOUNDARY'], {}), (['Layer', 'METAL2'], {})]

So to finally clear this up, we need one more parse action, attached
to the mainDict expression, that rearranges the subdicts using the
elements in the keys. The parse action looks like this, and it will
process the overall parse results for the entire data structure:

def rearrangeSubDicts(toks):
# iterate over all key-value pairs in the dict
for key,value in toks.items():
# key is of the form ['name'] or ['name', 'name2']
# and the value is the attrDict

# if key has just one element, use it to define
# a simple string key
if len(key)==1:
toks[key[0]] = value
else:
# if the key has two elements, create a
# subnode with the first element
if key[0] not in toks:
toks[key[0]] = ParseResults([])

# add an entry for the second key element
toks[key[0]][key[1]] = value

# now delete the original key that is the form
# ['name'] or ['name', 'name2']
del toks[key]

It looks a bit messy, but the point is to modify the tokens in place,
by rearranging the attrdicts to nodes with simple string keys, instead
of keys nested in structures.

Lastly, we attach the parse action in the usual way:

mainDict.setParseAction(rearrangeSubDicts)

Now you can access the fields of the different layers as:

print md.Layer.METAL2.lineStyle

I guess this all looks pretty convoluted. You might be better off
just doing your own Group'ing, and then navigating the nested lists to
build your own dict or other data structure.

-- Paul

Arnaud Delobelle · Mar 23, 2008

Hi all,
Hi,

I am struggling with parsing the following data:

test1 = """
Technology {
name = "gtc"
dielectric = 2.75e-05

[...]

I know it's cheating, but the grammar of your example is actually
quite simple and the values are valid python expressions, so here is a
solution without pyparsing (or regexps, for that matter). *WARNING*
it uses the exec statement.

from textwrap import dedent

def parse(txt):
globs, parsed = {}, {}
units = txt.strip().split('}')[:-1]
for unit in units:
label, params = unit.split('{')
paramdict = {}
exec dedent(params) in globs, paramdict
try:
label, key = label.split()
parsed.setdefault(label, {})[eval(key)] = paramdict
except ValueError:
parsed[label.strip()] = paramdict
return parsed

p = parse(test1)
p['Layer']['PRBOUNDARY']

Click to expand...

Click to expand...

{'maskName': '', 'defaultWidth': 0, 'color': 'cyan', 'pattern':
'blank', 'layerNumber': 0, 'minSpacing': 0, 'blink': 0, 'minWidth': 0,
'visible': 1, 'pitch': 0, 'selectable': 1, 'lineStyle': 'solid'}

p['Layer']['METAL2']['maskName'] 'metal2'
p['Technology']['gridResolution'] 5

Click to expand...

Click to expand...

HTH

rh0dium · Mar 23, 2008

There are a couple of bugs in our program so far.

First of all, our grammar isn't parsing the METAL2 entry at all. We
should change this line:

md = mainDict.parseString(test1)

to

md = (mainDict+stringEnd).parseString(test1)

The parser is reading as far as it can, but then stopping once
successful parsing is no longer possible. Since there is at least one
valid entry matching the OneOrMore expression, then parseString raises
no errors. By adding "+stringEnd" to our expression to be parsed, we
are saying "once parsing is finished, we should be at the end of the
input string". By making this change, we now get this parse
exception:

pyparsing.ParseException: Expected stringEnd (at char 1948), (line:54,
col:1)

So what is the matter with the METAL2 entries? After using brute
force "divide and conquer" (I deleted half of the entries and got a
successful parse, then restored half of the entries I removed, until I
added back the entry that caused the parse to fail), I found these
lines in the input:

fatTblThreshold = (0,0.39,10.005)
fatTblParallelLength = (0,1,0)

Both of these violate the atflist definition, because they contain
integers, not just floatnums. So we need to expand the definition of
aftlist:

floatnum = Combine(Word(nums) + "." + Word(nums) +
Optional('e'+oneOf("+ -")+Word(nums)))
floatnum.setParseAction(lambda t:float(t[0]))
integer = Word(nums).setParseAction(lambda t:int(t[0]))
atflist = Suppress("(") + delimitedList(floatnum|integer) + \
Suppress(")")

Then we need to tackle the issue of adding nesting for those entries
that have sub-keys. This is actually kind of tricky for your data
example, because nesting within Dict expects input data to be nested.
That is, nesting Dict's is normally done with data that is input like:

main
Technology
Layer
PRBOUNDARY
METAL2
Tile
unit

But your data is structured slightly differently:

main
Technology
Layer PRBOUNDARY
Layer METAL2
Tile unit

Because Layer is repeated, the second entry creates a new node named
"Layer" at the second level, and the first "Layer" entry is lost. To
fix this, we need to combine Layer and the layer id into a composite-
type of key. I did this by using Group, and adding the Optional alias
(which I see now is a poor name, "layerId" would be better) as a
second element of the key:

mainDict = dictOf(
Group(Word(alphas)+Optional(quotedString)),
Suppress("{") + attrDict + Suppress("}")
)

But now if we parse the input with this mainDict, we see that the keys
are no longer nice simple strings, but they are 1- or 2-element
ParseResults objects. Here is what I get from the command "print
md.keys()":

[(['Technology'], {}), (['Tile', 'unit'], {}), (['Layer',
'PRBOUNDARY'], {}), (['Layer', 'METAL2'], {})]

So to finally clear this up, we need one more parse action, attached
to the mainDict expression, that rearranges the subdicts using the
elements in the keys. The parse action looks like this, and it will
process the overall parse results for the entire data structure:

def rearrangeSubDicts(toks):
# iterate over all key-value pairs in the dict
for key,value in toks.items():
# key is of the form ['name'] or ['name', 'name2']
# and the value is the attrDict

# if key has just one element, use it to define
# a simple string key
if len(key)==1:
toks[key[0]] = value
else:
# if the key has two elements, create a
# subnode with the first element
if key[0] not in toks:
toks[key[0]] = ParseResults([])

# add an entry for the second key element
toks[key[0]][key[1]] = value

# now delete the original key that is the form
# ['name'] or ['name', 'name2']
del toks[key]

It looks a bit messy, but the point is to modify the tokens in place,
by rearranging the attrdicts to nodes with simple string keys, instead
of keys nested in structures.

Lastly, we attach the parse action in the usual way:

mainDict.setParseAction(rearrangeSubDicts)

Now you can access the fields of the different layers as:

print md.Layer.METAL2.lineStyle

I guess this all looks pretty convoluted. You might be better off
just doing your own Group'ing, and then navigating the nested lists to
build your own dict or other data structure.

-- Paul

Hi Paul,

Before I continue this I must thank you for your help. You really did
do an outstanding job on this code and it is really straight forward
to use and learn from. This was a fun weekend task and I really
wanted to use pyparsing to do it. Because this is one of several type
of files I want to parse. I (as I'm sure you would agree) think the
rearrangeSubDicts is a bit of a hack but never the less absolutely
required and due to the limitations of the data I am parsing. Once
again thanks for your great help. Now the problem..

I attempted to use this code on another testcase. This testcase had
tabs in it. I think 1.4.11 is missing the expandtabs attribute. I
ran my code (which had tabs) and I got this..

AttributeError: 'builtin_function_or_method' object has no attribute
'expandtabs'

Ugh oh. Is this a pyparsing problem or am I just an idiot..

Thanks again!

rh0dium · Mar 23, 2008

There are a couple of bugs in our program so far.

Click to expand...

First of all, our grammar isn't parsing the METAL2 entry at all. We
should change this line:

Click to expand...

md = mainDict.parseString(test1)

md = (mainDict+stringEnd).parseString(test1)

Click to expand...

The parser is reading as far as it can, but then stopping once
successful parsing is no longer possible. Since there is at least one
valid entry matching the OneOrMore expression, then parseString raises
no errors. By adding "+stringEnd" to our expression to be parsed, we
are saying "once parsing is finished, we should be at the end of the
input string". By making this change, we now get this parse
exception:

Click to expand...

pyparsing.ParseException: Expected stringEnd (at char 1948), (line:54,
col:1)

Click to expand...

So what is the matter with the METAL2 entries? After using brute
force "divide and conquer" (I deleted half of the entries and got a
successful parse, then restored half of the entries I removed, until I
added back the entry that caused the parse to fail), I found these
lines in the input:

Click to expand...

fatTblThreshold = (0,0.39,10.005)
fatTblParallelLength = (0,1,0)

Click to expand...

Both of these violate the atflist definition, because they contain
integers, not just floatnums. So we need to expand the definition of
aftlist:

Click to expand...

floatnum = Combine(Word(nums) + "." + Word(nums) +
Optional('e'+oneOf("+ -")+Word(nums)))
floatnum.setParseAction(lambda t:float(t[0]))
integer = Word(nums).setParseAction(lambda t:int(t[0]))
atflist = Suppress("(") + delimitedList(floatnum|integer) + \
Suppress(")")

Click to expand...

Then we need to tackle the issue of adding nesting for those entries
that have sub-keys. This is actually kind of tricky for your data
example, because nesting within Dict expects input data to be nested.
That is, nesting Dict's is normally done with data that is input like:

Click to expand...

main
Technology
Layer
PRBOUNDARY
METAL2
Tile
unit

Click to expand...

But your data is structured slightly differently:

Click to expand...

main
Technology
Layer PRBOUNDARY
Layer METAL2
Tile unit

Click to expand...

Because Layer is repeated, the second entry creates a new node named
"Layer" at the second level, and the first "Layer" entry is lost. To
fix this, we need to combine Layer and the layer id into a composite-
type of key. I did this by using Group, and adding the Optional alias
(which I see now is a poor name, "layerId" would be better) as a
second element of the key:

Click to expand...

mainDict = dictOf(
Group(Word(alphas)+Optional(quotedString)),
Suppress("{") + attrDict + Suppress("}")
)

Click to expand...

But now if we parse the input with this mainDict, we see that the keys
are no longer nice simple strings, but they are 1- or 2-element
ParseResults objects. Here is what I get from the command "print
md.keys()":

Click to expand...

[(['Technology'], {}), (['Tile', 'unit'], {}), (['Layer',
'PRBOUNDARY'], {}), (['Layer', 'METAL2'], {})]

Click to expand...

So to finally clear this up, we need one more parse action, attached
to the mainDict expression, that rearranges the subdicts using the
elements in the keys. The parse action looks like this, and it will
process the overall parse results for the entire data structure:

Click to expand...

def rearrangeSubDicts(toks):
# iterate over all key-value pairs in the dict
for key,value in toks.items():
# key is of the form ['name'] or ['name', 'name2']
# and the value is the attrDict

Click to expand...

# if key has just one element, use it to define
# a simple string key
if len(key)==1:
toks[key[0]] = value
else:
# if the key has two elements, create a
# subnode with the first element
if key[0] not in toks:
toks[key[0]] = ParseResults([])

Click to expand...

# add an entry for the second key element
toks[key[0]][key[1]] = value

Click to expand...

# now delete the original key that is the form
# ['name'] or ['name', 'name2']
del toks[key]

Click to expand...

It looks a bit messy, but the point is to modify the tokens in place,
by rearranging the attrdicts to nodes with simple string keys, instead
of keys nested in structures.

Click to expand...

Lastly, we attach the parse action in the usual way:

Click to expand...

mainDict.setParseAction(rearrangeSubDicts)

Click to expand...

Now you can access the fields of the different layers as:

Click to expand...

print md.Layer.METAL2.lineStyle

Click to expand...

I guess this all looks pretty convoluted. You might be better off
just doing your own Group'ing, and then navigating the nested lists to
build your own dict or other data structure.

Click to expand...

-- Paul

Click to expand...

Hi Paul,

Before I continue this I must thank you for your help. You really did
do an outstanding job on this code and it is really straight forward
to use and learn from. This was a fun weekend task and I really
wanted to use pyparsing to do it. Because this is one of several type
of files I want to parse. I (as I'm sure you would agree) think the
rearrangeSubDicts is a bit of a hack but never the less absolutely
required and due to the limitations of the data I am parsing. Once
again thanks for your great help. Now the problem..

I attempted to use this code on another testcase. This testcase had
tabs in it. I think 1.4.11 is missing the expandtabs attribute. I
ran my code (which had tabs) and I got this..

AttributeError: 'builtin_function_or_method' object has no attribute
'expandtabs'

Ugh oh. Is this a pyparsing problem or am I just an idiot..

Thanks again!

Doh!! Nevermind I am an idiot. Nope I got it what a bonehead..

I needed to tweak it a bit to ignore the comments.. Namely this fixed
it up..

mainDict = dictOf(
Group(Word(alphas)+Optional(quotedString)),
Suppress("{") + attrDict + Suppress("}")
) | cStyleComment.suppress()

Thanks again. Now I just need to figure out how to use your dicts to
do some work..

Francesco Bochicchio · Mar 24, 2008

Il Sat, 22 Mar 2008 14:11:16 -0700, rh0dium ha scritto:

Hi all,

I am struggling with parsing the following data:

test1 = """
Technology {
name = "gtc" dielectric
= 2.75e-05 unitTimeName
= "ns" timePrecision = 1000
unitLengthName = "micron"
lengthPrecision = 1000 gridResolution
= 5
unitVoltageName = "v" voltagePrecision
= 1000000 unitCurrentName =
"ma" currentPrecision = 1000
unitPowerName = "pw" powerPrecision
= 1000 unitResistanceName =
"kohm" resistancePrecision = 10000000
unitCapacitanceName = "pf"
capacitancePrecision = 10000000
unitInductanceName = "nh"
inductancePrecision = 100
}

Tile "unit" {
width = 0.22 height
= 1.69
}

Did you think of using something a bit more sofisticated than pyparsing?
I have had a good experience to using ply, a pure-python implementation
of yacc/lex tools, which I used to extract significant data from C
programs to automatize documentation.

I never used before yacc or similar tools, but having a bit of experience
with BNF notation, I found ply easy enough. In my case, the major problem
was to cope with yacc limitation in describing C syntax (which I solved
by "oelaxing" the rules a bit, since I was going to process only already-
compiled C code). In your much simpler case, I'd say that a few
production rules should be enough.

P.S : there are others, faster and maybe more complete python parser, but
as I said ply is pure python: no external libraries and runs everywhere.

Ciao

Paul McGuire · Mar 24, 2008

I needed to tweak it a bit to ignore the comments.. Namely this fixed
it up..

mainDict = dictOf(
Group(Word(alphas)+Optional(quotedString)),
Suppress("{") + attrDict + Suppress("}")
) | cStyleComment.suppress()

Thanks again. Now I just need to figure out how to use your dicts to
do some work..- Hide quoted text -

- Show quoted text -

I'm glad this is coming around to some reasonable degree of completion
for you. One last thought - your handling of comments is a bit crude,
and will not handle comments that crop up in the middle of dict
entries, as in:

color = /* using non-standard color during testing */
"plum"

The more comprehensive way to handle comments is to call ignore.
Using ignore will propagate the comment handling to all embedded
expressions, so you only need to call ignore once on the top-most
pyparsing expression, as in:

mainDict.ignore(cStyleComment)

Also, ignore does token suppression automatically.

-- Paul

Problem using Optional pyparsing	2	Aug 16, 2007
More fun with PyParsing - almost did it on my own..	1	May 15, 2008
pyparsing listAllMatches problem	2	Sep 9, 2006
comp.lang.vhdl FAQ part 1 of 4: general	0	Jul 8, 2003

Pyparsing help

rh0dium

Paul McGuire

Paul McGuire

rh0dium

Paul McGuire

Arnaud Delobelle

rh0dium

rh0dium

Francesco Bochicchio

Paul McGuire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads