Pyparsing...

R

Raoul

I am futzing with pyparsing and for the most part enjoying it.
However, I'm running into trouble with whitespace delimited lists. I
get data in blocks like this:

[QC1]
Type=15
NumberCells=1925
CellHeader=X Y PROBE PLEN ATOM INDEX
Cell1=132 0 N 25 0 132
Cell2=652 0 N 25 0 652
Cell3=648 0 N 25 0 648
....

I'd like to be able to parse this structure.

Ideally, I'd like for a QC node, to have a dictionary with
{'number':1
'type' : 15
'NumberCells' : 1925
'Table' : [{'cell':1,'x':132,'y':0,'probe':25,'plen':0,'atom':132',
'index':None}, {'cell':2 ....

I'm running into the following problems:

1. I can't seem to use delimitList() to define a rule that parses the
right hand side of the table into
['x','y','probe','plen','atom','index']. I think it's because my lists
are whitespace delimited.

2. I can't seem to convert value into an integer, for example, I can
parse each row in the table to :
['Cell','2','=', '652 0 N 25 0 652']
but am unable to get the setParseAction(see below) to convert and
substitute in the right value.

Any hints will help a great deal. Thanks...

Raoul-Sam


I have some ugly non functional code below..

def cdffile_BNF():
global cdfbnf

if not cdfbnf:
makeint = Word(nums).setParseAction( lambda s,l,t:[int(t[0])])
equals = Literal("=").suppress()
nonequals = "".join( [ c for c in printables if c != "=" ] ) +
" \t"

key = Word(nonequals)
value = Word(nonequals)
kvp = Group(key + equals + restOfLine)
kvpBlk = OneOrMore(kvp)

headerCell = delimitedList(Word(alphanums)," ")
rowHeader = Combine( Literal("CellHeader") + equals +
headerCell)
row = Combine(Literal("Cell").suppress() + restOfLine)
rows = OneOrMore(row)

CDF = Literal("[CDF]")
CDFBlk = Group(CDF + kvpBlk)

CHIP = Literal("[CHIP]")
CHIPBlk = Group(CHIP + kvpBlk)
CHIPBlk.setResultsName("chip")

QC = Combine( Literal("[QC").suppress() + Word(nums) +
Literal("]").suppress())
QCBlk = QC + kvp + kvp + rowHeader + rows

cdfbnf = CDFBlk + CHIPBlk + QCBlk

return cdfbnf
 
L

Larry Bates

Note: The Cellheader layout and the cell data
layout don't appear to match properly (when
compared to the data you show in your sample
dictionary). My solution follows the cellheader
layout.

The format of your data file is perfect to be
parsed with ConfigParser with [QC1] as section
name and Type, NumberCells, etc. as options.

import ConfigParser

inputfilename='data.ini' # Insert input filename
INI=ConfigParser.ConfigParser()
INI.read(inputfilename)
data={'type': None, 'numbercells': None, 'table':{}}

section='QC1'
option='type'
try: data['type']=INI.getint(section, option)
except:
#
# Insert code to handle missing type option
#
pass

option='numbercells'
try: data['numbercells']=INI.getint(section, option)
except:
#
# Insert code to handle missing numbercells option
#
pass

option='cellheader'
try: data['cellheader']=INI.get(section, option)
except:
#
# Insert code to handle missing numbercells option
#
pass

CELLS=[x for x in INI.options(section)
if x.startswith('cell')]

#
# Must get rid of 'cellheader' or maybe change the key name?
#
CELLS=[x for x in CELLS if x != 'cellheader']

celldatalist=[]
for CELL in CELLS:
celldata={}
x, y, probe, plen, atom, index=INI.get(section, CELL).split(' ')
celldata['cell']=int(CELL[4:])
celldata['x']=int(x)
celldata['y']=int(y)
celldata['plen']=plen
celldata['atom']=int(atom)
celldata['index']=int(index)
celldatalist.append(celldata)

data['table']=celldatalist


This is tested so it should be close (I do this quite a lot in
my code).

You could wrap a loop around the outside of this if you have
multiple QC instances.

Hope it helps.
Larry Bates



Raoul said:
I am futzing with pyparsing and for the most part enjoying it.
However, I'm running into trouble with whitespace delimited lists. I
get data in blocks like this:

[QC1]
Type=15
NumberCells=1925
CellHeader=X Y PROBE PLEN ATOM INDEX
Cell1=132 0 N 25 0 132
Cell2=652 0 N 25 0 652
Cell3=648 0 N 25 0 648
...

I'd like to be able to parse this structure.

Ideally, I'd like for a QC node, to have a dictionary with
{'number':1
'type' : 15
'NumberCells' : 1925
'Table' : [{'cell':1,'x':132,'y':0,'probe':25,'plen':0,'atom':132',
'index':None}, {'cell':2 ....

I'm running into the following problems:

1. I can't seem to use delimitList() to define a rule that parses the
right hand side of the table into
['x','y','probe','plen','atom','index']. I think it's because my lists
are whitespace delimited.

2. I can't seem to convert value into an integer, for example, I can
parse each row in the table to :
['Cell','2','=', '652 0 N 25 0 652']
but am unable to get the setParseAction(see below) to convert and
substitute in the right value.

Any hints will help a great deal. Thanks...

Raoul-Sam


I have some ugly non functional code below..

def cdffile_BNF():
global cdfbnf

if not cdfbnf:
makeint = Word(nums).setParseAction( lambda s,l,t:[int(t[0])])
equals = Literal("=").suppress()
nonequals = "".join( [ c for c in printables if c != "=" ] ) +
" \t"

key = Word(nonequals)
value = Word(nonequals)
kvp = Group(key + equals + restOfLine)
kvpBlk = OneOrMore(kvp)

headerCell = delimitedList(Word(alphanums)," ")
rowHeader = Combine( Literal("CellHeader") + equals +
headerCell)
row = Combine(Literal("Cell").suppress() + restOfLine)
rows = OneOrMore(row)

CDF = Literal("[CDF]")
CDFBlk = Group(CDF + kvpBlk)

CHIP = Literal("[CHIP]")
CHIPBlk = Group(CHIP + kvpBlk)
CHIPBlk.setResultsName("chip")

QC = Combine( Literal("[QC").suppress() + Word(nums) +
Literal("]").suppress())
QCBlk = QC + kvp + kvp + rowHeader + rows

cdfbnf = CDFBlk + CHIPBlk + QCBlk

return cdfbnf
 
R

Raoul

Larry,

This is an elegant little python! Thanks for you help. I am actually
trying to use pyparsing, not just in my project but as a learning
experience. I think pyparsing has a lot of potential.

I'm making progress but I am running into bugs with my pyparsing
grammar.

R-S

Larry Bates said:
Note: The Cellheader layout and the cell data
layout don't appear to match properly (when
compared to the data you show in your sample
dictionary). My solution follows the cellheader
layout.

The format of your data file is perfect to be
parsed with ConfigParser with [QC1] as section
name and Type, NumberCells, etc. as options.

import ConfigParser

inputfilename='data.ini' # Insert input filename
INI=ConfigParser.ConfigParser()
INI.read(inputfilename)
data={'type': None, 'numbercells': None, 'table':{}}

section='QC1'
option='type'
try: data['type']=INI.getint(section, option)
except:
#
# Insert code to handle missing type option
#
pass

option='numbercells'
try: data['numbercells']=INI.getint(section, option)
except:
#
# Insert code to handle missing numbercells option
#
pass

option='cellheader'
try: data['cellheader']=INI.get(section, option)
except:
#
# Insert code to handle missing numbercells option
#
pass

CELLS=[x for x in INI.options(section)
if x.startswith('cell')]

#
# Must get rid of 'cellheader' or maybe change the key name?
#
CELLS=[x for x in CELLS if x != 'cellheader']

celldatalist=[]
for CELL in CELLS:
celldata={}
x, y, probe, plen, atom, index=INI.get(section, CELL).split(' ')
celldata['cell']=int(CELL[4:])
celldata['x']=int(x)
celldata['y']=int(y)
celldata['plen']=plen
celldata['atom']=int(atom)
celldata['index']=int(index)
celldatalist.append(celldata)

data['table']=celldatalist


This is tested so it should be close (I do this quite a lot in
my code).

You could wrap a loop around the outside of this if you have
multiple QC instances.

Hope it helps.
Larry Bates



Raoul said:
I am futzing with pyparsing and for the most part enjoying it.
However, I'm running into trouble with whitespace delimited lists. I
get data in blocks like this:

[QC1]
Type=15
NumberCells=1925
CellHeader=X Y PROBE PLEN ATOM INDEX
Cell1=132 0 N 25 0 132
Cell2=652 0 N 25 0 652
Cell3=648 0 N 25 0 648
...

I'd like to be able to parse this structure.

Ideally, I'd like for a QC node, to have a dictionary with
{'number':1
'type' : 15
'NumberCells' : 1925
'Table' : [{'cell':1,'x':132,'y':0,'probe':25,'plen':0,'atom':132',
'index':None}, {'cell':2 ....

I'm running into the following problems:

1. I can't seem to use delimitList() to define a rule that parses the
right hand side of the table into
['x','y','probe','plen','atom','index']. I think it's because my lists
are whitespace delimited.

2. I can't seem to convert value into an integer, for example, I can
parse each row in the table to :
['Cell','2','=', '652 0 N 25 0 652']
but am unable to get the setParseAction(see below) to convert and
substitute in the right value.

Any hints will help a great deal. Thanks...

Raoul-Sam


I have some ugly non functional code below..

def cdffile_BNF():
global cdfbnf

if not cdfbnf:
makeint = Word(nums).setParseAction( lambda s,l,t:[int(t[0])])
equals = Literal("=").suppress()
nonequals = "".join( [ c for c in printables if c != "=" ] ) +
" \t"

key = Word(nonequals)
value = Word(nonequals)
kvp = Group(key + equals + restOfLine)
kvpBlk = OneOrMore(kvp)

headerCell = delimitedList(Word(alphanums)," ")
rowHeader = Combine( Literal("CellHeader") + equals +
headerCell)
row = Combine(Literal("Cell").suppress() + restOfLine)
rows = OneOrMore(row)

CDF = Literal("[CDF]")
CDFBlk = Group(CDF + kvpBlk)

CHIP = Literal("[CHIP]")
CHIPBlk = Group(CHIP + kvpBlk)
CHIPBlk.setResultsName("chip")

QC = Combine( Literal("[QC").suppress() + Word(nums) +
Literal("]").suppress())
QCBlk = QC + kvp + kvp + rowHeader + rows

cdfbnf = CDFBlk + CHIPBlk + QCBlk

return cdfbnf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top