Newbie code review of parsing program Please

len · Nov 16, 2008

I have created the following program to read a text file which happens
to be a cobol filed definition. The program then outputs to a file
what is essentially a file which is a list definition which I can
later
copy and past into a python program. I will eventually expand the
program
to also output an SQL script to create a SQL file in MySQL

The program still need a little work, it does not handle the following
items
yet;

1. It does not handle OCCURS yet.
2. It does not handle REDEFINE yet.
3. GROUP structures will need work.
4. Does not create SQL script yet.

It is my anticipation that any files created out of this program may
need
manual tweeking but I have a large number of cobol file definitions
which
I may need to work with and this seemed like a better solution than
hand
typing each list definition and SQL create file script by hand.

What I would like is if some kind soul could review my code and give
me
some suggestions on how I might improve it. I think the use of
regular
expression might cut the code down or at least simplify the parsing
but
I'm just starting to read those chapters in the book

*** SAMPLE INPUT FILE ***

000100 FD SALESMEN-FILE
000200 LABEL RECORDS ARE STANDARD
000300 VALUE OF FILENAME IS "SALESMEN".
000400
000500 01 SALESMEN-RECORD.
000600 05 SALESMEN-NO PIC 9(3).
000700 05 SALESMEN-NAME PIC X(30).
000800 05 SALESMEN-TERRITORY PIC X(30).
000900 05 SALESMEN-QUOTA PIC S9(7) COMP.
001000 05 SALESMEN-1ST-BONUS PIC S9(5)V99 COMP.
001100 05 SALESMEN-2ND-BONUS PIC S9(5)V99 COMP.
001200 05 SALESMEN-3RD-BONUS PIC S9(5)V99 COMP.
001300 05 SALESMEN-4TH-BONUS PIC S9(5)V99 COMP.

*** PROGRAM CODE ***

#!/usr/bin/python

import sys

f_path = '/home/lenyel/Bruske/MCBA/Internet/'
f_name = sys.argv[1]

fd = open(f_path + f_name, 'r')

def fmtline(fieldline):
size = ''
type = ''
dec = ''
codeline = []
if fieldline.count('COMP.') > 0:
left = fieldline[3].find('(') + 1
right = fieldline[3].find(')')
num = fieldline[3][left:right].lstrip()
if fieldline[3].count('V'):
left = fieldline[3].find('V') + 1
dec = int(len(fieldline[3][left:]))
size = ((int(num) + int(dec)) / 2) + 1
else:
size = (int(num) / 2) + 1
dec = 0
type = 'Pdec'
elif fieldline[3][0] in ('X', '9'):
dec = 0
left = fieldline[3].find('(') + 1
right = fieldline[3].find(')')
size = int(fieldline[3][left:right].lstrip('0'))
if fieldline[3][0] == 'X':
type = 'Xstr'
else:
type = 'Xint'
else:
dec = 0
left = fieldline[3].find('(') + 1
right = fieldline[3].find(')')
size = int(fieldline[3][left:right].lstrip('0'))
if fieldline[3][0] == 'X':
type = 'Xint'
codeline.append(fieldline[1].replace('-', '_').replace('.',
'').lower())
codeline.append(size)
codeline.append(type)
codeline.append(dec)
return codeline

wrkfd = []
rec_len = 0

for line in fd:
if line[6] == '*': # drop comment lines
continue
newline = line.split()
if len(newline) == 1: # drop blank line
continue
newline = newline[1:]
if 'FILENAME' in newline:
filename = newline[-1].replace('"','').lower()
filename = filename.replace('.','')
output = open('/home/lenyel/Bruske/MCBA/Internet/'+filename
+'.fd', 'w')
code = filename + ' = [\n'
output.write(code)
elif newline[0].isdigit() and 'PIC' in newline:
wrkfd.append(fmtline(newline))
rec_len += wrkfd[-1][1]

fd.close()

fmtfd = []

for wrkline in wrkfd[:-1]:
fmtline = str(tuple(wrkline)) + ',\n'
output.write(fmtline)

fmtline = tuple(wrkfd[-1])
fmtline = str(fmtline) + '\n'
output.write(fmtline)

lastline = ']\n'
output.write(lastline)

lenrec = filename + '_len = ' + str(rec_len)
output.write(lenrec)

output.close()

*** RESULTING OUTPUT ***

salesmen = [
('salesmen_no', 3, 'Xint', 0),
('salesmen_name', 30, 'Xstr', 0),
('salesmen_territory', 30, 'Xstr', 0),
('salesmen_quota', 4, 'Pdec', 0),
('salesmen_1st_bonus', 4, 'Pdec', 2),
('salesmen_2nd_bonus', 4, 'Pdec', 2),
('salesmen_3rd_bonus', 4, 'Pdec', 2),
('salesmen_4th_bonus', 4, 'Pdec', 2)
]
salesmen_len = 83

If you find this code useful please feel free to use any or all of it
at your own risk.

Thanks
Len S

len · Nov 16, 2008

I have created the following program to read a text file which happens
to be a cobol filed definition. The program then outputs to a file
what is essentially a file which is a list definition which I can
later
copy and past into a python program. I will eventually expand the
program
to also output an SQL script to create a SQL file in MySQL

Click to expand...

The program still need a little work, it does not handle the following
items
yet;

Click to expand...

1. It does not handle OCCURS yet.
2. It does not handle REDEFINE yet.
3. GROUP structures will need work.
4. Does not create SQL script yet.

Click to expand...

It is my anticipation that any files created out of this program may
need
manual tweeking but I have a large number of cobol file definitions
which
I may need to work with and this seemed like a better solution than
hand
typing each list definition and SQL create file script by hand.

Click to expand...

What I would like is if some kind soul could review my code and give
me
some suggestions on how I might improve it. I think the use of
regular
expression might cut the code down or at least simplify the parsing
but
I'm just starting to read those chapters in the book

Click to expand...

*** SAMPLE INPUT FILE ***

Click to expand...

000100 FD SALESMEN-FILE
000200 LABEL RECORDS ARE STANDARD
000300 VALUE OF FILENAME IS "SALESMEN".
000400
000500 01 SALESMEN-RECORD.
000600 05 SALESMEN-NO PIC 9(3).
000700 05 SALESMEN-NAME PIC X(30)..
000800 05 SALESMEN-TERRITORY PIC X(30).
000900 05 SALESMEN-QUOTA PIC S9(7) COMP.
001000 05 SALESMEN-1ST-BONUS PIC S9(5)V99 COMP.
001100 05 SALESMEN-2ND-BONUS PIC S9(5)V99 COMP.
001200 05 SALESMEN-3RD-BONUS PIC S9(5)V99 COMP.
001300 05 SALESMEN-4TH-BONUS PIC S9(5)V99 COMP.

Click to expand...

*** PROGRAM CODE ***

import sys

Click to expand...

f_path = '/home/lenyel/Bruske/MCBA/Internet/'
f_name = sys.argv[1]

Click to expand...

fd = open(f_path + f_name, 'r')

Click to expand...

def fmtline(fieldline):
size = ''
type = ''
dec = ''
codeline = []
if fieldline.count('COMP.') > 0:
left = fieldline[3].find('(') + 1
right = fieldline[3].find(')')
num = fieldline[3][left:right].lstrip()
if fieldline[3].count('V'):
left = fieldline[3].find('V') + 1
dec = int(len(fieldline[3][left:]))
size = ((int(num) + int(dec)) / 2) + 1
else:
size = (int(num) / 2) + 1
dec = 0
type = 'Pdec'
elif fieldline[3][0] in ('X', '9'):
dec = 0
left = fieldline[3].find('(') + 1
right = fieldline[3].find(')')
size = int(fieldline[3][left:right].lstrip('0'))
if fieldline[3][0] == 'X':
type = 'Xstr'
else:
type = 'Xint'
else:
dec = 0
left = fieldline[3].find('(') + 1
right = fieldline[3].find(')')
size = int(fieldline[3][left:right].lstrip('0'))
if fieldline[3][0] == 'X':
type = 'Xint'
codeline.append(fieldline[1].replace('-', '_').replace('.',
'').lower())
codeline.append(size)
codeline.append(type)
codeline.append(dec)
return codeline

Click to expand...

wrkfd = []
rec_len = 0

Click to expand...

for line in fd:
if line[6] == '*': # drop comment lines
continue
newline = line.split()
if len(newline) == 1: # drop blank line
continue
newline = newline[1:]
if 'FILENAME' in newline:
filename = newline[-1].replace('"','').lower()
filename = filename.replace('.','')
output = open('/home/lenyel/Bruske/MCBA/Internet/'+filename
+'.fd', 'w')
code = filename + ' = [\n'
output.write(code)
elif newline[0].isdigit() and 'PIC' in newline:
wrkfd.append(fmtline(newline))
rec_len += wrkfd[-1][1]

fd.close()

Click to expand...

fmtfd = []

Click to expand...

for wrkline in wrkfd[:-1]:
fmtline = str(tuple(wrkline)) + ',\n'
output.write(fmtline)

Click to expand...

fmtline = tuple(wrkfd[-1])
fmtline = str(fmtline) + '\n'
output.write(fmtline)

Click to expand...

lastline = ']\n'
output.write(lastline)

Click to expand...

lenrec = filename + '_len = ' + str(rec_len)
output.write(lenrec)

*** RESULTING OUTPUT ***

Click to expand...

salesmen = [
('salesmen_no', 3, 'Xint', 0),
('salesmen_name', 30, 'Xstr', 0),
('salesmen_territory', 30, 'Xstr', 0),
('salesmen_quota', 4, 'Pdec', 0),
('salesmen_1st_bonus', 4, 'Pdec', 2),
('salesmen_2nd_bonus', 4, 'Pdec', 2),
('salesmen_3rd_bonus', 4, 'Pdec', 2),
('salesmen_4th_bonus', 4, 'Pdec', 2)
]
salesmen_len = 83

Click to expand...

If you find this code useful please feel free to use any or all of it
at your own risk.

Click to expand...

Thanks
Len S

Click to expand...

You might want to check out the pyparsing library.

-Mark

Thanks Mark I will check in out right now.

Len

Steve Holden · Nov 16, 2008

Mark said:
news:fc3ef718-edc4-4892-8418-3eeff0975edc@u18g2000pro.googlegroups.com... [...]

You might want to check out the pyparsing library.

And you might want to trim your messages to avoid quoting irrelevant
stuff. This is not directed personally at Mark, but at all readers.

Loads of us do it, and I wish we'd stop it. It's poor netiquette because
it forces people to skip past stuff that isn't relevant to the point
being made. It's also a global wste of bandwidth and storage space,
though that's less important than it used to be.

regards
Steve

Lawrence D'Oliveiro · Nov 17, 2008

len said:
if fieldline.count('COMP.') > 0:

I take it you're only handling a particular subset of COBOL constructs: thus, "COMP" is never "COMPUTATIONAL" or "USAGE IS COMPUTATIONAL", and it always occurs just before the full-stop (can't remember enough COBOL syntax to be sure if anything else can go afterwards).

elif newline[0].isdigit() and 'PIC' in newline:

Similarly, "PIC" is never "PICTURE" or "PICTURE IS".

Aargh, I think I have to stop. I'm remembering more than I ever wanted to about COBOL. Must ... rip ... brain ... out ...

Lawrence D'Oliveiro · Nov 17, 2008

Mark said:
Point taken...or I could top post ;^)

A: A Rolls seats six.
Q: What's the saddest thing about seeing a Rolls with five top-posters in it going over a cliff?

John Machin · Nov 17, 2008

A: A Rolls seats six.
Q: What's the saddest thing about seeing a Rolls with five top-posters in it going over a cliff?

+1 but you forgot the boot & the roof rack AND if it was a really old
one there'd be space for a few on the running boards (attached like
the Norwegian Blue parrot)

Paul McGuire · Nov 17, 2008

Thanks Mark I will check in out right now.

Len

Len -

Here is a rough pyparsing starter for your problem:

from pyparsing import *

COMP = Optional("USAGE IS") + oneOf("COMP COMPUTATIONAL")
PIC = oneOf("PIC PICTURE") + Optional("IS")
PERIOD,LPAREN,RPAREN = map(Suppress,".()")

ident = Word(alphanums.upper()+"_-")
integer = Word(nums).setParseAction(lambda t:int(t[0]))
lineNum = Suppress(Optional(LineEnd()) + LineStart() + Word(nums))

rep = LPAREN + integer + RPAREN
repchars = "X" + rep
repchars.setParseAction(lambda tokens: ['X']*tokens[1])
strdecl = Combine(OneOrMore(repchars | "X"))

SIGN = Optional("S")
repdigits = "9" + rep
repdigits.setParseAction(lambda tokens: ['9']*tokens[1])
intdecl = SIGN("sign") + Combine(OneOrMore(repdigits | "9"))
("intpart")
realdecl = SIGN("sign") + Combine(OneOrMore(repdigits | "9"))
("intpart") + "V" + \
Combine(OneOrMore("9" + rep | "9"))("realpart")

type = Group((strdecl | realdecl | intdecl) +
Optional(COMP("COMP")))

fieldDecl = lineNum + "05" + ident("name") + \
PIC + type("type") + PERIOD
structDecl = lineNum + "01" + ident("name") + PERIOD + \
OneOrMore(Group(fieldDecl))("fields")

It prints out:

SALESMEN-RECORD
SALESMEN-NO ['999']
SALESMEN-NAME ['XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX']
SALESMEN-TERRITORY ['XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX']
SALESMEN-QUOTA ['S', '9999999', 'COMP']
SALESMEN-1ST-BONUS ['S', '99999', 'V', '99', 'COMP']
SALESMEN-2ND-BONUS ['S', '99999', 'V', '99', 'COMP']
SALESMEN-3RD-BONUS ['S', '99999', 'V', '99', 'COMP']
SALESMEN-4TH-BONUS ['S', '99999', 'V', '99', 'COMP']

I too have some dim, dark, memories of COBOL. I seem to recall having
to infer from the number of digits in an integer or real what size the
number would be. I don't have that logic implemented, but here is an
extension to the above program, which shows you where you could put
this kind of type inference logic (insert this code before the call to
searchString):

class TypeDefn(object):
@staticmethod
def intType(tokens):
self = TypeDefn()
self.str = "int(%d)" % (len(tokens.intpart),)
self.isSigned = bool(tokens.sign)
return self
@staticmethod
def realType(tokens):
self = TypeDefn()
self.str = "real(%d.%d)" % (len(tokens.intpart),len
(tokens.realpart))
self.isSigned = bool(tokens.sign)
return self
@staticmethod
def charType(tokens):
self = TypeDefn()
self.str = "char(%d)" % len(tokens)
self.isSigned = False
self.isComp = False
return self
def __repr__(self):
return ("+-" if self.isSigned else "") + self.str
intdecl.setParseAction(TypeDefn.intType)
realdecl.setParseAction(TypeDefn.realType)
strdecl.setParseAction(TypeDefn.charType)

This prints:

SALESMEN-RECORD
SALESMEN-NO [int(3)]
SALESMEN-NAME [char(1)]
SALESMEN-TERRITORY [char(1)]
SALESMEN-QUOTA [+-int(7), 'COMP']
SALESMEN-1ST-BONUS [+-real(5.2), 'COMP']
SALESMEN-2ND-BONUS [+-real(5.2), 'COMP']
SALESMEN-3RD-BONUS [+-real(5.2), 'COMP']
SALESMEN-4TH-BONUS [+-real(5.2), 'COMP']

You can post more questions about pyparsing on the Discussion tab of
the pyparsing wiki home page.

Best of luck!
-- Paul

len · Nov 17, 2008

len said:
len said:

if fieldline.count('COMP.') > 0:

Click to expand...

I take it you're only handling a particular subset of COBOL constructs: thus, "COMP" is never "COMPUTATIONAL" or "USAGE IS COMPUTATIONAL", and it always occurs just before the full-stop (can't remember enough COBOL syntax to be sure if anything else can go afterwards).

elif newline[0].isdigit() and 'PIC' in newline:

Click to expand...

Similarly, "PIC" is never "PICTURE" or "PICTURE IS".

Aargh, I think I have to stop. I'm remembering more than I ever wanted to about COBOL. Must ... rip ... brain ... out ...

Most of the cobol code originally comes from packages and is
relatively consistant.

Thanks
Len

len · Nov 17, 2008

Thanks Paul

I will be going over your code today. I started looking at Pyparsing
last night
and it just got to late and my brain started to fog over. I would
really like
to thank you for taking the time to provide me with the code sample
I'm sure it
will really help. Again thank you very much.

Len

Thanks Mark I will check in out right now.

Click to expand...

Len

Click to expand...

Len -

Here is a rough pyparsing starter for your problem:

from pyparsing import *

COMP = Optional("USAGE IS") + oneOf("COMP COMPUTATIONAL")
PIC = oneOf("PIC PICTURE") + Optional("IS")
PERIOD,LPAREN,RPAREN = map(Suppress,".()")

ident = Word(alphanums.upper()+"_-")
integer = Word(nums).setParseAction(lambda t:int(t[0]))
lineNum = Suppress(Optional(LineEnd()) + LineStart() + Word(nums))

rep = LPAREN + integer + RPAREN
repchars = "X" + rep
repchars.setParseAction(lambda tokens: ['X']*tokens[1])
strdecl = Combine(OneOrMore(repchars | "X"))

SIGN = Optional("S")
repdigits = "9" + rep
repdigits.setParseAction(lambda tokens: ['9']*tokens[1])
intdecl = SIGN("sign") + Combine(OneOrMore(repdigits | "9"))
("intpart")
realdecl = SIGN("sign") + Combine(OneOrMore(repdigits | "9"))
("intpart") + "V" + \
Combine(OneOrMore("9" + rep | "9"))("realpart")

type = Group((strdecl | realdecl | intdecl) +
Optional(COMP("COMP")))

fieldDecl = lineNum + "05" + ident("name") + \
PIC + type("type") + PERIOD
structDecl = lineNum + "01" + ident("name") + PERIOD + \
OneOrMore(Group(fieldDecl))("fields")

It prints out:

SALESMEN-RECORD
SALESMEN-NO ['999']
SALESMEN-NAME ['XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX']
SALESMEN-TERRITORY ['XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX']
SALESMEN-QUOTA ['S', '9999999', 'COMP']
SALESMEN-1ST-BONUS ['S', '99999', 'V', '99', 'COMP']
SALESMEN-2ND-BONUS ['S', '99999', 'V', '99', 'COMP']
SALESMEN-3RD-BONUS ['S', '99999', 'V', '99', 'COMP']
SALESMEN-4TH-BONUS ['S', '99999', 'V', '99', 'COMP']

I too have some dim, dark, memories of COBOL. I seem to recall having
to infer from the number of digits in an integer or real what size the
number would be. I don't have that logic implemented, but here is an
extension to the above program, which shows you where you could put
this kind of type inference logic (insert this code before the call to
searchString):

class TypeDefn(object):
@staticmethod
def intType(tokens):
self = TypeDefn()
self.str = "int(%d)" % (len(tokens.intpart),)
self.isSigned = bool(tokens.sign)
return self
@staticmethod
def realType(tokens):
self = TypeDefn()
self.str = "real(%d.%d)" % (len(tokens.intpart),len
(tokens.realpart))
self.isSigned = bool(tokens.sign)
return self
@staticmethod
def charType(tokens):
self = TypeDefn()
self.str = "char(%d)" % len(tokens)
self.isSigned = False
self.isComp = False
return self
def __repr__(self):
return ("+-" if self.isSigned else "") + self.str
intdecl.setParseAction(TypeDefn.intType)
realdecl.setParseAction(TypeDefn.realType)
strdecl.setParseAction(TypeDefn.charType)

This prints:

SALESMEN-RECORD
SALESMEN-NO [int(3)]
SALESMEN-NAME [char(1)]
SALESMEN-TERRITORY [char(1)]
SALESMEN-QUOTA [+-int(7), 'COMP']
SALESMEN-1ST-BONUS [+-real(5.2), 'COMP']
SALESMEN-2ND-BONUS [+-real(5.2), 'COMP']
SALESMEN-3RD-BONUS [+-real(5.2), 'COMP']
SALESMEN-4TH-BONUS [+-real(5.2), 'COMP']

You can post more questions about pyparsing on the Discussion tab of
the pyparsing wiki home page.

Best of luck!
-- Paul

[newbie] Recursive algorithm - review	5	Jan 4, 2014
[newbie] Recursive algorithm - review	5	Jan 4, 2014
For Peer Review	1	Apr 2, 2010
Request for source code review of simple Ising model	88	Apr 10, 2014
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
Code Review Request: Newbie C programmer	15	Oct 16, 2004
please critique my thread code	1	Jun 15, 2008

Newbie code review of parsing program Please

len

len

Steve Holden

Lawrence D'Oliveiro

Lawrence D'Oliveiro

John Machin

Paul McGuire

len

len

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads