How to read files written with COBOL

B

Batista, Facundo

People:

I'm trying to convert my father from using COBOL to Python, :)

One difficult thing we stuck into is how to read, from python, files written
with COBOL.

Do you know a module that allows me to do that?

It should avoid us the work to write a COBOL program that open the COBOL
file and write a CSV one (easily readable from python).

Thank you all!

Facundo Batista
Desarrollo de Red
(e-mail address removed)
(54 11) 5130-4643
Cel: 15 5132 0132
 
J

John Roth

Batista said:
People:

I'm trying to convert my father from using COBOL to Python, :)

One difficult thing we stuck into is how to read, from python, files written
with COBOL.

Do you know a module that allows me to do that?

It should avoid us the work to write a COBOL program that open the COBOL
file and write a CSV one (easily readable from python).

What's the OS for the two languages? COBOL from mainframe
to X86ish is very different from some flavor of Windows or Unix
COBOL.

Also, are we talking fixed or variable length records? And if
variable, how are they structured?

In either case, I think the struct module (under String Services)
is what you're looking for.

John Roth
 
A

asdf sdf

People:

I'm trying to convert my father from using COBOL to Python, :)

One difficult thing we stuck into is how to read, from python, files written
with COBOL.

Do you know a module that allows me to do that?

It should avoid us the work to write a COBOL program that open the COBOL
file and write a CSV one (easily readable from python).

Thank you all!

Facundo Batista
Desarrollo de Red
(e-mail address removed)
(54 11) 5130-4643
Cel: 15 5132 0132
i'm going to watch this thread with interest. a couple of weeks ago, i
asked about python to legacy mvs particularly for DB2 and Adabas access.
i got zero responses which suggested to me that no tools or modules
are in wide use.

i think you are undertaking a simpler problem generally. if all your
records are text it should be fairly straightforward. if not, you'll
need to figure out how to map COBOL data representations into python.

i seem to remember COMP-3, COMP-5 and packed decimal formats, among
others. what they mean, i dont't know, but generally various floating
and fixed point formats.

you also need to handle REDEFINES which is used to produce a c-union
sort of arrangement, where multiple formats can be used to access the
same record.

88-Levels are a similar problem.

after Y2K, a lot of COBOL files contain some non-obvious date handling,
which could involve bit manipulation.

if you learn of any sorts of tools at all, please post them back here.
python screen scrapers, python compatible database drivers, anything at all.

interesting project idea: a COBOL to python _code_ converter. should
be feasible, in light of COBOL's very limited syntax.

ah, COBOL fun. all us old guys are reflecting on how glad we are we
left it behind.

it might be a good exercise for your dad, if he wants to retool himself,
and he already knows all the data format stuff.
 
J

John Roth

asdf sdf said:
i'm going to watch this thread with interest. a couple of weeks ago, i
asked about python to legacy mvs particularly for DB2 and Adabas access.
i got zero responses which suggested to me that no tools or modules
are in wide use.

I missed seeing it, somehow, but you're also right: I don't know
of any tools either.
i think you are undertaking a simpler problem generally. if all your
records are text it should be fairly straightforward. if not, you'll
need to figure out how to map COBOL data representations into python.

In other words, take the 01s under the FD and create an object
that would expose all the converted data elements for the record?
Could be a somewhat interesting project, and it shouldn't be all
that hard since data descriptions are a fairly limited syntax.
you also need to handle REDEFINES which is used to produce a c-union
sort of arrangement, where multiple formats can be used to access the
same record.

Redefines in implicit - it's just multiple level 01s under the same FD.
88-Levels are a similar problem.

Aren't an issue. 88s are basically an isXXX type function call. That's not
how they're implemented, but that's the basic semantics.
after Y2K, a lot of COBOL files contain some non-obvious date handling,
which could involve bit manipulation.

if you learn of any sorts of tools at all, please post them back here.
python screen scrapers, python compatible database drivers, anything at all.

interesting project idea: a COBOL to python _code_ converter. should
be feasible, in light of COBOL's very limited syntax.

ah, COBOL fun. all us old guys are reflecting on how glad we are we
left it behind.

Ain't that the truth!

John Roth
 
S

Steve Williams

People:

I'm trying to convert my father from using COBOL to Python, :)

One difficult thing we stuck into is how to read, from python, files written
with COBOL.

Do you know a module that allows me to do that?

It should avoid us the work to write a COBOL program that open the COBOL
file and write a CSV one (easily readable from python).

Thank you all!

Facundo Batista
Desarrollo de Red
(e-mail address removed)
(54 11) 5130-4643
Cel: 15 5132 0132
I wrote an ETL system in python for a client to convert from Microfocus
COBOL to DB2. Here are some of the problems I saw:

1) COBOL has a very rich set of datatypes defined by the PICTURE clause

character
unsigned integer
zoned signed integer
integer trailing sign separate
integer leading sign separate
packed signed decimal
packed unsigned decimal
floating point

with the usual COBOL zoo of implied decimal points and scaling

Not to mention COBOL allowing formatted numeric data to be
used as source fields in arithmetic operations.

In my application, each of these types was converted by a
parameter-driven function.

That is, I took the original COBOL 01 level definition and
converted it to a list with definition parameters name, type,
length, decimal point, etc. to make it easy for Python and
to add some stuff to make DB2 happy (convert to title case. . .)

I doubt if you can easily write a parser for the COBOL PICTURE
clause and for most cases it would be a waste of time. I just
converted the definition by using 'replacing all occurences' in
a text processor.

I had the most problem with Microfocus unsigned decimal, as
I'd never seen it before.

2) Reading fixed and variable length records wasn't much of a problem

Reading Microfocus keyed sequential data with embedded indexes
took some bit-level coding.

3) None of this would be remotely attractive to a COBOL programmer.
Converting the data to CSV, however, might get his attention
as it's pretty easy in Python and not much fun in COBOL.

I you want to sell dad, talk about text and string processing
in Python.
 
A

asdf sdf

Steve said:
I wrote an ETL system in python for a client to convert from Microfocus
COBOL to DB2. Here are some of the problems I saw:

1) COBOL has a very rich set of datatypes defined by the PICTURE clause
That is, I took the original COBOL 01 level definition and
converted it to a list with definition parameters name, type,
length, decimal point, etc. to make it easy for Python and
to add some stuff to make DB2 happy (convert to title case. . .)
Steve,

I've been looking for ideas on getting at DB2 and Adabas from Python.
You might have some thoughts.

Is it feasible to go to directly to MVS/DB2/Adabas from Python on Unix
or Win?

Is it more realistic to hit DB2 on AIX or Linux and use some kind of DB2
linking or replication to reach DB2/MVS?

Other ideas? Maybe 3270 emulation with screen scraping? How about
telnet 3270? (Hundreds years of ago, I could dial into a command line
MVS environment.)

I don't mean to hijack the thread. I think this is related and might be
helpful to unfortunates to have to interoperate with legacy systems.
 
S

Steve Williams

asdf said:
Steve,

I've been looking for ideas on getting at DB2 and Adabas from Python.
You might have some thoughts.

Is it feasible to go to directly to MVS/DB2/Adabas from Python on Unix
or Win?

Is it more realistic to hit DB2 on AIX or Linux and use some kind of DB2
linking or replication to reach DB2/MVS?

Other ideas? Maybe 3270 emulation with screen scraping? How about
telnet 3270? (Hundreds years of ago, I could dial into a command line
MVS environment.)

I don't mean to hijack the thread. I think this is related and might be
helpful to unfortunates to have to interoperate with legacy systems.
Well, the application processed a lot of data on a nightly basis. It
used FTP to connect to the COBOL machine (an AIX box) and FTP callbacks
to sequentially read the files and convert the the data. There are two
a bugs in the Python FTP module that surface if the file size is larger
than 2 gig, but they're easily fixed.

I developed this application on Windows, initially targeting a test DB2
database on Windows and then moving the DB2 database to AIX and posting
with ODBC over the network from Windows.

In the full production environment I moved the Python
application to AIX. The moves were straightforward--Python was platform
independent for my purposes.

Initially I used ODBC or the API to post the data to DB2, but
that turned out to be slow. To get the speed I needed, I just wrote
the converted data to a CSV flat file and passed the file to the
DB2 loader utilities. No matter how good your code is, you'll never
outperform the database utilities.

I've never used replication or linking. I know nothing about DB2 on
MVS. In general, my experience with DB2 on networks (admittedly Unix
and Windows boxes) tells me accessing DB2 on MVS over a network would
not be a problem. I know nothing about ADABAS.

Python will certainly do TELNET and screen scraping, but life is short.

Other than the overall success of the project (I've been told successful
data warehouse projects are rare) the major benefit of using Python was
the ability to try new concepts quickly. With python you have
enormous flexibility, as opposed to compiled languages (COBOL, C, etc)
or third party ETL utilities.

As an example, my application converted accounting data on
a nightly basis. With no advance warning, the Accounting department
converted to another package. The python code to extract and load
the data from the new system was written and in production in 2 days.
 
B

Buck Nuggets

At least for DB2 this shouldn't be a problem - but would typically
involve a separate product - called "DB2 Connect". Shouldn't be cheap
or require any MVS components:
http://www-306.ibm.com/software/data/db2/db2connect/

No, DB2 Connect should give you odbc, jdbc, cli, etc protocols
directly to mvs. You can go through another db2 database, but that's
probably extra work & complexity.
Other than the overall success of the project (I've been told successful
data warehouse projects are rare) the major benefit of using Python was
the ability to try new concepts quickly. With python you have
enormous flexibility, as opposed to compiled languages (COBOL, C, etc)
or third party ETL utilities.

Nice case study. I've been building ETL systems for twelve years and
am on my second python etl project right now. Python has proved
itself the best option - there's nothing like adaptability when you've
got a dozen system interfaces to maintain! And its quick learning
curve has meant that bringing others up to speed has been a snap.

Most of my communication with db2 is just over the command line (via
popen2.Popen3) which is the only way to issue commands such as load,
export, force application, list application, etc. However, quite a
few of my summaries are run this way as well (typically mass inserts)
and aside from the primitive error codes, it works fine. There's also
at least one db2 python package (PyDB2). Here's a link to the
package:
http://sourceforge.net/projects/pydb2/
and here's a link to a tutorial for it:
https://www6.software.ibm.com/reg/devworks/dw-db2pylnx-i?S_TACT=102B7W91&S_CMP=DB2DD
I'm not using it yet, though a coworker just installed and started
using a python db2 module - I assume that it is this one.

And as far as reading files written in COBOL, here's a few thoughts:
1. don't make python read all the COBOL data types, instead make the
COBOL program write out a plain ascii record. Writing to a
fixed-length ascii record is very simple (if a little tedious to parse
on the other side).
2. if you can't modify the COBOL output...then you could consider a
commercial (perhaps with a free trial license) product that already
provides COBOL 'copybook' interpretation. There are quite a few of
these, though the least expensive ones I'm aware of are SyncSort, Data
Junction, and perhaps Compuware's FileAid. Don't think any have a
regular license for less than $1500.
3. if you have to read non-character cobol files, then I'd try to
just keep the number of options down to a reasonable number: you may
only need to support a few formats - such as zoned & packed decimal
(comp-3) for instance. Variable length files, float, comp-4, isam,
etc aren't that common. Redefines are often used in conjuction with
record types, and this can be sometimes simplified by just splitting
the file into multiple separate files by record type. And all the
formatting in the picture clause can be easily handled in the program
that reads the files (implied decimal places, signs, etc are all very
simple).

buck
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top