How to read files written with COBOL

Discussion in 'Python' started by Batista, Facundo, May 10, 2004.

  1. People:

    I'm trying to convert my father from using COBOL to Python, :)

    One difficult thing we stuck into is how to read, from python, files written
    with COBOL.

    Do you know a module that allows me to do that?

    It should avoid us the work to write a COBOL program that open the COBOL
    file and write a CSV one (easily readable from python).

    Thank you all!

    Facundo Batista
    Desarrollo de Red

    (54 11) 5130-4643
    Cel: 15 5132 0132
    Batista, Facundo, May 10, 2004
    #1
    1. Advertising

  2. Batista, Facundo

    John Roth Guest

    "Batista, Facundo" <> wrote in message
    news:...
    > People:
    >
    > I'm trying to convert my father from using COBOL to Python, :)
    >
    > One difficult thing we stuck into is how to read, from python, files

    written
    > with COBOL.
    >
    > Do you know a module that allows me to do that?
    >
    > It should avoid us the work to write a COBOL program that open the COBOL
    > file and write a CSV one (easily readable from python).


    What's the OS for the two languages? COBOL from mainframe
    to X86ish is very different from some flavor of Windows or Unix
    COBOL.

    Also, are we talking fixed or variable length records? And if
    variable, how are they structured?

    In either case, I think the struct module (under String Services)
    is what you're looking for.

    John Roth
    >
    > Thank you all!
    >
    > Facundo Batista
    > Desarrollo de Red
    >
    > (54 11) 5130-4643
    > Cel: 15 5132 0132
    >
    >
    >
    John Roth, May 10, 2004
    #2
    1. Advertising

  3. Batista, Facundo

    asdf sdf Guest

    Batista, Facundo wrote:
    > People:
    >
    > I'm trying to convert my father from using COBOL to Python, :)
    >
    > One difficult thing we stuck into is how to read, from python, files written
    > with COBOL.
    >
    > Do you know a module that allows me to do that?
    >
    > It should avoid us the work to write a COBOL program that open the COBOL
    > file and write a CSV one (easily readable from python).
    >
    > Thank you all!
    >
    > Facundo Batista
    > Desarrollo de Red
    >
    > (54 11) 5130-4643
    > Cel: 15 5132 0132
    >
    >
    >

    i'm going to watch this thread with interest. a couple of weeks ago, i
    asked about python to legacy mvs particularly for DB2 and Adabas access.
    i got zero responses which suggested to me that no tools or modules
    are in wide use.

    i think you are undertaking a simpler problem generally. if all your
    records are text it should be fairly straightforward. if not, you'll
    need to figure out how to map COBOL data representations into python.

    i seem to remember COMP-3, COMP-5 and packed decimal formats, among
    others. what they mean, i dont't know, but generally various floating
    and fixed point formats.

    you also need to handle REDEFINES which is used to produce a c-union
    sort of arrangement, where multiple formats can be used to access the
    same record.

    88-Levels are a similar problem.

    after Y2K, a lot of COBOL files contain some non-obvious date handling,
    which could involve bit manipulation.

    if you learn of any sorts of tools at all, please post them back here.
    python screen scrapers, python compatible database drivers, anything at all.

    interesting project idea: a COBOL to python _code_ converter. should
    be feasible, in light of COBOL's very limited syntax.

    ah, COBOL fun. all us old guys are reflecting on how glad we are we
    left it behind.

    it might be a good exercise for your dad, if he wants to retool himself,
    and he already knows all the data format stuff.
    asdf sdf, May 10, 2004
    #3
  4. Batista, Facundo

    John Roth Guest

    "asdf sdf" <> wrote in message
    news:VYSnc.46990$...
    > Batista, Facundo wrote:
    > > People:
    > >
    > > I'm trying to convert my father from using COBOL to Python, :)
    > >
    > > One difficult thing we stuck into is how to read, from python, files

    written
    > > with COBOL.
    > >
    > > Do you know a module that allows me to do that?
    > >
    > > It should avoid us the work to write a COBOL program that open the COBOL
    > > file and write a CSV one (easily readable from python).
    > >
    > > Thank you all!
    > >
    > > Facundo Batista
    > > Desarrollo de Red
    > >
    > > (54 11) 5130-4643
    > > Cel: 15 5132 0132
    > >
    > >
    > >

    > i'm going to watch this thread with interest. a couple of weeks ago, i
    > asked about python to legacy mvs particularly for DB2 and Adabas access.
    > i got zero responses which suggested to me that no tools or modules
    > are in wide use.


    I missed seeing it, somehow, but you're also right: I don't know
    of any tools either.

    > i think you are undertaking a simpler problem generally. if all your
    > records are text it should be fairly straightforward. if not, you'll
    > need to figure out how to map COBOL data representations into python.


    In other words, take the 01s under the FD and create an object
    that would expose all the converted data elements for the record?
    Could be a somewhat interesting project, and it shouldn't be all
    that hard since data descriptions are a fairly limited syntax.

    > you also need to handle REDEFINES which is used to produce a c-union
    > sort of arrangement, where multiple formats can be used to access the
    > same record.


    Redefines in implicit - it's just multiple level 01s under the same FD.

    > 88-Levels are a similar problem.


    Aren't an issue. 88s are basically an isXXX type function call. That's not
    how they're implemented, but that's the basic semantics.

    > after Y2K, a lot of COBOL files contain some non-obvious date handling,
    > which could involve bit manipulation.
    >
    > if you learn of any sorts of tools at all, please post them back here.
    > python screen scrapers, python compatible database drivers, anything at

    all.
    >
    > interesting project idea: a COBOL to python _code_ converter. should
    > be feasible, in light of COBOL's very limited syntax.
    >
    > ah, COBOL fun. all us old guys are reflecting on how glad we are we
    > left it behind.


    Ain't that the truth!

    John Roth
    John Roth, May 10, 2004
    #4
  5. Batista, Facundo wrote:
    > People:
    >
    > I'm trying to convert my father from using COBOL to Python, :)
    >
    > One difficult thing we stuck into is how to read, from python, files written
    > with COBOL.
    >
    > Do you know a module that allows me to do that?
    >
    > It should avoid us the work to write a COBOL program that open the COBOL
    > file and write a CSV one (easily readable from python).
    >
    > Thank you all!
    >
    > Facundo Batista
    > Desarrollo de Red
    >
    > (54 11) 5130-4643
    > Cel: 15 5132 0132
    >
    >
    >

    I wrote an ETL system in python for a client to convert from Microfocus
    COBOL to DB2. Here are some of the problems I saw:

    1) COBOL has a very rich set of datatypes defined by the PICTURE clause

    character
    unsigned integer
    zoned signed integer
    integer trailing sign separate
    integer leading sign separate
    packed signed decimal
    packed unsigned decimal
    floating point

    with the usual COBOL zoo of implied decimal points and scaling

    Not to mention COBOL allowing formatted numeric data to be
    used as source fields in arithmetic operations.

    In my application, each of these types was converted by a
    parameter-driven function.

    That is, I took the original COBOL 01 level definition and
    converted it to a list with definition parameters name, type,
    length, decimal point, etc. to make it easy for Python and
    to add some stuff to make DB2 happy (convert to title case. . .)

    I doubt if you can easily write a parser for the COBOL PICTURE
    clause and for most cases it would be a waste of time. I just
    converted the definition by using 'replacing all occurences' in
    a text processor.

    I had the most problem with Microfocus unsigned decimal, as
    I'd never seen it before.

    2) Reading fixed and variable length records wasn't much of a problem

    Reading Microfocus keyed sequential data with embedded indexes
    took some bit-level coding.

    3) None of this would be remotely attractive to a COBOL programmer.
    Converting the data to CSV, however, might get his attention
    as it's pretty easy in Python and not much fun in COBOL.

    I you want to sell dad, talk about text and string processing
    in Python.
    Steve Williams, May 11, 2004
    #5
  6. Batista, Facundo

    asdf sdf Guest

    Steve Williams wrote:

    > I wrote an ETL system in python for a client to convert from Microfocus
    > COBOL to DB2. Here are some of the problems I saw:
    >
    > 1) COBOL has a very rich set of datatypes defined by the PICTURE clause

    <...snipping various items...>

    > That is, I took the original COBOL 01 level definition and
    > converted it to a list with definition parameters name, type,
    > length, decimal point, etc. to make it easy for Python and
    > to add some stuff to make DB2 happy (convert to title case. . .)

    Steve,

    I've been looking for ideas on getting at DB2 and Adabas from Python.
    You might have some thoughts.

    Is it feasible to go to directly to MVS/DB2/Adabas from Python on Unix
    or Win?

    Is it more realistic to hit DB2 on AIX or Linux and use some kind of DB2
    linking or replication to reach DB2/MVS?

    Other ideas? Maybe 3270 emulation with screen scraping? How about
    telnet 3270? (Hundreds years of ago, I could dial into a command line
    MVS environment.)

    I don't mean to hijack the thread. I think this is related and might be
    helpful to unfortunates to have to interoperate with legacy systems.
    asdf sdf, May 11, 2004
    #6
  7. asdf sdf wrote:
    > Steve Williams wrote:
    >
    >> I wrote an ETL system in python for a client to convert from
    >> Microfocus COBOL to DB2. Here are some of the problems I saw:
    >>
    >> 1) COBOL has a very rich set of datatypes defined by the PICTURE clause

    >
    > <...snipping various items...>
    >
    >> That is, I took the original COBOL 01 level definition and
    >> converted it to a list with definition parameters name, type,
    >> length, decimal point, etc. to make it easy for Python and
    >> to add some stuff to make DB2 happy (convert to title case. . .)

    >
    > Steve,
    >
    > I've been looking for ideas on getting at DB2 and Adabas from Python.
    > You might have some thoughts.
    >
    > Is it feasible to go to directly to MVS/DB2/Adabas from Python on Unix
    > or Win?
    >
    > Is it more realistic to hit DB2 on AIX or Linux and use some kind of DB2
    > linking or replication to reach DB2/MVS?
    >
    > Other ideas? Maybe 3270 emulation with screen scraping? How about
    > telnet 3270? (Hundreds years of ago, I could dial into a command line
    > MVS environment.)
    >
    > I don't mean to hijack the thread. I think this is related and might be
    > helpful to unfortunates to have to interoperate with legacy systems.
    >
    >
    >
    >
    >
    >
    >

    Well, the application processed a lot of data on a nightly basis. It
    used FTP to connect to the COBOL machine (an AIX box) and FTP callbacks
    to sequentially read the files and convert the the data. There are two
    a bugs in the Python FTP module that surface if the file size is larger
    than 2 gig, but they're easily fixed.

    I developed this application on Windows, initially targeting a test DB2
    database on Windows and then moving the DB2 database to AIX and posting
    with ODBC over the network from Windows.

    In the full production environment I moved the Python
    application to AIX. The moves were straightforward--Python was platform
    independent for my purposes.

    Initially I used ODBC or the API to post the data to DB2, but
    that turned out to be slow. To get the speed I needed, I just wrote
    the converted data to a CSV flat file and passed the file to the
    DB2 loader utilities. No matter how good your code is, you'll never
    outperform the database utilities.

    I've never used replication or linking. I know nothing about DB2 on
    MVS. In general, my experience with DB2 on networks (admittedly Unix
    and Windows boxes) tells me accessing DB2 on MVS over a network would
    not be a problem. I know nothing about ADABAS.

    Python will certainly do TELNET and screen scraping, but life is short.

    Other than the overall success of the project (I've been told successful
    data warehouse projects are rare) the major benefit of using Python was
    the ability to try new concepts quickly. With python you have
    enormous flexibility, as opposed to compiled languages (COBOL, C, etc)
    or third party ETL utilities.

    As an example, my application converted accounting data on
    a nightly basis. With no advance warning, the Accounting department
    converted to another package. The python code to extract and load
    the data from the new system was written and in production in 2 days.
    Steve Williams, May 12, 2004
    #7
  8. Batista, Facundo

    Buck Nuggets Guest

    Steve Williams <> wrote in message news:<nJhoc.186646$>...
    > asdf sdf wrote:
    > > Is it feasible to go to directly to MVS/DB2/Adabas from Python on Unix
    > > or Win?


    At least for DB2 this shouldn't be a problem - but would typically
    involve a separate product - called "DB2 Connect". Shouldn't be cheap
    or require any MVS components:
    http://www-306.ibm.com/software/data/db2/db2connect/

    > > Is it more realistic to hit DB2 on AIX or Linux and use some kind of DB2
    > > linking or replication to reach DB2/MVS?


    No, DB2 Connect should give you odbc, jdbc, cli, etc protocols
    directly to mvs. You can go through another db2 database, but that's
    probably extra work & complexity.

    > Other than the overall success of the project (I've been told successful
    > data warehouse projects are rare) the major benefit of using Python was
    > the ability to try new concepts quickly. With python you have
    > enormous flexibility, as opposed to compiled languages (COBOL, C, etc)
    > or third party ETL utilities.


    Nice case study. I've been building ETL systems for twelve years and
    am on my second python etl project right now. Python has proved
    itself the best option - there's nothing like adaptability when you've
    got a dozen system interfaces to maintain! And its quick learning
    curve has meant that bringing others up to speed has been a snap.

    Most of my communication with db2 is just over the command line (via
    popen2.Popen3) which is the only way to issue commands such as load,
    export, force application, list application, etc. However, quite a
    few of my summaries are run this way as well (typically mass inserts)
    and aside from the primitive error codes, it works fine. There's also
    at least one db2 python package (PyDB2). Here's a link to the
    package:
    http://sourceforge.net/projects/pydb2/
    and here's a link to a tutorial for it:
    https://www6.software.ibm.com/reg/devworks/dw-db2pylnx-i?S_TACT=102B7W91&S_CMP=DB2DD
    I'm not using it yet, though a coworker just installed and started
    using a python db2 module - I assume that it is this one.

    And as far as reading files written in COBOL, here's a few thoughts:
    1. don't make python read all the COBOL data types, instead make the
    COBOL program write out a plain ascii record. Writing to a
    fixed-length ascii record is very simple (if a little tedious to parse
    on the other side).
    2. if you can't modify the COBOL output...then you could consider a
    commercial (perhaps with a free trial license) product that already
    provides COBOL 'copybook' interpretation. There are quite a few of
    these, though the least expensive ones I'm aware of are SyncSort, Data
    Junction, and perhaps Compuware's FileAid. Don't think any have a
    regular license for less than $1500.
    3. if you have to read non-character cobol files, then I'd try to
    just keep the number of options down to a reasonable number: you may
    only need to support a few formats - such as zoned & packed decimal
    (comp-3) for instance. Variable length files, float, comp-4, isam,
    etc aren't that common. Redefines are often used in conjuction with
    record types, and this can be sometimes simplified by just splitting
    the file into multiple separate files by record type. And all the
    formatting in the picture clause can be easily handled in the program
    that reads the files (implied decimal places, signs, etc are all very
    simple).

    buck
    Buck Nuggets, May 14, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ferro
    Replies:
    7
    Views:
    14,472
    Dave Monroe
    Oct 29, 2003
  2. cobol and binary data written by C

    , May 9, 2004, in forum: C Programming
    Replies:
    5
    Views:
    503
    Lew Pitcher
    May 10, 2004
  3. Batista, Facundo

    RE: How to read files written with COBOL

    Batista, Facundo, May 10, 2004, in forum: Python
    Replies:
    1
    Views:
    503
    Steve Holden
    May 10, 2004
  4. none
    Replies:
    2
    Views:
    549
    Dennis Lee Bieber
    Sep 22, 2005
  5. Gabkin

    read COBOL index file with perl

    Gabkin, Jul 5, 2004, in forum: Perl Misc
    Replies:
    4
    Views:
    501
    Gabkin
    Jul 5, 2004
Loading...

Share This Page