Working with fixed format text db's

Discussion in 'Python' started by Neil Cerutti, Jun 8, 2007.

  1. Neil Cerutti

    Neil Cerutti Guest

    Many of the file formats I have to work with are so-called
    fixed-format records, where every line in the file is a record,
    and every field in a record takes up a specific amount of space.

    For example, one of my older Python programs contains the
    following to create a fixed-format text record for a batch of new
    students:

    new = file("new.dat", "w")
    if not new:
    print "Error. Could not open file new.dat for writing."
    raw_input("Press Return To Exit.")
    sys.exit(1)

    for s in freshmen:
    new.write(s.ssn.ljust(9))
    new.write(s.id.ljust(10))
    new.write(s.last[:16].ljust(16))
    new.write(s.first[:11].ljust(11))
    new.write(' '.ljust(10)) # Phone Number
    new.write(' '.ljust(1254)) # Empty 'filler' space.
    new.write('2813 ')
    new.write(s.major.ljust(5))

    # Etc...

    Luckily, the output format has not changed yet, so issues with
    maintaining the above haven't arisen.

    However, I'd like something better.

    Is there already a good module for working with fixed format
    records available? I couldn't find one.

    If not, please suggest how I might improve the above code.

    --
    Neil Cerutti
    When "yearn" was sung, the performers ounded like they were in a state of
    yearning. --Music Lit Essay
     
    Neil Cerutti, Jun 8, 2007
    #1
    1. Advertising

  2. Neil Cerutti <> wrote:

    > Luckily, the output format has not changed yet, so issues with
    > maintaining the above haven't arisen.


    The problem surely is that when you want to change the format you have to do
    so in all files (and what about the backups then?) and all programs
    simultaneously.

    Maintaining the code is the least of your the problems, I'd say.

    You could change the data layout so that eg each field was terminated by a
    marker character, then read/write delimited values. But unless you also
    review all the other parts of your programs, you need to be sure that you
    don't have any other code anywhere that implicitly relies on a particular
    field being a known fixed length.

    >
    > However, I'd like something better.


    What precisely do you want to achieve?


    --
    Jeremy C B Nicoll - my opinions are my own.
     
    Jeremy C B Nicoll, Jun 8, 2007
    #2
    1. Advertising

  3. Neil Cerutti

    Neil Cerutti Guest

    On 2007-06-08, Jeremy C B Nicoll <> wrote:
    > Neil Cerutti <> wrote:
    >> Luckily, the output format has not changed yet, so issues with
    >> maintaining the above haven't arisen.

    >
    > The problem surely is that when you want to change the format
    > you have to do so in all files (and what about the backups
    > then?) and all programs simultaneously.


    I don't have control of the format, unfortunately. It's an import
    file format for a commercial database application.

    > Maintaining the code is the least of your the problems, I'd
    > say.
    >
    > You could change the data layout so that eg each field was
    > terminated by a marker character, then read/write delimited
    > values. But unless you also review all the other parts of your
    > programs, you need to be sure that you don't have any other
    > code anywhere that implicitly relies on a particular field
    > being a known fixed length.
    >
    >> However, I'd like something better.

    >
    > What precisely do you want to achieve?


    I was hoping for a module that provides a way for me to specify a
    fixed file format, along with some sort of interface for writing
    and reading files that are in said format.

    It is not actually *hard* to do this with ad-hoc code, but then
    the program is indecipherable without a hardcopy of the spec in
    hand. And also, as you say, if the spec ever does change, the
    hand-written batch of ljust, rjust and slice will be somewhat of
    a pain to reconfigure.

    But biggest weakness, to me, is that the specification is not in
    the code, or read and used by the code, and I think it should be.

    If nothing exists already I guess I'll roll my own. But I'd like
    to be lazier, and virtually all published modules are better than
    what I'll write for myself. ;)

    The underlying problem, of course, is the archaic flat-file
    format with fixed-width data fields. Even the Department of
    Education has moved on to XML for most of it's data files, which
    are much simpler for me to parse.

    --
    Neil Cerutti
     
    Neil Cerutti, Jun 8, 2007
    #3
  4. In <>, Neil Cerutti wrote:

    > new = file("new.dat", "w")
    > if not new:
    > print "Error. Could not open file new.dat for writing."
    > raw_input("Press Return To Exit.")
    > sys.exit(1)


    Hey, Python is not C. File objects should *always* be "true". An error
    is handled via exceptions.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Jun 8, 2007
    #4
  5. Neil Cerutti

    Neil Cerutti Guest

    On 2007-06-08, Marc 'BlackJack' Rintsch <> wrote:
    > In <>, Neil Cerutti wrote:
    >
    >> new = file("new.dat", "w")
    >> if not new:
    >> print "Error. Could not open file new.dat for writing."
    >> raw_input("Press Return To Exit.")
    >> sys.exit(1)

    >
    > Hey, Python is not C. File objects should *always* be "true".
    > An error is handled via exceptions.


    Thanks. Update in progress.

    --
    Neil Cerutti
    The doctors X-rayed my head and found nothing. --Dizzy Dean
     
    Neil Cerutti, Jun 8, 2007
    #5
  6. Neil Cerutti

    Mark Carter Guest

    Neil Cerutti wrote:

    > The underlying problem, of course, is the archaic flat-file
    > format with fixed-width data fields. Even the Department of
    > Education has moved on to XML for most of it's data files,


    :(

    I'm writing a small app, and was wondering the best way to store data.
    Currently the fields are separated by spaces. I was toying with the idea
    of using sqlite, yaml or json, but I think I've settled on CSV. Dull,
    but it's easy to parse for humans and computers.
     
    Mark Carter, Jun 8, 2007
    #6
  7. Neil Cerutti <> wrote:

    > On 2007-06-08, Jeremy C B Nicoll <> wrote:
    > > Neil Cerutti <> wrote:
    > >> Luckily, the output format has not changed yet, so issues with
    > >> maintaining the above haven't arisen.

    > >
    > > The problem surely is that when you want to change the format
    > > you have to do so in all files (and what about the backups
    > > then?) and all programs simultaneously.

    >
    > I don't have control of the format, unfortunately. It's an import
    > file format for a commercial database application.


    You're saying your program merely has to read data files created by that
    database app? It's not that you have a whole suite of programs that create
    and read these files, nor that you have years worth of old files that would
    need their format converted if the programs were changed?


    > It is not actually *hard* to do this with ad-hoc code, but then
    > the program is indecipherable without a hardcopy of the spec in
    > hand. And also, as you say, if the spec ever does change, the
    > hand-written batch of ljust, rjust and slice will be somewhat of
    > a pain to reconfigure.


    You could presumably define a list (of some sort, might be the wrong
    terminology) that defines the 'name', type, length, justification and
    padding of each field, and then make the explicit code you showed loop
    through that list and do what's needed field by field.

    There's a risk that abstracting the definitions will make the code less
    clear to anyone else; at least it's clear what the current stuff does.

    > But biggest weakness, to me, is that the specification is not in
    > the code, or read and used by the code, and I think it should be.


    It'd be better if you could read the data layout spec from some file
    produced by the database system. No chance perhaps of having the dat files
    include some sort of dummy first record that contains the necessary info in
    a form that you could interpret?


    --
    Jeremy C B Nicoll - my opinions are my own.
     
    Jeremy C B Nicoll, Jun 8, 2007
    #7
  8. Neil Cerutti

    Ben Finney Guest

    Neil Cerutti <> writes:

    > I was hoping for a module that provides a way for me to specify a
    > fixed file format, along with some sort of interface for writing and
    > reading files that are in said format.


    Isn't that done by the 'struct' module
    <URL:http://www.python.org/doc/lib/module-struct>?

    >>> records = [

    ... "Foo 13 Bar ",
    ... "Spam 23 Eggs ",
    ... "Guido 666Robot ",
    ... ]
    >>> record_format = "8s3s8s"
    >>> for record in [struct.unpack(record_format, r) for r in records]:

    ... print record
    ...
    ('Foo ', '13 ', 'Bar ')
    ('Spam ', '23 ', 'Eggs ')
    ('Guido ', '666', 'Robot ')

    --
    \ "Buy not what you want, but what you need; what you do not need |
    `\ is expensive at a penny." -- Cato, 234-149 BC, Relique |
    _o__) |
    Ben Finney
     
    Ben Finney, Jun 9, 2007
    #8
  9. Neil Cerutti

    John Machin Guest

    On Jun 9, 7:55 am, Jeremy C B Nicoll <> wrote:
    > Neil Cerutti <> wrote:
    > > On 2007-06-08, Jeremy C B Nicoll <> wrote:
    > > > Neil Cerutti <> wrote:
    > > >> Luckily, the output format has not changed yet, so issues with
    > > >> maintaining the above haven't arisen.

    >
    > > > The problem surely is that when you want to change the format
    > > > you have to do so in all files (and what about the backups
    > > > then?) and all programs simultaneously.

    >
    > > I don't have control of the format, unfortunately. It's an import
    > > file format for a commercial database application.

    >
    > You're saying your program merely has to read data files created by that
    > database app? It's not that you have a whole suite of programs that create
    > and read these files, nor that you have years worth of old files that would
    > need their format converted if the programs were changed?
    >
    > > It is not actually *hard* to do this with ad-hoc code, but then
    > > the program is indecipherable without a hardcopy of the spec in
    > > hand. And also, as you say, if the spec ever does change, the
    > > hand-written batch of ljust, rjust and slice will be somewhat of
    > > a pain to reconfigure.

    >
    > You could presumably define a list (of some sort, might be the wrong
    > terminology) that defines the 'name', type, length, justification and
    > padding of each field, and then make the explicit code you showed loop
    > through that list and do what's needed field by field.
    >
    > There's a risk that abstracting the definitions will make the code less
    > clear to anyone else; at least it's clear what the current stuff does.
    >
    > > But biggest weakness, to me, is that the specification is not in
    > > the code, or read and used by the code, and I think it should be.

    >
    > It'd be better if you could read the data layout spec from some file
    > produced by the database system. No chance perhaps of having the dat files
    > include some sort of dummy first record that contains the necessary info in
    > a form that you could interpret?


    The OP is *WRITING* not reading.
     
    John Machin, Jun 9, 2007
    #9
  10. Neil Cerutti wrote:
    > The underlying problem, of course, is the archaic flat-file
    > format with fixed-width data fields. Even the Department of
    > Education has moved on to XML for most of it's data files, which
    > are much simpler for me to parse.


    XML easier to parse than fixed position file. Wow!

    Very likely this file is created by a COBOL program, because this is
    what COBOL loves.

    01 my-record.
    05 ssn pic 9(9).
    05 id pic 9(10).
    05 last-name pic x(16).
    05 first-name pic x(11).
    05 phone-nbr pic 9(10).
    05 filler pic x(1254).
    05 filler pic x(6) value '2813'.
    05 major pic x(5).

    write my-record

    Haha. I'm just amused that new languages make simpler some things that
    were hard in older languages, but in turn make more difficult things
    that were simple!

    Frank
    COBOL expert/Python newbie
     
    Frank Swarbrick, Jun 9, 2007
    #10
  11. Neil Cerutti

    Guest

    On Jun 8, 6:18?pm, Ben Finney <>
    wrote:
    > Neil Cerutti <> writes:
    > > I was hoping for a module that provides a way for me to specify a
    > > fixed file format, along with some sort of interface for writing and
    > > reading files that are in said format.

    >
    > Isn't that done by the 'struct' module
    > <URL:http://www.python.org/doc/lib/module-struct>?
    >
    > >>> records = [

    > ... "Foo 13 Bar ",
    > ... "Spam 23 Eggs ",
    > ... "Guido 666Robot ",
    > ... ]
    > >>> record_format = "8s3s8s"
    > >>> for record in [struct.unpack(record_format, r) for r in records]:

    > ... print record
    > ...
    > ('Foo ', '13 ', 'Bar ')
    > ('Spam ', '23 ', 'Eggs ')
    > ('Guido ', '666', 'Robot ')


    But when you pack a struct, the padding is null bytes,
    not spaces.


    >
    > --
    > \ "Buy not what you want, but what you need; what you do not need |
    > `\ is expensive at a penny." -- Cato, 234-149 BC, Relique |
    > _o__) |
    > Ben Finney
     
    , Jun 9, 2007
    #11
  12. On Jun 8, 5:50 pm, Neil Cerutti <> wrote:
    > Many of the file formats I have to work with are so-called
    > fixed-format records, where every line in the file is a record,
    > and every field in a record takes up a specific amount of space.
    >
    > For example, one of my older Python programs contains the
    > following to create a fixed-format text record for a batch of new
    > students:
    >
    > new = file("new.dat", "w")
    > if not new:
    > print "Error. Could not open file new.dat for writing."
    > raw_input("Press Return To Exit.")
    > sys.exit(1)
    >
    > for s in freshmen:
    > new.write(s.ssn.ljust(9))
    > new.write(s.id.ljust(10))
    > new.write(s.last[:16].ljust(16))
    > new.write(s.first[:11].ljust(11))
    > new.write(' '.ljust(10)) # Phone Number
    > new.write(' '.ljust(1254)) # Empty 'filler' space.
    > new.write('2813 ')
    > new.write(s.major.ljust(5))
    >


    I have to do this occasionally, and also find it cumbersome.

    I toyed with the idea of posting a feature request for a new 'fixed
    length' string formatting operator, with optional parameters for left/
    right-justified and space/zero-filled.

    We already have '%-12s' to space fill for a length of 12, but it is
    not truly fixed-length, as if the value has a length greater than 12
    you need it to be truncated, and this construction will not do that.

    Assume we have a new flag '!n', which defaults to left-justified and
    space-filled, but allows an optional 'r' and '0' to override the
    defaults.

    Then the above example could be written as

    format = '%!9s%!10s%!16s%!11s%!10s%!1254s%!6s%!5s'
    for s in freshmen:
    new.write (format %
    (s.ssn,s.id,s.last,s.first,
    ' ',' ','2813',s.major))

    I never felt strongly enough about it to propose it, but I thought I
    would mention it.

    Frank Millman
     
    Frank Millman, Jun 9, 2007
    #12
  13. Neil Cerutti

    John Machin Guest

    On Jun 9, 5:48 am, Mark Carter <> wrote:
    > Neil Cerutti wrote:
    > > The underlying problem, of course, is the archaic flat-file
    > > format with fixed-width data fields. Even the Department of
    > > Education has moved on to XML for most of it's data files,

    >
    > :(
    >
    > I'm writing a small app, and was wondering the best way to store data.
    > Currently the fields are separated by spaces. I was toying with the idea
    > of using sqlite, yaml or json, but I think I've settled on CSV. Dull,
    > but it's easy to parse for humans and computers.


    Yup, humans find that parsing stuff like the following is quite easy:

    "Jack ""The Ripper"" Jones","""Eltsac Ruo"", 123 Smith St",,Paris TX
    12345

    Cheers,
    John
     
    John Machin, Jun 9, 2007
    #13
  14. Neil Cerutti

    Lloyd Zusman Guest

    Frank Millman <> writes:

    > On Jun 8, 5:50 pm, Neil Cerutti <> wrote:
    >> Many of the file formats I have to work with are so-called
    >> fixed-format records, where every line in the file is a record,
    >> and every field in a record takes up a specific amount of space.
    >>
    >> [ ... ]

    >
    > We already have '%-12s' to space fill for a length of 12, but it is
    > not truly fixed-length, as if the value has a length greater than 12
    > you need it to be truncated, and this construction will not do that.


    In this case, we can use '%-12.12s'.

    --
    Lloyd Zusman

    God bless you.
     
    Lloyd Zusman, Jun 9, 2007
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Motaz Saad
    Replies:
    7
    Views:
    6,514
  2. johnp
    Replies:
    4
    Views:
    3,691
    Toby Inkster
    May 23, 2005
  3. Dominik
    Replies:
    4
    Views:
    2,433
    Dominik
    Mar 22, 2007
  4. fixed width format

    , Sep 6, 2007, in forum: C Programming
    Replies:
    6
    Views:
    358
    Barry Schwarz
    Sep 7, 2007
Loading...

Share This Page