newbie file/DB processing

len · May 18, 2005

I am in the process of learning python. I have bought Learning Python
by Mark Lutz, printed a copy of Dive into Python and various other
books and looked at several tutorials. I have started a stupid little
project in python and things are proceeding well. I am an old time
cobol programmer from the IBM 360/370 eria and this ingrained idea of
file processing using file definition (FD's) I believe is causing me
problems because I think python requires a different way of looking at
datafiles and I haven't really gotten my brain around it yet. I would
like to create a small sequential file, processing at first to store a
group id, name, amount, and date which I can add to delete from and
update. Could someone point me to some code that would show me how this
is done in python. Eventually, I intend to expand my little program to
process this file as a flat comma delimited file, move it to some type
of indexed file and finally to some RDBMS system. My little program
started out at about 9 lines of code and is now at about 100 with 5 or
six functions which I will eventually change to classes (I need to
learn OOP to but one step at a time).

So any recommendations of code, online tutorial, or book that might
address this file processing/database would be appreciated

Thanks
Len

Dennis Lee Bieber · May 19, 2005

like to create a small sequential file, processing at first to store a
group id, name, amount, and date which I can add to delete from and
update. Could someone point me to some code that would show me how this
is done in python. Eventually, I intend to expand my little program to
process this file as a flat comma delimited file, move it to some type
of indexed file and finally to some RDBMS system. My little program
started out at about 9 lines of code and is now at about 100 with 5 or
six functions which I will eventually change to classes (I need to
learn OOP to but one step at a time).

So any recommendations of code, online tutorial, or book that might
address this file processing/database would be appreciated

I suspect your biggest, uhm, complication... will be that once
you get away from the era of COBOL, FORTRAN (and even basic Ada), you
find that files are NOT record oriented, but are stream (byte/character
sequences).

If you intend to perform "in-place" updates to a file, you are
going to have write routines to make sure each of your fields is padded
out to some maximum length. With fixed length fields, you can work with
logical records -- that is, you can perform seek operations by (rec# -
1) * rec_len to get to the start of a record, and then read rec_len
bytes. I'd recommend that you don't try to use line-ends in this
situation as some OSs convert the single line-end into two bytes, which
would affect your rec_len.

Any other format -- CSV, new-line separated, etc. will require
you to either: 1) read the entire contents into memory, modify, write
entire contents, or 2) use two files, one for "old" and one for "new",
and process each record of the "old" generating records in "new" -- then
delete "old" and rename "new".

Not sure what you consider an indexed file -- closest in Python
might be bsddb (or other dbm) file. These only, as I recall, support a
single key, and variable data string. Which reminds me -- if you intend
to store /binary/ data, you will either need something like
pickle/shelve, or the struct module; stream files are just a sequence of
bytes -- you have to supply the meaning.

Unless you really intend the intervening methods as a learning
tool, it might be faster just to go directly to the RDBM... Plenty of
them available...

--

Mike Meyer · May 19, 2005

len said:
I am in the process of learning python. I have bought Learning Python
by Mark Lutz, printed a copy of Dive into Python and various other
books and looked at several tutorials. I have started a stupid little
project in python and things are proceeding well. I am an old time
cobol programmer from the IBM 360/370 eria and this ingrained idea of
file processing using file definition (FD's) I believe is causing me
problems because I think python requires a different way of looking at
datafiles and I haven't really gotten my brain around it yet. I would
like to create a small sequential file, processing at first to store a
group id, name, amount, and date which I can add to delete from and
update. Could someone point me to some code that would show me how this
is done in python. Eventually, I intend to expand my little program to
process this file as a flat comma delimited file, move it to some type
of indexed file and finally to some RDBMS system. My little program
started out at about 9 lines of code and is now at about 100 with 5 or
six functions which I will eventually change to classes (I need to
learn OOP to but one step at a time).

What you're looking for isn't so much the Python way of doing things;
it's the Unix way of doing things. The OS doesn't present the file as
a sequence of records; files are presented as a sequence of bytes. Any
structure beyond that is provided by the application - possibly via a
library. This is a sufficiently powerful way of looking at files that
every modern OS I'm familiar with uses this view of files.

You might want to look at <URL: http://www.faqs.org/docs/artu/ >. It's
not really an answer to your question, but looks at Unix programming
in general. It uses fetchmail as an example application, including
examining the configuration editor written in Python.

A classic Unix approach to small databases is to use text files. When
you need to update the file, you just rewrite the whole thing. This
works well on Unix, because it comes with a multitude of tools for
processing text files. Such an approach is simple and easy to
implement, but not very efficient for large files. A classic example
is a simple phone book application: you have a simple tool for
updating the phone book, and use the "grep" command for searching
it. Works like a charm for small files, and allows for some amazingly
sophisticated queries.

To provide some (simple) code, assume your file is a list of lines,
with id, name, amount, date on each line, separated by spaces. Loading
this into a list in memory is trivial:

datafile = open("file", "r")
datalist = []
for line in data:
datalist.append(line.split())
datafile.close()

At this point, datalist is a list of lists. datalist[0] is a list of
[id, name, amount, date]. You could (for example) sum all the amounts
like so:

total = 0
for datum in datalist:
total += datum[2]

There are more concise ways to write this, but this is simple and
obvious.

Writing the list back out is also trivial:

datafile = open("file", "w")
datafile.writelines(" ".join(x) + "\n" for x in datalist)
datafile.close()

Note that that uses a 2.4 feature. The more portable - and obvious -
way to write this is:

datafile = open("file", "w")
for datum in datalist:
datafile.write(" ".join(datum) + "\n")
datafile.close()

For comma delimited files, there's the CSV module. It loads files
formmated as Comma Seperated Values (a common interchange format for
spreadsheets) into memory, and writes them back out again. This is a
slightly more structured version of the simple text file approach. It
may be just what you're looking for.

If you want to store string objects selectable by a single string key,
the various Unix db libraries are just what the doctor ordered. The
underlying C libraries allow arbitrary memory chunks as keys/objects,
but the Python libraries use Python strings, and dbms look like
dictionaries. The shelve module is built on top of these, allowing you
to store arbitrary Python objects instead of just strings.

Finally, for RDBMS, you almost always get SQL these days. The options
run from small embedded databases built in python to network
connections to full-blown SQL servers. I'd put that decision off until
you really need a database.

<mike

Magnus Lycka · May 19, 2005

len said:
I am an old time
cobol programmer from the IBM 360/370 eria and this ingrained idea of
file processing using file definition (FD's) I believe is causing me
problems because I think python requires a different way of looking at
datafiles and I haven't really gotten my brain around it yet.

Yup, Python uses the same world view as C, C++, Unix etc.

I would
like to create a small sequential file, processing at first to store a
group id, name, amount, and date which I can add to delete from and
update. Could someone point me to some code that would show me how this
is done in python. Eventually, I intend to expand my little program to
process this file as a flat comma delimited file, move it to some type
of indexed file and finally to some RDBMS system. My little program
started out at about 9 lines of code and is now at about 100 with 5 or
six functions which I will eventually change to classes (I need to
learn OOP to but one step at a time).

I think it's much easier to go directly to SQL without those
diversions. In a way, an SQL database maps better to your idea
of files / records, and it just takes much less code and effort
to use SQL than to twist "normal" (for me) files into behaving
like main frame dittos.

I'd suggest that you download pysqlite. Then you have a small
embedded SQL database in your python program and don't need to
bother with a server. See http://initd.org/tracker/pysqlite

Another obvious solution, since you are saying that it's a
small file, is to always read the whole file into memory, and
to rewrite the whole file when you change things.

For info in CSV handling, see http://docs.python.org/lib/module-csv.html

For other non-SQL solutions, please have a look at
http://www-106.ibm.com/developerworks/library/l-pypers.html
and http://docs.python.org/lib/node77.html

len · May 19, 2005

Thanks for the reply.

I just read your response and will be taking your suggestion immediatly

Len Sumnler

len · May 19, 2005

Thanks for the reply

I think you might be right. I have been playing around with Linux at
home. What I may have to do in switch my mindset from IBM/Microsoft to
a more Unix way of thinking.

Also thanks for the code samples.

Len Sumnler

len · May 19, 2005

Thanks for the reply

Everyone seems to be saying the same thing which is jump into some
RDBM.

Len Sumnler

Paul Watson · May 19, 2005

len said:
I am an old time
cobol programmer from the IBM 360/370 eria and this ingrained idea of
file processing using file definition (FD's) I believe is causing me
problems because I think python requires a different way of looking at
datafiles and I haven't really gotten my brain around it yet.

Welcome, Len.

I would
like to create a small sequential file, processing at first to store a
group id, name, amount, and date which I can add to delete from and
update

In addition to the suggestions already given, you might take a look at the
struct module. This will let you use fixed-width binary records.

The concept of streams found in UNIX takes some getting used to. Many files
are maintained as text using delimited, variable length fields with a
newline at the end. Try 'cat /etc/passwd' on a UNIX/Linux host to see such
a file using a colon ':' as the delimiter.

I turn to the 'od' command when I want the truth. Use it to see what bytes
are -really- in the file. The following should work on Linux or under
Cygwin if you are still using Windows.

od -Ax -tcx1 thefile.dat

You can use od to look at data in the stream. The output of the print
command is going into the od command.

$ print "now"|od -Ax -tcx1
000000 6e 6f 77 0a
n o w \n
6e 6f 77 0a
000004

Dennis Lee Bieber · May 19, 2005

home. What I may have to do in switch my mindset from IBM/Microsoft to
a more Unix way of thinking.

Even Windows follows "stream I/O" concepts. The last OSs I've
worked on that still had OS support for "records" were DEC VMS and
RadioShacks TRS-DOS (yes, that OS did handle fixed-length direct access,
the application programmer did not have to do the offset calculations;
just open the file with a specified record length).

--

jlach · May 20, 2005

Hi Len,
If you want to still try this with a Windows programming language try
OZEXE Lite at http://www.ozdevelopment.com. With built in support for
ODBC and a simplified language you be up and running in no time. You
could just use a datasource for a text file. It's free by the way.

Regards,
James

Processing in Python help	0	Aug 31, 2022
Hello, newbie struggling with consistency	3	Jan 24, 2023
Processing a file using multithreads	4	Sep 8, 2011
Parallel Processing	5	Jan 8, 2012
Arrays - Processing 3 (using Java Subscript)	0	Dec 10, 2018
PDF File Code	4	Apr 20, 2023
Manyfile Processing	4	Dec 4, 2009
Reading log and saving data to DB	4	Aug 14, 2013

newbie file/DB processing

len

Dennis Lee Bieber

Mike Meyer

Magnus Lycka

len

len

len

Paul Watson

Dennis Lee Bieber

jlach

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads