newbie write to file question

P

ProvoWallis

Hi,

I'm trying to create a script that will search an SGML file for the
numbers and titles of the hierarchical elements (section level
headings) and create a dictionary with the section number as the key
and the title as the value.

I've managed to make some progress but I'd like to get some general
feedback on my progress so far plus ask a question. When I run this
script on a directory that contains multiple files even the files that
don't contain any matches generate log files and usually with the
contents of the last file that contained matches. I'm not sure what I'm
missing so I'd appreciate some advice.

Thanks,

Greg


Here's a very simplified version of my SGML:

<sec-main no="1.01"><title>section title 1.01
<sec-sub1 no="1"><title>title 1
<sec-sub1 no="2"><title>title 2
<sec-sub2 no="a"><title>title a
<sec-sub2 no="b"><title>title b
<sec-sub3 no="i"><title>title i
<sec-main no="2.02"><title>section title 2.02
<sec-main no="3.03"><title>section title 3.03
<sec-sub1 no="1"><title>title 1
<sec-sub1 no="2"><title>title 2
<sec-main no="4.04"><title>section title 4.04
<sec-main no="5.05"><title>section title 5.05

And here's what I written so far:

import os
import re

setpath = raw_input("Enter the path where the program should run: ")
print

table ={}

for root, dirs, files in os.walk(setpath):
fname = files
for fname in files:
inputFile = file(os.path.join(root,fname), 'r')


while 1:
lines = inputFile.readlines(10000)
if not lines:
break
for line in lines:
main = re.search(r'(?i)<sec-main
no=\"(\d+\.\d\d)\">\n?<title>(.*?)\n' , line)
sub_one = re.search(r'(?i)<sec-sub1
no=\"(\w*)\">\n?<title>(.*?)\n' , line)
sub_two = re.search(r'(?i)<sec-sub2
no=\"(\w*)\">\n?<title>(.*?)\n' , line)
sub_three = re.search(r'(?i)<sec-sub3
no=\"(\w*)\">\n?<title>(.*?)\n' , line)
if main is not None:
table[main.group(1)] = main.group(2)
m = main.group(1)
if main is None:
pass
if sub_one is not None:
one = m + '[' + sub_one.group(1) + ']'
table[one] = sub_one.group(2)
if sub_one is None:
pass
if sub_two is not None:
two = one + '[' + sub_two.group(1) + ']'
table[two] = sub_two.group(2)
if sub_two is None:
pass
if sub_three is not None:
three = two + '[' + sub_three.group(1) + ']'
table[three] = sub_three.group(2)
if sub_three is None:
pass

str_table = str(table)
(name,ext) = os.path.splitext(fname)
output_name = name + '.log'
outputFile =
file(os.path.join(root,output_name), 'w')
outputFile.write(str_table)
outputFile.close()
 
R

Rob E

I'm not sure what I'm
missing so I'd appreciate some advice.
You question is pretty general and I'm not going to go over this in any
great detail, but I will make a few comments.

* In your if section use if ... else constructs not all the strange if
and then not if blocks. Also get rid of all those unneeded if's with the
pass in them. They do nothing.
* You may want to put the heart of the code in a separate function or if
you need persistance, use a class. Optional and depends on how complex
your analysis code is going to be. Generally one function should not
be two deep in terms of block nesting for readabilily
and maintainability.
* the table initialization i.e. table = {} was outside of your main file
scan loop, that seemed strange to me since I think you were doing this
file by file.
* your log writing code was indented below the the if sub_three is None:
if block which means that it's inside that block -- that's probably not
what you want. Remember python defines blocks by indentation. The
indentation is a nice feature because python blocking is in fact like
it looks (unlike C++).
* if your parsing XML and maybe SGML the python library has some special
tools for this. You might want to look at the lib or search the net.

Take care,
Rob
 
S

Scott David Daniels

ProvoWallis wrote:
....
for root, dirs, files in os.walk(setpath):
fname = files
for fname in files:
inputFile = file(os.path.join(root,fname), 'r')
while 1:
lines = inputFile.readlines(10000)
if not lines:
break
for line in lines:
...
Is pretty verbose. Also, open is meant to be the function to use.
Try:
for root, dirs, files in os.walk(setpath):
for fname in files:
inputFile = open(os.path.join(root,fname), 'r')
for line in inputFile:
...
inputFile.close()


There is no point to lines like:
> if main is None:
> pass
Just drop it.

Perhaps what you meant was:
if main is None:
continue
If so, change:
if main is not None:
table[main.group(1)] = main.group(2)
m = main.group(1)
> if main is None:
> pass
To:
if main is None:
continue
table[main.group(1)] = main.group(2)
m = main.group(1)

As Rob E pointed out, your postlude has the wrong indentation.
You may want:
for root, dirs, files in os.walk(setpath):
for fname in files:
inputFile = open(os.path.join(root, fname), 'r')
for line in inputFile:
...
inputFile.close()
if table:
name, ext = os.path.splitext(fname)
output_name = name + '.log'
outputFile = open(os.path.join(root, name + '.log'), 'w')
outputFile.write(str(table))
outputFile.close()
table = {}
So that you only write logs if something was found.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top