Newbie Count Question

P

ProvoWallis

I have a newbie count question.

I have a number of SGML documents divided into sections but over the
course of editing them the some sections have been deleted (and perhaps
others added). I'd like to renumber them. The input documents look like
this:

<sec-main no="1.01">
<content>
<sec-main no="1.02">
<content>
<sec-main no="1.03">
<content>
<sec-main no="1.08">
<content>
<sec-main no="1.09">
<content>
<sec-main no="1.10">
<content>
<sec-main no="1.15">

and after renumbering I would like the sections to look like this:

<sec-main no="1.01">
<content>
<sec-main no="1.02">
<content>
<sec-main no="1.03">
<content>
<sec-main no="1.04">
<content>
<sec-main no="1.05">
<content>
<sec-main no="1.06">
<content>
<sec-main no="1.07">

so they are basically numbered sequentially from 1 thru to the end of
the number of sections.

I've managed to get this far thanks to looking at other posts on the
board but no matter waht I try all of the sections end up being
numbered for the total number of sections in the document. e.g., if
there are 100 sections in the document the "no" attribute is "1.100"
for each one.

import os, re

setpath = raw_input("Enter the path where the program should run: ")
print

for root, folders, files in os.walk(setpath):
for name in files:
filepath = os.path.join(root, name)
fileopen = open(filepath, 'r')
data = fileopen.read()
fileopen.close()

secmain_pattern = re.compile(r'<sec-main no=\"(\d*)\.(\d*)\">',
re.IGNORECASE)
m = secmain_pattern.search(data)
all = secmain_pattern.findall(data)

counter = 0
for i in range(0,len(all)):
counter = counter + 1
print counter

if m is not None:
def new_number(match):
return '<sec-main no="%s.%s">' % (match.group(1),
counter)
data = secmain_pattern.sub(new_number, data)

outputFile = file(os.path.join(root,name), 'w')
outputFile.write(data)
outputFile.close()


Thanks for your help!
 
G

George Sakkis

ProvoWallis said:
I've managed to get this far thanks to looking at other
posts on the board but no matter waht I try all of the
sections end up being numbered for the total number of
sections in the document. e.g., if there are 100 sections
in the document the "no" attribute is "1.100"
for each one.

Of course it is; the counter you compute is fixed and equal to
len(all). What you need is a substitution function that keeps track of
the counter and increments it by one for every substitution. This means
that the counter becomes part of the function's state. When you hear
"function" and "state" together, the typical solution is "class":

import re

class ReplaceSecMain(object):
def __init__(self):
self._count = 0
self._secmain_re = re.compile(r'''
(?<= <sec-main\ no=" ) # positive lookbehind assertion
( \d* ) # first number
(?: \.\d* ) # dot and second number (ignored)
(?= "> ) # positive lookahead assertion
''', re.IGNORECASE | re.VERBOSE)

def sub(self, text):
return self._secmain_re.sub(self._subNum, text)

def _subNum(self, match):
self._count += 1
return '%s.%.2d' % (match.group(1), self._count)


print ReplaceSecMain().sub(open("myfile.txt").read())


I also cleaned up the regular expression a little; google for
lookahead/lookbehind assertions if you haven't seen them before.

HTH,
George
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top