Newbie Class/Counter question

P

ProvoWallis

Hi,

I've always struggled with classes and this one is no exception.

I'm working in an SGML file and I want to renumber a couple of elements
in the hierarchy based on the previous level.

E.g.,

My document looks like this

<level1>A. Title Text
<level2>1. Title Text
<level2>1. Title Text
<level2>1. Title Text
<level1>B. Title Text
<level2>1. Title Text
<level2>1. Title Text

but I want to change the numbering of the second level to sequential
numbers like 1, 2, 3, etc. so my output would look like this

<level1>A. Title Text
<level2>1. Title Text
<level2>2. Title Text
<level2>3. Title Text
<level1>B. Title Text
<level2>1. Title Text
<level2>2. Title Text

This is what I've come up with on my own but it doesn't work. I was
hoping someone could critique this and point me in the right or better
direction.

Thanks,

Greg

###


def Fix(m):

new = m.group(1)

class ReplacePtSubNumber(object):

def __init__(self):
self._count = 0
self._ptsubtwo_re = re.compile(r'<pt-sub2
no=\"[0-9]\">', re.IGNORECASE| re.UNICODE)
# self._ptsubone_re = re.compile(r'<pt-sub1',
re.IGNORECASE| re.UNICODE)

def sub(self, new):
return self._ptsubtwo_re.sub(self._ptsubNum, new)

def _ptsubNum(self, match):
self._count +=1
return '<pt-sub2 no="%s">' % (self._count)


new = ReplacePtSubNumber().sub(new)
return '<pt-sub1%s<pt-sub1' % (new)

data = re.sub(r'(?i)(?m)(?s)<pt-sub1(.*?)<pt-sub1', Fix, data)
 
P

Paul McGuire

ProvoWallis said:
Hi,

I've always struggled with classes and this one is no exception.

I'm working in an SGML file and I want to renumber a couple of elements
in the hierarchy based on the previous level.

E.g.,

My document looks like this

<level1>A. Title Text
<level2>1. Title Text
<level2>1. Title Text
<level2>1. Title Text
<level1>B. Title Text
<level2>1. Title Text
<level2>1. Title Text

but I want to change the numbering of the second level to sequential
numbers like 1, 2, 3, etc. so my output would look like this

<level1>A. Title Text
<level2>1. Title Text
<level2>2. Title Text
<level2>3. Title Text
<level1>B. Title Text
<level2>1. Title Text
<level2>2. Title Text

This is what I've come up with on my own but it doesn't work. I was
hoping someone could critique this and point me in the right or better
direction.

Thanks,

Greg

###


def Fix(m):

new = m.group(1)

class ReplacePtSubNumber(object):

def __init__(self):
self._count = 0
self._ptsubtwo_re = re.compile(r'<pt-sub2
no=\"[0-9]\">', re.IGNORECASE| re.UNICODE)
# self._ptsubone_re = re.compile(r'<pt-sub1',
re.IGNORECASE| re.UNICODE)

def sub(self, new):
return self._ptsubtwo_re.sub(self._ptsubNum, new)

def _ptsubNum(self, match):
self._count +=1
return '<pt-sub2 no="%s">' % (self._count)


new = ReplacePtSubNumber().sub(new)
return '<pt-sub1%s<pt-sub1' % (new)

data = re.sub(r'(?i)(?m)(?s)<pt-sub1(.*?)<pt-sub1', Fix, data)

This may not be as elegant as your RE approach, but it seems more readable
to me. Using pyparsing, we can define search patterns, attach callbacks to
be invoked when a match is found, and the callbacks can return modified text
to replace the original. Although the running code matches your text
sample, I've also included commented statements that match your source code
sample.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul


testData = """<level1>A. Title Text
<level2>1. Title Text
<level2>1. Title Text
<level2>1. Title Text
<level1>B. Title Text
<level2>1. Title Text
<level2>1. Title Text
"""

from pyparsing import *

class Fix(object):
def __init__(self):
self.curItem = 0

def resetCurItem(self,s,l,t):
self.curItem = 0

def nextCurItem(self,s,l,t):
self.curItem += 1
return "<level2>%d." % self.curItem
# return '<pt-sub2 no="%d">' % self.curItem

def fixText(self,data):
# set up patterns for searching
lev1 = Literal("<level1>")
lev2 = Literal("<level2>") + Word(nums) + "."
# lev1 = CaselessLiteral("<pt-sub1>")
# lev2 = CaselessLiteral('<pt-sub2 no="') + Word(nums) + '">'

# when level 1 encountered, reset the cur item counter
lev1.setParseAction(self.resetCurItem)

# when level 2 encountered, use next cur item counter value
lev2.setParseAction(self.nextCurItem)

patterns = (lev1 | lev2)
return patterns.transformString( data )

f = Fix()
print f.fixText( testData )

returns:
<level1>A. Title Text
<level2>1. Title Text
<level2>2. Title Text
<level2>3. Title Text
<level1>B. Title Text
<level2>1. Title Text
<level2>2. Title Text
 
M

Michael Tobis

Paul, thanks for the enlightening intro to pyparsing!

We don't really know what the application is (I suspect it's homework),
but whether it's homework or a real-world one-off this all seems like
overkill to me. There's a time for elegance and a time for quick and
dirty. Assuming line oriented input as shown, I can do it in 9 simple
lines (or 7, if there is no "level3" or higher case) with no imports,
no regular expressions, and no user defined classes. However, I hate to
submit someone's homework.

Anyway, Provo seems confused, and should focus more on technique, I
think. Remember that "simple is better than complex" and "readability
counts". Try to find a way to take smaller steps toward your goal.

mt
 
M

Michael Spencer

ProvoWallis said:
My document looks like this

<level1>A. Title Text
<level2>1. Title Text
<level2>1. Title Text
<level2>1. Title Text
<level1>B. Title Text
<level2>1. Title Text
<level2>1. Title Text

but I want to change the numbering of the second level to sequential
numbers like 1, 2, 3, etc. so my output would look like this

<level1>A. Title Text
<level2>1. Title Text
<level2>2. Title Text
<level2>3. Title Text
<level1>B. Title Text
<level2>1. Title Text
<level2>2. Title Text
....

Here's a fixed-up version of your approach:

import re

source_text = """
<level1>A. Title Text
<level2>1. Title Text
<level2>1. Title Text
<level2>1. Title Text
<level1>B. Title Text
<level2>1. Title Text
<level2>1. Title Text"""


class ReplacePtSubNumber(object):

line_pattern = re.compile("""
(\<\w+\>) # <level1>
(\w) # second level
(\.\s+\w+\s*\w+) # . Title Text
""", re.VERBOSE)

def __init__(self):
self._count = 0

def sub(self, match):
level, second, rest = match.groups()
if second.isalpha():
self._count = 0
else:
self._count +=1
second = str(self._count)
return "%s%s%s" % (level, second, rest)

def replace(self, source):
return self.line_pattern.sub(self.sub, source)

<level1>A. Title Text
<level2>1. Title Text
<level2>2. Title Text
<level2>3. Title Text
<level1>B. Title Text
<level2>1. Title Text
<level2>2. Title Text

HTH
Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top