Refactoring a generator function

K

Kent Johnson

Here is a simple function that scans through an input file and groups the lines of the file into
sections. Sections start with 'Name:' and end with a blank line. The function yields sections as
they are found.

def makeSections(f):
currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection

There is some obvious code duplication in the function - this bit is repeated 2.67 times ;-):
if currSection:
yield currSection
currSection = []

As a firm believer in Once and Only Once, I would like to factor this out into a separate function,
either a nested function of makeSections(), or as a separate method of a class implementation.
Something like this:

def makeSections(f): ### DOESN'T WORK ###
currSection = []

def yieldSection():
if currSection:
yield currSection
del currSection[:]

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
yieldSection()
currSection.append(line)

elif not line:
# Blank line ends a section
yieldSection()

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
yieldSection()


The problem is that yieldSection() now is the generator, and makeSections() is not, and the result
of calling yieldSection() is a new iterator, not the section...

Is there a way to do this or do I have to live with the duplication?

Thanks,
Kent


Here is a complete program:

data = '''
Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx
.....................
xxxxxxxxxxxxxxxxxxxx


Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx

'''

import cStringIO # just for test

def makeSections(f):
''' This is a generator function. It will return successive sections
of f until EOF.

Sections are every line from a 'Name:' line to the first blank line.
Sections are returned as a list of lines with line endings stripped.
'''

currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection


f = cStringIO.StringIO(data)

for section in makeSections(f):
print 'Section'
for line in section:
print ' ', line
print
 
S

Steven Bethard

Kent said:
Here is a simple function that scans through an input file and groups
the lines of the file into sections. Sections start with 'Name:' and end
with a blank line. The function yields sections as they are found.

def makeSections(f):
currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection

There is some obvious code duplication in the function - this bit is
repeated 2.67 times ;-):
if currSection:
yield currSection
currSection = []

You can write:

for section in yieldSection():
yield section

in both places, but I assume you still don't like the code duplication
this would create.

How about something like (completely untested):

if line == 'Name:' or not line:
if currSection:
yield currSection
currSection = []
if line == 'Name:'
currSection.append(line)

Another consideration: in Python 2.4, itertools has a groupby function
that you could probably get some benefit from:
.... def __init__(self):
.... self.is_section = False
.... def __call__(self, line):
.... if line == 'Name:\n':
.... self.is_section = True
.... elif line == '\n':
.... self.is_section = False
.... return self.is_section
........ for _, section in itertools.groupby(f, Sections()):
.... result = ''.join(section)
.... if result != '\n':
.... yield result
....['Name:\nA\nx\ny\nz\n', 'Name:\nB\na\nb\nc\n']
 
M

max

Here is a simple function that scans through an input file and
groups the lines of the file into sections. Sections start with
'Name:' and end with a blank line. The function yields sections
as they are found.

def makeSections(f):
currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection

There is some obvious code duplication in the function - this bit
is repeated 2.67 times ;-):
if currSection:
yield currSection
currSection = []

As a firm believer in Once and Only Once, I would like to factor
this out into a separate function, either a nested function of
makeSections(), or as a separate method of a class
implementation. Something like this:


The problem is that yieldSection() now is the generator, and
makeSections() is not, and the result of calling yieldSection()
is a new iterator, not the section...

Is there a way to do this or do I have to live with the
duplication?

Thanks,
Kent

This gets rid of some duplication by ignoring blanklines altogether,
which might be a bug...

def makeSections2(f):
currSection = []
for line in f:
line = line.strip()
if line:
if line == 'Name:':
if currSection:
yield cs
currSection = []
currSection.append(line)
if currSection:
yield currSection

but

def makeSections2(f):
currSection = []
for line in f:
line = line.strip()

if line:
if line == 'Name:':
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif currSection:
yield currSection

if currSection:
yield currSection

should be equivalent.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,276
Latest member
Sawatmakal

Latest Threads

Top