Simple text parsing gets difficult when line continues to next line

J

Jacob Rael

Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)


I read in each line with:

for line in open(fileName).readlines():

I would line to identify if a line continues (if line.endswith('_'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

jr
 
L

Larry Bates

Jacob said:
Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)


I read in each line with:

for line in open(fileName).readlines():

I would line to identify if a line continues (if line.endswith('_'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

jr
Something like (not tested):

fp=open(filename, 'r')
for line in fp:
while line.rstrip().endswith('_'):
line+=fp.next()


fp.close()

-Larry
 
R

Roberto Bonvallet

Jacob Rael wrote:
[...]
I would line to identify if a line continues (if line.endswith('_'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

Don't use readlines.

# NOT TESTED
program = open(fileName)
for line in program:
while line.rstrip("\n").endswith("_"):
line = line.rstrip("_ \n") + program.readline()
do_the_magic()

Cheers,
 
D

Dennis Lee Bieber

I read in each line with:

for line in open(fileName).readlines():

I would line to identify if a line continues (if line.endswith('_'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?
Well, besides the stereotypical ("Doctor, it hurts when I do this";
"Don't do that")...

UNTESTED

line = ""
inCont = False
for ln in open(whatever).readlines():
if inCont:
line = line + ln #stripped of newlines, of course
else:
line = ln

inCont = line.endswith("_")

if not inCont:
#process completed line
line = ""


Though I suspect newer versions of Python, where file objects can be
used as iterators, would be better...

REALLY UNTESTED

f = open(whatever)
for ln in f:
#strip newline, of course
while ln.endswith("_"):
ln = ln + f.readline() #strip the newline here too
#process ln

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
J

John Machin

Jacob said:
Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)


I read in each line with:

for line in open(fileName).readlines():

I would line to identify if a line continues (if line.endswith('_'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

Don't do that. I'm rather dubious about approaches that try to grab the
next line on the fly e.g. fp.next(). Here's a function that takes a
list of lines and returns another with all trailing whitespace removed
and the continued lines glued together. It uses a simple state machine
approach.

def continue_join(linesin):
linesout = []
buff = ""
NORMAL = 0
PENDING = 1
state = NORMAL
for line in linesin:
line = line.rstrip()
if state == NORMAL:
if line.endswith('_'):
buff = line[:-1]
state = PENDING
else:
linesout.append(line)
else:
if line.endswith('_'):
buff += line[:-1]
else:
buff += line
linesout.append(buff)
buff = ""
state = NORMAL
if state == PENDING:
raise ValueError("last line is continued: %r" % line)
return linesout

import sys
fp = open(sys.argv[1])
rawlines = fp.readlines()
cleanlines = continue_join(rawlines)
for line in cleanlines:
print repr(line)
===
Tested with following files:
C:\junk>type contlinet1.txt
only one line

C:\junk>type contlinet2.txt
line 1
line 2

C:\junk>type contlinet3.txt
line 1
line 2a _
line 2b _
line 2c
line 3

C:\junk>type contlinet4.txt
line 1
_
_
line 2c
line 3

C:\junk>type contlinet5.txt
line 1
_
_
line 2c
line 3 _

C:\junk>

HTH,
John
 
T

Tim Hochberg

John said:
Don't do that. I'm rather dubious about approaches that try to grab the
next line on the fly e.g. fp.next(). Here's a function that takes a
list of lines and returns another with all trailing whitespace removed
and the continued lines glued together. It uses a simple state machine
approach.

I agree that mixing the line assembly and parsing is probably a mistake
although using next explicitly is fine as long as your careful with it.
For instance, I would be wary to use the mixed for-loop, next strategy
that some of the previous posts suggested. Here's a different,
generator-based implementation of the same idea that, for better or for
worse is considerably less verbose:

def continue_join_2(linesin):
getline = iter(linesin).next
while True:
buffer = getline().rstrip()
try:
while buffer.endswith('_'):
buffer = buffer[:-1] + getline().rstrip()
except StopIteration:
raise ValueError("last line is continued: %r" % line)
yield buffer

-tim

[SNIP]
 
J

John Machin

Tim Hochberg wrote:
[snip]
I agree that mixing the line assembly and parsing is probably a mistake
although using next explicitly is fine as long as your careful with it.
For instance, I would be wary to use the mixed for-loop, next strategy
that some of the previous posts suggested. Here's a different,
generator-based implementation of the same idea that, for better or for
worse is considerably less verbose:
[snip]

Here's a somewhat less verbose version of the state machine gadget.

def continue_join_3(linesin):
linesout = []
buff = ""
pending = 0
for line in linesin:
# remove *all* trailing whitespace
line = line.rstrip()
if line.endswith('_'):
buff += line[:-1]
pending = 1
else:
linesout.append(buff + line)
buff = ""
pending = 0
if pending:
raise ValueError("last line is continued: %r" % line)
return linesout

FWIW, it works all the way back to Python 2.1

Cheers,
John,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,270
Latest member
TopCryptoTwitterChannels_

Latest Threads

Top