requestion regarding regular expression

K

Kelie

Hello,

I'm trying to analyze some autolisp code with python. In the file to
be analyzed there are many functions. Each function begins with a
"defun" statement. And before that, there may or may not have comment
line(s), which begins with ";". My goal is to export each function
into separate files, with comments, if there is any. Below is the code
that I'm struggling with:

Code:
path = "C:\\AutoCAD\\LSP\\Sub.lsp"
string = file(path, 'r').read()

import re
pat = "\\;+.+\\n\\(DEFUN"
p = re.compile(pat,re.I)

iterator = p.finditer(string)
spans = [match.span() for match in iterator]

for i in range(min(15, len(spans))):
    print string[spans[i][0]:spans[i][1]]

The code above runs fine. But it only takes care of the situation in
which there is exactly one comment line above the "defun" statement.
How do I repeat the sub-pattern "\\;+.+\\n" here?
For example if I want to repeat this pattern 0 to 10 times, I know
"\\;+.+\\n{0:10}\\(DEFUN" does not work. But don't know where to put
"{0:10}". As a work around, I tried to use
pat = "|".join(["\\;+.+\\n"*i+ "\\(DEFUN" for i in range(11)]), and it
turned out to be very slow. Any help?

Thank you.

Kelie
 
K

Kent Johnson

Kelie said:
Hello,

I'm trying to analyze some autolisp code with python. In the file to
be analyzed there are many functions. Each function begins with a
"defun" statement. And before that, there may or may not have comment
line(s), which begins with ";". My goal is to export each function
into separate files, with comments, if there is any. Below is the code
that I'm struggling with:

Code:
path = "C:\\AutoCAD\\LSP\\Sub.lsp"
string = file(path, 'r').read()

import re
pat = "\\;+.+\\n\\(DEFUN"
p = re.compile(pat,re.I)

iterator = p.finditer(string)
spans = [match.span() for match in iterator]

for i in range(min(15, len(spans))):
print string[spans[i][0]:spans[i][1]]

The code above runs fine. But it only takes care of the situation in
which there is exactly one comment line above the "defun" statement.

ISTM you don't need regex here, a simple line processor will work.
Something like this (untested):

path = "C:\\AutoCAD\\LSP\\Sub.lsp"
lines = open(path).readlines()

# Find the starts of all the functions
starts = [i for i, line in enumerate(lines) if line.startswith('(DEFUN')]

# Check for leading comments
for i, start in starts:
while start > 0 and lines[start-1].startswith(';'):
starts = start = start-1

# Now starts should be a list of line numbers for the start of each function

Kent
 
B

BartlebyScrivener

Kent,

Running

path = "d:/emacs files/emacsinit.txt"
lines = open(path).readlines()
# my defun lines are lowercase,
# next two lines are all on one
starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]
for i, start in starts:
while start > 0 and lines[start-1].startswith(';'):
starts = start = start-1
print starts

I get

File "D:\Python\findlines.py", line 7, in __main__
for i, start in starts:
TypeError: unpack non-sequence

Also, I don't understand the "i for i", but I don't understand a lot of
things yet :)

thanks,

rick
 
F

Felipe Almeida Lessa

Em Sex, 2006-04-14 às 07:47 -0700, BartlebyScrivener escreveu:
starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]

This line makes a list of integers. enumerate gives you a generator that
yields tuples consisting of (integer, object), and by "i for i, line"
you unpack the tuple into "(i, line)" and pick just "i".
for i, start in starts:

Here you try to unpack the elements of the list "starts" into "(i,
start)", but as we saw above the list contains just "i", so an exception
is raised.

I don't know what you want, but...

starts = [i, line for i, line in enumerate(lines) if
line.startswith('(defun')]

or

starts = [x for x in enumerate(lines) if x[1].startswith('(defun')]

....may (or may not) solve your problem.
 
K

Kent Johnson

BartlebyScrivener said:
Kent,

Running

path = "d:/emacs files/emacsinit.txt"
lines = open(path).readlines()
# my defun lines are lowercase,
# next two lines are all on one
starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]
for i, start in starts:
while start > 0 and lines[start-1].startswith(';'):
starts = start = start-1
print starts

I get

File "D:\Python\findlines.py", line 7, in __main__
for i, start in starts:
TypeError: unpack non-sequence


Sorry, should be
for i, start in enumerate(starts):

start is a specific start line, i is the index of that start line in the
starts array (so the array can be modified in place).

Kent
 
B

BartlebyScrivener

That's it. Thank you! Very instructive.

Final:

path = "d:/emacs files/emacsinit.txt"
lines = open(path).readlines()
# next two lines all on one
starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]
for i, start in enumerate(starts):
while start > 0 and lines[start-1].startswith(';'):
starts = start = start-1
print starts
 
S

Scott David Daniels

BartlebyScrivener said:
That's it. Thank you! Very instructive.

Final:

path = "d:/emacs files/emacsinit.txt"
lines = open(path).readlines()
# next two lines all on one
starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]
for i, start in enumerate(starts):
while start > 0 and lines[start-1].startswith(';'):
starts = start = start-1
print starts

If you don't want to hold the whole file in memory, this gets the
starts a result at a time:

def starts(source):
prelude = None
for number, line in enumerate(source): # read and number a line
if line[0] == ';':
if prelude is None:
prelude = number # Start of commented region
# else: this line just extends previous prelude
else:
if line.startswith('(defun'):
# You could append to a result here, but yield lets
# the first found one get out straightaway.
if prelude is None:
yield number
else:
yield prelude
prelude = None


path = "d:/emacs files/emacsinit.txt"
source = open(path)
try:
for line in starts(source):
print line,
# could just do: print list(starts(source))
finally:
source.close()
print
 
B

BartlebyScrivener

This is very helpful.

I wasn't the OP. I'm just learning, but I'm on the verge of making my
own file searching scripts. This will be a huge help. Thanks for
posting, and especially thanks for the comments in the code. Big help!

rick
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top