requestion regarding regular expression

Kelie · Apr 14, 2006

Hello,

I'm trying to analyze some autolisp code with python. In the file to
be analyzed there are many functions. Each function begins with a
"defun" statement. And before that, there may or may not have comment
line(s), which begins with ";". My goal is to export each function
into separate files, with comments, if there is any. Below is the code
that I'm struggling with:

Code:

path = "C:\\AutoCAD\\LSP\\Sub.lsp"
string = file(path, 'r').read()

import re
pat = "\\;+.+\\n\\(DEFUN"
p = re.compile(pat,re.I)

iterator = p.finditer(string)
spans = [match.span() for match in iterator]

for i in range(min(15, len(spans))):
    print string[spans[i][0]:spans[i][1]]

The code above runs fine. But it only takes care of the situation in
which there is exactly one comment line above the "defun" statement.
How do I repeat the sub-pattern "\\;+.+\\n" here?
For example if I want to repeat this pattern 0 to 10 times, I know
"\\;+.+\\n{0:10}\\(DEFUN" does not work. But don't know where to put
"{0:10}". As a work around, I tried to use
pat = "|".join(["\\;+.+\\n"*i+ "\\(DEFUN" for i in range(11)]), and it
turned out to be very slow. Any help?

Thank you.

Kelie

Kent Johnson · Apr 14, 2006

Kelie said:
Hello,

I'm trying to analyze some autolisp code with python. In the file to
be analyzed there are many functions. Each function begins with a
"defun" statement. And before that, there may or may not have comment
line(s), which begins with ";". My goal is to export each function
into separate files, with comments, if there is any. Below is the code
that I'm struggling with:

Code:

path = "C:\\AutoCAD\\LSP\\Sub.lsp" string = file(path, 'r').read() import re pat = "\\;+.+\\n\\(DEFUN" p = re.compile(pat,re.I) iterator = p.finditer(string) spans = [match.span() for match in iterator] for i in range(min(15, len(spans))): print string[spans[i][0]:spans[i][1]]

The code above runs fine. But it only takes care of the situation in
which there is exactly one comment line above the "defun" statement.

ISTM you don't need regex here, a simple line processor will work.
Something like this (untested):

path = "C:\\AutoCAD\\LSP\\Sub.lsp"
lines = open(path).readlines()

# Find the starts of all the functions
starts = [i for i, line in enumerate(lines) if line.startswith('(DEFUN')]

# Check for leading comments
for i, start in starts:
while start > 0 and lines[start-1].startswith(';'):
starts = start = start-1

# Now starts should be a list of line numbers for the start of each function

Kent

BartlebyScrivener · Apr 14, 2006

Kent,

Running

path = "d:/emacs files/emacsinit.txt"
lines = open(path).readlines()
# my defun lines are lowercase,
# next two lines are all on one
starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]
for i, start in starts:
while start > 0 and lines[start-1].startswith(';'):
starts = start = start-1
print starts

I get

File "D:\Python\findlines.py", line 7, in __main__
for i, start in starts:
TypeError: unpack non-sequence

Also, I don't understand the "i for i", but I don't understand a lot of
things yet

thanks,

rick

Felipe Almeida Lessa · Apr 14, 2006

Em Sex, 2006-04-14 Ã s 07:47 -0700, BartlebyScrivener escreveu:

starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]

This line makes a list of integers. enumerate gives you a generator that
yields tuples consisting of (integer, object), and by "i for i, line"
you unpack the tuple into "(i, line)" and pick just "i".

for i, start in starts:

Here you try to unpack the elements of the list "starts" into "(i,
start)", but as we saw above the list contains just "i", so an exception
is raised.

I don't know what you want, but...

starts = [i, line for i, line in enumerate(lines) if
line.startswith('(defun')]

or

starts = [x for x in enumerate(lines) if x[1].startswith('(defun')]

....may (or may not) solve your problem.

Kent Johnson · Apr 14, 2006

BartlebyScrivener said:
Kent,

Running

path = "d:/emacs files/emacsinit.txt"
lines = open(path).readlines()
# my defun lines are lowercase,
# next two lines are all on one
starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]
for i, start in starts:
while start > 0 and lines[start-1].startswith(';'):
starts = start = start-1
print starts

I get

File "D:\Python\findlines.py", line 7, in __main__
for i, start in starts:
TypeError: unpack non-sequence

Sorry, should be
for i, start in enumerate(starts):

start is a specific start line, i is the index of that start line in the
starts array (so the array can be modified in place).

Kent

BartlebyScrivener · Apr 14, 2006

That's it. Thank you! Very instructive.

Final:

path = "d:/emacs files/emacsinit.txt"
lines = open(path).readlines()
# next two lines all on one
starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]
for i, start in enumerate(starts):
while start > 0 and lines[start-1].startswith(';'):
starts = start = start-1
print starts

Scott David Daniels · Apr 14, 2006

BartlebyScrivener said:
That's it. Thank you! Very instructive.

Final:

path = "d:/emacs files/emacsinit.txt"
lines = open(path).readlines()
# next two lines all on one
starts = [i for i, line in enumerate(lines) if
line.startswith('(defun')]
for i, start in enumerate(starts):
while start > 0 and lines[start-1].startswith(';'):
starts = start = start-1
print starts

If you don't want to hold the whole file in memory, this gets the
starts a result at a time:

def starts(source):
prelude = None
for number, line in enumerate(source): # read and number a line
if line[0] == ';':
if prelude is None:
prelude = number # Start of commented region
# else: this line just extends previous prelude
else:
if line.startswith('(defun'):
# You could append to a result here, but yield lets
# the first found one get out straightaway.
if prelude is None:
yield number
else:
yield prelude
prelude = None

path = "d:/emacs files/emacsinit.txt"
source = open(path)
try:
for line in starts(source):
print line,
# could just do: print list(starts(source))
finally:
source.close()
print

BartlebyScrivener · Apr 14, 2006

This is very helpful.

I wasn't the OP. I'm just learning, but I'm on the verge of making my
own file searching scripts. This will be a huge help. Thanks for
posting, and especially thanks for the comments in the code. Big help!

rick

Kelie · Apr 15, 2006

Thanks to both of you, Kent and Scott.

Custom Minecraft launcher client error; I think regarding java	0	Sep 7, 2022
using regular express to analyze lisp code	4	Oct 4, 2007
PyWart: Python regular expression syntax is not intuitive.	18	Jan 25, 2012
regular expression extracting groups	3	Aug 10, 2008
small regexp help	1	Oct 30, 2013
What's the best way to write this regular expression?	41	Mar 6, 2012
Regular expression for different date formats in Python	4	Nov 26, 2012
Regular Expression for Finding and Deleting comments	1	Jan 4, 2011

requestion regarding regular expression

Kelie

Kent Johnson

BartlebyScrivener

Felipe Almeida Lessa

Kent Johnson

BartlebyScrivener

Scott David Daniels

BartlebyScrivener

Kelie

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads