String Manipulation Help!

D

Dave

OK, I'm stumped.

I'm trying to find newline characters (\n, specifically) that are NOT
in comments.

So, for example (where "<-" = a newline character):
==========================================
1: <-
2: /*<-
3: ----------------------<-
4: comment<-
5: ----------------------<-
6: */<-
7: <-
8: CODE CODE CODE<-
9: <-
==========================================

I want to return the newline characters at lines 1, 6, 7, 8, and 9 but
NOT the others.

I've tried using regular expressions but I dislike them because they
aren't immediately readable (and also I don't bloody understand the
things). I'm not opposed to using them, though, if they provide a
solution to this problem!

Thanks in advance for any suggestions anyone can provide.

- Dave
 
K

Kirk McDonald

Dave said:
OK, I'm stumped.

I'm trying to find newline characters (\n, specifically) that are NOT
in comments.

So, for example (where "<-" = a newline character):
==========================================
1: <-
2: /*<-
3: ----------------------<-
4: comment<-
5: ----------------------<-
6: */<-
7: <-
8: CODE CODE CODE<-
9: <-
==========================================

[snip]

Well, I'm sure there is some regex that'll do it, but here's a stupid
iterative solution:

def newlines(s):
nl = []
inComment = False
for i in xrange(len(s)):
if s[i:i+2] == '/*':
inComment = True
if s[i:i+2] == '*/':
inComment = False
if inComment: continue
if s == '\n':
nl.append(i)
return tuple(nl)

Your example returns:
(0, 64, 65, 80, 81)

This probably isn't as fast as a regex, but at least it works.

-Kirk McDonald
 
P

Paul McGuire

Dave said:
OK, I'm stumped.

I'm trying to find newline characters (\n, specifically) that are NOT
in comments.

So, for example (where "<-" = a newline character):
==========================================
1: <-
2: /*<-
3: ----------------------<-
4: comment<-
5: ----------------------<-
6: */<-
7: <-
8: CODE CODE CODE<-
9: <-
==========================================

I want to return the newline characters at lines 1, 6, 7, 8, and 9 but
NOT the others.

Dave -

Pyparsing has built-in support for detecting line breaks and comments, and
the syntax is pretty simple, I think. Here's a pyparsing program that gives
your desired results:

===============================
from pyparsing import lineEnd, cStyleComment, lineno

testsource = """
/*
----------------------
comment
----------------------
*/

CODE CODE CODE

"""

# define the expression you want to search for
eol = lineEnd

# specify that you don't want to match within C-style comments
eol.ignore(cStyleComment.leaveWhitespace())

# loop through all the occurrences returned by scanString
# and print the line number of that location within the original string
for toks,startloc,endloc in eol.scanString(testsource):
print lineno(startloc,data)
===============================

The expression you are searching for is pretty basic, just a plain
end-of-line, or pyparsing's built-in expression, lineEnd. The curve you are
throwing is that you *don't* want eol's inside of C-style comments.
Pyparsing allows you to designate an "ignore" expression to skip undesirable
content, and fortunately, ignoring comments happens so often during parsing,
that pyparsing includes common comment expressions for C, C++, Java, Python,
and HTML. Next, pyparsing's version of re.search is scanString. scanString
returns a generator that gives the matching tokens, start location, and end
location of every occurrence of the given parse expression, in your case,
eol. Finally, in the body of our for loop, we use pyparsing's lineno
function to give us the line number of a string location within the original
string.

About the only real wart on all this is that pyparsing implicitly skips over
leading whitespace, even when looking for expressions to be ignored. In
order not to lose eols that are just before a comment (like your line 1), we
have to modify cStyleComment to leave leading whitespace.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul
 
R

Richard Schneiderman

I really enjoyed your article. I will try to understand this.
Will you be doing more of this in the future with more complicated examples?
 
P

Paul McGuire

Richard Schneiderman said:
I really enjoyed your article. I will try to understand this.
Will you be doing more of this in the future with more complicated examples?
I'm giving two presentations at PyCon at the end of February, so I think
those will be published after the conference.

Otherwise, I'll be answering pyparsing questions as they come up on c.l.py
or on the pyparsing forums on SourceForge. I'd like to compile these into
more of a book form at some point, but my work schedule is pretty crazy
right now.

Glad you liked the article,

-- Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,533
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top