String Manipulation Help!

Dave · Jan 28, 2006

OK, I'm stumped.

I'm trying to find newline characters (\n, specifically) that are NOT
in comments.

So, for example (where "<-" = a newline character):
==========================================
1: <-
2: /*<-
3: ----------------------<-
4: comment<-
5: ----------------------<-
6: */<-
7: <-
8: CODE CODE CODE<-
9: <-
==========================================

I want to return the newline characters at lines 1, 6, 7, 8, and 9 but
NOT the others.

I've tried using regular expressions but I dislike them because they
aren't immediately readable (and also I don't bloody understand the
things). I'm not opposed to using them, though, if they provide a
solution to this problem!

Thanks in advance for any suggestions anyone can provide.

- Dave

Kirk McDonald · Jan 28, 2006

Dave said:
OK, I'm stumped.

I'm trying to find newline characters (\n, specifically) that are NOT
in comments.

So, for example (where "<-" = a newline character):
==========================================
1: <-
2: /*<-
3: ----------------------<-
4: comment<-
5: ----------------------<-
6: */<-
7: <-
8: CODE CODE CODE<-
9: <-
==========================================

[snip]

Well, I'm sure there is some regex that'll do it, but here's a stupid
iterative solution:

def newlines(s):
nl = []
inComment = False
for i in xrange(len(s)):
if s[i:i+2] == '/*':
inComment = True
if s[i:i+2] == '*/':
inComment = False
if inComment: continue
if s == '\n':
nl.append(i)
return tuple(nl)

Your example returns:
(0, 64, 65, 80, 81)

This probably isn't as fast as a regex, but at least it works.

-Kirk McDonald

Dave · Jan 28, 2006

This is great, thanks!

Paul McGuire · Jan 28, 2006

Dave said:
OK, I'm stumped.

I'm trying to find newline characters (\n, specifically) that are NOT
in comments.

So, for example (where "<-" = a newline character):
==========================================
1: <-
2: /*<-
3: ----------------------<-
4: comment<-
5: ----------------------<-
6: */<-
7: <-
8: CODE CODE CODE<-
9: <-
==========================================

I want to return the newline characters at lines 1, 6, 7, 8, and 9 but
NOT the others.

Dave -

Pyparsing has built-in support for detecting line breaks and comments, and
the syntax is pretty simple, I think. Here's a pyparsing program that gives
your desired results:

===============================
from pyparsing import lineEnd, cStyleComment, lineno

testsource = """
/*
----------------------
comment
----------------------
*/

CODE CODE CODE

"""

# define the expression you want to search for
eol = lineEnd

# specify that you don't want to match within C-style comments
eol.ignore(cStyleComment.leaveWhitespace())

# loop through all the occurrences returned by scanString
# and print the line number of that location within the original string
for toks,startloc,endloc in eol.scanString(testsource):
print lineno(startloc,data)
===============================

The expression you are searching for is pretty basic, just a plain
end-of-line, or pyparsing's built-in expression, lineEnd. The curve you are
throwing is that you *don't* want eol's inside of C-style comments.
Pyparsing allows you to designate an "ignore" expression to skip undesirable
content, and fortunately, ignoring comments happens so often during parsing,
that pyparsing includes common comment expressions for C, C++, Java, Python,
and HTML. Next, pyparsing's version of re.search is scanString. scanString
returns a generator that gives the matching tokens, start location, and end
location of every occurrence of the given parse expression, in your case,
eol. Finally, in the body of our for loop, we use pyparsing's lineno
function to give us the line number of a string location within the original
string.

About the only real wart on all this is that pyparsing implicitly skips over
leading whitespace, even when looking for expressions to be ignored. In
order not to lose eols that are just before a comment (like your line 1), we
have to modify cStyleComment to leave leading whitespace.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

Richard Schneiderman · Jan 28, 2006

I really enjoyed your article. I will try to understand this.
Will you be doing more of this in the future with more complicated examples?

Paul McGuire · Jan 28, 2006

Richard Schneiderman said:
I really enjoyed your article. I will try to understand this.
Will you be doing more of this in the future with more complicated examples?

I'm giving two presentations at PyCon at the end of February, so I think
those will be published after the conference.

Otherwise, I'll be answering pyparsing questions as they come up on c.l.py
or on the pyparsing forums on SourceForge. I'd like to compile these into
more of a book form at some point, but my work schedule is pretty crazy
right now.

Glad you liked the article,

-- Paul

Need help for javascript code	3	Sep 28, 2022
Help please	8	Jul 7, 2023
Help with code	0	Jun 12, 2022
Help with my responsive home page	2	Dec 14, 2022
I need help fixing my website	2	Oct 15, 2023
Survey details won't go through using php, ajax, Mysql	0	Oct 26, 2023
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Needing Help With My First JavaScript script	1	Mar 22, 2018

String Manipulation Help!

Dave

Kirk McDonald

Dave

Paul McGuire

Richard Schneiderman

Paul McGuire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads