L
lorinh
Hi Folks,
I'm trying to strip C/C++ style comments (/* ... */ or // ) from
source code using Python regexps.
If I don't have to worry about comments embedded in strings, it seems
pretty straightforward (this is what I'm using now):
cpp_pat = re.compile(r"""
/\* .*? \*/ | # C comments
// [^\n\r]* # C++ comments
""",re.S|re.X)
s = file('myprog.cpp').read()
cpp_pat.sub(' ',s)
However, the sticking point is dealing with tokens like /* embedded
within a string:
const char *mystr = "This is /*trouble*/";
I've inherited a working Perl script, which I'd like to reimplement in
Python so that I don't have to spawn a new Perl process in my Python
program each time I want to strip comments from a file. The Perl script
looks like this:
#!/usr/bin/perl -w
$/ = undef; # no line delimiter
$_ = <>; # read entire file
s! ((['"]) (?: \\. | .)*? \2) | # skip quoted strings
/\* .*? \*/ | # delete C comments
// [^\n\r]* # delete C++ comments
! $1 || ' ' # change comments to a single space
!xseg; # ignore white space, treat as single line
# evaluate result, repeat globally
print;
The Perl regexp above uses some sort of conditional to deal with this,
by replacing a quoted string with itself if the initial match is a
quoted string. Is there some equivalent feature in Python regexps?
Lorin
I'm trying to strip C/C++ style comments (/* ... */ or // ) from
source code using Python regexps.
If I don't have to worry about comments embedded in strings, it seems
pretty straightforward (this is what I'm using now):
cpp_pat = re.compile(r"""
/\* .*? \*/ | # C comments
// [^\n\r]* # C++ comments
""",re.S|re.X)
s = file('myprog.cpp').read()
cpp_pat.sub(' ',s)
However, the sticking point is dealing with tokens like /* embedded
within a string:
const char *mystr = "This is /*trouble*/";
I've inherited a working Perl script, which I'd like to reimplement in
Python so that I don't have to spawn a new Perl process in my Python
program each time I want to strip comments from a file. The Perl script
looks like this:
#!/usr/bin/perl -w
$/ = undef; # no line delimiter
$_ = <>; # read entire file
s! ((['"]) (?: \\. | .)*? \2) | # skip quoted strings
/\* .*? \*/ | # delete C comments
// [^\n\r]* # delete C++ comments
! $1 || ' ' # change comments to a single space
!xseg; # ignore white space, treat as single line
# evaluate result, repeat globally
print;
The Perl regexp above uses some sort of conditional to deal with this,
by replacing a quoted string with itself if the initial match is a
quoted string. Is there some equivalent feature in Python regexps?
Lorin